首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 167 毫秒
1.
Cultural modeling aims at developing behavioral models of groups and analyzing the impact of culture factors on group behavior using computational methods. Machine learning methods and in particular classification, play a central role in such applications. In modeling cultural data, it is expected that standard classifiers yield good performance under the assumption that different classification errors have uniform costs. However, this assumption is often violated in practice. Therefore, the performance of standard classifiers is severely hindered. To handle this problem, this paper empirically studies cost-sensitive learning in cultural modeling. We consider cost factor when building the classifiers, with the aim of minimizing total misclassification costs. We conduct experiments to investigate four typical cost-sensitive learning methods, combine them with six standard classifiers and evaluate their performance under various conditions. Our empirical study verifies the effectiveness of cost-sensitive learning in cultural modeling. Based on the experimental results, we gain a thorough insight into the problem of non-uniform misclassification costs, as well as the selection of cost-sensitive methods, base classifiers and method-classifier pairs for this domain. Furthermore, we propose an improved algorithm which outperforms the best method-classifier pair using the benchmark cultural datasets.  相似文献   

2.
We discuss how a large class of regularization methods, collectively known as spectral regularization and originally designed for solving ill-posed inverse problems, gives rise to regularized learning algorithms. All of these algorithms are consistent kernel methods that can be easily implemented. The intuition behind their derivation is that the same principle allowing for the numerical stabilization of a matrix inversion problem is crucial to avoid overfitting. The various methods have a common derivation but different computational and theoretical properties. We describe examples of such algorithms, analyze their classification performance on several data sets and discuss their applicability to real-world problems.  相似文献   

3.
Novelty detection, also referred to as one-class classification, is the process of detecting ‘abnormal’ behavior in a system by learning the ‘normal’ behavior. Novelty detection has been of particular interest to researchers in domains where it is difficult or expensive to find examples of abnormal behavior (such as in medical/equipment diagnosis and IT network surveillance). Effective representation of normal data is of primary interest in pursuing one-class classification. While the literature offers several methods for one-class classification, very few methods can support representation of non-stationary classes without making stringent assumptions about the class distribution. This paper proposes a one-class classification method for non-stationary classes using a modified support vector machine and an efficient online version for reducing computational time. The presented method is applied to several simulated datasets and actual data from a drilling machine. In addition, we present comparison results with other methods that demonstrate its superior performance.  相似文献   

4.
恶意软件的家族分类问题是网络安全研究中的重要课题,恶意软件的动态执行特征能够准确的反映恶意软件的功能性与家族属性。本文通过研究恶意软件调用Windows API的行为特点,发现恶意软件的恶意行为与序列前后向API调用具有一定的依赖关系,而双向LSTM模型的特征计算方式符合这样的依赖特点。通过设计基于双向LSTM的深度学习模型,对恶意软件的前后API调用概率关系进行了建模,经过实验验证,测试准确率达到了99.28%,所提出的模型组合方式对恶意软件调用系统API的行为具有良好的建模能力,为了深入的测试深度学习方法的分类性能,实验部分进一步设置了对抗样本实验,通过随机插入API序列的方式构造模拟对抗样本来测试原始参数模型的分类性能,对抗样本实验表明,深度学习方法相对某些浅层机器学习方法具有更高的稳定性。文中实验为深度学习技术向工业界普及提供了一定的参考意义。  相似文献   

5.
Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance – at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.  相似文献   

6.
基于骨骼信息的人体行为识别旨在从输入的包含一个或多个行为的骨骼序列中,正确地分析出行为的种类,是计算机视觉领域的研究热点之一。与基于图像的人体行为识别方法相比,基于骨骼信息的人体行为识别方法不受背景、人体外观等干扰因素的影响,具有更高的准确性、鲁棒性和计算效率。针对基于骨骼信息的人体行为识别方法的重要性和前沿性,对其进行全面和系统的总结分析具有十分重要的意义。本文首先回顾了9个广泛应用的骨骼行为识别数据集,按照数据收集视角的差异将它们分为单视角数据集和多视角数据集,并着重探讨了不同数据集的特点和用法。其次,根据算法所使用的基础网络,将基于骨骼信息的行为识别方法分为基于手工制作特征的方法、基于循环神经网络的方法、基于卷积神经网络的方法、基于图卷积网络的方法以及基于Transformer的方法,重点阐述分析了这些方法的原理及优缺点。其中,图卷积方法因其强大的空间关系捕捉能力而成为目前应用最为广泛的方法。采用了全新的归纳方法,对图卷积方法进行了全面综述,旨在为研究人员提供更多的思路和方法。最后,从8个方面总结现有方法存在的问题,并针对性地提出工作展望。  相似文献   

7.
This paper presents a data mining algorithm based on supervised clustering to learn data patterns and use these patterns for data classification. This algorithm enables a scalable incremental learning of patterns from data with both numeric and nominal variables. Two different methods of combining numeric and nominal variables in calculating the distance between clusters are investigated. In one method, separate distance measures are calculated for numeric and nominal variables, respectively, and are then combined into an overall distance measure. In another method, nominal variables are converted into numeric variables, and then a distance measure is calculated using all variables. We analyze the computational complexity, and thus, the scalability, of the algorithm, and test its performance on a number of data sets from various application domains. The prediction accuracy and reliability of the algorithm are analyzed, tested, and compared with those of several other data mining algorithms.  相似文献   

8.
Bank failures threaten the economic system as a whole. Therefore, predicting bank financial failures is crucial to prevent and/or lessen the incoming negative effects on the economic system. This is originally a classification problem to categorize banks as healthy or non-healthy ones. This study aims to apply various neural network techniques, support vector machines and multivariate statistical methods to the bank failure prediction problem in a Turkish case, and to present a comprehensive computational comparison of the classification performances of the techniques tested. Twenty financial ratios with six feature groups including capital adequacy, asset quality, management quality, earnings, liquidity and sensitivity to market risk (CAMELS) are selected as predictor variables in the study. Four different data sets with different characteristics are developed using official financial data to improve the prediction performance. Each data set is also divided into training and validation sets. In the category of neural networks, four different architectures namely multi-layer perceptron, competitive learning, self-organizing map and learning vector quantization are employed. The multivariate statistical methods; multivariate discriminant analysis, k-means cluster analysis and logistic regression analysis are tested. Experimental results are evaluated with respect to the correct accuracy performance of techniques. Results show that multi-layer perceptron and learning vector quantization can be considered as the most successful models in predicting the financial failure of banks.  相似文献   

9.
Langley  Pat 《Machine Learning》1986,1(3):243-248
Summary Although science can be characterized in terms of search, some search methods let one explore multiple paths in parallel. We have argued that more machine learning researchers should focus their efforts on modeling human behavior, but we have not argued that the field should limit itself to this approach. For those interested in general principles, the study of nonhuman learning methods is also necessary for useful results. In terms of applications, some of machine learning's greatest achievements have involved nonincremental methods that are clearly poor models of human learning. Planes are terrible imitations of birds (and fly less efficiently), but there are still excellent reasons for using aircraft.However, we do believe that too little research has focused on results from the literature on human learning, and that greater attention in this direction would benefit the field as a whole. Science is a complex and bewildering process, and the scientist should employ all available knowledge to direct his steps in useful directions. This strategy seems especially important in young fields like machine learning, in which conflicting views and methods abound. We encourage the reader to join us in applying machine learning techniques to explain the mysteries of human behavior, and in using knowledge of human behavior to constrain our computational theories of learning.  相似文献   

10.
Gaussian processes are powerful modeling tools in machine learning which offer wide applicability for regression and classification tasks due to their non-parametric and non-linear behavior. However, one of their main drawbacks is the training time complexity which scales cubically with the number of examples. Our work addresses this issue by combining Gaussian processes with random decision forests to enable fast learning. An important advantage of our method is its simplicity and the ability to directly control the tradeoff between classification performance and computational speed. Experiments on an indoor place recognition task and on standard machine learning benchmarks show that our method can handle large training sets of up to three million examples in reasonable time while retaining good classification accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号