首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Saul C.  Raul Fonseca   《Neurocomputing》2008,71(7-9):1550-1560
In this contribution, we introduce a new on-line approximate maximal margin learning algorithm based on an extension of the perceptron algorithm. This extension, which we call fixed margin perceptron (FMP), finds the solution of a linearly separable learning problem given a fixed margin. It is shown that this algorithm converges in updates, where γf<γ* is the fixed margin, γ* is the optimum margin and R is the radius of the ball that circumscribes the data. The incremental margin algorithm (IMA) approximates the large margin solution by successively using FMP with increasing margin values. This incremental approach always guarantees a good solution at hands. Also, it is easy to implement and avoids quadratic programming methods. IMA was tested using several different data sets and it yields results similar to those found by an SVM.  相似文献   

2.

We consider the problem of cost sensitive multiclass classification, where we would like to increase the sensitivity of an important class at the expense of a less important one. We adopt an apportioned margin framework to address this problem, which enables an efficient margin shift between classes that share the same boundary. The decision boundary between all pairs of classes divides the margin between them in accordance with a given prioritization vector, which yields a tighter error bound for the important classes while also reducing the overall out-of-sample error. In addition to demonstrating an efficient implementation of our framework, we derive generalization bounds, demonstrate Fisher consistency, adapt the framework to Mercer’s kernel and to neural networks, and report promising empirical results on all accounts.

  相似文献   

3.
Due to being fast, easy to implement and relatively effective, some state-of-the-art naive Bayes text classifiers with the strong assumption of conditional independence among attributes, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, have received a great deal of attention from researchers in the domain of text classification. In this article, we revisit these naive Bayes text classifiers and empirically compare their classification performance on a large number of widely used text classification benchmark datasets. Then, we propose a locally weighted learning approach to these naive Bayes text classifiers. We call our new approach locally weighted naive Bayes text classifiers (LWNBTC). LWNBTC weakens the attribute conditional independence assumption made by these naive Bayes text classifiers by applying the locally weighted learning approach. The experimental results show that our locally weighted versions significantly outperform these state-of-the-art naive Bayes text classifiers in terms of classification accuracy.  相似文献   

4.
5.
6.
Species’ potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species’ potential distribution.  相似文献   

7.
8.
Large margin vs. large volume in transductive learning   总被引:2,自引:0,他引:2  
We consider a large volume principle for transductive learning that prioritizes the transductive equivalence classes according to the volume they occupy in hypothesis space. We approximate volume maximization using a geometric interpretation of the hypothesis space. The resulting algorithm is defined via a non-convex optimization problem that can still be solved exactly and efficiently. We provide a bound on the test error of the algorithm and compare it to transductive SVM (TSVM) using 31 datasets.  相似文献   

9.
程昊翔  王坚 《控制与决策》2016,31(5):949-952
为了提高孪生支持向量机的泛化能力,提出一种新的孪生大间隔分布机算法,以增加间隔分布对于训练模型的影响.理论研究表明,间隔分布对于模型的泛化性能有着非常重要的影响.该算法在标准孪生支持向量机优化目标函数上增加了间隔分布的影响,间隔分布通过一阶和二阶数据统计特征来体现.在标准数据集上的实验结果表明,所提出的算法比SVM、TWSVM、TBSVM算法的分类精确度更高.  相似文献   

10.
We propose a novel discriminative learning approach for Bayesian pattern classification, called ‘constrained maximum margin (CMM)’. We define the margin between two classes as the difference between the minimum decision value for positive samples and the maximum decision value for negative samples. The learning problem is to maximize the margin under the constraint that each training pattern is classified correctly. This nonlinear programming problem is solved using the sequential unconstrained minimization technique. We applied the proposed CMM approach to learn Bayesian classifiers based on Gaussian mixture models, and conducted the experiments on 10 UCI datasets. The performance of our approach was compared with those of the expectation-maximization algorithm, the support vector machine, and other state-of-the-art approaches. The experimental results demonstrated the effectiveness of our approach.  相似文献   

11.
刘忠宝  王士同 《控制与决策》2012,27(12):1870-1875
受空间几何知识和光学领域光束角的启发,提出了基于光束角思想的最大间隔学习机(BAMLM).该方法试图在模式空间中找到一个“光源”分别照射两类样本,然后根据照射区域的不同确定样本类属.分析发现,BAMLM的核化形式等价于核化中心受限最小包含球(CCMEB),通过引入核心向量机将BAMLM扩展为基于核心向量机的BAMLM (BACVM),有效地解决了大规模样本的分类问题.标准数据集和人工数据集上的实验表明了BAMLM和BACVM的有效性.  相似文献   

12.
Gupta  Umesh  Gupta  Deepak 《Applied Intelligence》2021,51(10):7058-7093

Better prediction ability is the main objective of any regression-based model. Large margin Distribution Machine for Regression (LDMR) is an efficient approach where it tries to reduce both loss functions, i.e. ε-insensitive and quadratic loss to diminish the effects of outliers. However, still, it has a significant drawback, i.e. high computational complexity. To achieve the improved generalization of the regression model with less computational cost, we propose an enhanced form of LDMR named as Least Squares Large margin Distribution Machine-based Regression (LS-LDMR) by transforming the inequality conditions alleviate to equality conditions. The elucidation is attained by handling a system of linear equations where we need to measure the inverse of the matrix only. Hence, there is no need to solve the large size of the quadratic programming problem, unlike in the case of other regression-based algorithms as SVR, Twin SVR, and LDMR. The numerical experiment has been performed on the benchmark real-life datasets along with synthetically generated datasets by using the linear and Gaussian kernel. All the experiments of presented LS-LDMR are analyzed with standard SVR, Twin SVR, primal least squares Twin SVR (PLSTSVR), ε-Huber SVR (ε-HSVR), ε-support vector quantile regression (ε-SVQR), minimum deviation regression (MDR), and LDMR, which shows the effectiveness and usability of LS-LDMR. This approach is also statistically validated and verified in terms of various metrics.

  相似文献   

13.
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.  相似文献   

14.
The analysis of travel mode choice is an important task in transportation planning and policy making in order to understand and predict travel demands. While advances in machine learning have led to numerous powerful classifiers, their usefulness for modeling travel mode choice remains largely unexplored. Using extensive Dutch travel diary data from the years 2010 to 2012, enriched with variables on the built and natural environment as well as on weather conditions, this study compares the predictive performance of seven selected machine learning classifiers for travel mode choice analysis and makes recommendations for model selection. In addition, it addresses the importance of different variables and how they relate to different travel modes. The results show that random forest performs significantly better than any other of the investigated classifiers, including the commonly used multinomial logit model. While trip distance is found to be the most important variable, the importance of the other variables varies with classifiers and travel modes. The importance of the meteorological variables is highest for support vector machine, while temperature is particularly important for predicting bicycle and public transport trips. The results suggest that the analysis of variable importance with respect to the different classifiers and travel modes is essential for a better understanding and effective modeling of people’s travel behavior.  相似文献   

15.

Fraudulent online sellers often collude with reviewers to garner fake reviews for their products. This act undermines the trust of buyers in product reviews, and potentially reduces the effectiveness of online markets. Being able to accurately detect fake reviews is, therefore, critical. In this study, we investigate several preprocessing and textual-based featuring methods along with machine learning classifiers, including single and ensemble models, to build a fake review detection system. Given the nature of product review data, where the number of fake reviews is far less than that of genuine reviews, we look into the results of each class in detail in addition to the overall results. We recognise from our preliminary analysis that, owing to imbalanced data, there is a high imbalance between the accuracies for different classes (e.g., 1.3% for the fake review class and 99.7% for the genuine review class), despite the overall accuracy looking promising (around 89.7%). We propose two dynamic random sampling techniques that are possible for textual-based featuring methods to solve this class imbalance problem. Our results indicate that both sampling techniques can improve the accuracy of the fake review class—for balanced datasets, the accuracies can be improved to a maximum of 84.5% and 75.6% for random under and over-sampling, respectively. However, the accuracies for genuine reviews decrease to 75% and 58.8% for random under and over-sampling, respectively. We also discover that, for smaller datasets, the Adaptive Boosting ensemble model outperforms other single classifiers; whereas, for larger datasets, the performance improvement from ensemble models is insignificant compared to the best results obtained by single classifiers.

  相似文献   

16.
Think globally, act locally: decentralized supervisory control   总被引:5,自引:0,他引:5  
Decentralized supervisory control is investigated by considering problem formulations that model systems whose specifications are given as global constraints, but whose solution is described by local controllers. A necessary and sufficient condition is given for the existence of a solution to the problem of finding decentralized supervisors that ensure that the behavior of the closed-loop system lies in a given range. Where the range of behavior can be described by regular languages, it can be effectively tested whether the decentralized control problem is solvable; in this case, a procedure is given to compute the associated supervisors  相似文献   

17.
Multimedia Tools and Applications - Link prediction is a widely studied topic in graph data analytics and finds numerous applications like friend recommendations in social networks and product...  相似文献   

18.
This research synthesizes a taxonomy for classifying detection methods of new malicious code by Machine Learning (ML) methods based on static features extracted from executables. The taxonomy is then operationalized to classify research on this topic and pinpoint critical open research issues in light of emerging threats. The article addresses various facets of the detection challenge, including: file representation and feature selection methods, classification algorithms, weighting ensembles, as well as the imbalance problem, active learning, and chronological evaluation. From the survey we conclude that a framework for detecting new malicious code in executable files can be designed to achieve very high accuracy while maintaining low false positives (i.e. misclassifying benign files as malicious). The framework should include training of multiple classifiers on various types of features (mainly OpCode and byte n-grams and Portable Executable Features), applying weighting algorithm on the classification results of the individual classifiers, as well as an active learning mechanism to maintain high detection accuracy. The training of classifiers should also consider the imbalance problem by generating classifiers that will perform accurately in a real-life situation where the percentage of malicious files among all files is estimated to be approximately 10%.  相似文献   

19.
Although activity recognition is an emerging general area of research in computer science, its potential in construction engineering and management (CEM) domain has not yet been fully investigated. Due to the complex and dynamic nature of many construction and infrastructure projects, the ability to detect and classify key activities performed in the field by various equipment and human crew can improve the quality and reliability of project decision-making and control. In particular to simulation modeling, process-level knowledge obtained as a result of activity recognition can help verify and update the input parameters of simulation models. Such input parameters include but are not limited to activity durations and precedence, resource flows, and site layout. The goal of this research is to investigate the prospect of using built-in smartphone sensors as ubiquitous multi-modal data collection and transmission nodes in order to detect detailed construction equipment activities which can ultimately contribute to the process of simulation input modeling. A case study of front-end loader activity recognition is presented to describe the methodology for action recognition and evaluate the performance of the developed system. In the designed methodology, certain key features are extracted from the collected data using accelerometer and gyroscope sensors, and a subset of the extracted features is used to train supervised machine learning classifiers. In doing so, several important technical details such as selection of discriminating features to extract, sensitivity analysis of data segmentation window size, and choice of the classifier to be trained are investigated. It is shown that the choice of the level of detail (LoD) in describing equipment actions (classes) is an important factor with major impact on the classification performance. Results also indicate that although decreasing the number of classes generally improves the classification output, considering other factors such as actions to be combined as a single activity, methodologies to extract knowledge from classified activities, computational efficiency, and end use of the classification process may as well influence one’s decision in selecting an optimal LoD in describing equipment activities (classes).  相似文献   

20.
Remote sensing scientists are increasingly adopting machine learning classifiers for land cover and land use (LCLU) mapping, but model selection, a critical step of the machine learning classification, has usually been ignored in the past research. In this paper, step-by-step guidance (for classifier training, model selection, and map production) with supervised learning model selection is first provided. Then, model selection is exhaustively applied to different machine learning (e.g. Artificial Neural Network (ANN), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF)) classifiers to identify optimal polynomial degree of input features (d) and hyperparameters with Landsat imagery of a study region in China and Ghana. We evaluated the map accuracy and computing time associated with different versions of machine learning classification software (i.e. ArcMap, ENVI, TerrSet, and R).

The optimal classifiers and their associated polynomial degree of input features and hyperparameters vary for the two image datasets that were tested. The optimum combination of d and hyperparameters for each type of classifier was used across software packages, but some classifiers (i.e. ENVI and TerrSet ANN) were customized due to the constraints of software packages. The LCLU map derived from ENVI SVM has the highest overall accuracy (72.6%) for the Ghana dataset, while the LCLU map derived from R DT has the highest overall accuracy (48.0%) for the FNNR dataset. All LCLU maps for the Ghana dataset are more accurate compared to those from the China dataset, likely due to more limited and uncertain training data for the China (FNNR) dataset. For the Ghana dataset, LCLU maps derived from tree-based classifiers (ArcMap RF, TerrSet DT, and R RF) routines are accurate, but these maps have artefacts resulting from model overfitting problems.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号