共查询到20条相似文献,搜索用时 15 毫秒
1.
Applied Intelligence - In the real-world applications of machine learning and cybernetics, the data with imbalanced distribution of classes or skewed class proportions is very pervasive. When... 相似文献
2.
Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large
number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques)
and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model
of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class
labels to the testing instances where the values of the predictor features are known, but the value of the class label is
unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy—ensembles
of classifiers. 相似文献
3.
Mechanical excavators are widely used in mining, tunneling and civil engineering projects. There are several types of mechanical excavators, such as a roadheader, tunnel boring machine and impact hammer. This is because these tools can bring productivity to the project quickly, accurately and safely. Among these, roadheaders have some advantages like selective mining, mobility, less over excavation, minimal ground disturbances, elimination of blast vibration, reduced ventilation requirements and initial investment cost. A critical issue in successful roadheader application is the ability to evaluate and predict the machine performance named instantaneous (net) cutting rate. Although there are several prediction methods in the literature, for the prediction of roadheader performance, only a few of them have been developed via artificial neural network techniques. In this study, for this purpose, 333 data sets including uniaxial compressive strength and power on cutting boom, 103 data set including RQD, and 125 data sets including machine weight are accumulated from the literature. This paper focuses on roadheader performance prediction using six different machine learning algorithms and a combination of various machine learning algorithms via ensemble techniques. Algorithms are ZeroR, random forest (RF), Gaussian process, linear regression, logistic regression and multi-layer perceptron (MLP). As a result, MLP and RF give better results than the other algorithms also the best solution achieved was bagging technique on RF and principle component analysis (PCA). The best success rate obtained in this study is 90.2% successful prediction, and it is relatively better than contemporary research. 相似文献
4.
Deep learning techniques for Sentiment Analysis have become very popular. They provide automatic feature extraction and both richer representation capabilities and better performance than traditional feature based techniques (i.e., surface methods). Traditional surface approaches are based on complex manually extracted features, and this extraction process is a fundamental question in feature driven methods. These long-established approaches can yield strong baselines, and their predictive capabilities can be used in conjunction with the arising deep learning methods. In this paper we seek to improve the performance of deep learning techniques integrating them with traditional surface approaches based on manually extracted features. The contributions of this paper are sixfold. First, we develop a deep learning based sentiment classifier using a word embeddings model and a linear machine learning algorithm. This classifier serves as a baseline to compare to subsequent results. Second, we propose two ensemble techniques which aggregate our baseline classifier with other surface classifiers widely used in Sentiment Analysis. Third, we also propose two models for combining both surface and deep features to merge information from several sources. Fourth, we introduce a taxonomy for classifying the different models found in the literature, as well as the ones we propose. Fifth, we conduct several experiments to compare the performance of these models with the deep learning baseline. For this, we use seven public datasets that were extracted from the microblogging and movie reviews domain. Finally, as a result, a statistical study confirms that the performance of these proposed models surpasses that of our original baseline on F1-Score. 相似文献
5.
Abstract
Error Correcting Output Coding (ECOC) methods for
multiclass classification present several open problems ranging
from the trade-off between their error recovering capabilities
and the learnability of the induced dichotomies to the selection
of proper base learners and to the design of well-separated
codes for a given multiclass problem. We experimentally analyse
some of the main factors affecting the effectiveness of ECOC
methods. We show that the architecture of ECOC learning machines
influences the accuracy of the ECOC classifier, highlighting
that ensembles of parallel and independent dichotomic
Multi-Layer Perceptrons are well-suited to implement ECOC
methods. We quantitatively evaluate the dependence among
codeword bit errors using mutual information based measures,
experimentally showing that a low dependence enhances the
generalisation capabilities of ECOC. Moreover we show that the
proper selection of the base learner and the decoding function
of the reconstruction stage significantly affects the
performance of the ECOC ensemble. The analysis of the
relationships between the error recovering power, the accuracy
of the base learners, and the dependence among codeword bits
show that all these factors concur to the effectiveness of ECOC
methods in a not straightforward way, very likely dependent on
the distribution and complexity of the data.An erratum to this article can be found at 相似文献
6.
The Journal of Supercomputing - Protein secondary structure is the local conformation assigned to protein sequences with the help of its three-dimensional structure. Assigning the local... 相似文献
7.
Data Mining and Knowledge Discovery - Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different... 相似文献
8.
In using a neural network for an application, data representation and network structure are critical to performance. While most improvements to networks focus on these aspects, we have found that modification of the error function based on current performance can result in significant advantages. We consider here a multilayered network trained by the backpropagation error reduction rule. We also consider a specific task, namely that of direct recognition of handwriting patterns, without any feature extraction to optimise the representation used. We show that the relaxation of the definition of error improves the final performance and accelerates learning. Since the application used in this study has generic qualities, we believe that the results of this numerical experiment are pertinent to a wide class of applications. 相似文献
9.
Machine Learning - In this paper, we show that the way internal estimates are used to measure variable importance in Random Forests are also applicable to feature selection in unsupervised... 相似文献
10.
Co-training is a good paradigm of semi-supervised, which requires the data set to be described by two views of features. There are a notable characteristic shared by many co-training algorithm: the selected unlabeled instances should be predicted with high confidence, since a high confidence score usually implies that the corresponding prediction is correct. Unfortunately, it is not always able to improve the classification performance with these high confidence unlabeled instances. In this paper, a new semi-supervised learning algorithm was proposed combining the benefits of both co-training and active learning. The algorithm applies co-training to select the most reliable instances according to the two criterions of high confidence and nearest neighbor for boosting the classifier, also exploit the most informative instances with human annotation for improve the classification performance. Experiments on several UCI data sets and natural language processing task, which demonstrate our method achieves more significant improvement for sacrificing the same amount of human effort. 相似文献
11.
Despite notable advances over the past decade, current virtual reality systems have numerous drawbacks. The FlatWorld project at the University of Southern California's Institute for Creative Technologies seeks to overcome these limitations by exploring a new approach to virtual environments (VEs) inspired by Hollywood set-design techniques. Since the dawn of the film industry, movie sets have been constructed using modular panels called flats. Set designers use flats to create physical structures to represent various places and activities. The paper considers how FlatWorld is developing a reconfigurable system of digital flats. Using large-screen displays and real-time computer graphics technology, a single digital flat can appear as an interior room wall or an exterior building face. 相似文献
12.
A novel feature selection algorithm is designed for high-dimensional data classification. The relevant features are selected with the least square loss function and \({\ell _{2,1}}\)-norm regularization term if the minimum representation error rate between the features and labels is approached with respect to only these features. Taking into account both the local and global structures of data distribution with subspace learning, an efficient optimization algorithm is proposed to solve the joint objective function, so as to select the most representative features and noise-resistant features to enhance the performance of classification. Sets of experiments are conducted on benchmark datasets, show that the proposed approach is more effective and robust than existing feature selection algorithms. 相似文献
13.
Ontology alignment identifies semantically matching entities in different ontologies. Various ontology alignment strategies have been proposed; however, few systems have explored how to automatically combine multiple strategies to improve the matching effectiveness. This paper presents a dynamic multistrategy ontology alignment framework, named RiMOM. The key insight in this framework is that similarity characteristics between ontologies may vary widely. We propose a systematic approach to quantitatively estimate the similarity characteristics for each alignment task and propose a strategy selection method to automatically combine the matching strategies based on two estimated factors. In the approach, we consider both textual and structural characteristics of ontologies. With RiMOM, we participated in the 2006 and 2007 campaigns of the Ontology Alignment Evaluation Initiative (OAEI). Our system is among the top three performers in benchmark data sets. 相似文献
14.
Intelligence is strongly connected with learning adapting abilities, therefore such capabilities are considered as indispensable features of intelligent manufacturing systems (IMSs). A number of approaches have been described to apply different machine learning (ML) techniques for manufacturing problems, starting with rule induction in symbolic domains and pattern recognition techniques in numerical, subsymbolic domains. In recent years, artificial neural network (ANN) based learning is the dominant ML technique in manufacturing. However, mainly because of the black box nature of ANNs, these solutions have limited industrial acceptance. In the paper, the integration of neural and fuzzy techniques is treated and former solutions are analysed. A genetic algorithm (GA) based approach is introduced to overcome problems that are experienced during manufacturing applications with other algorithms. 相似文献
15.
The annoyance of spam emails increasingly plagues both individuals and organizations. In response, most of prior research investigates spam filtering as a classical text categorization task, in which training examples must include both spam (positive examples) and legitimate (negative examples) emails. However, in many spam filtering scenarios, obtaining legitimate emails for training purpose can be more difficult than collecting spam and unclassified emails. Hence, it is more appropriate to construct a classification model for spam filtering that uses positive training examples (i.e., spam) and unlabeled instances only and does not require legitimate emails as negative training examples. Several single-class learning techniques, such as PNB and PEBL, have been proposed in the literature. However, they incur inherent limitations with regard to spam filtering. In this study, we propose and develop an ensemble approach, referred to as E2, to address these limitations. Specifically, we follow the two-stage framework of PEBL but extend each stage with an ensemble strategy. The empirical evaluation results from two spam filtering corpora suggest that our proposed E2 technique generally outperforms benchmark techniques (i.e., PNB and PEBL) and exhibits more stable performance than its counterparts. 相似文献
16.
针对集成学习算法的不足,提出了一种新颖的集成学习算法一集成最大间隔集成学习算法(MMEA).该算法的时间与空间复杂度都是O(N),而标准的SVM算法的时间复杂度是O(N3),空间复杂度是O(N2),其中N是数据样本的大小,并从理论上证明了MMEA算法的收敛性.用MMEA算法与Bagging LibSVM,AdaBoostLibSVM,BaggingLiblinear,AdaBoostLiblinear流行的集成算法对扩展的MIT人脸数据集进行分类.实验结果表明,提出的MMEA算法在多项指标上均达到最优. 相似文献
17.
Ensemble learning has attracted considerable attention owing to its good generalization performance. The main issues in constructing a powerful ensemble include training a set of diverse and accurate base classifiers, and effectively combining them. Ensemble margin, computed as the difference of the vote numbers received by the correct class and the another class received with the most votes, is widely used to explain the success of ensemble learning. This definition of the ensemble margin does not consider the classification confidence of base classifiers. In this work, we explore the influence of the classification confidence of the base classifiers in ensemble learning and obtain some interesting conclusions. First, we extend the definition of ensemble margin based on the classification confidence of the base classifiers. Then, an optimization objective is designed to compute the weights of the base classifiers by minimizing the margin induced classification loss. Several strategies are tried to utilize the classification confidences and the weights. It is observed that weighted voting based on classification confidence is better than simple voting if all the base classifiers are used. In addition, ensemble pruning can further improve the performance of a weighted voting ensemble. We also compare the proposed fusion technique with some classical algorithms. The experimental results also show the effectiveness of weighted voting with classification confidence. 相似文献
18.
This study presented various soft computing techniques for forecasting the hourly precipitations during tropical cyclones. The purpose of the current study is to present a concise and synthesized documentation of the current level of skill of various models at precipitation forecasts. The techniques involve artificial neural networks (ANN) comprising the multilayer perceptron (MLP) with five training methods (denoted as ANN-1, ANN-2, ANN-3, ANN-4, and ANN-5), and decision trees including classification and regression tree (CART), Chi-squared automatic interaction detector (CHAID), and exhaustive CHAID (E-CHAID). The developed models were applied to the Shihmen Reservoir Watershed in Taiwan. The traditional statistical models including multiple linear regressions (MLR), and climatology average model (CLIM) were selected as the benchmarks and compared with these machine learning. A total of 157 typhoons affecting the watershed were collected. The measures used include numerical statistics and categorical statistics. The RMSE criterion was employed to assess the suitable scenario, while the categorical scores, bias, POD, FAR, HK, and ETS were based on the rain contingency table. Consequently, this study found that ANN and decision trees provide better prediction compared to traditional statistical models according to the various average skill scores. 相似文献
19.
Many recommendation systems find similar users based on a profile of a target user and recommend products that he/she may be interested in. The profile is constructed with his/her purchase histories. However, histories of new customers are not stored and it is difficult to recommend products to them in the same fashion. The problem is called a cold start problem. We propose a recommendation method using access logs instead of purchase histories, because the access logs are gathered more easily than purchase histories and the access logs include much information on their interests. In this study, we construct user’s profiles using product categories browsed by them from their access logs and predict products with Gradient Boosting Decision Tree. In addition, we carry out evaluation experiments using access logs in a real online shop and discuss performance of our proposed method comparing with conventional machine learning and Support Vector Machine (SVM). We confirmed that the proposed method achieved higher precision than SVM over 10 data sets. Especially, under unbalanced data sets, the proposed method is superior to SVM. 相似文献
20.
Due to the fast learning speed, simplicity of implementation and minimal human intervention, extreme learning machine has received considerable attentions recently, mostly from the machine learning community. Generally, extreme learning machine and its various variants focus on classification and regression problems. Its potential application in analyzing censored time-to-event data is yet to be verified. In this study, we present an extreme learning machine ensemble to model right-censored survival data by combining the Buckley-James transformation and the random forest framework. According to experimental and statistical analysis results, we show that the proposed model outperforms popular survival models such as random survival forest, Cox proportional hazard models on well-known low-dimensional and high-dimensional benchmark datasets in terms of both prediction accuracy and time efficiency. 相似文献
|