共查询到20条相似文献,搜索用时 26 毫秒
1.
In this paper we introduce a method called CL.E.D.M. (CLassification through ELECTRE and Data Mining), that employs aspects of the methodological framework of the ELECTRE I outranking method, and aims at increasing the accuracy of existing data mining classification algorithms. In particular, the method chooses the best decision rules extracted from the training process of the data mining classification algorithms, and then it assigns the classes that correspond to these rules, to the objects that must be classified. Three well known data mining classification algorithms are tested in five different widely used databases to verify the robustness of the proposed method. 相似文献
2.
3.
We investigate empirically the performance under damage conditions of single- and multilayer perceptrons (MLP's), with various numbers of hidden units, in a representative pattern-recognition task. While some degree of graceful degradation was observed, the single-layer perceptron was considerably less fault tolerant than any of the multilayer perceptrons, including one with fewer adjustable weights. Our initial hypothesis that fault tolerance would be significantly improved for multilayer nets with larger numbers of hidden units proved incorrect. Indeed, there appeared to be a liability to having excess hidden units. A simple technique (called augmentation) is described, which was successful in translating excess hidden units into improved fault tolerance. Finally, our results were supported by applying singular value decomposition (SVD) analysis to the MLP's internal representations. 相似文献
4.
This paper presents a theoretical approach to determine the probability of misclassification of the multilayer perceptron (MLP) neural model, subject to weight errors. The type of applications considered are classification/recognition tasks involving binary input-output mappings. The analytical models are validated via simulation of a small illustrative example. The theoretical results, in agreement with simulation results, show that, for the example considered, Gaussian weight errors of standard deviation up to 22% of the weight value can be tolerated. The theoretical method developed here adds predictability to the fault tolerance capability of neural nets and shows that this capability is heavily dependent on the problem data. 相似文献
5.
不同的聚类算法用于设计各自的策略,然而,每种技术在执行特定数据集时都有一定的局限性。选择恰当的识别信息方法(DIM)可确保文档聚类的进行。针对这些问题提出一种基于共识和分类的文档聚类(DCCC)的DIM。首先,选择识别信息最大化聚类(CDIM)作为数据集生成初始聚类的解决方法,并使用两种不同的CDIM方法生成两个初始聚集;其次,使用不同的参数方法对两初始聚集再进行初始化,通过簇标签信息间的关系建立共识,最大限度地提高文档的识别数总和;最后,选择识别文本权重分类(DTWC)作为文本分类器给共识分配新的簇标签,通过训练文本分类器更改基础分区,并根据预报标签信息生成最后的分区。采用8个网络数据集进行实验,选择BCubed的精度和召回率指标进行聚类验证。实验结果表明,所提出的共识分类方法的聚类结果优于对比方法的聚类结果。 相似文献
6.
Most research of class imbalance is focused on two class problem to date. A multi-class imbalance is so complicated that one has little knowledge and experience in Internet traffic classification. In this paper we study the challenges posed by Internet traffic classification using machine learning with multi-class unbalanced data and the ability of some adjusting methods, including resampling (random under-sampling, random over-sampling) and cost-sensitive learning. Then we empirically compare the effectiveness of these methods for Internet traffic classification and determine which produces better overall classifier and under what circumstances. Main works are as below. (1) Cost-sensitive learning is deduced with MetaCost that incorporates the misclassification costs into the learning algorithm for improving multi-class imbalance based on flow ratio. (2) A new resampling model is presented including under-sampling and over-sampling to make the multi-class training data more balanced. (3) The solution is presented to compare among three methods or to compare three methods with original case. Experiment results are shown on sixteen datasets that flow g-mean and byte g-mean are statistically increased by 8.6 % and 3.7 %; 4.4 % and 2.8 %; 11.1 % and 8.2 % when three methods are compared with original case. Cost-sensitive learning is as the first choice when the sample size is enough, but resampling is more practical in the rest. 相似文献
7.
问题分类旨在对问题的类型进行自动分类,该任务是问答系统研究的一项基本任务。提出了一种基于答案辅助的半监督问题分类方法。首先,将答案特征结合问题特征一起实现样本表示;然后,利用标签传播方法对已标注问题训练分类器,自动标注未标注问题的类别;最后,将初始标注的问题和自动标注的问题合并作为训练样本,利用最大熵模型对问题的测试文本进行分类。实验结果表明,本文提出的基于答案辅助的半监督分类方法能够充分利用未标注样本提升性能,明显优于其他的基准方法。 相似文献
8.
A new efficient computational technique for training of multilayer feedforward neural networks is proposed. The proposed algorithm consists of two learning phases. The first phase is a local search which implements gradient descent, and the second phase is a direct search scheme which implements dynamic tunneling in weight space avoiding the local trap and thereby generates the point of next descent. The repeated application of these two phases alternately forms a new training procedure which results in a global minimum point from any arbitrary initial choice in the weight space. The simulation results are provided for five test examples to demonstrate the efficiency of the proposed method which overcomes the problem of initialization and local minimum point in multilayer perceptrons. 相似文献
9.
A one-dimensional simulation procedure is developed for use in estimating structural reliability in multi-dimensional load and resistance space with the loads represented as stochastic process. The technique employed is based on the idea of using ‘strips’ of points parallel to each other and sampled on the limit state hyperplanes. The ‘local’ outcrossing rate and the zero time failure probability Pf(0) associated with the narrow strips are derived using the conditional reliability index. When the domain boundary consists of a set of limit states, second order bounds are used to obtain a lower bound approximation of the outcrossing rate and Pf(0) associated with the union of a set of λ strips. It is shown by examples that for high reliability problems, λ may be much less than the number of limit states without significant loss of accuracy and with considerable saving in computation time. It was also found that the rate of convergence of the simulations is quite fast even without using importance sampling. 相似文献
10.
Current classification algorithms usually do not try to achieve a balance between fitting and generalization when they infer models from training data. Furthermore, current algorithms ignore the fact that there may be different penalty costs for the false-positive, false-negative, and unclassifiable types. Thus, their performance may not be optimal or may even be coincidental. This paper proposes a meta-heuristic approach, called the Convexity Based Algorithm (CBA), to address these issues. The new approach aims at optimally balancing the data fitting and generalization behaviors of models when some traditional classification approaches are used. The CBA first defines the total misclassification cost (TC) as a weighted function of the three penalty costs and the corresponding error rates as mentioned above. Next it partitions the training data into regions. This is done according to some convexity properties derivable from the training data and the traditional classification method to be used in conjunction with the CBA. Next the CBA uses a genetic approach to determine the optimal levels of fitting and generalization. The TC is used as the fitness function in this genetic approach. Twelve real-life datasets from a wide spectrum of domains were used to better understand the effectiveness of the proposed approach. The computational results indicate that the CBA may potentially fill in a critical gap in the use of current or future classification algorithms. 相似文献
11.
针对传统度量相似性方法中忽略项目多属性类别差异的问题,提出一种改进项目多属性类别划分的推荐算法,首先构建项目—用户隶属矩阵挖掘用户间的隶属关系,并创建相似邻居FP-Tree以提取最近邻居集;然后分析用户间共同项目相似性和项目多属性类别划分的差异性,通过权重因子将共同项目和多属性类别融合,构建CNB度量模型度量邻居相似程度;最后对所得相似用户进行降序排列,获取更加精准的相似用户,完成推荐工作。结合医药数据集验证该算法的有效性,结果表明其时间复杂度、推荐准确性和平均精度均值均有较好的提升。 相似文献
12.
《Engineering Applications of Artificial Intelligence》2005,18(1):13-19
Gaussian mixture model (GMM) has been widely used for modeling speakers. In speaker identification, one major problem is how to generate a set of GMMs for identification purposes based upon the training data. Due to the hill-climbing characteristic of the maximum likelihood (ML) method, any arbitrary estimate of the initial model parameters will usually lead to a sub-optimal model in practice. To resolve this problem, this paper proposes a hybrid training method based on genetic algorithm (GA). It utilizes the global searching capability of GA and combines the effectiveness of the ML method. Experimental results based on TI46 and TIMIT showed that this hybrid GA could obtain more optimized GMMs and better results than the simple GA and the traditional ML method. 相似文献
13.
Pattern Analysis and Applications - The classification of EEG signals is an essential step in the design of brain–computer interface. In order to optimize this step, a method for EEG... 相似文献
14.
Since the concept of structural classes of proteins was proposed, the problem of protein classification has been tackled by many groups. Most of their classification criteria are based only on the helix/strand contents of proteins. In this paper, we proposed a method for protein structural classification based on their secondary structure sequences. It is a classification scheme that can confirm existing classifications. Here a mathematical model is constructed to describe protein secondary structure sequences, in which each protein secondary structure sequence corresponds to a transition probability matrix that characterizes and differentiates protein structure numerically. Its application to a set of real data has indicated that our method can classify protein structures correctly. The final classification result is shown schematically. So it is visual to observe the structural classifications, which is different from traditional methods. 相似文献
15.
Eklundh JO Yamamoto H Rosenfeld A 《IEEE transactions on pattern analysis and machine intelligence》1980,(1):72-75
Three approaches to reducing errors in multispectral pixel classification were compared: 1) postprocessing (iterated reclassification based on comparison with the neighbors' classes); 2) preprocessing (iterated smoothing, by averaging with selected neighbors, prior to classification); and 3) relaxation (probabilistic classification followed by iterative probability adjustment). In experiments using a color image of a house, the relaxation approach gave markedly superior performance; relaxation eliminated 4-8 times as many errors as the other methods did. 相似文献
16.
《Journal of molecular graphics & modelling》2008,26(6):852-855
Since the concept of structural classes of proteins was proposed, the problem of protein classification has been tackled by many groups. Most of their classification criteria are based only on the helix/strand contents of proteins. In this paper, we proposed a method for protein structural classification based on their secondary structure sequences. It is a classification scheme that can confirm existing classifications. Here a mathematical model is constructed to describe protein secondary structure sequences, in which each protein secondary structure sequence corresponds to a transition probability matrix that characterizes and differentiates protein structure numerically. Its application to a set of real data has indicated that our method can classify protein structures correctly. The final classification result is shown schematically. So it is visual to observe the structural classifications, which is different from traditional methods. 相似文献
17.
Over the last few decades, classification applied to numerous applications in science, engineering, business and industries have rapidly been increased, especially for big data. However, classifiers dealing with complicated high dimension problems with non-conforming patterns with high accuracy are rare, especially for bit-level features. It is a challenging research problem. This paper proposed a novel efficient classifier based on cellular automata model, called Cellular Automata-based Classifier (CAC). CAC possesses the promising capability to deal with non-conforming patterns in the bit-level features. It was developed on a new kind of the proposed elementary cellular automata, called Decision Support Elementary Cellular Automata (DS-ECA). The classification capability of DS-ECA is promising since it can describe very complicated decision rule in high dimension problems with less complexity. CAC comprises double rule vectors and a decision function, the structure of which has two layers; the first layer is employed to evolve an input pattern into feature space and the other interprets the patterns in feature space as binary answer through the decision function. It has a time complexity of learning at O(n2), while the classification for one instance is O(1), where n is a number of bit patterns. For classification performance, 12 datasets consisting of binary and non-binary features are empirically implemented in comparison with Support Vector Machines (SVM) using k-fold cross validation. In this respect, CAC outperforms SVM with the best kernel for binary features, and provides the promising results equivalent to SVM on average for non-binary features. 相似文献
18.
19.
This article proposes a method of exploiting spatial information to improve classification rules constructed by automated methods such as k-nearest neighbour or linear discriminant analysis. The method is intended for polygonbased, land cover type mapping using remote sensing information in a GIS. Our approach differs from contextual allocation methods used in lattice- or pixel-based mapping because it does not rely on spatial dependence models. Instead, the method uses a Bayes-type formula to modify the estimated posterior probabilities of group membership produced by automated classifiers. The method is found to substantially improve classification accuracy estimates in areas where there is a moderate or greater degree of physiographic variation across the map extent. 相似文献
20.
Epistemic uncertainties always exist in engineering structures due to the lack of knowledge or information, which can be mathematically described by either fuzzy-set theory or evidence theory (ET) In this work, the authors present a novel uncertainty model, namely evidence-based fuzzy model, in which the fuzzy sets and ET are combined to represent the epistemic uncertainty. A novel method for combining multiple membership functions and a corresponding reliability analysis method is also developed. In the combination method, the combined fuzzy-set representations are approximated by the enveloping lines of the multiple membership functions (smoothed by neglecting the valleys in the membership functions curves) and the Murphy’s average combination rule is applied to compute the basic probability assignment for focal elements. Then, the combined membership function is transformed to the equivalent probability density function by means of a normalizing factor. Finally, the Markov Chain Monte Carlo (MCMC) subset simulation method is used to solve reliability by introducing intermediate failure events. A numerical example and two engineering examples are provided to demonstrate the effectiveness of the proposed method. 相似文献