首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 469 毫秒
1.
Di Wang  Peng Zhang 《Pattern recognition》2010,43(10):3468-3482
Support vector machine (SVM) is a widely used classification technique. However, it is difficult to use SVMs to deal with very large data sets efficiently. Although decomposed SVMs (DSVMs) and core vector machines (CVMs) have been proposed to overcome this difficulty, they cannot be applied to online classification (or classification with learning ability) because, when new coming samples are misclassified, the classifier has to be adjusted based on the new coming misclassified samples and all the training samples. The purpose of this paper is to address this issue by proposing an online CVM classifier with adaptive minimum-enclosing-ball (MEB) adjustment, called online CVMs (OCVMs). The OCVM algorithm has two features: (1) many training samples are permanently deleted during the training process, which would not influence the final trained classifier; (2) with a limited number of selected samples obtained in the training step, the adjustment of the classifier can be made online based on new coming misclassified samples. Experiments on both synthetic and real-world data have shown the validity and effectiveness of the OCVM algorithm.  相似文献   

2.
Classification is a key problem in machine learning/data mining. Algorithms for classification have the ability to predict the class of a new instance after having been trained on data representing past experience in classifying instances. However, the presence of a large number of features in training data can hurt the classification capacity of a machine learning algorithm. The Feature Selection problem involves discovering a subset of features such that a classifier built only with this subset would attain predictive accuracy no worse than a classifier built from the entire set of features. Several algorithms have been proposed to solve this problem. In this paper we discuss how parallelism can be used to improve the performance of feature selection algorithms. In particular, we present, discuss and evaluate a coarse-grained parallel version of the feature selection algorithm FortalFS. This algorithm performs well compared with other solutions and it has certain characteristics that makes it a good candidate for parallelization. Our parallel design is based on the master--slave design pattern. Promising results show that this approach is able to achieve near optimum speedups in the context of Amdahl's Law.  相似文献   

3.
基于增量学习支持向量机的音频例子识别与检索   总被引:5,自引:0,他引:5  
音频例子识别与检索的主要任务是构造一个良好的分类学习机,而在构造过程中,从含有冗余样本的训练库中选择最佳训练例子、节省学习机的训练时间是构造分类机面临的一个挑战,尤其是对含有大样本训练库音频例子的识别.由于支持向量是支持向量机中的关键例子,提出了增量学习支持向量机训练算法.在这个算法中,训练样本被分成训练子库按批次进行训练,每次训练中,只保留支持向量,去除非支持向量.与普通和减量支持向量机对比的实验表明,算法在显著减少训练时间前提下,取得了良好的识别检索正确率.  相似文献   

4.
Most of the existing classification methods, used for voice pathology assessment, are built based on labeled pathological and normal voice signals. This paper studies the problem of building a classifier using labeled and unlabeled data. We propose a novel learning technique, called Partitioning and Biased Support Vector Machine Classification (PBSVM), which tries to utilize all the available data in two steps: (1) a new heuristically partition-based algorithm, which extracts high quality pathological and normal samples from an unlabeled set, and (2) a more principle approach based on biased formulation of support vector machine, which is fairly robust to mislabeling and unbalance data problem. Experiments with wavelet-based energy features extracted from sustained vowels show that the new recognition scheme is highly feasible and significantly outperform the baseline classical SVM classifier, especially in the situation where the labeled training data is small.  相似文献   

5.
MILES: multiple-instance learning via embedded instance selection   总被引:4,自引:0,他引:4  
Multiple-instance problems arise from the situations where training class labels are attached to sets of samples (named bags), instead of individual samples within each bag (called instances). Most previous multiple-instance learning (MIL) algorithms are developed based on the assumption that a bag is positive if and only if at least one of its instances is positive. Although the assumption works well in a drug activity prediction problem, it is rather restrictive for other applications, especially those in the computer vision area. We propose a learning method, MILES (multiple-instance learning via embedded instance selection), which converts the multiple-instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels. MILES maps each bag into a feature space defined by the instances in the training bags via an instance similarity measure. This feature mapping often provides a large number of redundant or irrelevant features. Hence, 1-norm SVM is applied to select important features as well as construct classifiers simultaneously. We have performed extensive experiments. In comparison with other methods, MILES demonstrates competitive classification accuracy, high computation efficiency, and robustness to labeling uncertainty  相似文献   

6.

In this paper, we explore the adaption of techniques previously used in the domains of adversarial machine learning and differential privacy to mitigate the ML-powered analysis of streaming traffic. Our findings are twofold. First, constructing adversarial samples effectively confounds an adversary with a predetermined classifier but is less effective when the adversary can adapt to the defense by using alternative classifiers or training the classifier with adversarial samples. Second, differential-privacy guarantees are very effective against such statistical-inference-based traffic analysis, while remaining agnostic to the machine learning classifiers used by the adversary. We propose three mechanisms for enforcing differential privacy for encrypted streaming traffic and evaluate their security and utility. Our empirical implementation and evaluation suggest that the proposed statistical privacy approaches are promising solutions in the underlying scenarios

  相似文献   

7.
8.
基于SVM的离线图像目标分类算法   总被引:1,自引:0,他引:1  
目标分类是计算机视觉与模式识别领域的关键环节. SVM(支持向量机)是在统计学习理论基础上提出的一种新的机器学习方法.提出一种支持向量机结合梯度直方图特征的离线图像目标分类算法.首先对训练集进行预处理,然后对处理后的图片进行梯度直方图特征提取,最后通过训练得到可以检测图像目标的分类器.利用得到的分类器对测试图片进行测试,测试结果表明,对目标分类检测有良好的效果.  相似文献   

9.
Joint optimization strategies across various layers of the protocol stack have recently been proposed for improving the performance of real-time video transmission over wireless networks. In this paper, we propose a new, low complexity system for determining the optimal cross-layer strategies for wireless multimedia transmission based on classification and machine learning techniques. We first determine offline the optimal cross-layer strategy for various video sequences and channel conditions (training data). Subsequently, we extract relevant and easy to compute content features, encoder-specific parameters, and channel resources from the training data, and train a statistical classifier based on these optimal results. At run-time, we predict using the classifier the optimal cross-layer compression and transmission strategy using these simple, on-the-fly computed features. Hence, we consider the complex problem of finding the optimal cross-layer strategy during the training phase only, and rely at transmission-time on low-complexity classification techniques. We illustrate the proposed classification-based system by performing MAC-application layer optimizations for video transmission over 802.11a wireless LANs. Specifically, we predict the optimal MAC retry limits for the various video packets and compare our results against both optimal and conventionally used ad-hoc cross-layer solutions. Our results indicate that considerable improvements can be obtained through the proposed cross-layer techniques relying on classification as opposed to optimized ad-hoc solutions. The improvements are especially important at high packet-loss rates (5% and higher), where deploying a judicious mixture of strategies at the various layers becomes essential. Furthermore, our proposed classification-based system can be easily modified to include other layers from the OSI stack during the cross-layer optimization.  相似文献   

10.
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.  相似文献   

11.
This paper addresses the classification problem for applications with extensive amounts of data and a large number of features. The learning system developed utilizes a hierarchical multiple classifier scheme and is flexible, efficient, highly accurate and of low cost. The system has several novel features: (1) It uses a graph-theoretic clustering algorithm to group the training data into possibly overlapping cluster, each representing a dense region in the data space; (2) component classifiers trained on these dense regions are specialists whose probabilistic outputs are gated inputs to a super-classifier. Only those classifiers whose training clusters are most related to an unknown data instance send their outputs to the super-classifier; and (3) sub-class labelling is used to improve the classification of super-classes. The learning system achieves the goals of reducing the training cost and increasing the prediction accuracy compared to other multiple classifier algorithms. The system was tested on three large sets of data, two from the medical diagnosis domain and one from a forest cover classification problem. The results are superior to those obtained by several other learning algorithms.  相似文献   

12.
Isotonic separation is a supervised machine learning technique where classification is represented as a linear programming problem (LPP) with an objective of minimizing the number of misclassifications. It is computationally expensive to solve the LPP using traditional methods when the dataset grows. Evolutionary isotonic separation (EIS), a hybrid classification algorithm, is introduced to tackle this issue. Here, isotonic separation acts as a host architecture where evolutionary framework based on genetic algorithm is embedded in the training phase of the isotonic separation, to find an optimum or near-optimum solution for the LPP. Evolutionary framework deploys a newly introduced slack vector to find the feasible solution. It also employs a position-based crossover operator to obtain the optimum or near-optimum solution. Experimental studies are conducted on Wisconsin Breast Cancer dataset and a synthetic dataset. Experimental and statistical results show that EIS outperforms its predecessors and state of the art machine learning techniques in terms of accuracy.  相似文献   

13.
传统的机器学习方法是在训练数据和测试数据分布一致的前提下进行的。然而,在一些现实世界中的应用,训练数据和测试数据来自不同的领域。在不考虑数据分布的情况下,传统的机器学习算法可能会失效,针对这一问题,提出一种基于模糊C均值(FCM)的文本迁移学习算法。首先,通过简单分类器对测试样本分类,接着,利用自然邻算法构建样本初始模糊隶属度;然后,利用FCM算法通过迭代更新样本模糊隶属度,修正样本标签;最后,对样本孤立点进行处理,得到最终分类结果。实验结果表明,该算法具有较好的正确率,有效的解决了在训练数据和测试数据分布不一致的情况下的文本分类问题。  相似文献   

14.
We consider the problem of classification in environments where training and test data may come from different probability distributions. When the fundamental stationary distribution assumption made in supervised learning (and often not satisfied in practice) does not hold, the classifier performance may significantly deteriorate. Several proposals have been made to deal with classification problems where the class priors change after training, but they may fail when the class conditional data densities also change. To cope with this problem, we propose an algorithm that uses unlabeled test data to adapt the classifier outputs to new operating conditions, without re-training it. The algorithm is based on a posterior probability model with two main assumptions: (1) the classes may be decomposed in several (unknown) subclasses, and (2) all changes in data distributions arise from changes in prior subclass probabilities. Experimental results with a neural network model on synthetic and remote sensing practical settings show that the adaptation at the subclass level can get a better adjustment to the new operational conditions than the methods based on class prior changes.  相似文献   

15.
《Information Fusion》2007,8(3):252-265
This work developed and demonstrated a machine learning approach for robust ATR. The primary innovation of this work was the development of an automated way of developing inference rules that can draw on multiple models and multiple feature types to make robust ATR decisions. The key realization is that this “meta learning” problem is one of structural learning, and that it can be conducted independently of parameter learning associated with each model and feature based technique. This was accomplished by using a learning classifier system, which is based on genetics-based machine learning, for the ill conditioned combinatorial problem of structural rule learning, while using statistical and mathematical techniques for parameter learning.This system was tested on MSTAR Public Release SAR data using standard and extended operation conditions. These results were also compared against two baseline classifiers, a PCA based distance classifier and a MSE classifier. The classifiers were evaluated for accuracy (via training set classification) and robustness (via testing set classification). In both cases, the LCS based robust ATR system performed well with accuracy over 99% and robustness over 80%.  相似文献   

16.
Real‐world datasets often contain large numbers of unlabeled data points, because there is additional cost for obtaining the labels. Semi‐supervised learning (SSL) algorithms use both labeled and unlabeled data points for training that can result in higher classification accuracy on these datasets. Generally, traditional SSLs tentatively label the unlabeled data points on the basis of the smoothness assumption that neighboring points should have the same label. When this assumption is violated, unlabeled points are mislabeled injecting noise into the final classifier. An alternative SSL approach is cluster‐then‐label (CTL), which partitions all the data points (labeled and unlabeled) into clusters and creates a classifier by using those clusters. CTL is based on the less restrictive cluster assumption that data points in the same cluster should have the same label. As shown, this allows CTLs to achieve higher classification accuracy on many datasets where the cluster assumption holds for the CTLs, but smoothness does not hold for the traditional SSLs. However, cluster configuration problems (e.g., irrelevant features, insufficient clusters, and incorrectly shaped clusters) could violate the cluster assumption. We propose a new framework for CTLs by using a genetic algorithm (GA) to evolve classifiers without the cluster configuration problems (e.g., the GA removes irrelevant attributes, updates number of clusters, and changes the shape of the clusters). We demonstrate that a CTL based on this framework achieves comparable or higher accuracy with both traditional SSLs and CTLs on 12 University of California, Irvine machine learning datasets.  相似文献   

17.
针对许多多示例算法都对正包中的示例情况做出假设的问题,提出了结合模糊聚类的多示例集成算法(ISFC).结合模糊聚类和多示例学习中负包的特点,提出了"正得分"的概念,用于衡量示例标签为正的可能性,降低了多示例学习中示例标签的歧义性;考虑到多示例学习中将负示例分类错误的代价更大,设计了一种包的代表示例选择策略,选出的代表示...  相似文献   

18.
在不平衡数据分类问题中,作为目标对象的少数类往往不易识别.常见方法存在需要显式设置实例重要度、仅仅间接支持少数类的识别等缺点.由此,文中提出基于实例重要性的支持向量机--ⅡSVM.它分为3个阶段.前两个阶段分别采用单类支持向昔机和二元支持向量机,将数据按照"最重要"、"较重要",和"不重要"3个档次重新组织.阶段3首先选择最重要的数据训练初始分类器,并通过显式设置早停止条件,直接支持少数类的识别.实验表明,ⅡSVM的平均分类性能优于目前的主流方法.  相似文献   

19.
多示例多标记学习(Multi-Instance Multi-Label,MIML)是一种新的机器学习框架,基于该框架上的样本由多个示例组成并且与多个类别相关联,该框架因其对多义性对象具有出色的表达能力,已成为机器学习界研究的热点.解决MIML分类问题的最直接的思路是采用退化策略,通过向多示例学习或多标记学习的退化,将MIML框架下的分类问题简化为一系列的二类分类问题进行求解.但是在退化过程中会丢失标记之间的关联信息,降低分类的准确率.针对此问题,本文提出了MIMLSVM-LOC算法,该算法将改进的MIMLSVM算法与一种局部标记相关性的方法ML-LOC相结合,在训练过程中结合标记之间的关联信息进行分类.算法首先对MIMLSVM算法中的K-medoids聚类算法进行改进,采用的混合Hausdorff距离,将每一个示例包转化为一个示例,将MIML问题进行了退化.然后采用单示例多标记的算法ML-LOC算法继续以后的分类工作.在实验中,通过与其他多示例多标记算法对比,得出本文提出的算法取得了比其他分类算法更优的分类效果.  相似文献   

20.
The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号