首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
For detecting malicious bidding activities in e‐auctions, this study develops a chunk‐based incremental learning framework that can operate in real‐world auction settings. The self‐adaptive framework first classifies incoming bidder chunks to counter fraud in each auction and take necessary actions. The fraud classifier is then adjusted with confident bidders' labels validated via bidder verification and one‐class classification. Based on real fraud data produced from commercial auctions, we conduct an extensive experimental study wherein the classifier is adapted incrementally using only relevant bidding data while evaluating the subsequent adjusted models' detection and misclassification rates. We also compare our classifier with static learning and learning without data relevancy.  相似文献   

2.
Incremental learning has been developed for supervised classification, where knowledge is accumulated incrementally and represented in the learning process. However, labeling sufficient samples in each data chunk is of high cost, and incremental technologies are seldom discussed in the semi-supervised paradigm. In this paper we advance an Incremental Semi-Supervised classification approach via Self-Representative Selection (IS3RS) for data streams classification, by exploring both the labeled and unlabeled dynamic samples. An incremental self-representative data selection strategy is proposed to find the most representative exemplars from the sequential data chunk. These exemplars are incrementally labeled to expand the training set, and accumulate knowledge over time to benefit future prediction. Extensive experimental evaluations on some benchmarks have demonstrated the effectiveness of the proposed framework.  相似文献   

3.
一种基于类支持度的增量贝叶斯学习算法   总被引:1,自引:0,他引:1       下载免费PDF全文
丁厉华  张小刚 《计算机工程》2008,34(22):218-219
介绍增量贝叶斯分类器的原理,提出一种基于类支持度的优化增量贝叶斯分类器学习算法。在增量学习过程的样本选择问题上,算法引入一个类支持度因子λ,根据λ的大小逐次从测试样本集中选择样本加入分类器。实验表明,在训练数据集较小的情况下,该算法比原增量贝叶斯分类算法具有更高的精度,能大幅度减少增量学习样本优选的计算时间。  相似文献   

4.
In this paper, we investigate a comprehensive learning algorithm for text classification without pre-labeled training set based on incremental learning. In order to overcome the high cost in getting labeled training examples, this approach reforms fuzzy partition clustering to obtain a small quantity of labeled training data. Then the incremental learning of Bayesian classifier is applied. The model of the proposed classifier is composed of a Naïve-Bayes-based incremental learning algorithm and a modified fuzzy partition clustering method. For improved efficiency, a feature reduction is designed based on the Quadratic Entropy in Mutual Information. We perform experiments to demonstrate the performance of the approach, and the results show that our approach is feasible and effective.  相似文献   

5.
Incremental learning has been used extensively for data stream classification. Most attention on the data stream classification paid on non-evolutionary methods. In this paper, we introduce new incremental learning algorithms based on harmony search. We first propose a new classification algorithm for the classification of batch data called harmony-based classifier and then give its incremental version for classification of data streams called incremental harmony-based classifier. Finally, we improve it to reduce its computational overhead in absence of drifts and increase its robustness in presence of noise. This improved version is called improved incremental harmony-based classifier. The proposed methods are evaluated on some real world and synthetic data sets. Experimental results show that the proposed batch classifier outperforms some batch classifiers and also the proposed incremental methods can effectively address the issues usually encountered in the data stream environments. Improved incremental harmony-based classifier has significantly better speed and accuracy on capturing concept drifts than the non-incremental harmony based method and its accuracy is comparable to non-evolutionary algorithms. The experimental results also show the robustness of improved incremental harmony-based classifier.  相似文献   

6.
Co-training机器学习方法在中文组块识别中的应用   总被引:6,自引:0,他引:6  
采用半指导机器学习方法co2training 实现中文组块识别。首先明确了中文组块的定义,co-training 算法的形式化定义。文中提出了基于一致性的co-training 选取方法将增益的隐马尔可夫模型(Transductive HMM) 和基于转换规则的分类器(fnTBL) 组合成一个分类体系,并与自我训练方法进行了比较,在小规模汉语树库语料和大规模未带标汉语语料上进行中文组块识别,实验结果要比单纯使用小规模的树库语料有所提高,F 值分别达到了85134 %和83141 % ,分别提高了2113 %和7121 %。  相似文献   

7.
Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multiagent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an "integration" operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed.  相似文献   

8.
针对标准支持向量机在P2P网络流量识别中不支持增量学习的问题.提出一种适于P2P网络流量识别的SVM快速增量学习方法。在对违背Karush—Kuhn—Tucker条件的新增正负样本集分别进行聚类分析基础上,运用聚类簇中心对支持向量机训练生成一个接近增量学习最优分类超平面的过渡超平面.并以此超平面为基准确定初始训练样本集上非支持向量和支持向量的互相转化.进而生成新的样本集实现SVM增量学习。理论分析和实验结果表明。该方法能有效简化增量学习的训练样本集.在不降低P2P网络流量识别精度的前提下.明显缩短SVM的增量学习时间和识别时间。  相似文献   

9.
针对传统的网络安全态势要素获取模型中,当样本分布不平衡时,占比很少的样本(统称小样本)不能被有效检测,准确识别到每一类攻击样本成为研究热点之一。利用深度学习提出了一种面向样本不平衡的要素获取模型,利用卷积神经网络作为基分类器提取网络数据的深层特征,其次使用GAN生成对抗网络扩充小样本的方法,解决样本分布不均衡问题。在扩充后的平衡数据集上采用迁移学习,加快基分类器到适应于小样本的新分类的训练时间。在NSL-KDD数据集上的实验表明,经过生成对抗网络扩充后的数据集,结合迁移学习有效加快了模型训练收敛速度,并有效提高网络安全态势要素获取的分类精度。  相似文献   

10.
基于增量式贝叶斯模型的中文问句分类研究   总被引:1,自引:0,他引:1  
固定训练集生成的分类器性能不理想且不能跟踪用户需求,为此,提出一种将增量式贝叶斯思想用于问句分类的方法。采用遗传算法选取最优特征子集优化分类器,从而避免训练集特征过分冗余,使分类器在学习过程中动态地扩大训练集并修改分类器参数。在对问句进行分类时,提取问句的疑问词、句法结构、疑问意向词和疑问意向词在知网的首项义原作为分类特征。为了验证增量式贝叶斯方法的有效性,从语料库中随机抽取不同规模的问句构成增量集,基于不同的增量集对同一测试集中的问句进行分类。实验结果表明,增量式贝叶斯分类器较朴素贝叶斯分类器有更高的分类精度,大类和小类的准确率分别达到90.2%和76.3%,在提高准确率的同时优化了运行效率。  相似文献   

11.
In recent years, the use of multi-view data has attracted much attention resulting in many multi-view batch learning algorithms. However, these algorithms prove expensive in terms of training time and memory when used on the incremental data. In this paper, we propose Multi-view Incremental Discriminant Analysis (MvIDA), which updates the trained model to incorporate new data samples. MvIDA requires only the old model and newly added data to update the model. Depending on the nature of the increments, MvIDA is presented as two cases, sequential MvIDA and chunk MvIDA. We have compared the proposed method against the batch Multi-view Discriminant Analysis (MvDA) for its discriminability, order independence, the effect of the number of views, training time, and memory requirements. We have also compared our method with single-view Incremental Linear Discriminant Analysis (ILDA) for accuracy and training time. The experiments are conducted on four datasets with a wide range of dimensions per view. The results show that through order independence and faster construction of the optimal discriminant subspace, MvIDA addresses the issues faced by the batch multi-view algorithms in the incremental setting.  相似文献   

12.

In this paper, we propose the problem of online cost-sensitive classifier adaptation and the first algorithm to solve it. We assume that we have a base classifier for a cost-sensitive classification problem, but it is trained with respect to a cost setting different to the desired one. Moreover, we also have some training data samples streaming to the algorithm one by one. The problem is to adapt the given base classifier to the desired cost setting using the steaming training samples online. To solve this problem, we propose to learn a new classifier by adding an adaptation function to the base classifier, and update the adaptation function parameter according to the streaming data samples. Given an input data sample and the cost of misclassifying it, we update the adaptation function parameter by minimizing cost-weighted hinge loss and respecting previous learned parameter simultaneously. The proposed algorithm is compared to both online and off-line cost-sensitive algorithms on two cost-sensitive classification problems, and the experiments show that it not only outperforms them on classification performances, but also requires significantly less running time.

  相似文献   

13.
贝叶斯在训练样本不完备的情况下,对未知类别新增训练集进行增量学习时,会将分类错误的训练样本过早地加入到分类器中而降低其性能,另外增量学习采用固定的置信度评估参数会使其效率低下,泛化性能不稳定.为解决上述问题,提出一种动态置信度的序列选择增量学习方法.首先,在现有的分类器基础上选出分类正确的文本组成新增训练子集.其次,利用置信度动态监控分类器性能来对新增训练子集进行批量实例选择.最后,通过选择合理的学习序列来强化完备数据的积极影响,弱化噪声数据的消极影响,并实现对测试文本的分类.实验结果表明,本文提出的方法在有效提高分类精度的同时也能明显改善增量学习效率.  相似文献   

14.
肖建鹏  张来顺  任星 《计算机应用》2008,28(7):1642-1644
针对直推式支持向量机在进行大数据量分类时出现精度低、学习速度慢和回溯式学习多的问题,提出了一种基于增量学习的直推式支持向量机分类算法,将增量学习引入直推式支持向量机,使其在训练过程中仅保留有用样本而抛弃无用样本,从而减少学习时间,提高分类速度。实验结果表明,该算法具有较快的分类速度和较高的分类精度。  相似文献   

15.
This paper proves the problem of losing incremental samples’ information of the present SVM incremental learning algorithm from both theoretic and experimental aspects, and proposes a new incremental learning algorithm with support vector machine based on hyperplane-distance. According to the geometric character of support vector, the algorithm uses Hyperplane-Distance to extract the samples, selects samples which are most likely to become support vector to form the vector set of edge, and conducts the support vector machine training on the vector set. This method reduces the number of training samples and effectively improves training speed of incremental learning. The results of experiment performed on Chinese webpage classification show that this algorithm can reduce the number of training samples effectively and accumulate historical information. The HD-SVM algorithm has higher training speed and better precision of classification.  相似文献   

16.
Incremental training has been used for genetic algorithm (GA)‐based classifiers in a dynamic environment where training samples or new attributes/classes become available over time. In this article, ordered incremental genetic algorithms (OIGAs) are proposed to address the incremental training of input attributes for classifiers. Rather than learning input attributes in batch as with normal GAs, OIGAs learn input attributes one after another. The resulting classification rule sets are also evolved incrementally to accommodate the new attributes. Furthermore, attributes are arranged in different orders by evaluating their individual discriminating ability. By experimenting with different attribute orders, different approaches of OIGAs are evaluated using four benchmark classification data sets. Their performance is also compared with normal GAs. The simulation results show that OIGAs can achieve generally better performance than normal GAs. The order of attributes does have an effect on the final classifier performance where OIGA training with a descending order of attributes performs the best. © 2004 Wiley Periodicals, Inc. Int J Int Syst 19: 1239–1256, 2004.  相似文献   

17.
莫建文  陈瑶嘉 《控制与决策》2021,36(10):2475-2482
针对神经网络模型进行类增量训练时产生的灾难性遗忘问题,提出一种基于分类特征约束变分伪样本生成器的类增量学习方法.首先,通过构造伪样本生成器记忆旧类样本来训练新的分类器及新的伪样本生成器.伪样本生成器以变分自编码器为基础,用分类特征进行约束,使生成的样本更好地保留旧类在分类器上的性能.然后,用旧分类器的输出作为伪样本的精馏标签,进一步保留从旧类获得的知识.最后,为了平衡旧类样本的生成数量,采用基于分类器分数的伪样本选择,在保持每个旧类伪样本数量平衡的前提下选择一些更具代表性的旧类伪样本.在MNIST、FASHION、E-MNIST和SVHN数据集上的实验结果表明,所提出的方法能有效减少灾难性遗忘的影响,提高图像的分类精度.  相似文献   

18.
针对经典支持向量机在增量学习中的不足,提出一种基于云模型的最接近支持向量机增量学习算法。该方法利用最接近支持向量机的快速学习能力生成初始分类超平面,并与k近邻法对全部训练集进行约简,在得到的较小规模的精简集上构建云模型分类器直接进行分类判断。该算法模型简单,不需迭代求解,时间复杂度较小,有较好的抗噪性,能较好地体现新增样本的分布规律。仿真实验表明,本算法能够保持较好的分类精度和推广能力,运算速度较快。  相似文献   

19.
基于增量学习支持向量机的音频例子识别与检索   总被引:5,自引:0,他引:5  
音频例子识别与检索的主要任务是构造一个良好的分类学习机,而在构造过程中,从含有冗余样本的训练库中选择最佳训练例子、节省学习机的训练时间是构造分类机面临的一个挑战,尤其是对含有大样本训练库音频例子的识别.由于支持向量是支持向量机中的关键例子,提出了增量学习支持向量机训练算法.在这个算法中,训练样本被分成训练子库按批次进行训练,每次训练中,只保留支持向量,去除非支持向量.与普通和减量支持向量机对比的实验表明,算法在显著减少训练时间前提下,取得了良好的识别检索正确率.  相似文献   

20.
杨海涛  肖军  王佩瑶  王威 《信息与控制》2016,45(4):432-436,443
针对处理大量时间序列数据或数据流时,参数间隔孪生支持向量机(TPMSVM)分类训练速度依然较慢的问题.本文证明了样本满足TPMSVM的KKT条件所对应的数值条件,并根据结论提出一种适用于TPMSVM的增量学习算法用于处理时间序列数据.该算法选取新增数据中违背广义KKT条件和部分满足条件的原始数据,参加分类器训练.实验证明:本文提出的增量算法在保持一定分类精度的同时提高了TPMSVM的训练速度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号