首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
提出一种改进的选择神经网络集成方法,首先构造一批单个神经网络个体,分别利用Bootstrap算法产生若干个训练集并行进行训练;然后采用聚类算法计算训练好的个体网络之间的差异度和个体网络在验证集的预测精度;最后根据个体精度和个体差异度选择合适的个体网络加入集成.实验结果验证,该集成方法能较好地提高集成的预测精度和泛化能力.  相似文献   

2.
周末  金敏 《计算机应用》2017,37(11):3317-3322
为了提高短期电力负荷预测精度,首次提出多算法多模型与在线第二次学习结合的预测方法。首先,利用互信息方法和统计方法对输入变量进行选择;然后,通过Bootstrap方法对数据集进行多样性采样,利用多个不同的人工智能算法和机器学习算法训练得到多个差异化较大的异构预测模型;最后,用每个待预测时刻最近一段时间的实际负荷值、第一次学习生成的多异构预测模型的负荷预测值构成新训练数据集,对新训练数据集进行在线第二次学习,得到最终预测结果。对中国广州市负荷进行预测研究,与最优单模型、单算法多模型和多算法单模型相比,在每日总负荷预测中,全年平均绝对百分误差(MAPE)分别下降了21.07%、7.64%和5.00%,在每日峰值负荷预测中,全年MAPE分别下降了16.02%、7.60%和13.14%。实验结果表明,推荐方法有效地提高了负荷预测精度,有利于智能电网实现节能降耗、调度精细化管理和电网安全预警。  相似文献   

3.
在多示例学习中引入利用未标记示例的机制,能降低训练的成本并提高学习器的泛化能力。当前半监督多示例学习算法大部分是基于对包中的每一个示例进行标记,把多示例学习转化为一个单示例半监督学习问题。考虑到包的类标记由包中示例及包的结构决定,提出一种直接在包层次上进行半监督学习的多示例学习算法。通过定义多示例核,利用所有包(有标记和未标记)计算包层次的图拉普拉斯矩阵,作为优化目标中的光滑性惩罚项。在多示例核所张成的RKHS空间中寻找最优解被归结为确定一个经过未标记数据修改的多示例核函数,它能直接用在经典的核学习方法上。在实验数据集上对算法进行了测试,并和已有的算法进行了比较。实验结果表明,基于半监督多示例核的算法能够使用更少量的训练数据而达到与监督学习算法同样的精度,在有标记数据集相同的情况下利用未标记数据能有效地提高学习器的泛化能力。  相似文献   

4.
设A是一训练集,B是A的一个子集,B是选择A中部分有代表性的示例而生成的。得到了这样一个结论,即对于适当选取的B,由B训练出的决策树其泛化精度优于由A训练出的决策树的泛化精度。进一步,设计实现了一种如何从A中挑选有代表性的示例来生成B的算法,并从数据分布和信息熵理论角度分析了该算法的设计原理。  相似文献   

5.
非平衡数据集的分类问题是机器学习领域的一个研究热点。针对非平衡数据集分类困难的问题,特别是由于非平衡分布引起的少数类识别能力低下的问题,提出了一种改进算法,AdaBoost-SVM-OBMS。该算法结合Boosting算法和基于错分样本产生新样本的过抽样技术。在新算法中,以支持向量机为元分类器,每次Boosting迭代中标记出错分的样本点,然后在错分样本点与其近邻间随机产生一定数量与错分样本同一类别的新样本点。新产生样本点加入原训练集中重新训练学习,以提高分类困难样本的识别能力。在AUC,F-value和G-mean 3个不同价格的评价指标下8个benchmark数据集上对AdaBoost-SVM-OBMS算法与AdaBoost-SVM算法和APLSC算法进行了对比实验,实验结果表明了AdaBoost-SVM-OBMS算法在非平衡数据集分类中的有效性。  相似文献   

6.
多示例学习在区域图像检索中取得较好效果。其一票通过制在人脸鉴别中易导致误判,因五官之一相似,甚至都相似,两幅人脸仍可能不同。为适应特殊场景,提出股权多示例学习概念,某示例类在实验库中有不同股权,训练集特性可近似代表实验库特性;不同类示例的判别结果按示例类股权配比后,形成包的类别归属。其次引入整体特性作为特殊示例进行特征融合,引入整体示例股权阈值控制配比,防止五官类似而整体不同的情况;通过股权阈值选优提升识别率。在ORL和FERET图像集上进行的对比实验表明,该算法分类准确性优于传统算法。  相似文献   

7.
提出了一种改进的支持向量机增量学习算法。分析了新样本加入后,原样本和新样本中哪些样本可能转化为新支持向量。基于分析结论提出了一种改进的学习算法。该算法舍弃了对最终分类无用的样本,并保留了有用的样本。对标准数据集的实验结果表明,该算法在保证分类准确度的同时大大减少了训练时间。  相似文献   

8.
为了进一步分析QoS历史数据的动态变化,对现有的基于QoS历史数据的云服务选择算法进行了改进。将原算法中每一时间段的评价指标权重由QoS历史数据平均值获得,修改为由该时间段对应的QoS历史数据获得,更能发挥历史数据的动态性。使用时间序列预测ARIMA模型对原QoS历史数据进行预测,把预测结果并入原数据集形成新的数据集,在新数据集上进行服务选择。设计了三个模型递进地进行实验分析,通过对比实验结果验证了改进算法的性能效果。  相似文献   

9.
针对特征选择算法——relief在训练个别属性权值时的盲目性缺点,提出了一种基于自适应划分实例集的新算法——Q-relief,该算法改正了原算法属性选择时的盲目性缺点,选择出表达图像信息最优的特征子集来进行模式识别。将该算法应用于列车运行故障动态图像监测系统(TFDS)的故障识别,经实验验证,与其他算法相比,Q-relief算法明显提高了故障图像识别的准确率。  相似文献   

10.
针对大多数现有的标记分布学习算法从全局角度利用标记相关性,忽略了仅存于部分示例范围内的局部标记相关性,同时,算法性能会受到无关和冗余特征干扰的问题,提出一种基于局部标记相关性的标记分布学习算法(LDL-LLC)。通过对训练数据进行分组,将每组训练数据的标记相关性约束在标记输出上,探索和利用局部标记相关性,引入特征选择常用的范数约束,学习标记私有特征和共享特征。在多个真实标记分布数据集上的对比实验结果表明,LDL-LLC算法性能良好。  相似文献   

11.
Abstract

This paper begins with a general theory of error in cross-validation testing of algorithms for supervised learning from examples. It is assumed that the examples are described by attribute-value pairs, where the values are symbolic. Cross-validation requires a set of training examples and a set of testing examples. The value of the attribute that is to be predicted is known to the learner in the training set, but unknown in the testing set. The theory demonstrates that cross-validation error has two components: error on the training set (inaccuracy) and sensitivity to noise (instability). This general theory is then applied to voting in instance-based learning. Given an example in the testing set, a typical instance-based learning algorithm predicts the designated attribute by voting among the k nearest neighbours (the k most similar examples) to the testing example in the training set. Voting is intended to increase the stability (resistance to noise) of instance-based learning, but a theoretical analysis shows that there are circumstances in which voting can be destabilising. The theory suggests ways to minimize cross-validation error, by insuring that voting is stable and does not adversely affect accuracy.  相似文献   

12.
Tri-training: exploiting unlabeled data using three classifiers   总被引:24,自引:0,他引:24  
In many practical data mining applications, such as Web page classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. In this paper, a new co-training style semi-supervised learning algorithm, named tri-training, is proposed. This algorithm generates three classifiers from the original labeled example set. These classifiers are then refined using unlabeled examples in the tri-training process. In detail, in each round of tri-training, an unlabeled example is labeled for a classifier if the other two classifiers agree on the labeling, under certain conditions. Since tri-training neither requires the instance space to be described with sufficient and redundant views nor does it put any constraints on the supervised learning algorithm, its applicability is broader than that of previous co-training style algorithms. Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.  相似文献   

13.
Image collections are currently widely available and are being generated in a fast pace due to mobile and accessible equipment. In principle, that is a good scenario taking into account the design of successful visual pattern recognition systems. However, in particular for classification tasks, one may need to choose which examples are more relevant in order to build a training set that well represents the data, since they often require representative and sufficient observations to be accurate. In this paper we investigated three methods for selecting relevant examples from image collections based on learning models from small portions of the available data. We considered supervised methods that need labels to allow selection, and an unsupervised method that is agnostic to labels. The image datasets studied were described using both handcrafted and deep learning features. A general purpose algorithm is proposed which uses learning methods as subroutines. We show that our relevance selection algorithm outperforms random selection, in particular when using unlabelled data in an unsupervised approach, significantly reducing the size of the training set with little decrease in the test accuracy.  相似文献   

14.
苏本跃  倪钰  盛敏  赵丽丽 《控制与决策》2021,36(12):3031-3038
传统动力下肢假肢运动意图识别算法常使用机器学习算法分类器,在特征选择方面则需要手工提取.针对该问题将深度学习算法应用于运动意图识别研究中,通过在传统的卷积神经网络的基础上进行改进,使算法更适应于基于短时行为样本数据的运动意图识别,同时抑制深度学习算法应用于运动意图识别中的过拟合.在意图识别数据集中进行滑动窗口预处理,目的是对时间序列样本做数据增广,扩增目标数据集能够使训练集更加丰富全面,提高识别的精度,运用改进后的卷积神经网络对增广后的数据集进行特征学习与分类.实验结果表明,该方法在13类运动模式下的识别率达到93%.  相似文献   

15.
PU文本分类(以正例和未标识实例集训练分类器的分类方法)关键在于从U(未标识实例)集中提取尽可能多的可靠反例,然后在正例与可靠反例的基础上使用机器学习的方法构造有效分类器,而已有的方法可靠反例的数量少或不可靠,同样构造的分类器也精度不高,基于SVM主动学习技术的PU文本分类算法提出一种利用SVM与改进的Rocchio分类器进行主动学习的PU文本分类方法,并通过spy技术来提高SVM分类器的准确度,解决某些机器学习中训练样本获取代价过大,尤其是反例样本较难获取的实际问题。实验表明,该方法比目前其它的主动学习方法及面向PU的文本分类方法具有更高的准确率和召回率。  相似文献   

16.
Classification in imbalanced domains is a recent challenge in data mining. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest from the point of view of the learning task. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. In this paper, we propose an under-sampling procedure guided by evolutionary algorithms to perform a training set selection for enhancing the decision trees obtained by the C4.5 algorithm and the rule sets obtained by PART rule induction algorithm. The proposal has been compared with other under-sampling and over-sampling techniques and the results indicate that the new approach is very competitive in terms of accuracy when comparing with over-sampling and it outperforms standard under-sampling. Moreover, the obtained models are smaller in terms of number of leaves or rules generated and they can considered more interpretable. The results have been contrasted through non-parametric statistical tests over multiple data sets.  相似文献   

17.
Most machine learning tasks in data classification and information retrieval require manually labeled data examples in the training stage. The goal of active learning is to select the most informative examples for manual labeling in these learning tasks. Most of the previous studies in active learning have focused on selecting a single unlabeled example in each iteration. This could be inefficient, since the classification model has to be retrained for every acquired labeled example. It is also inappropriate for the setup of information retrieval tasks where the user's relevance feedback is often provided for the top K retrieved items. In this paper, we present a framework for batch mode active learning, which selects a number of informative examples for manual labeling in each iteration. The key feature of batch mode active learning is to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we employ the Fisher information matrix as the measurement of model uncertainty, and choose the set of unlabeled examples that can efficiently reduce the Fisher information of the classification model. We apply our batch mode active learning framework to both text categorization and image retrieval. Promising results show that our algorithms are significantly more effective than the active learning approaches that select unlabeled examples based only on their informativeness for the classification model.  相似文献   

18.
在多示例学习框架下,训练数据集由若干个包组成,包内含有多个用属性-值对形式表示的示例,系统对包内的多个示例进行学习。传统的基于多示例学习的局部离群点检测算法将多示例学习框架运用到数据集上,将多示例问题转化为单示例问题进行处理。但在示例包的转换过程中采用示例内部的特征长度所占比作为权重机制,并没有考察对结果影响较大的示例,分析原因或者动态调整其权重,从而对离群点检测的效果造成影响。针对这一问题,为了充分适应数据内部的分布特征,提出了一种基于多示例学习的局部离群点改进算法FWMIL-LOF。算法采用MIL(Multi-Instance Learning)框架,在示例包的转换过程中引入描述数据重要度的权重函数,通过定义惩罚策略对权重函数做相应调整,从而确定了不同特征属性的示例在所属包中的权重。在实际企业的实时采集监控系统中,通过仿真分析,并与其他经典局部离群点检测算法进行对比,验证了改进算法在离群点检测效果方面的提高。  相似文献   

19.
We present a novel algorithm using new hypothesis representations for learning context-free grammars from a finite set of positive and negative examples. We propose an efficient hypothesis representation method which consists of a table-like data structure similar to the parse table used in efficient parsing algorithms for context-free grammars such as Cocke-Younger-Kasami algorithm. By employing this representation method, the problem of learning context-free grammars from examples can be reduced to the problem of partitioning the set of nonterminals. We use genetic algorithms for solving this partitioning problem. Further, we incorporate partially structured examples to improve the efficiency of our learning algorithm, where a structured example is represented by a string with some parentheses inserted to indicate the shape of the derivation tree of the unknown grammar. We demonstrate some experimental results using these algorithms and theoretically analyse the completeness of the search space using the tabular method for context-free grammars.  相似文献   

20.
Existing classification algorithms use a set of training examples to select classification features, which are then used for all future applications of the classifier. A major problem with this approach is the selection of a training set: a small set will result in reduced performance, and a large set will require extensive training. In addition, class appearance may change over time requiring an adaptive classification system. In this paper, we propose a solution to these basic problems by developing an on-line feature selection method, which continuously modifies and improves the features used for classification based on the examples provided so far. The method is used for learning a new class, and to continuously improve classification performance as new data becomes available. In ongoing learning, examples are continuously presented to the system, and new features arise from these examples. The method continuously measures the value of the selected features using mutual information, and uses these values to efficiently update the set of selected features when new training information becomes available. The problem is challenging because at each stage the training process uses a small subset of the training data. Surprisingly, with sufficient training data the on-line process reaches the same performance as a scheme that has a complete access to the entire training data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号