首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
k-近邻方法基于单一k值预测,无法兼顾不同实例可能存在的特征差异,总体预测精度难以保证.针对该问题,提出了一种基于Bagging的组合k-NN预测模型,并在此基础上实现了具有属性选择的Bgk-NN预测方法.该方法通过训练建立个性化预测模型集合,各模型独立生成未知实例预测值,并以各预测值的中位数作为组合预测结果.Bgk-NN预测可适用于包含离散值属性及连续值属性的各种类型数据集.标准数据集上的实验表明,Bgk-NN预测精度较之传统k-NN方法有了明显提高.  相似文献   

2.

k-近邻方法基于单一k值预测,无法兼顾不同实例可能存在的特征差异,总体预测精度难以保证.针对该问题,提出了一种基于Bagging的组合k-NN预测模型,并在此基础上实现了具有属性选择的Bgk-NN预测方法.该方法通过训练建立个性化预测模型集合,各模型独立生成未知实例预测值,并以各预测值的中位数作为组合预测结果.Bgk-NN预测可适用于包含离散值属性及连续值属性的各种类型数据集.标准数据集上的实验表明,Bgk-NN预测精度较之传统k-NN方法有了明显提高.

  相似文献   

3.
针对软件可靠性受到多种不确定因素影响,且因素间具有多重共线性,单-预测模型无法全面准确描述其变化规律,导致软件可靠性预测精度不高.为了提高软件可靠性预测的精度,提出一种基于熵值法的软件可靠性组合预测模型.首先采用主成分分析消除软件可靠性度量属性间多重共线性,加快学习速度,然后分别采用AR模型和RBF神经网络对软件可靠性进行预测,采用嫡值法确定两种模型的权重,从而得到组合预测模型的软件可靠性预测值.用NASA的软件度量数据进行模型预测,结果表明,仿真预测模型明显提高了软件可靠性预测精度,说明组合预测方法对软件可靠性预测是可行的.  相似文献   

4.
杨杰  燕雪峰  张德平 《计算机科学》2017,44(8):176-180, 206
Boosting重抽样是常用的扩充小样本数据集的方法,首先针对抽样过程中存在的维数灾难现象,提出随机属性子集选择方法以进行降维处理;进而针对软件缺陷预测对于漏报与误报的惩罚因子不同的特点,在属性选择过程中添加代价敏感算法。以多个基本k-NN预测器为弱学习器,以代价最小为属性删除原则,得到当前抽样集的k值与属性子集的预测器集合,采用代价敏感的权重更新机制对抽样过程中的不同数据实例赋予相应权值,由所有预测器集合构成自适应的集成k-NN强学习器并建立软件缺陷预测模型。基于NASA数据集的实验结果表明,在小样本情况下,基于Boosting的代价敏感软件缺陷预测方法预测的漏报率有较大程度降低,误报率有一定程度增加,整体性能优于原来的Boosting集成预测方法。  相似文献   

5.
软件缺陷预测是改善软件开发质量,提高测试效率的重要途径.文中提出一种基于软件度量元的集成k-NN软件缺陷预测方法.首先,该方法在不同的Bootstrap抽样数据集上迭代训练生成一个基本k-NN预测器集合.然后,这些基本预测器分别对软件模块进行独立预测,各基本预测值将被融合生成最终的预测结果.为判别新的软件模块是否为缺陷模块,设计分类阈值的自适应学习方法.集成预测结果大于该阈值的模块将被识别为缺陷模块,反之则为正常模块.NASAMDP及PROMISEAR标准软件缺陷数据集上的实验结果表明集成k-NN缺陷预测的性能较之广泛采用的对比缺陷预测方法有较明显的提高,同时也证明软件度量元在缺陷预测中的有效性.  相似文献   

6.
结合运用组合函数法,尤其是对数函数和幂函数在进行原始数据列转换后能提高数据列的光滑度和预测精度的优点,提出了基于组合函数法的灰色预测模型,利用此模型实例预测某油田的年产量,并利用相对误差法将预测值与实际值进行精度检验,取得了较好的结果。  相似文献   

7.
及时、准确预测人体血压变化从而预防人体血压不稳定导致的病情加重的情况发生显得越来越重要.对此本文提出一种基于小波分析与BP神经网络组合的人体血压预测模型,该模型利用小波分解重构法对非平稳的人体血压序列进行分解重构计算,分离出原始序列中的高频细节分量和低频趋势分量,再利用BP神经网络预测算法对各层分量建立预测模型,最后将两种模型的预测值进行叠加,得到原始血压序列的预测值.研究表明,该组合预测模型的预测精度明显高于传统BP神经网络预测模型的预测精度,为人体血压预测提供了一种有效可靠的组合预测方法.  相似文献   

8.
提出一种通用的时间序列数据流预测方法,算法首先通过经验模式分解方法将从链式重写窗口取得的数据集分解有限具有特征振荡周期的固有模态函数分量和一个代表原始序列平均趋势的余量;然后对于各个分量分别建立最大Lyapunov指数预测模型进行预测;最后将各分量的预测值组合获得最终预测值。通过电力负荷的预测实验表明,与单一的时间序列数据流预测模型相比,该模型具有较高的预测精度和很好的模型适应性。  相似文献   

9.
康琪  林军 《微型机与应用》2013,32(16):93-96
在对城市燃气负荷数据特性进行分析的基础上,提出了针对城市燃气负荷量短期预测的思想即分解-组合预测模型,同时提出了三种分解方法对分解-组合预测模型进行了验证.首先在建模之前运用数据挖掘的方法对原始数据集进行了离群点挖掘与修正;其次,为了验证准确性,将三种方法的预测结果与其他单一、组合模型预测结果进行对比;最后为了验证该模型的有效性、适用性,对特殊日期、天气和其另一组燃气负荷量数据集进行了建模和预测,通过对预测值和实际值的误差分析,实验结果进一步验证了分解-组合模型的适应性和准确性.  相似文献   

10.
为提高含噪声瓦斯浓度数据的预测精度,提出了一种基于独立成分分析(ICA)和k-最近邻(kNN)法的反向传播人工神经网络(BP-ANN)预测模型。利用滑动时间窗算法产生训练样本矩阵,采用ICA方法估计训练样本矩阵中的独立成分,用不含噪声的独立成分重新构建训练集;运用k-NN法减小训练集规模,引入混合距离测度函数降低训练过程的计算复杂度。实验结果表明,该预测模型较普通BP-ANN模型有效减小了瓦斯浓度预测误差和训练时间。  相似文献   

11.
The k-Nearest Neighbour (k-NN) estimation and prediction technique is widely used to produce pixel-level predictions and areal estimates of continuous forest variables such as area and volume, often by sub-categories such as species. An advantage of k-NN is that the same parameters (e.g., k-value, distance metric, weight vector for the feature space variables) can be used for all variables, whether continuous or categorical. An obvious question is the degree to which accuracy can be improved if the k-NN estimation parameters are tailored for specific variable groups such as volumes by tree species or categorical variables. We investigated prediction of categorical forest attribute variables from satellite image spectral data using k-NN with optimisation of the weight vector for the ancillary variables obtained using a genetic algorithm. We tested several genetic algorithm fitness functions, all derived from well-known accuracy measures. For a Finnish test site, the categorical forest attribute variables were site fertility and tree species dominance, and for an Italian test site, the variables were forest type and conifer/broad-leaved dominance. The results for both test sites were validated using independent data sets. Our results indicate that use of the genetic algorithm to optimize the weight vector for prediction of a single forest attribute variable had a slight positive effect on the prediction accuracies for other variables. Errors can be further decreased if the optimisation is done by variable groups.  相似文献   

12.
Lazy learning methods for function prediction use different prediction functions. Given a set of stored instances, a similarity measure, and a novel instance, a prediction function determines the value of the novel instance. A prediction function consists of three components: a positive integer k specifying the number of instances to be selected, a method for selecting the k instances, and a method for calculating the value of the novel instance given the k selected instances. This paper introduces a novel method called k surrounding neighbor (k-SN) for intelligently selecting instances and describes a simple k-SN algorithm. Unlike k nearest neighbor (k-NN), k-SN selects k instances that surround the novel instance. We empirically compared k-SN with k-NN using the linearly weighted average and local weighted regression methods. The experimental results show that k-SN outperforms k-NN with linearly weighted average and performs slightly better than k-NN with local weighted regression for the selected datasets.  相似文献   

13.
提出了一个基于聚类索引树的高维近似检索方法。详细描述了其建树算法和检索算法。由于传统索引对高维空间的k-近邻检索效率的提高非常有限,我们把近似检索和聚类索引树结合起来。从而用很小的精度损失换取很高的检索效率。实验表明,与精确检索相比,本方法的误差非常小,而检索速度大大优于其他方法,因此具有广泛的应用前景。  相似文献   

14.
This paper centres on a new GMDH (group method of data handling) algorithm based on the k-nearest neighbour (k-NN) method. Instead of the transfer function that has been used in traditional GMDH, the k-NN kernel function is adopted in the proposed GMDH to characterise relationships between the input and output variables. The proposed method combines the advantages of the k-nearest neighbour (k-NN) algorithm and GMDH algorithm, and thus improves the predictive capability of the GMDH algorithm. It has been proved that when the bandwidth of the kernel is less than a certain constant C, the predictive capability of the new model is superior to that of the traditional one. As an illustration, it is shown that the new method can accurately forecast consumer price index (CPI).  相似文献   

15.
This paper introduces a binary neural network-based prediction algorithm incorporating both spatial and temporal characteristics into the prediction process. The algorithm is used to predict short-term traffic flow by combining information from multiple traffic sensors (spatial lag) and time series prediction (temporal lag). It extends previously developed Advanced Uncertain Reasoning Architecture (AURA) k-nearest neighbour (k-NN) techniques. Our task was to produce a fast and accurate traffic flow predictor. The AURA k-NN predictor is comparable to other machine learning techniques with respect to recall accuracy but is able to train and predict rapidly. We incorporated consistency evaluations to determine whether the AURA k-NN has an ideal algorithmic configuration or an ideal data configuration or whether the settings needed to be varied for each data set. The results agree with previous research in that settings must be bespoke for each data set. This configuration process requires rapid and scalable learning to allow the predictor to be set-up for new data. The fast processing abilities of the AURA k-NN ensure this combinatorial optimisation will be computationally feasible for real-world applications. We intend to use the predictor to proactively manage traffic by predicting traffic volumes to anticipate traffic network problems.  相似文献   

16.
针对基于道路网络的连续k近邻查询处理, 提出一种新的道路网络有向图模型, 分别利用基于内存的哈希表和线性链表结构对移动对象当前位置和道路网络有向图模型进行存储和管理.通过引入单向网络距离度量和双向网络距离度量, 提出单向网络扩展(UNE)算法和双向网络扩展(BNE)算法以支持不同语义的连续k近邻查询处理, 并采用影响树及网络扩展策略来减少连续k近邻查询更新的搜索代价. 实验结果表明, 上述两种算法性能优于目前的IMA和MKNN等连续k近邻查询处理算法.  相似文献   

17.
Fast k-nearest neighbor classification using cluster-based trees   总被引:5,自引:0,他引:5  
Most fast k-nearest neighbor (k-NN) algorithms exploit metric properties of distance measures for reducing computation cost and a few can work effectively on both metric and nonmetric measures. We propose a cluster-based tree algorithm to accelerate k-NN classification without any presuppositions about the metric form and properties of a dissimilarity measure. A mechanism of early decision making and minimal side-operations for choosing searching paths largely contribute to the efficiency of the algorithm. The algorithm is evaluated through extensive experiments over standard NIST and MNIST databases.  相似文献   

18.
We propose five different ways of integrating Dempster-Shafer theory of evidence and the rank nearest neighbor classification rules with a view to exploiting the benefits of both. These algorithms have been tested on both real and synthetic data sets and compared with the k-nearest neighbour rule (k-NN), m-multivariate rank nearest neighbour rule (m-MRNN), and k-nearest neighbour Dempster-Shafer theory rule (k-NNDST), which is an algorithm that also combines Dempster-Shafer theory with the k-NN rule. If different features have widely different variances then the distance-based classifier algorithms like k-NN and k-NNDST may not perform well, but in this case the proposed algorithms are expected to perform better. Our simulation results indeed reveal this. Moreover, the proposed algorithms are found to exhibit significant improvement over the m-MRNN rule  相似文献   

19.
Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforcement learning problems better than either method alone. This class, the class of differential games, includes numerous important control problems that arise in robotics, planning, game playing, and other areas, and solutions for differential games suggest solution strategies for the general class of planning and control problems. We conducted a series of experiments applying three learning approaches – lazy Q-learning, k-nearest neighbor (k-NN), and a genetic algorithm – to a particular differential game called a pursuit game. Our experiments demonstrate that k-NN had great difficulty solving the problem, while a lazy version of Q-learning performed moderately well and the genetic algorithm performed even better. These results motivated the next step in the experiments, where we hypothesized k-NN was having difficulty because it did not have good examples – a common source of difficulty for lazy learning. Therefore, we used the genetic algorithm as a bootstrapping method for k-NN to create a system to provide these examples. Our experiments demonstrate that the resulting joint system learned to solve the pursuit games with a high degree of accuracy – outperforming either method alone – and with relatively small memory requirements.  相似文献   

20.
The goal of this study is to propose a new classification of African ecosystems based on an 8-year analysis of Normalized Difference Vegetation Index (NDVI) data sets from SPOT/VEGETATION. We develop two methods of classification. The first method is obtained from a k-nearest neighbour (k-NN) classifier, which represents a simple machine learning algorithm in pattern recognition. The second method is hybrid in that it combines k-NN clustering, hierarchical principles and the Fast Fourier Transform (FFT). The nomenclature of the two classifications relies on three levels of vegetation structural categories based on the Land Cover Classification System (LCCS). The two main outcomes are: (i) The delineation of the spatial distribution of ecosystems into five bioclimatic ecoregions at the African continental scale; (ii) Two ecosystem maps were made sequentially: an initial map with 92 ecosystems from the k-NN, plus a deduced hybrid classification with 73 classes, which better reflects the bio-geographical patterns. The inclusion of bioclimatic information and successive k-NN clustering elements helps to enhance the discrimination of ecosystems. Adopting this hybrid approach makes the ecosystem identification and labelling more flexible and more accurate in comparison to straightforward methods of classification. The validation of the hybrid classification, conducted by crossing-comparisons with validated continental maps, displayed a mapping accuracy of 54% to 61%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号