首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Tomek's preprocessing scheme is discussed for editing the training set prior to analyzing it by Hart's condensed nearest neighbor technique. Preprocessing was performed by a κ-nearest-neighbor pdf estimation scheme, although other methods are suggested in this paper. The procedure was studied experimentally and was found to achieve a significant reduction in the storage requirements of the CNN method while maintaining approximately the same error rate, or even improving it.  相似文献   

Feature and instance selection are two effective data reduction processes which can be applied to classification tasks obtaining promising results. Although both processes are defined separately, it is possible to apply them simultaneously.This paper proposes an evolutionary model to perform feature and instance selection in nearest neighbor classification. It is based on cooperative coevolution, which has been applied to many computational problems with great success.The proposed approach is compared with a wide range of evolutionary feature and instance selection methods for classification. The results contrasted through non-parametric statistical tests show that our model outperforms previously proposed evolutionary approaches for performing data reduction processes in combination with the nearest neighbor rule.  相似文献   

The nearest neighbors rules are commonly used in pattern recognition and statistics. The performance of these methods relies on three crucial choices: a distance metric, a set of prototypes and a classification scheme. In this paper, we focus on the second, challenging issue: instance selection. We apply a maximum a posteriori criterion to the evaluation of sets of instances and we propose a new optimization algorithm. This gives birth to Eva, a new instance selection method. We benchmark this method on real datasets and perform a multi-criteria analysis: we evaluate the compression rate, the predictive accuracy, the reliability and the computational time. We also carry out experiments on synthetic datasets in order to discriminate the respective contributions of the criterion and the algorithm, and to illustrate the advantages of Eva over the state-of-the-art algorithms. The study shows that Eva outputs smaller and more reliable sets of instances, in a competitive time, while preserving the predictive accuracy of the related classifier.  相似文献   

聚类分析是一种重要的数据挖掘方法。K-means聚类算法在数据挖掘领域具有非常重要的应用价值。针对K-means需要人工设定聚类个数并且易陷入局部极优的缺陷,提出了一种基于最近共享邻近节点的K-means聚类算法(KSNN)。KSNN在数据集中搜索中心点,依据中心点查找数据集个数,为K-means聚类提供参数。从而克服了K-means需要人工设定聚类个数的问题,同时具有较好的全局收敛性。实验证明KSNN算法比K-means、粒子群K-means(pso)以及多中心聚类算法(MCA)有更好的聚类效果。  相似文献   

In this paper, we present a novel nearest neighbor rule-based implementation of the structural risk minimization principle to address a generic classification problem. We propose a fast reference set thinning algorithm on the training data set similar to a support vector machine (SVM) approach. We then show that the nearest neighbor rule based on the reduced set implements the structural risk minimization principle, in a manner which does not involve selection of a convenient feature space. Simulation results on real data indicate that this method significantly reduces the computational cost of the conventional SVMs, and achieves a nearly comparable test error performance.  相似文献   

目前基于网络结构的节点分类方法只注重局部网络连接关系。为了能获取更广泛的网络信息,提出一种基于邻居节点结构信息的半监督节点分类算法CBGN。首先,在网络中加入惩罚因子来改进随机游走策略以获取节点的不定长游走序列,这些节点序列被当做句子输入到word2vec模型中,从而将网络结构的潜在信息转换成向量作为节点的特征表示;其次,改进支持向量机算法,结合梯度下降法和坐标下降法来优化参数空间,以对未标记节点进行更准确的分类;最后,在四个标准数据集上与目前较先进的几种方法进行了对比实验。结果表明,CBGN算法提高了分类精度,相比之前已有的方法具有更好的分类效果。  相似文献   

钱江波  胡伟  陈华辉  董一鸿 《控制与决策》2019,34(12):2567-2575
基于哈希的近邻查找技术在图像检索、文本匹配、数据挖掘等信息检索领域均有广泛应用.该技术将原始数据通过哈希函数压缩成低维的二进制编码,然后在海明距离下排序检索,具有快速高效且维度不敏感的优势.但是,目前学术界针对流数据的实时在线哈希学习方法的研究很少,而且基本没有讨论哈希函数的更新频率和稳定性问题.针对这一问题,通过增加置信区间来减少更换哈希函数的频率,并构造在线学习的目标函数,使得算法尽可能保持稳定,且快速收敛.为了验证所提出算法的效率和有效性,在公开数据集上与同类的OSH、OKH在线哈希算法进行比较,比较结果表明,所提出的算法在平均准确率和训练时间上有一定优势.  相似文献   

We propose a two-layer decision fusion technique, called Fuzzy Stacked Generalization (FSG) which establishes a hierarchical distance learning architecture. At the base-layer of an FSG, fuzzy k-NN classifiers receive different feature sets each of which is extracted from the same dataset to gain multiple views of the dataset. At the meta-layer, first, a fusion space is constructed by aggregating decision spaces of all the base-layer classifiers. Then, a fuzzy k-NN classifier is trained in the fusion space by minimizing the difference between the large sample and N-sample classification error. In order to measure the degree of collaboration among the base-layer classifiers and the diversity of the feature spaces, a new measure called, shareability, is introduced. Shearability is defined as the number of samples that are correctly classified by at least one of the base-layer classifiers in FSG. In the experiments, we observe that FSG performs better than the popular distance learning and ensemble learning algorithms when the shareability measure is large enough such that most of the samples are correctly classified by at least one of the base-layer classifiers. The relationship between the proposed and state-of-the-art diversity measures is experimentally analyzed. The tests performed on a variety of artificial and real-world benchmark datasets show that the classification performance of FSG increases compared to that of state-of-the art ensemble learning and distance learning methods as the number of classes increases.  相似文献   

The nearest neighbor classification is a simple and yet effective technique for pattern recognition. Performance of this technique depends significantly on the distance function used to compute similarity between examples. Some techniques were developed to learn weights of features for changing the distance structure of samples in nearest neighbor classification. In this paper, we propose an approach to learning sample weights for enlarging margin by using a gradient descent algorithm to minimize margin based classification loss. Experimental analysis shows that the distances trained in this way reduce the loss of the margin and enlarge the hypothesis margin on several datasets. Moreover, the proposed approach consistently outperforms nearest neighbor classification and some other state-of-the-art methods.  相似文献   

在大数据环境下,K近邻多标签算法(ML-KNN)高时间复杂度的问题显得尤为突出;此外,ML-KNN也没有考虑◢k◣个近邻对最终分类结果的影响。针对上述问题进行研究,首先将训练集进行聚类,再为测试集找到一个距离其最近的训练数据簇作为新的训练数据集;然后计算最近邻样本的距离权重,并用该权重描述最近邻和其他近邻对预测结果的影响;最后使用新的目标函数为待测样本分类。通过在图片、Web页面文本数据等数据集上的实验表明,所提算法得到了更好的分类结果,并且大大降低了时间复杂度。  相似文献   

局部保持投影(LPP)是一种新的数据降维技术,但其本身是一种非监督学习算法,对于分类问题效果不是太好。基于自适应最近邻,结合LPP算法,提出了一种有监督的局部保持投影算法(ANNLPP)。该方法通过修改LPP算法中的权值矩阵,在降维的同时,增加了类别信息,是一种有监督学习算法。通过二维数据可视化和UMIST、ORL 人脸识别实验,表明该方法对于分类问题具有较好的降维效果。  相似文献   

邢艳  周勇 《计算机应用研究》2012,29(7):2524-2526
近邻传播(AP)算法是一种新提出的聚类算法,是在数据点的相似度矩阵的基础上进行聚类,通过数据点之间交换信息,最后得到聚类结果。提出了基于互近邻一致性近邻传播算法,即KMNC-AP算法,该算法利用互近邻一致性调整数据点之间的相似度,进而提高聚类效率和精确度。实验结果表明,该算法在处理能力和运算速度上优于原算法。  相似文献   

为识别混合属性数据集中的离群点,提出了一种基于共享最近邻的离群检测算法,通过计算增量聚类结果簇间的共享最近邻相似度,不但能够发现任意形状的簇,还可以检测到变密度数据集中的全局离群点。算法时间复杂度关于数据集的大小和属性个数呈近似线性。在人工数据集和真实数据集上的实验结果显示,提出的算法能有效检测到数据集中的离群点。  相似文献   

针对传统社团检测算法无法判断网络中特殊节点和SCAN算法对于参数依赖性太大的缺点,提出了一种基于自然最近邻居概念的社团检测算法CD3N.算法利用自然最近邻居无参的特性,首先以结构相似度为基准,计算出网络节点的自然最近邻居,并依此构造小值最近邻域图;然后取邻域图中邻居数最多的节点为核心节点,根据可达关系,构造关于核心节点的社团;重复选取核心节点并构造社团的过程,直到没有可归入社团的节点.将算法应用到空手道俱乐部网络和海豚网络中,并与SCAN算法进行对比.实验结果表明,CD3N算法有效解决了参数敏感性问题,能够很好地进行社团检测.  相似文献   

协同过滤是目前电子商务推荐系统中广泛应用的最成功的推荐技术,但面临严峻的用户评分数据稀疏性和推荐实时性挑战。针对协同过滤中的数据稀疏问题,提出了一种基于最近邻的个性化推荐算法。通过维数简化技术对评分矩阵进行优化,降低数据稀疏性;采用一种新颖的相似性度量方法计算目标用户的最近邻居,产生推荐预测。实验结果表明,该算法有效地解决了数据稀疏,提高了推荐系统的推荐质量。  相似文献   

《Pattern recognition letters》2001,22(3-4):407-412
This paper extends a previous risk study of the well-known nearest neighbor (NN) rule with fixed and finite reference samples. Our result is competitive with some previously obtained in fairly restrictive and complex settings, and beats these in general cases.  相似文献   

基于遗传进化的最近邻聚类算法及其应用   总被引:4,自引:0,他引:4       下载免费PDF全文
提出了基于遗传进化的最近邻聚类算法,该算法结合了遗传算法(GA)与最近邻聚类算法(NN)。对要进行分类的样本和特征量进行优化选取,去除位于类交界的模糊样本,并对样本分类有效的特征量进行放大,对不利于样本分类的特征量进行抑制,从而提高了样本分类的精度,将该算法应用于抽水蓄能发电机组的工况分类,大大提高了机组工况的识别效果,验证了基于遗传算法的最近邻聚类算法的有效性。  相似文献   

基于近邻策略的旅行商问题求解   总被引:1,自引:0,他引:1       下载免费PDF全文
根据TSP问题的特征信息并借鉴邻域搜索算法的有关思想,提出了一种基于近邻策略的TSP问题求解算法,该算法首先依据TSP问题的特殊性求出相应的近邻模式,再将近邻模式用于初始种群的生成,而后在进化过程中随机引入这类模式。该算法可以大大缩短遗传进程,提高进化效率。通过仿真实验,验证了该算法的有效性,并且随着城市数目的增加其优越性更为明显。  相似文献   

为解决密度聚类算法在处理高维和多密度数据集时聚类结果不精确的问题,提出一种基于共享近邻亲和度(SNNA)的聚类算法。该算法引入[k]近邻和共享近邻,定义共享近邻亲和度作为对象的局部密度度量。算法首先根据亲和度来提取核心点,然后利用广度优先搜索算法对核心点进行聚类,最后对非核心点进行指派即完成整个数据集的聚类。实验结果表明,该算法能够发现任意形状、大小、密度的聚类;与同类算法相比,SNNA算法在处理高维数据时具有较高的聚类准确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号