首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems).  相似文献   

2.
This paper presents a study of the Multi-Type Reverse Nearest Neighbor (MTRNN) query problem. Traditionally, a reverse nearest neighbor (RNN) query finds all the objects that have the query point as their nearest neighbor. In contrast, an MTRNN query finds all the objects that have the query point in their multi-type nearest neighbors. Existing RNN queries find an influence set by considering only one feature type. However, the influence from multiple feature types is often critical for strategic decision making in many business scenarios, such as site selection for a new shopping center. To that end, we first formalize the notion of the MTRNN query by considering the influence of multiple feature types. We also propose R-tree based algorithms to find the influence set for a given query point and multiple feature types. Finally, experimental results are provided to show the strength of the proposed algorithms as well as design decisions related to performance tuning.  相似文献   

3.
A partially specified nearest neighbor query is a nearest neighbor search in which only some of the possible keys are specified. An algorithm that uses k-d trees to perform such searching is described. The expected time complexity is O(N1-jk). where k is the total number of keys and j the number of keys specified in the query. Experimental results, which are consistent with the theoretical predictions, are also presented.  相似文献   

4.
在文本分类中,最近邻搜索算法具有思想简单、准确率高等优点,但通常在分类过程中的计算量较大。为克服这一不足,提出了一种基于最近邻子空间搜索的两类文本分类方法。首先提取每一类样本向量组的特征子空间,并通过映射将子空间变换为高维空间中的点,然后把最近邻子空间搜索转化为最近邻搜索完成分类过程。在Reuters-21578数据集上的实验表明,该方法能够有效提高文本分类的性能,具有较高的准确率、召回率和F1值。  相似文献   

5.
6.
In this paper,a new approach is presented to find the reference set for the nearest neighbor classifer.The optimal reference set,which has minimum sample size and satisfies a certain error rate threshold,is obtained through a Tabu search algorithm.When the error rate threshold is set to zero,the algorithm obtains a near minimal consistent subset of a given training set.While the threshold is set to a small appropriate value,the obtained reference set may compensate the bias of the nearest neighbor estimate.An aspiration criterion for Tabu search is introduced,which aims to prevent the search process form the inefficient wandering between the feasible and infeasible regions in the search space and speed up the convergence.Experimental results based on a number of typical data sets are presented and analyzed to illustrate the benefits of the proposed method.Compared to conventional methods,such as CNN and Dasarathy‘s algorithm,the size of the reduced reference sets is much smaller,and the nearest neighbor classification performance is better,especially when the error rate thresholds are set to appropriate nonzerovalues,The experimental results also illustrate that the MCS(inimal consistent set)of Dasarathy‘s algorithm is not minimal,and its candidate consistent set is not always ensured to reduce monotonically.A counter example is also given to confirm this claim.  相似文献   

7.
An efficient and universal similarity search solution is a holy grail for multimedia information retrieval. Most similarity indexes work by mapping the original multimedia objects into simpler representations, which are then searched by proximity using a suitable distance function.  相似文献   

8.
9.
基于网格的共享近邻聚类算法   总被引:1,自引:0,他引:1  
刘敏娟  柴玉梅 《计算机应用》2006,26(7):1673-1675
提出了一种基于网格的共享近邻聚类算法(Grid based shared Nearest Neighbor algorithm, GNN)。该算法主要利用网格技术去除数据集中的部分孤立点或噪声,使用密度阈值处理技术来处理网格的密度阈值,使用中心点技术提高聚类效率。GNN算法仅对数据集进行一遍扫描,且能处理任意形状和大小的聚类。实验表明,GNN有较好的可扩展性,其精度和效率明显地好于共享近邻SNN算法。  相似文献   

10.
11.
移动对象的动态反向k最近邻研究   总被引:1,自引:1,他引:0       下载免费PDF全文
反向最近邻查询是空间数据库中最重要的算法之一。传统的反向最近邻查询方法主要是针对静态对象的查询,随着无线通讯和定位技术的快速发展,移动对象发出的查询请求成为新的研究热点。该文将TPR-tree作为算法的索引结构,并提出了基于矩形框的对角线的修剪策略,将半平面修剪策略进行改进,给出了移动对象的动态反向k最近邻的查询方案。  相似文献   

12.
13.
目的 海量图像检索技术是计算机视觉领域研究热点之一,一个基本的思路是对数据库中所有图像提取特征,然后定义特征相似性度量,进行近邻检索。海量图像检索技术,关键的是设计满足存储需求和效率的近邻检索算法。为了提高图像视觉特征的近似表示精度和降低图像视觉特征的存储空间需求,提出了一种多索引加法量化方法。方法 由于线性搜索算法复杂度高,而且为了满足检索的实时性,需把图像描述符存储在内存中,不能满足大规模检索系统的需求。基于非线性检索的优越性,本文对非穷尽搜索的多索引结构和量化编码进行了探索新研究。利用多索引结构将原始数据空间划分成多个子空间,把每个子空间数据项分配到不同的倒排列表中,然后使用压缩编码的加法量化方法编码倒排列表中的残差数据项,进一步减少对原始空间的量化损失。在近邻检索时采用非穷尽搜索的策略,只在少数倒排列表中检索近邻项,可以大大减少检索时间成本,而且检索过程中不用存储原始数据,只需存储数据集中每个数据项在加法量化码书中的码字索引,大大减少内存消耗。结果 为了验证算法的有效性,在3个数据集SIFT、GIST、MNIST上进行测试,召回率相比近几年算法提升4%~15%,平均查准率提高12%左右,检索时间与最快的算法持平。结论 本文提出的多索引加法量化编码算法,有效改善了图像视觉特征的近似表示精度和存储空间需求,并提升了在大规模数据集的检索准确率和召回率。本文算法主要针对特征进行近邻检索,适用于海量图像以及其他多媒体数据的近邻检索。  相似文献   

14.
With the recent surge in the use of the location-based service (LBS), the importance of spatial database queries has increased. The reverse nearest neighbor (RNN) search is one of the most popular spatial database queries. In most previous studies, the spatial distance is used for measuring the distance between objects. However, as the demands of users of the LBSs are becoming more complex, considering only the spatial factor as a distance measure is not sufficient. For example, through a hotel finding service, users want to choose a hotel considering not only the spatial distance, but also the non-spatial aspect of the hotel such as the quality which can be represented by the number of stars. Therefore, services that consider both spatial and non-spatial factors in measuring the distance are more useful for users. In such a case, techniques proposed in the previous studies cannot be used since the distance measure is different. In this paper, we propose an efficient method for the RNN search in which a distance measure involves both the spatial distance and the non-spatial aspect of an object. We conduct extensive experiments on a large dataset to evaluate the efficiency of the proposed method. The experimental results show that the proposed method is significantly efficient and scalable.  相似文献   

15.
聚类是一种无监督的机器学习方法,其任务是发现数据中的自然簇。共享最近邻聚类算法(SNN)在处理大小不同、形状不同以及密度不同的数据集上具有很好的聚类效果,但该算法还存在以下不足:(1)时间复杂度为O(n2),不适合处理大规模数据集;(2)没有明确给出参数阈值的简单指导性操作方法;(3)只能处理数值型属性数据集。对共享最近邻算法进行改进,使其能够处理混合属性数据集,并给出参数阈值的简单选择方法,改进后算法运行时间与数据集大小成近似线性关系,适用于大规模高维数据集。在真实数据集和人造数据集上的实验结果表明,提出的改进算法是有效可行的。  相似文献   

16.
为解决密度聚类算法在处理高维和多密度数据集时聚类结果不精确的问题,提出一种基于共享近邻亲和度(SNNA)的聚类算法。该算法引入[k]近邻和共享近邻,定义共享近邻亲和度作为对象的局部密度度量。算法首先根据亲和度来提取核心点,然后利用广度优先搜索算法对核心点进行聚类,最后对非核心点进行指派即完成整个数据集的聚类。实验结果表明,该算法能够发现任意形状、大小、密度的聚类;与同类算法相比,SNNA算法在处理高维数据时具有较高的聚类准确率。  相似文献   

17.
协同过滤是目前电子商务推荐系统中广泛应用的最成功的推荐技术,但面临严峻的用户评分数据稀疏性和推荐实时性挑战。针对协同过滤中的数据稀疏问题,提出了一种基于最近邻的个性化推荐算法。通过维数简化技术对评分矩阵进行优化,降低数据稀疏性;采用一种新颖的相似性度量方法计算目标用户的最近邻居,产生推荐预测。实验结果表明,该算法有效地解决了数据稀疏,提高了推荐系统的推荐质量。  相似文献   

18.
在大数据环境下,K近邻多标签算法(ML-KNN)高时间复杂度的问题显得尤为突出;此外,ML-KNN也没有考虑◢k◣个近邻对最终分类结果的影响。针对上述问题进行研究,首先将训练集进行聚类,再为测试集找到一个距离其最近的训练数据簇作为新的训练数据集;然后计算最近邻样本的距离权重,并用该权重描述最近邻和其他近邻对预测结果的影响;最后使用新的目标函数为待测样本分类。通过在图片、Web页面文本数据等数据集上的实验表明,所提算法得到了更好的分类结果,并且大大降低了时间复杂度。  相似文献   

19.
The problem of k nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. In this paper, a novel fast kNN search method using an orthogonal search tree is proposed. The proposed method creates an orthogonal search tree for a data set using an orthonormal basis evaluated from the data set. To find the kNN for a query point from the data set, projection values of the query point onto orthogonal vectors in the orthonormal basis and a node elimination inequality are applied for pruning unlikely nodes. For a node, which cannot be deleted, a point elimination inequality is further used to reject impossible data points. Experimental results show that the proposed method has good performance on finding kNN for query points and always requires less computation time than available kNN search algorithms, especially for a data set with a big number of data points or a large standard deviation.  相似文献   

20.
空间数据库中反向最近邻查询在低维查询时一般利用基于R-Tree的改进树作为索引结构,由于树型索引结构本身的限制,R-Tree等索引结构的查询在高维中都会出现维数灾难。针对这个问题,提出了一种基于VARdnn-Tree的索引结构,采用量化压缩的方法存储数据,能够有效地支持高维查询。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号