首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Reverse Nearest Neighbor Search in Metric Spaces   总被引:7,自引:0,他引:7  
Given a set {cal D} of objects, a reverse nearest neighbor (RNN) query returns the objects o in {cal D} such that o is closer to a query object q than to any other object in {cal D}, according to a certain similarity metric. The existing RNN solutions are not sufficient because they either 1) rely on precomputed information that is expensive to maintain in the presence of updates or 2) are applicable only when the data consists of "Euclidean objects” and similarity is measured using the L_2 norm. In this paper, we present the first algorithms for efficient RNN search in generic metric spaces. Our techniques require no detailed representations of objects, and can be applied as long as their mutual distances can be computed and the distance metric satisfies the triangle inequality. We confirm the effectiveness of the proposed methods with extensive experiments.  相似文献   

2.
Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth- first or a best-first algorithm to the search hierarchy containing the data. These algorithms are generally applicable to any index based on hierarchical clustering. The idea is that the data is partitioned into clusters which are aggregated to form other clusters, with the total aggregation being represented as a tree. These algorithms have traditionally used a lower bound corresponding to the minimum distance at which a nearest neighbor can be found (termed MinDist) to prune the search process by avoiding the processing of some of the clusters as well as individual objects when they can be shown to be farther from the query object q than all of the current k nearest neighbors of q. An alternative pruning technique that uses an upper bound corresponding to the maximum possible distance at which a nearest neighbor is guaranteed to be found (termed MaxNearestDist) is described. The MaxNearestDist upper bound is adapted to enable its use for finding the k nearest neighbors instead of just the nearest neighbor (i.e., k=1) as in its previous uses. Both the depth-first and best-first k-nearest neighbor algorithms are modified to use MaxNearestDist, which is shown to enhance both algorithms by overcoming their shortcomings. In particular, for the depth-first algorithm, the number of clusters in the search hierarchy that must be examined is not increased thereby potentially lowering its execution time, while for the best-first algorithm, the number of clusters in the search hierarchy that must be retained in the priority queue used to control the ordering of processing of the clusters is also not increased, thereby potentially lowering its storage requirements.  相似文献   

3.
Similarity search (e.g., k-nearest neighbor search) in high-dimensional metric space is the key operation in many applications, such as multimedia databases, image retrieval and object recognition, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are built by partitioning the data set based on distances to certain reference point(s). Using the index, search is confined to a small number of partitions. However, these methods either ignore the property of the data distribution (e.g., VP-tree and its variants) or produce non-disjoint partitions (e.g., M-tree and its variants, DBM-tree); these greatly affect the search efficiency. In this paper, we study the effectiveness of a new index structure, called Nested-Approximate-eQuivalence-class tree (NAQ-tree), which overcomes the above disadvantages. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency.  相似文献   

4.
移动对象的动态反向k最近邻研究   总被引:1,自引:1,他引:0       下载免费PDF全文
反向最近邻查询是空间数据库中最重要的算法之一。传统的反向最近邻查询方法主要是针对静态对象的查询,随着无线通讯和定位技术的快速发展,移动对象发出的查询请求成为新的研究热点。该文将TPR-tree作为算法的索引结构,并提出了基于矩形框的对角线的修剪策略,将半平面修剪策略进行改进,给出了移动对象的动态反向k最近邻的查询方案。  相似文献   

5.
对顺序索引方法进行了研究,提出一种基于向量近似的高维顺序索引结构,该结构顺序访问部分文件就能完成k近邻查询。在查询过程中依据投影值来终止查询过程,依据距离来排除不匹配的数据。为进一步降低数据访问率,采用椭圆体聚类算法对数据集进行划分。新索引结构支持以多个顺序访问过程完成k近邻查询,能够同时降低查询过程中的I/O开销和CPU开销。在大型高维图像特征库上的实验表明,新的高维索引结构的查询性能优于其他高维索引方法。  相似文献   

6.
Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60%.  相似文献   

7.
Fast k-nearest neighbor classification using cluster-based trees   总被引:5,自引:0,他引:5  
Most fast k-nearest neighbor (k-NN) algorithms exploit metric properties of distance measures for reducing computation cost and a few can work effectively on both metric and nonmetric measures. We propose a cluster-based tree algorithm to accelerate k-NN classification without any presuppositions about the metric form and properties of a dissimilarity measure. A mechanism of early decision making and minimal side-operations for choosing searching paths largely contribute to the efficiency of the algorithm. The algorithm is evaluated through extensive experiments over standard NIST and MNIST databases.  相似文献   

8.
提出一种新的基于图论的聚类算法NeiMu。该算法首先分析数据中的对象,寻找每个对象的k近邻,根据k近邻关系构造k近邻有向图,然后通过k近邻有向图中的k-互邻居关系构造k-聚类图,发现数据中的自然聚类。算法的特点是根据数据之间的互为k近邻关系确定数据中的自然簇,而不必引入其他方法来划分小簇,从而能够保证对象不会被错误聚类,仅会与其他小簇一起融合到一个大簇中。这一优点可以有效保证NeiMu算法的聚类质量。而且,NeiMu算法给出的这种类似自底向上的层次聚类结果还有利于用户根据渐变的结果确定最佳的k值。实验结果表明,该算法对密度变化大的数据、大小相差大的数据、任意分布形状的数据均具有很好的聚类质量,对孤立点也很健壮。  相似文献   

9.
We propose a new efficient and accurate technique for generic approximate similarity searching, based on the use of inverted files. We represent each object of a dataset by the ordering of a number of reference objects according to their distance from the object itself. In order to compare two objects in the dataset, we compare the two corresponding orderings of the reference objects. We show that this representation enables us to use inverted files to obtain very efficiently a very small set of good candidates for the query result. The candidate set is then reordered using the original similarity function to obtain the approximate similarity search result. The proposed technique performs several orders of magnitude better than exact similarity searches, still guaranteeing high accuracy. To also demonstrate the scalability of the proposed approach, tests were executed with various dataset sizes, ranging from 200,000 to 100 million objects.  相似文献   

10.
Rav-tree:一种有效支持反向近似近邻查询的索引结构   总被引:1,自引:0,他引:1  
空间数据库的索引结构是实现有效数据查询的前提和基础。空间数据反向近似近邻查询是空间查询的一个新方向,它避免了精确查询中过多的距离计算,从而能够在效率与准确性上取得平衡。提出的Rav-tree不同于基于启发式规则的索引结构,首先利用局部近似,然后根据Voronoi cell区域和估计圆的方法实现近似近邻查询,并利用过滤结果和分域查询得到初步的候选集,最终通过反向近似近邻查询(RANNQuery)算法得到RANN集,并完整地给出基于Rav-tree的ANN查询算法和RANN查询算法。实验结果表明,Rav-tree对RANN等查询具有较好的查询效率和查全率。  相似文献   

11.
在时空数据库中,最近邻查询用于对某个查询对象,在被查询对象中找出离它最近的一个或多个对象。该文在TPR树这一时空索引的基础上,提出了一种高效的最近邻查询算法,能够支持移动对象的多个最近邻对象的查询,并在性能上也有所提高。  相似文献   

12.
使用R树进行k-NN搜索   总被引:1,自引:0,他引:1  
在地理信息系统中经常要做k-NN搜索,进行这些查询用到的算法与位置和范围查询的算法不同,需要专门进行研究,介绍了一种分支界限遍历R树算法,并将该算法概括为k-NN算法。文中讨论了两种方法。对R树进行结点内MBR的排序以及剪枝过程,以减少搜索空间中需访问结点的数量,有效地进行k-NN搜索。  相似文献   

13.
一种自适应k-最近邻算法的研究   总被引:3,自引:0,他引:3  
针对传统k-最近邻算法(k-Nearest Neighbor, kNN)存在搜索慢的缺陷,提出了一种改进型的自适应k-最近邻算法。该方法在以测试样本点为中心的超球内进行搜索,对超球半径的生长进行采样,建立半径生长的BP神经网络模型,逼近半径变化函数,并用该函数指导超球体的生长。该方法有效地缩小了搜索范围,减少了超球体半径生长的试探次数,对处理稀疏数据集有明显的优越性。  相似文献   

14.
路网中互近邻查询处理方法   总被引:1,自引:0,他引:1  
提出路网中的互近邻查询问题.给定路网G(V,E),对象集P,查询点q,近邻数k1和k2,互近邻查询返回既是q的k1近邻,又是q的反k2近邻的对象集.为解决该问题,首先提出基础算法,即先求出查询点q的k1近邻作为候选,再验证这些候选是否为真正的结果.然后,在此基础上提出了优化算法,根据落在对象点与查询点最短路径边上的标记点个数直接排除掉一些错误的候选对象.最后,通过实验验证了优化算法的有效性.  相似文献   

15.
Processing moving queries over moving objects using motion-adaptive indexes   总被引:2,自引:0,他引:2  
This paper describes a motion-adaptive indexing scheme for efficient evaluation of moving continual queries (MCQs) over moving objects. It uses the concept of motion-sensitive bounding boxes (MSBs) to model moving objects and moving queries. These bounding boxes automatically adapt their sizes to the dynamic motion behaviors of individual objects. Instead of indexing frequently changing object positions, we index less frequently changing object and query MSBs, where updates to the bounding boxes are needed only when objects and queries move across the boundaries of their boxes. This helps decrease the number of updates to the indexes. More importantly, we use predictive query results to optimistically precalculate query results, decreasing the number of searches on the indexes. Motion-sensitive bounding boxes are used to incrementally update the predictive query results. Furthermore, we introduce the concepts of guaranteed safe radius and optimistic safe radius to extend our motion-adaptive indexing scheme to evaluating moving continual k-nearest neighbor (kNN) queries. Our experiments show that the proposed motion-adaptive indexing scheme is efficient for the evaluation of both moving continual range queries and moving continual kNN queries.  相似文献   

16.
Given a set of data points P and a query point q in a multidimensional space, reverse nearest neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-nearest neighbor (RkNN) query (where k ges 1) generalizes RNN query to find data points whose kNNs include q. For RkNN query semantics, q is said to have influence to all those answer data points. The degree of q's influence on a data point p (isin P) is denoted by kappap where q is the kappap-th NN of p. We introduce a new variant of RNN query, namely, ranked reverse nearest neighbor (RRNN) query, that retrieves t data points most influenced by q, i.e., the t data points having the smallest kappa's with respect to q. To answer this RRNN query efficiently, we propose two novel algorithms, kappa-counting and kappa-browsing that are applicable to both monochromatic and bichromatic scenarios and are able to deliver results progressively. Through an extensive performance evaluation, we validate that the two proposed RRNN algorithms are superior to solutions derived from algorithms designed for RkNN query.  相似文献   

17.
Reverse nearest neighbor (RNN) queries have a broad application base such as decision support, profile-based marketing, resource allocation, etc. Previous work on RNN search does not take obstacles into consideration. In the real world, however, there are many physical obstacles (e.g., buildings) and their presence may affect the visibility between objects. In this paper, we introduce a novel variant of RNN queries, namely, visible reverse nearest neighbor (VRNN) search, which considers the impact of obstacles on the visibility of objects. Given a data set P, an obstacle set O, and a query point q in a 2D space, a VRNN query retrieves the points in P that have q as their visible nearest neighbor. We propose an efficient algorithm for VRNN query processing, assuming that P and O are indexed by R-trees. Our techniques do not require any preprocessing and employ half-plane property and visibility check to prune the search space. In addition, we extend our solution to several variations of VRNN queries, including: 1) visible reverse k-nearest neighbor (VRkNN) search, which finds the points in P that have q as one of their k visible nearest neighbors; 2) delta-VRkNN search, which handles VRkNN retrieval with the maximum visible distance delta constraint; and 3) constrained VRkNN (CVRkNN) search, which tackles the VRkNN query with region constraint. Extensive experiments on both real and synthetic data sets have been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings.  相似文献   

18.
大型网络中近似子图匹配研究   总被引:1,自引:0,他引:1       下载免费PDF全文
为降低噪声对近似子图匹配准确率的影响,提出一种改进的近似子图匹配方法。在预处理阶段,利用k-近邻顶点集为数据图中的每个顶点建立标签-权重向量索引。在查询过程中,基于单个近邻标签的权重距离和所有近邻标签的整体匹配程度进行两级过滤,生成顶点候选集,采用生成树匹配和图匹配的方式确定查询图在大型网络中的位置。在真实数据集上的实验结果表明,该方法具有较高的执行效率和匹配准确率。  相似文献   

19.
边界约束的非相交球树实体对象多维统一索引   总被引:1,自引:0,他引:1  
俞肇元  袁林旺  罗文  胡勇  闾国年 《软件学报》2012,23(10):2746-2759
针对现有空间索引剖分结构复杂、节点重叠率高及对多维实体对象检索及运算支撑较弱等问题,构建了一种边界约束的非相交球实体对象多维统一空间索引;利用球的几何代数外积表达,提出了基于求交算子的直线-平面和直线-球面的相交判定与交点提取方法,建立了多维实体对象体元化剖分方法及包含边界约束的非相交离散球实体填充算法,实现了实体对象空间均匀、非重叠的分割,并在填充球的个数、重叠率以及对象逼近近似度等约束条件上获得了较好的平衡.定义了最小外包球生成与更新的迭代算法与包含球体积修正的批量Neural Gas层次聚类算法,在尽可能保证球树各分支平衡性的前提下,实现了索引层次体系的稳健构建.利用几何代数下球对象间几何关系计算的内蕴性与参数更新的动态性,实现了索引结构的动态生成与更新,进而设计了实体对象表面及其内部任意位置及区域的检索策略及基于实体索引的空间关系计算方法.基于不同实体对象的模拟实验显示,基于几何代数的实体对象索引可以有效实现多维实体对象表面及其内部任意位置及区域的快速检索,并能在有限时间内以较高的精度实现多维实体对象最近邻距离和动态实体对象相交状态的检索.相对于常用球树索引,所提出的索引方法在填充率、节点重叠率、填充误差、体元个数、层次球个数、体积百分比和时间占用等方面均具有明显优势,且不同分辨率剖分条件下的索引结构及空间关系计算精度具有更高的稳健性,可运用于具有较强时间约束下复杂多维动态场景中对象检索与空间关系计算.  相似文献   

20.
古凌岚  彭利民 《计算机科学》2016,43(12):213-217
针对传统的基于欧氏距离的相似性度量不能完全反映复杂结构的数据分布特性的问题,提出了一种基于相对密度和流形上k近邻的聚类算法。基于能描述全局一致性信息的流形距离,及可体现局部相似性和紧密度的k近邻概念,通过流形上k近邻相似度度量数据对象间的相似性,采用k近邻的相对紧密度发现不同密度下的类簇,设计近邻点对约束规则搜寻k近邻点对构成的近邻链,归类数据对象及识别离群点。与标准k-means算法、流形距离改进的k-means算法进行了性能比较,在人工数据集和UCI数据集上的仿真实验结果均表明,该算法能有效地处理复杂结构的数据聚类问题,且聚类效果更好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号