首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth- first or a best-first algorithm to the search hierarchy containing the data. These algorithms are generally applicable to any index based on hierarchical clustering. The idea is that the data is partitioned into clusters which are aggregated to form other clusters, with the total aggregation being represented as a tree. These algorithms have traditionally used a lower bound corresponding to the minimum distance at which a nearest neighbor can be found (termed MinDist) to prune the search process by avoiding the processing of some of the clusters as well as individual objects when they can be shown to be farther from the query object q than all of the current k nearest neighbors of q. An alternative pruning technique that uses an upper bound corresponding to the maximum possible distance at which a nearest neighbor is guaranteed to be found (termed MaxNearestDist) is described. The MaxNearestDist upper bound is adapted to enable its use for finding the k nearest neighbors instead of just the nearest neighbor (i.e., k=1) as in its previous uses. Both the depth-first and best-first k-nearest neighbor algorithms are modified to use MaxNearestDist, which is shown to enhance both algorithms by overcoming their shortcomings. In particular, for the depth-first algorithm, the number of clusters in the search hierarchy that must be examined is not increased thereby potentially lowering its execution time, while for the best-first algorithm, the number of clusters in the search hierarchy that must be retained in the priority queue used to control the ordering of processing of the clusters is also not increased, thereby potentially lowering its storage requirements.  相似文献   

2.
The k-nearest neighbor (KNN) rule is a classical and yet very effective nonparametric technique in pattern classification, but its classification performance severely relies on the outliers. The local mean-based k-nearest neighbor classifier (LMKNN) was firstly introduced to achieve robustness against outliers by computing the local mean vector of k nearest neighbors for each class. However, its performances suffer from the choice of the single value of k for each class and the uniform value of k for different classes. In this paper, we propose a new KNN-based classifier, called multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. In our method, the k nearest neighbors in each class are first found, and then used to compute k different local mean vectors, which are employed to compute their harmonic mean distance to the query sample. Finally, MLM-KHNN proceeds in classifying the query sample to the class with the minimum harmonic mean distance. The experimental results, based on twenty real-world datasets from UCI and KEEL repository, demonstrated that the proposed MLM-KHNN classifier achieves lower classification error rate and is less sensitive to the parameter k, when compared to nine related competitive KNN-based classifiers, especially in small training sample size situations.  相似文献   

3.
The problem of k-nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. Among available methods, the principal axis search tree (PAT) algorithm always has good performance on finding nearest k neighbors using the PAT structure and a node elimination criterion. In this paper, a novel kNN search algorithm is proposed. The proposed algorithm stores projection values for all data points in leaf nodes. If a leaf node in the PAT cannot be rejected by the node elimination criterion, data points in the leaf node are further checked using their pre-stored projection values to reject more impossible data points. Experimental results show that the proposed method can effectively reduce the number of distance calculations and computation time for the PAT algorithm, especially for the data set with a large dimension or for a search tree with large number of data points in a leaf node.  相似文献   

4.
针对伪近邻分类算法(LMPNN)对异常点和噪声点仍然敏感的问题,提出了一种基于双向选择的伪近邻算法(BS-PNN)。利用邻近性度量选取[k]个最近邻,让测试样本和近邻样本通过互近邻定义进行双向选择;通过计算每类中互近邻的个数及其局部均值的加权距离,从而得到测试样本到伪近邻的欧氏距离;利用改进的类可信度作为投票度量方式,对测试样本进行分类。BS-PNN算法在处理复杂的分类任务时,具有能够准确识别噪声点,降低近邻个数[k]的敏感性,提高分类精度等优势。在UCI和KEEL的15个实际数据集上进行仿真实验,并与KNN、WKNN、LMKNN、PNN、LMPNN、DNN算法以及P-KNN算法进行比较,实验结果表明,基于双向选择的伪近邻算法的分类性能明显优于其他几种近邻分类算法。  相似文献   

5.
Given a set of data points P and a query point q in a multidimensional space, reverse nearest neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-nearest neighbor (RkNN) query (where k ges 1) generalizes RNN query to find data points whose kNNs include q. For RkNN query semantics, q is said to have influence to all those answer data points. The degree of q's influence on a data point p (isin P) is denoted by kappap where q is the kappap-th NN of p. We introduce a new variant of RNN query, namely, ranked reverse nearest neighbor (RRNN) query, that retrieves t data points most influenced by q, i.e., the t data points having the smallest kappa's with respect to q. To answer this RRNN query efficiently, we propose two novel algorithms, kappa-counting and kappa-browsing that are applicable to both monochromatic and bichromatic scenarios and are able to deliver results progressively. Through an extensive performance evaluation, we validate that the two proposed RRNN algorithms are superior to solutions derived from algorithms designed for RkNN query.  相似文献   

6.
Reverse nearest neighbor (RNN) queries have a broad application base such as decision support, profile-based marketing, resource allocation, etc. Previous work on RNN search does not take obstacles into consideration. In the real world, however, there are many physical obstacles (e.g., buildings) and their presence may affect the visibility between objects. In this paper, we introduce a novel variant of RNN queries, namely, visible reverse nearest neighbor (VRNN) search, which considers the impact of obstacles on the visibility of objects. Given a data set P, an obstacle set O, and a query point q in a 2D space, a VRNN query retrieves the points in P that have q as their visible nearest neighbor. We propose an efficient algorithm for VRNN query processing, assuming that P and O are indexed by R-trees. Our techniques do not require any preprocessing and employ half-plane property and visibility check to prune the search space. In addition, we extend our solution to several variations of VRNN queries, including: 1) visible reverse k-nearest neighbor (VRkNN) search, which finds the points in P that have q as one of their k visible nearest neighbors; 2) delta-VRkNN search, which handles VRkNN retrieval with the maximum visible distance delta constraint; and 3) constrained VRkNN (CVRkNN) search, which tackles the VRkNN query with region constraint. Extensive experiments on both real and synthetic data sets have been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings.  相似文献   

7.
为了更好地解决密度不均衡问题与刻画高维数据相似性度量问题,提出一种基于共享[k]-近邻与共享逆近邻的密度峰聚类算法。该算法计算两个点的共享[k]-近邻数与共享逆近邻数,并结合欧氏距离来确定这两个点之间的共享相似度;将样本点与其逆近邻点的共享相似度之和定义为该点的共享密度,再通过共享密度选取聚类中心。通过实验证明,该算法在人工数据集和真实数据集上的聚类结果较其他密度聚类算法更加准确,并且能更好地处理密度不均衡问题,同时也提高了高维数据的聚类精度。  相似文献   

8.
姚红娟  赵小强  李炜  惠永永 《控制与决策》2021,36(12):3023-3030
针对间歇过程数据的动态特性带来的故障检测问题,提出一种双权重多邻域保持嵌入(double weight multiple neighborhoods preserving embedding,DWMNPE)算法.首先,为每个样本点寻找时间近邻来描述样本点之间的时序相关关系;其次,定义角度近邻,并为样本点寻找角度近邻和距离近邻,以表征样本点在空间上的相似性,通过提取这3种不同的流形特征,充分表征数据的局部结构特征;再次,构造一种新的目标函数,在考虑误差最小的同时兼顾3种近邻的顺序信息,可防止NPE算法在计算重构权值时丢失近邻顺序信息,在解决数据动态性的同时充分提取原始数据的本质局部结构;最后,对降维数据构造局部离群因子(local outlier factor,LOF)统计量进行监控,消除数据非高斯特性对监控效果的不利影响.数值例子和青霉素发酵过程仿真结果验证了DWMNPE方法对动态性间歇过程故障检测的有效性.  相似文献   

9.
A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition   总被引:2,自引:0,他引:2  
We propose a fast nearest neighbor finding algorithm, named tentatively an ordered partition, based on the ordered lists of the training samples of each projection axis. The ordered partition contains two properties, one is ordering?to bound the search region, and the other is partitioning?to reject the unwanted samples without actual distance computations. It is proved that the proposed algorithm can find k nearest neighbors in a constant expected time. Simulations show that the algorithm is rather distribution free, and only 4.6 distance calculations, on the average, were required to find a nearest neighbor among 10 000 samples drawn from a bivariate normal distribution.  相似文献   

10.
针对DBSCAN算法中最小点数和最大邻域半径难以确定,算法时间开销大,对起始数据点的选择比较敏感,以及难以发现不同密度下的邻近簇等问题,本文提出一种基于扩展区域查询的密度聚类算法(GISN-DBSCAN)。该方法首先提出扩展区域查询算法,随后采用最近邻域和反最近邻域的邻域关系,建立每个点的k-影响空间域,最后提出一种异常点判定函数,使得算法能够准确的识别边界点和噪声点。实验结果表明:GISN-DBSCAN算法能够有效的解决DBSCAN算法的不足。  相似文献   

11.
针对网络空间中有范围约束、不确定对象的最近邻查询问题,提出范围受限的网络空间模糊对象最近邻查询概念,并根据查询顺序的不同,给出NN-R查询算法和R-NN查询算法。两种算法均采用网络位置信息与连接信息分别存储的方式,使用聚类文件进行组织,减少I/O操作。NN-R算法在近邻查询过程中利用查询对象与受限范围的α-距离作为约束,缩小搜索范围。R-NN算法将受限范围内查询对象的欧氏近邻作为候选对象,利用欧氏距离的下界性与易求性降低时间复杂度。两种算法时间复杂度分别为O((log_(m1)|E|+(|V~*|m3+1)log_(m2)|V|+|E|+|V|log|V|+n(lgn+1))和O(log_(m4)n+(k+1)log_(m1)|E|+|E|+|V|log|V|)。实验结果表明,在各自适用条件下,两种算法均有较好的性能。  相似文献   

12.
The problem of k nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. In this paper, a novel fast kNN search method using an orthogonal search tree is proposed. The proposed method creates an orthogonal search tree for a data set using an orthonormal basis evaluated from the data set. To find the kNN for a query point from the data set, projection values of the query point onto orthogonal vectors in the orthonormal basis and a node elimination inequality are applied for pruning unlikely nodes. For a node, which cannot be deleted, a point elimination inequality is further used to reject impossible data points. Experimental results show that the proposed method has good performance on finding kNN for query points and always requires less computation time than available kNN search algorithms, especially for a data set with a big number of data points or a large standard deviation.  相似文献   

13.
使用R树进行k-NN搜索   总被引:1,自引:0,他引:1  
在地理信息系统中经常要做k-NN搜索,进行这些查询用到的算法与位置和范围查询的算法不同,需要专门进行研究,介绍了一种分支界限遍历R树算法,并将该算法概括为k-NN算法。文中讨论了两种方法。对R树进行结点内MBR的排序以及剪枝过程,以减少搜索空间中需访问结点的数量,有效地进行k-NN搜索。  相似文献   

14.
针对现有聚类算法普遍存在聚类质量低、参数依赖性大、孤立点难识别等问题,提出一种基于数据场的聚类算法。该算法通过计算每个数据对象点的势值,根据类簇中心的势值比周围邻居的势值大,且与其他类簇中心有相对较大距离的特点,确定类簇中心;根据孤立点的势值等于零的特点,选出孤立点;最后将其他数据对象点划分到比自身势值大且最近邻的类簇中,从而实现聚类。仿真实验表明,该算法在不需要人为调参的情况下准确找出类簇中心和孤立点,聚类效果优良,且与数据集的形状无关。  相似文献   

15.
路网中互近邻查询处理方法   总被引:1,自引:0,他引:1  
提出路网中的互近邻查询问题.给定路网G(V,E),对象集P,查询点q,近邻数k1和k2,互近邻查询返回既是q的k1近邻,又是q的反k2近邻的对象集.为解决该问题,首先提出基础算法,即先求出查询点q的k1近邻作为候选,再验证这些候选是否为真正的结果.然后,在此基础上提出了优化算法,根据落在对象点与查询点最短路径边上的标记点个数直接排除掉一些错误的候选对象.最后,通过实验验证了优化算法的有效性.  相似文献   

16.
Given a collection of n points in the plane, we exhibit an algorithm that computes the nearest neighbor in the north-east (first quadrant) of each point, in the L1 metric. By applying a suitable transformation to the input points, the same procedure can be used to compute the L1 nearest neighbor in any given octant of each point. This is the basis of an algorithm for computing the minimum spanning tree of the n points in the L1 metric. All three algorithms run in O(n lg n) total time and O(n) space.  相似文献   

17.
In multimedia databases, k-nearest neighbor queries are popular and frequently contain non-spatial predicates. Among the available techniques for such queries, the incremental nearest neighbor algorithm proposed by Hjaltason and Samet is known as the most useful algorithm [16]. The reason is that if k > k neighbors are needed, it can provide the next neighbor for the upper operator without restarting the query from scratch. However, the R-tree in their algorithm has no facility capable of partially pruning tuple candidates that will turn out not to satisfy the remaining predicates, leading their algorithm to inefficiency. In this paper, we propose an RS-tree-based incremental nearest neighbor algorithm complementary to their algorithm. The RS-tree used in our algorithm is a hybrid of the R-tree and the S-tree, as its buddy tree, based on the hierarchical signature file. Experimental results show that our RS-tree enhances the performance of Hjaltason and Samet's algorithm.  相似文献   

18.
提出一种基于特征点的多幅图像自动拼接算法。根据SIFT或SURF算法在图像的尺度空间中提取特征点,对特征点进行亚像素定位,并赋予主方向。根据特征点邻域信息分布计算得到特征向量后,基于k-d树进行最近邻和次最近邻搜索,利用最近邻特征点距离与次近邻特征点距离之比得到初始匹配点对。使用RANSAC(Random Sample Consensus)算法剔除错误匹配特征点对,同时对图像之间的变换参数进行鲁棒估计,使用多频带融合算法消除拼接痕迹。实验验证了该算法能够完成多幅图像的自动无缝拼接。  相似文献   

19.
Reverse nearest neighbors in large graphs   总被引:3,自引:0,他引:3  
A reverse nearest neighbor (RNN) query returns the data objects that have a query point as their nearest neighbor (NN). Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we provide a fundamental lemma, which can be used to prune the search space while traversing the graph in search for RNN. Based on it, we develop two RNN methods; an eager algorithm that attempts to prune network nodes as soon as they are visited and a lazy technique that prunes the search space when a data point is discovered. We study retrieval of an arbitrary number k of reverse nearest neighbors, investigate the benefits of materialization, cover several query types, and deal with cases where the queries and the data objects reside on nodes or edges of the graph. The proposed techniques are evaluated in various practical scenarios involving spatial maps, computer networks, and the DBLP coauthorship graph.  相似文献   

20.
Many standard image processing operations can be implemented using quadtrees as a simple tree traversal where, at each terminal node, a computation is performed involving some of that node's neighbors. Most of this work has involved the use of bottom-up neighbor-finding techniques which search for a nearest common ancestor. Recently, top-down techniques have been proposed which make use of a neighbor vector as the tree is traversed. A simplified version of the top-down method for a quadtree in the context of a general-purpose tree traversal algorithm is presented. It differs, in part, from prior work in its ability to compute diagonally adjacent neighbors rather than just horizontally and vertically adjacent neighbors. It builds a neighbor vector for each node using a minimal amount of information. Analysis of the algorithm shows that its execution time is directly proportional to the number of nodes in the tree. However, it does require some extra storage. Use of the algorithm leads to lower execution time bounds for some common quadtree image processing operations such as connected component labeling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号