期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

于亚新王国仁林利增李淼朱歆华《计算机研究与发展》2010,47(4)

由于从病例库中进行病例的相似性检索关系到能否提供给医生充分且正确的候选病例,因此如何高效、准确地实现影像病例的相似性检索是学术界和医学界的研究热点之一.迄今为止,很多文献提出了用于提高查询精度的检索策略,但涉及检索效率的文章还为之甚少.基于此,提出了一种融多种度量空间相似性计算于一体的M2+-树高维索引技术.该索引将病例中的文本和影像合成一个高维多特征向量,该向量在度量空间上将数据空间划分成若干子空间,并借助关键向量对划分后的数据子空间再进行向量空间上的二次划分.关键向量的无重叠划分和三角不等式过滤原理可以加快病例的检索速度.总之,在度量和向量空间上的两次数据划分使得M2+-索引树大大减少了待查询病例与数据库病例间的不必要相似性计算的次数,从而加快了相似性病例的检索速度.实验结果表明,M2+-树的性能优于典型的度量空间多特征索引代表M2-树的性能. 相似文献

2.

基于相对距离哈希方法的一种高维索引

骆吉洲李建中祝园园高宏《计算机技术》2008,2(1):32-44

人们设计了许多索引以有效地处理高维空间中的近邻查询和区域查询。已经证明,维数较高时利用高维索引处理这两类查询几乎不可能比线性扫描快。提出了一种两层索引以自适应地识别数据集中的聚簇;数据集具有聚簇特性时,用该索引处理邻近查询和区域查询比现有的索引结构快;对其他数据集,利用该索引处理邻近查询和区域查询与线性扫描大致相当。该索引的上层结构将一些参考点组织成一棵二叉树,下层结构是一系列动态哈希表。数据集中的数据点根据它们到参考点的相对距离被哈希到相应的哈希桶中。查询处理时用查询点到参考点的距离进行剪除搜索。实验表明,提出的索引结构具有良好的性能。相似文献

3.

基于相对距离哈希方法的一种高维索引

骆吉洲李建中祝园园高宏《计算机科学与探索》2008,2(1):32-44

人们设计了许多索引以有效地处理高维空间中的近邻查询和区域查询。已经证明,维数较高时利用高维索引处理这两类查询几乎不可能比线性扫描快。提出了一种两层索引以自适应地识别数据集中的聚簇;数据集具有聚簇特性时,用该索引处理邻近查询和区域查询比现有的索引结构快;对其他数据集,利用该索引处理邻近查询和区域查询与线性扫描大致相当。该索引的上层结构将一些参考点组织成一棵二叉树,下层结构是一系列动态哈希表。数据集中的数据点根据它们到参考点的相对距离被哈希到相应的哈希桶中。查询处理时用查询点到参考点的距离进行剪除搜索。实验表明,提出的索引结构具有良好的性能。相似文献

4.

基于流形空间的交互式人脸图像索引

庄毅胡华袁承祥蒋国昌胡海洋琚春华《计算机研究与发展》2010,47(Z1)

认知科学表明基于流形学习的人脸图像检索能准确反映人脸图片的内在相似性和人类的视觉感知本质. 提出一种基于相关反馈的人脸高维索引方法--NDL,以提高人脸图像检索的性能.同时在该索引基础上提出一种流形空间下的相似查询--虚拟k近邻查询(Vk-NN), 该查询方法特别为基于NDL的人脸检索而设计.首先通过在一定阈值约束下计算任何两个人脸图片的相似度,建立一个称为邻接距离表(NDL)的二维距离图. 同时将距离值用B+-树建立索引.最后, 高维流形空间的Vk-NN查询转化为一维空间的基于B+树的查询. 实验表明:NDL索引在流形空间的检索效率明显优于顺序检索,特别适合海量人脸图片的检索. 相似文献

5.

CMRS:聚类的多解析度字符串索引结构

郑若石王镝徐恒宇王国仁陈白尘《小型微型计算机系统》2006,27(3):497-502

随着基因测序技术和人类基因组计划的发展，从大量的生物数据中寻找相似的序列就越来越成为当前研究的热点问题．本文提出了一种聚类的多解析度字符串索引结构，用于解决生物序列的相似性查询问题．首先，以较小容量的MBR（最小绑定矩形）构造基因序列的多解析度字符串索引结构，然后通过对MBR的聚类以夏保序技术的应用，减小索引中MBR的平均体积，从而增加了查询向量到索引的空间距离，提高了索引的过滤能力．还给出了一种新的后处理方法，通过大量的减少编辑距离的计算，提高索引的性能．文中给出了该索引结构并详细介绍了索引的相关算法．实验表明，该索引结构是一种有效的处理生物数据的相似性查询的索引结构．相似文献

6.

支持块编辑距离的索引结构 总被引：1，自引：0，他引：1

王斌郭庆李中博杨晓春《计算机研究与发展》2010,47(1)

在近似字符串匹配中,传统的编辑距离不能很好地衡量诸如人名、地址等数据的相似关系,而块编辑距离可以很好地衡量两个字符串的相似性.如何有效地支持块编辑距离,进行近似字符串查询处理具有重要的意义.计算两个字符串的块编辑距离是一个NP完全问题,因此希望提供有效的方法可以增强过滤能力,并减少假通过率.设计了一种支持移动编辑距离的新颖的索引结构SHV-Trie,通过研究移动编辑距离的操作特性,使用字母出现的频率作为支持移动编辑距离操作的一个下界,并且提出相应的查询过滤算法,同时,针对索引SHV-Trie的空间开销过大的问题,提出一种优化字母排列的索引结构和一种压缩的索引结构及相关查询过滤算法.真实数据集上的实验结果与分析显示了所提出的索引结构具有良好的过滤能力,并通过减少效率假通过率提高查询的效率. 相似文献

7.

基于LSH的时间子序列查询算法

汤春蕾董家麒《计算机学报》2012,35(11):2228-2236

子序列的相似性查询是时间序列数据集中的一种重要操作,包括范围查询和k近邻查询.现有的大多算法是基于欧几里德距离或者DTW距离的,缺点在于查询效率低下.文中提出了一种新的基于LSH的距离度量方法,可以在保证查询结果质量的前提下,极大提高相似性查询的效率;在此基础上,给出一种DS-Index索引结构,利用距离下界进行剪枝,进而还提出了两种优化的OLSH-Range和OLSH-kNN算法.实验是在真实的股票序列集上进行的,数据结果表明算法能快速精确地找出相似性查询结果. 相似文献

8.

RB树:一种支持空间近似关键字查询的外存索引

王金宝高宏李建中杨东华《计算机研究与发展》2012,49(10):2142-2152

空间近似关键字查询包含一个空间条件和一组关键字相似性条件,这种查询在空间数据库中返回同时满足以下条件的对象:1)对象的位置信息满足查询中的空间条件;2)对于查询中的任何一个关键字,对象中至少包含一个关键字与其相似度大于给定阈值.随着当前数据的爆炸性增长,空间数据库无法完整地存放在内存中,因此空间数据库需要支持空间近似关键字查询的外存索引.目前,还没有在外存中支持精确的空间近似关键字查询的索引结构.设计了一种新型的外存索引RB树,在外存中支持精确的空间近似关键字查询.RB树支持的空间近似关键字查询包括多种空间条件,如范围查询、NN查询,同时支持多种关键字相似性度量,包括编辑距离、规范化编辑距离等.通过真实数据中的性能测试验证了RB树的效率. 相似文献

9.

一种基于角相似性的k-最近邻搜索算法*

余小高余小鹏《计算机应用研究》2009,26(9):3296-3299

k-最近邻搜索(KNNS) 在高维空间中应用非常广泛,但目前很多KNNS算法是基于欧氏距离对数据进行索引和搜索,不适合采用角相似性的应用。提出一种基于角相似性的k-最近邻搜索算法(BA-KNNS)。该算法先提出基于角相似性的数据索引结构(BA-Index),参照一条中心线和一条参照线,将数据以系列壳—超圆锥体方式进行组织并分别线性存储;然后确定查询对象的空间位置,有效确定一个以从原点到查询对象的直线为中心线的超圆锥体并在其中进行搜索。实验结果表明,BA-KNNS算法较其他k-最近邻搜索算法有更好的性能。相似文献

10.

RAKDB—Tree——一种基于近似区域的多维数据索引结构

黄维辉熊翱《软件》2013,(11):77-79

多维数据的处理已经成为影响很多领域发展的关键因素,特别是多维数据的相似性查询已经被用在很多领域中。当数据维度很大的时候,大多数索引结构处理的性能下降,这现象被称为“维度灾难”。针对多维度灾难,RAKDB-Tree是本文提出的一种高效处理多维数据的索引结构。该索引结构首先把数据空间划分为子空间,然后使用改进的KDB—Tree对子空间建立索引。RAKDB—Tree的查询、插入、删除等算法使得,索引结构一直保持较优状态。实验结果表明,RAKDB．Tree能够很好解决因为数据维度增加而带来的各种问题。相似文献

11.

基于查询采样的高维数据混合索引

张军旗周向东施伯乐《软件学报》2008,19(8):2054-2065

为了改进高维数据库查询的效率,通常需要根据数据分布来选择合适的索引策略.然而,经典的分布模型难以解决实际应用中图像、视频等高维数据复杂的分布估计问题.提出一种基于查询采样进行数据分布估计的方法,并在此基础上提出了一种支持最近邻查询的混合索引,即针对多媒体数据分布的不均匀性,自适应地对不同分布的数据使用不同的索引结构,建立统一的索引结构.为了实现混合索引,采用构造性方法:首先通过聚类分解分割数据并建立树状索引;然后使用查询采样算法,对数据实际分布进行估计;最后根据数据分布的特性,把稀疏数据从树状索引中剪裁出来,进行基于顺序扫描策略的索引,而分布比较密集的数据仍然保留在树状索引中.在4个真实的图像数据集上进行了充分的实验,结果显示,该索引方法明显优于iDistance,M-Tree等度量空间索引,在维数达到336时,查询效率仍高于顺序扫描.实验结果显示,该查询采样算法在采样数据量仅为N~(1/2)(N为数据量)的情况下即可获得满足索引需要的分布估计结果. 相似文献

12.

BC-iDistance:基于位码的优化高维索引 总被引：1，自引：0，他引：1

梁俊杰冯玉才《小型微型计算机系统》2007,28(9):1647-1651

在高维空间KNN查询算法中,近似向量和一维转换表示法能有效克服维数灾难,本文结合这两种思想,提出一种基于位码的优化高维索引结构（BC-iDistance）.针对iDistance缺点,高维向一维转换引起的大量数据信息丢失,BC-iDistance不仅利用一维距离表示点对象和参考点间的远近关系,而且引入位码近似表示它们之间的位置关系,将高维向量压缩为二维向量表示.利用特殊的B＋树组织,KNN检索时实现两层剪枝处理,降低I/O和距离计算代价.采用模拟数据和真实数据,实验验证了优化后的索引具有更高的检索效率. 相似文献

13.

Querying high-dimensional data in single-dimensional space 总被引：1，自引：0，他引：1

Cui?Yu Email author Stéphane?Bressan Beng?Chin?Ooi Kian-Lee?Tan 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(2):105-119

In this paper, we propose a new tunable index scheme, called iMinMax( ), that maps points in high-dimensional spaces to single-dimensional values determined by their maximum or minimum values among all dimensions. By varying the tuning knob, , we can obtain different families of iMinMax structures that are optimized for different distributions of data sets. The transformed data can then be indexed using existing single-dimensional indexing structures such as the B⁺-trees. Queries in the high-dimensional space have to be transformed into queries in the single-dimensional space and evaluated there. We present efficient algorithms for evaluating window queries as range queries on the single-dimensional space. We conducted an extensive performance study to evaluate the effectiveness of the proposed schemes. Our results show that iMinMax( ) outperforms existing techniques, including the Pyramid scheme and VA-file, by a wide margin. We then describe how iMinMax could be used in approximate K-nearest neighbor (KNN) search, and we present a comparative study against the recently proposed iDistance, a specialized KNN indexing method.Received: 21 May 2000, Revised: 14 March 2002, Published online: 8 April 2004Edited by: M. Kitsuregawa. 相似文献

14.

Composite Distance Transformation for Indexing and k-Nearest-Neighbor Searching in High-Dimensional Spaces 总被引：1，自引：0，他引：1

下载免费PDF全文

庄毅庄越挺吴飞《计算机科学技术学报》2007,22(2)

Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast k-nearest-neighbor (k-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a k-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B -tree. Thus, given a query point, its k-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show-that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree. 相似文献

15.

An encoding-based dual distance tree high-dimensional index

Yi Zhuang YueTing Zhuang Fei Wu 《中国科学F辑(英文版)》2008,51(10):1401-1414

The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree （for encoding-based dual distance tree）, to support fast k-nearest neighbor （k-NN） search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B^＋-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large. 相似文献

16.

RR_tree:多维索引的关系模式实现新方法

下载免费PDF全文

于利胜张倩王珊张延松《计算机科学与探索》2010,4(3):193-201

为了有效地管理多媒体信息、地理信息及空间数据,提出了多种针对多维数据的索引方法。其中一些索引方法已经在现有的商用数据库系统(DBMS)得以实现,然而学术研究及实际应用中需要更多种的多维乃至高维数据索引方法的支持。有研究提出在关系数据库上利用存储结构、存储过程、触发器来模拟并实现X_tree的多维数据索引功能。在此基础上加以改进,重新设计了模式结构,增加了关键的索引,引入了聚簇存储,以关系模式实现多维索引的创建、插入、查询等操作;并且与现有的商用数据库系统的多维索引Oracle Spatial进行了插入、查询的性能比较。实验结果充分证明这种以关系模式实现多维索引方法的可行性与可用性。相似文献

17.

A hyperplane based indexing technique for high-dimensional data 总被引：1，自引：0，他引：1

Guoren Wang Xiangmin Zhou Bin Wang Baiyou Qiao Donghong Han 《Information Sciences》2007,177(11):2255-2268

In this paper, we propose a novel hyperplane based indexing method to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the key dimension based index. Compared with the key dimension concept, the hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Extensive experiments based on two types of real data sets are conducted and the results illustrate a significantly improved filtering efficiency. Because of the feature of hyperplane, the proposed indexing method is only suitable to Euclidean spaces. 相似文献

18.

CVA file: an index structure for high-dimensional datasets

Jiyuan An Hanxiong Chen Kazutaka Furuse Nobuo Ohbo 《Knowledge and Information Systems》2005,7(3):337-357

Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data. 相似文献

19.

Composite Distance Transformation for Indexing and k-Nearest-Neighbor Searching in High-Dimensional Spaces

下载免费PDF全文

Yi Zhuang Yue-Ting Zhuang and Fei Wu 《计算机科学技术学报》2007,22(2):208-217

Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a ＂hard＂ problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast κ-nearest-neighbor （κ-NN） search in high-dimensional spaces. In CDT, all （n） data points are first grouped into some clusters by a κ-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B^＋-tree. Thus, given a query point, its κ-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree. 相似文献

20.

基于向量投影的KNN文本分类算法 总被引：2，自引：0，他引：2

卜凡军钱雪忠《计算机工程与设计》2009,30(21)

针对KNN算法分类时间过长的缺点,分析了提高分类效率的方法.在KNN算法基础上,结合向量投影理论以及iDistance索引结构,提出了一种改进的KNN算法--PKNN.该算法通过比较待分类样本和训练样本的一维投影距离,获得最有可能的临近样本点,减小了参与计算的训练样本数,因此可以减少每次分类的计算量.实验结果表明,PKNN算法可以明显提高KNN算法的效率,PKNN算法的原理决定其适合大容量高维文本分类. 相似文献