首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
大数据时代的到来,快速而准确的索引算法对信息检索至关重要。针对基于随机投影构成的单表哈希检索方法导致搜索性能低的问题,提出一种基于主成分的多表图像哈希检索方法。为了得到高效的哈希编码保证不同语义样本特征的区分性,首先通过主元分析方法保留训练集具有区分性图像特征,此外利用特征聚类作为学习哈希投影的指引构建多个索引表;其次采用正交旋转矩阵对哈希投影进行优化,保证了相同语义的样本具有相似的哈希码。最后分别在CIFAR-10和Caltech-256数据集上与相关方法进行比较,实验结果表明提出的方法提高了检索性能。  相似文献   

2.
张鸿骏  武延军  张珩  张立波 《软件学报》2020,31(10):3038-3055
散列表(hash table)作为一类根据关键码值(key value)提供高效数据访问的数据索引结构,其广泛应用于各类计算机应用中,尤其是在对性能要求极高的系统软件、数据库以及高性能计算领域.在网络、云计算和物联网服务方面,以散列表为核心结构已经成为缓存系统的重要系统组件.然而,随着大规模数据量的大幅度增加,以多核CPU为核心设计散列表结构的系统已经逐渐出现性能瓶颈,亟需进一步改进散列表的高性能和可扩展性.随着通用图形处理器(graphic processing unit,简称GPU)的日益普及以及硬件计算能力和并发性能的大幅度提升,各类以并行计算为核心的系统软件任务在GPU上进行了优化设计并得到可观的性能提升.由于存在稀疏性和随机性,采用现有散列表的并行结构直接在GPU上应用势必会带来高频次的内存访问和频繁的总线数据传输,影响了散列表在GPU上的性能发挥.重点分析了缓存系统中散列表索引的内存访问、命中率与索引开销,提出并设计了一种适应GPU的混合访问缓存索引框架CCHT(cache cuckoo hash table),提供了两种适应不同命中率和索引开销要求的缓存策略,允许写入与查询操作并发执行,最大程度地利用了GPU硬件的计算性能与并发特性,减少了内存访问与总线传输.通过在GPU硬件上的实现与实验验证,CCHT在保证缓存命中率的同时,性能优于其他用于缓存索引的散列表.  相似文献   

3.
Hash tables, as a type of data indexing structure that provides efficient data access based on key values, are widely used in various computer applications, especially in system software, databases, and high-performance computing field that requires extremely high performance. In network, cloud computing and IoT services, hash tables have become the core system components of cache systems. However, with the large-scale increase in the amount of large-scale data, performance bottlenecks have gradually emerged in systems designed with a multi-core CPU as the core of the hash table structure. There is an urgent need to further improve the high performance and scalability of the hash tables. With the increasing popularity of general-purpose Graphic Processing Units (GPUs) and the substantial improvement of hardware computing capabilities and concurrency performance, various types of system software tasks with parallel computing as the core have been optimized on the GPU and have achieved considerable performance promotion. Due to the sparseness and randomness, using the existing parallel structure of the hash tables directly on the GPUs will inevitably bring high-frequency memory access and frequent bus data transmission, which affects the performance of the hash tables on the GPUs. This study focuses on the analysis of memory access, hit ratio, and index overhead of hash table indexes in the cache system. A hybrid access cache indexing framework CCHT (Cache Cuckoo Hash Table) adapted to GPU is proposed and provided. The cache strategy suitable to different requirements of hit ratios and index overheads allows concurrent execution of write and query operations, maximizing the use of the computing performance and concurrency characteristics of GPU hardware, reducing memory access and bus transferring overhead. Through GPU hardware implementation and experimental verification, CCHT has better performance than other cache indexing hash tables while ensuring cache hit ratios.  相似文献   

4.
The perceptual video hash function defines a feature vector that characterizes a video depending on its perceptual contents. This function must be robust to the content preserving manipulations and sensitive to the content changing manipulations. In the literature, the subspace projection techniques such as the reduced rank PARAllel FACtor analysis (PARAFAC), have been successfully applied to extract perceptual hash for the videos. We propose a robust perceptual video hash function based on Tucker decomposition, a multi-linear subspace projection method. We also propose a method to find the optimum number of components in the factor matrices of the Tucker decomposition. The Receiver Operating Characteristics (ROC) curves are used to evaluate the performance of the proposed algorithm compared to the other state-of-the-art projection techniques. The proposed algorithm shows superior performance for most of the image processing attacks. An application for indexing and retrieval of near-identical videos is developed using the proposed algorithm and the performance is evaluated using average recall/precision curves. The experimental results show that the proposed algorithm is suitable for indexing and retrieval of near-identical videos.  相似文献   

5.
现有的指纹索引方法大多是基于实数值特征向量,当应用于大规模指纹库时无法避免计算资源与存储空间消耗巨大的问题。为了在海量指纹库中进行高效快速检索并得到实时响应结果,提出了一种全新的基于有监督深度哈希的指纹索引方法。将传统指纹领域知识与自注意力深度哈希模型相结合。传统领域知识用于指纹图像预处理来获取指纹二值骨架图,自注意力深度哈希模型进行特征提取与哈希映射得到二进制编码。其中特征提取模块使用Transformer结构替换卷积神经网络来提取指纹细节特征,此外模型中加入了自动对齐模块并设计了一种STN-AE的结构来辅助训练该模块。最后在NIST4、NIST14、FVC2000、FVC2002、FVC2004等公开指纹数据集上进行了实验,实验结果证实该方法在提高海量指纹库中的检索速度以及降低存储消耗等方面是卓有成效的。  相似文献   

6.
以往的研究大多针对文件系统,而DBMS存在更多细粒度的更新.本文综合考虑闪存自身的特点、设备种类繁多及不同闪存设备读写特性差别大等,提出了一种基于闪存的DBMS索引结构:LD_B+树.LD_B+树根据工作负载的读写特性动态地调节索引模式使之能够适应于不同种类的闪存设备.LD_B+树采用日志结构组织结点,通过结点转换表和日志缓冲区维护索引结构.模拟实验结果表明,不同闪存设备及工作负载下,LD_B+索引结构比B+树和日志型B+树(BFTL)具有6%-63%的性能提高.  相似文献   

7.
基于最小生成树的图数据库索引算法   总被引:1,自引:0,他引:1  
李楠  高宏  李建中 《软件学报》2009,20(Z1):144-153
对复杂数据进行图模式建模近几年越来越流行,因此,在查询执行的优化过程中图索引技术变得至关重要.研究了图模式的索引问题,并且提出了一种近似的索引方法,称为MSTA方法.MSTA方法利用最小生成树结构作为索引特征,依据最小生成树边序列的包含关系和基于最大公共子图的图距离度量,将最小生成树组织到一个称为MST树的索引结构中.MST树索引结构可以高效地支持多种查询,例如子图查询.MSTA方法具备高效的索引性能.在索引大小和索引建立时间方面,传统方法是MSTA方法的数十倍,甚至上百倍.MSTA方法虽然不能返回完整结果,但是可以返回经图距离度量排序最好的部分结果.  相似文献   

8.
多维向量动态索引结构研究   总被引:4,自引:0,他引:4  
多维向量的索引技术是多媒体数据库系统中的关键技术之一.集中研究基于向量空间模型的动态索引结构,以解决在图像数据库系统中按内容快速检索图像的对象问题.在分析研究R-Tree和R*-Tree的基础上,提出了ER-Tree动态索引结构.该索引树用超球体划分多维向量空间,以有利于计算最近邻;吸取R*-Tree树的重插技术,以增强索引树对数据集整体特征的表达能力,从而提高检索效率;通过引入插入安全点和删除安全点概念,有效地提高建树的效率.同时,给出了基于该结构的特征向量插入算法.实验结果表明,所提出的索引结构建树的  相似文献   

9.
在此提出了一种基于速度分布的HR树索引结构,首先在速度域中对移动对象集进行规则划分,根据速度标量大小将移动对象划分到不同的速度树中,每棵速度树中移动对象具有相近的速度;对每棵速度树中的移动对象,则利用时间间隔进行划分。HR树索引增加了两个分别建于叶节点和根节点之上的Hash辅助索引结构,并基于HR树提出了反向最近邻查询算法,具有很好的动态更新性能和并发性。实验结果与分析表明,基于HR树索引的反向最近邻查询算法具有良好的更新及查询性能,优于通用的TPR树索引。  相似文献   

10.
In wireless mobile computing environments, broadcasting is an effective and scalable technique to disseminate information to a massive number of clients, wherein the energy usage and latency are considered major concerns. This paper presents an indexing scheme for the energy- and latency-efficient processing of full-text searches over the wireless broadcast data stream. Although a lot of access methods and index structures have been proposed in the past for full-text searches, all of them are targeted for data in disk storage, not wireless broadcast channels. For full-text searches on a wireless broadcast stream, we firstly introduce a naive, inverted list-style indexing method, where inverted lists are placed in front of the data on the wireless channel. In order to reduce the latency overhead, we propose a two-level indexing method which adds another level of index structure to the basic inverted list-style index. In addition, we propose a replication strategy of the index list and index tree to further improve the latency performance. We analyze the performance of the proposed indexing scheme with respect to the latency and energy usage measures, and show the optimality of index replication. The correctness of the analysis is demonstrated through simulation experiments, and the effectiveness of the proposed scheme is shown by implementing a real wireless information delivery system.  相似文献   

11.
近年来,深度有监督哈希检索方法已成功应用于众多图像检索系统中。但现有方法仍然存在一些不足:一是大部分深度哈希学习方法都采用对称策略来训练网络,但该策略训练通常比较耗时,难以用于大规模哈希学习过程;二是哈希学习过程中存在离散优化问题,现有方法将该问题进行松弛,但难以保证得到最优解。为解决上述问题,提出了一种贪心非对称深度有监督哈希图像检索方法,该方法将贪心算法和非对称策略的优势充分结合,进一步提高了哈希检索性能。在两个常用数据集上与17种先进方法进行比较。在CIFAR-10数据集上48 bit条件下,与性能最好的方法相比mAP提高1.3%;在NUS-WIDE数据集上所有bit下,mAP平均提高2.3%。在两个数据集上的实验结果表明,该方法可以进一步提高哈希检索性能。  相似文献   

12.
The study of indexing techniques on object oriented databases   总被引:1,自引:0,他引:1  
An object-oriented database (OODB) has been becoming more important in recent years. It can deal with a large amount of complex objects and relationships that relational database (RDB) systems cannot handle well. However, the retrieval and update performance of an OODB depends on indexing techniques. In this paper, we study the indexing techniques on OODBs, based on an inheritance hierarchy and an aggregation hierarchy. Given the access probability and the size of each class, we propose a cost function to evaluate the gain of building an index on an inheritance hierarchy. For an aggregation hierarchy, we use a path-catenation technique to evaluate how to build index files on classes. Through some experiments, we found our methods have better retrieval performance than most ones proposed before.  相似文献   

13.
一种基于固定网络的移动对象运动轨迹索引模型   总被引:2,自引:0,他引:2  
实际应用中移动对象通常运动在城市固定道路上,针对此特征研究人员已提出一些相关索引模型,但都存在一定的局限性,表现为索引模型只管理对象的历史位置信息或实时位置信息以及只对窗口查询或轨迹查询进行优化.IMTFN是一种基于固定网络的移动对象运动轨迹索引模型,管理移动对象的实时位置信息和历史轨迹信息,并且有效优化窗口查询及轨迹查询操作.IMTFN由一个管理固定网络的2D R^*-Tree、一组管理移动对象运动轨迹的1D R^*-Tree以及记录移动对象实时位置信息的Hash结构组成.最后通过实验IMTFN分别与STR-Tree与FNR-Tree进行性能比较,证明IMTFN模型提供速度更快的查询操作.  相似文献   

14.
Content-based indexing of multimedia databases   总被引:1,自引:0,他引:1  
Content-based retrieval of multimedia database calls for content-based indexing techniques. Different from conventional databases, where data items are represented by a set of attributes of elementary data types, multimedia objects in multimedia databases are represented by a collection of features; similarity of object contents depends on context and frame of reference; and features of objects are characterized by multimodal feature measures. These lead to great challenges for content-based indexing. On the other hand, there are special requirements on content-based indexing: to support visual browsing, similarity retrieval, and fuzzy retrieval, nodes of the index should represent certain meaningful categories. That is to say that certain semantics must be added when performing indexing. ContIndex, the context-based indexing technique presented in this paper, is proposed to meet these challenges and special requirements. The indexing tree is formally defined by adapting a classification-tree concept. Horizontal links among nodes in the same level enhance the flexibility of the index. A special neural-network model, called Learning based on Experiences and Perspectives (FEP), has been developed to create node categories by fusing multimodal feature measures. It brings into the index the capability of self-organizing nodes with respect to certain context and frames of reference. An icon image is generated for each intermediate node to facilitate visual browsing. Algorithms have been developed to support multimedia object archival and retrieval using Contlndex  相似文献   

15.
随着多媒体技术的发展,许多领域产生大量的高维数据集。为了有效地检索这些高维数据,高维索引成为人们研究的热点。聚类树是一种有效地支持高维数据检索的索引结构。提出了一种基于子空间聚类的聚类树结构,该索引结构基于一种改进的CLIQUE聚类算法,利用小波变换的多尺度特性对图像特征分布曲线进行不同尺度的小波变换,去除一些小的分类和可能的噪声干扰,从而得到不同粒度下的层次聚类。在层次聚类的基础上,建立起分层索引结构。由于改进的聚类算法使用爬山法确定子空间聚类,因而有效地避免了用户参数的定义。实验结果证明,该方法在不需要用户设定聚类参数下能够进行有效聚类,在不同尺度下构建的聚类结构能够有效地组织图像关系,大大提高图像的检索效率。  相似文献   

16.
Persistent indexing structures are proposed in response to emerging non-volatile memory(NVM)to provide high performance yet durable indexes.However,due to the lack of real NVM hardware,many prior persistent indexing structures were evaluated via emulation,which varies a lot across different setups and differs from the real deployment.Recently,Intel has released its Optane DC Persistent Memory Module(PMM),which is the first production-ready NVM.In this paper,we revisit popular persistent indexing structures on PMM and conduct comprehensive evaluations to study the performance differences among persistent indexing structures,including persistent hash tables and persistent trees.According to the evaluation results,we find that Cacheline-Conscious Extendible Hashing(CCEH)achieves the best performance among all evaluated persistent hash tables,and Failure-Atomic ShifT B+-Tree(FAST)and Write Optimal Radix Tree(WORT)perform better than other trees.Besides,we find that the insertion performance of hash tables is heavily influenced by data locality,while the insertion latency of trees is dominated by the flush instructions.We also uncover that no existing emulation methods accurately simulate PMM for all the studied data structures.Finally,we provide three suggestions on how to fully utilize PMM for better performance,including using clflushopt/clwb with sfence instead of clflush,flushing continuous data in a batch,and avoiding data access immediately after it is flushed to PMM.  相似文献   

17.
Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1) Semantics-sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.  相似文献   

18.
Fast similarity retrieval is critical for content-based image retrieval systems. Tree indexing is a classical technique for fast retrieval, but the practical performance increase offered by the indexing tree depends on the intrinsic dimension of the data. Data with a low intrinsic dimension can be indexed more efficiently than data with high intrinsic dimension. This suggests that an indexing tree that is adapted to the data distribution may be more efficient. This paper proposes two adaptation procedures that are guaranteed to improve indexing efficiency. The procedures are based on a formula for average number of node tests incurred during the retrieval. The formula clearly shows how indexing performance varies with the distribution of feature points and the query. Greedy and optimal tree adaptation procedures are derived based on the formula. Both procedures explicitly enhance the retrieval performance of indexing trees. The optimally adapted tree carries the mathematical guarantee that it is the best performing tree in a set of possible trees obtained by node elimination. The adaptation procedures are applied to kdb-trees and hierarchical clustering trees for indexing synthetic as well as real data sets in medical image databases. Experimental results validate the claim that adaptation procedures increase retrieval efficiency.  相似文献   

19.
在时空数据库中,频繁更新会导致TPR树更新与查询性能下降。针对该问题,提出MAH—TPR索引方法,分别对预处理过程、索引结构及更新算法进行优化。在构建索引及更新操作时,通过使用空间聚类来减少节点间空间区域的交叠几率。引入基于磁盘的Hash辅助存储结构,在直接访问叶节点的基础上进一步减少磁盘I/O的操作。引入基于内存的移动对象辅助存储结构,用于存储发出频繁更新请求,以避免主索引结构节点的合并和分裂。实验结果表明,MAH—TPR索引方法的查询性能优于HTPR方法和LGU方法,更新性能优于HTPR索引方法。  相似文献   

20.
Storing and querying high-dimensional data are important problems in designing an information retrieval system. Two crucial issues, time and space efficiencies, must be considered when evaluating the performance of such a system. The KDB-tree and its variants have been reported to have good performance by using them as the index structure for retrieving multidimensional data. However, they all suffer from low storage utilization problem caused by imperfect “splitting policies.” Unnecessary splits increase the size of the index structure and deteriorate the performance of the system. In this paper, a new data insertion algorithm with a better splitting policy was proposed, which arranges data entries in the leaf nodes as many as possible. Our new index scheme can increase the storage utilization up to nearly 100% and reduce the index size to a smaller scale. As a result, both time and space efficiencies are significantly improved. Analytical and experimental results show that our indexing method outperforms the traditional KDB-tree and its variants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号