首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 206 毫秒
1.
为了解决大规模数据集下传统视觉词袋模型生成时间长、内存消耗大且分类精度低等问题,提出了基于监督核哈希(Supervised Hashing with Kernels,KSH)的视觉词袋模型.首先,提取图像的SIFT特征点,构造特征点样本集.然后,学习KSH函数,将距离相近的特征点映射成相同的哈希码,每一个哈希码代表聚类中心,构成视觉词典.最后,利用生成的视觉词典,将图像表示为直方图向量,并应用于图像分类.在标准数据集上的实验结果表明,该模型生成的视觉词典具有较好的区分度,有效地提高了图像分类的精度和效率.  相似文献   

2.
针对深度描述子无法提供图像特征之间关联性的问题进行了研究,提出了一种融合特征关联性的深度哈希图像表示方法,这种方法将深度描述子之间的关系融入到图像内容的描述中,用于提高图像检索性能。首先,通过预训练网络生成图像的特征映射,并在此基础上提取出深度特征描述子。然后,将深度特征描述子映射为深度视觉词,从而用于深度视觉词的频繁项集发现。接下来将离散值的深度视觉词图像表示和哈希值的频繁项集图像表示连接生成图像表示。最后,算法通过图像类内、类间的相似性关系构造优化,得到最优的阈值,用于将图像表示变为哈希值。实验中,将提出的方法与一些优秀的图像表示方法在holiday、Oxford和Paris图像集的图像检索任务中进行了性能比对,用于证明此方法的有效性。  相似文献   

3.
为了使图像哈希具有更好的鲁棒性和安全性,在脊波域中提出了一种新的基于视觉特性的图像哈希方法。该方法首先将原始图像预处理,然后利用人类视觉特性提取脊波变换域上的系数,最后量化并压缩生成最终哈希值。实验结果表明,该方法能够抵抗JPEG压缩、滤波、加噪声、裁剪和旋转等攻击,且能够区分恶意窜改。生成的哈希值具有很好的鲁棒性和安全性。  相似文献   

4.
汪海龙  禹晶  肖创柏 《自动化学报》2021,47(5):1077-1086
哈希学习能够在保持数据之间语义相似性的同时, 将高维数据投影到低维的二值空间中以降低数据维度实现快速检索. 传统的监督型哈希学习算法主要是将手工设计特征作为模型输入, 通过分类和量化生成哈希码. 手工设计特征缺乏自适应性且独立于量化过程使得检索的准确率不高. 本文提出了一种基于点对相似度的深度非松弛哈希算法, 在卷积神经网络的输出端使用可导的软阈值函数代替常用的符号函数使准哈希码非线性接近-1或1, 将网络输出的结果直接用于计算训练误差, 在损失函数中使用$\ell_1$范数约束准哈希码的各个哈希位接近二值编码. 模型训练完成之后, 在网络模型外部使用符号函数, 通过符号函数量化生成低维的二值哈希码, 在低维的二值空间中进行数据的存储与检索. 在公开数据集上的实验表明, 本文的算法能够有效地提取图像特征并准确地生成二值哈希码, 且在准确率上优于其他算法.  相似文献   

5.
针对当前主流视觉问答(visual question answering,VQA)任务使用区域特征作为图像表示而面临的训练复杂度高、推理速度慢等问题,提出一种基于复合视觉语言的卷积网络(composite visionlinguistic convnet,CVlCN)来对视觉问答任务中的图像进行表征.该方法将图像特征和问题语义通过复合学习表示成复合图文特征,然后从空间和通道上计算复合图文特征的注意力分布,以选择性地保留与问题语义相关的视觉信息.在VQA-v2数据集上的测试结果表明,该方法在视觉问答任务上的准确率有明显的提升,整体准确率达到64.4%.模型的计算复杂度较低且推理速度更快.  相似文献   

6.
目的 视觉检索需要准确、高效地从大型图像或者视频数据集中检索出最相关的视觉内容,但是由于数据集中图像数据量大、特征维度高的特点,现有方法很难同时保证快速的检索速度和较好的检索效果。方法 对于面向图像视频数据的高维数据视觉检索任务,提出加权语义局部敏感哈希算法(weighted semantic locality-sensitive hashing, WSLSH)。该算法利用两层视觉词典对参考特征空间进行二次空间划分,在每个子空间里使用加权语义局部敏感哈希对特征进行精确索引。其次,设计动态变长哈希码,在保证检索性能的基础上减少哈希表数量。此外,针对局部敏感哈希(locality sensitive hashing, LSH)的随机不稳定性,在LSH函数中加入反映参考特征空间语义的统计性数据,设计了一个简单投影语义哈希函数以确保算法检索性能的稳定性。结果 在Holidays、Oxford5k和DataSetB数据集上的实验表明,WSLSH在DataSetB上取得最短平均检索时间0.034 25 s;在编码长度为64位的情况下,WSLSH算法在3个数据集上的平均精确度均值(mean average precision,mAP)分别提高了1.2%32.6%、1.7%19.1%和2.6%28.6%,与几种较新的无监督哈希方法相比有一定的优势。结论 通过进行二次空间划分、对参考特征的哈希索引次数进行加权、动态使用变长哈希码以及提出简单投影语义哈希函数来对LSH算法进行改进。由此提出的加权语义局部敏感哈希(WSLSH)算法相比现有工作有更快的检索速度,同时,在长编码的情况下,取得了更为优异的性能。  相似文献   

7.
多核学习方法(Multiple kernel learning, MKL)在视觉语义概念检测中有广泛应用, 但传统多核学习大都采用线性平稳的核组合方式而无法准确刻画复杂的数据分布. 本文将精确欧氏空间位置敏感哈希(Exact Euclidean locality sensitive Hashing, E2LSH)算法用于聚类, 结合非线性多核组合方法的优势, 提出一种非线性非平稳的多核组合方法—E2LSH-MKL. 该方法利用Hadamard内积实现对不同核函数的非线性加权,充分利用了不同核函数之间交互得到的信息; 同时利用基于E2LSH哈希原理的聚类算法,先将原始图像数据集哈希聚类为若干图像子集, 再根据不同核函数对各图像子集的相对贡献大小赋予各自不同的核权重, 从而实现多核的非平稳加权以提高学习器性能; 最后,把E2LSH-MKL应用于视觉语义概念检测. 在Caltech-256和TRECVID 2005数据集上的实验结果表明,新方法性能优于现有的几种多核学习方法.  相似文献   

8.
丁光耀  徐辰  钱卫宁  周傲英 《软件学报》2024,35(3):1207-1230
计算机视觉因其强大的学习能力,在各种真实场景中得到了广泛应用.随着数据库的发展,利用数据库中成熟的数据管理技术来处理视觉分析应用,已成为一种日益增长的研究趋势.图像、视频和文本等多模态数据的相互融合处理,也促进了视觉分析应用的多样性和准确性.近年来,因深度学习的兴起,支持深度学习的视觉分析应用开始受到广泛关注.然而,传统的数据库管理技术在深度学习场景下面临着复杂视觉分析语义难以表达、应用执行效率低等问题.因此,支持深度学习的视觉数据库管理系统得到了广泛关注.综述了目前视觉数据库管理系统的研究进展:首先,总结了视觉数据库管理系统在不同层面上面临的挑战,包括编程接口、查询优化、执行调度和数据存储;其次,分别探讨了上述4个层面上的相关技术;最后,对视觉数据库管理系统未来的研究方向进行了展望.  相似文献   

9.
刘冶  潘炎  夏榕楷  刘荻  印鉴 《计算机科学》2016,43(9):39-46, 51
在大数据时代,图像检索技术在大规模数据上的应用是一个热门的研究领域。近年来,大规模图像检索系统中, 图像哈希算法 由于具备提高图像的检索效率同时减少储存空间的优点而受到广泛的关注。现有的有监督学习哈希算法存在一些问题,主流的有监督的哈希算法需要通过图像特征提取器获取人为构造的图像特征表示,这种做法带来的图像特征损失影响了哈希算法的效果,也不能较好地处理图像数据集中语义的相似性问题。随着深度学习在大规模数据上研究的兴起,一些相关研究尝试通过深度神经网络进行有监督的哈希函数学习,提升了哈希函数的效果,但这类方法需要针对数据集人为设计复杂的深度神经网络,增大了哈希函数设计的难度,而且深度神经网络的训练需要较多的数据和较长的时间,这些问题影响了基于深度学习的哈希算法在大规模数据集上的应用。针对这些问题,提出了一种基于深度卷积神经网络的快速图像哈希算法,该算法通过设计优化问题的求解方法以及使用预训练的大规模深度神经网络,提高了哈希算法的效果,同时明显地缩短了复杂神经网络的训练时间。根据在不同图像数据集上的实验结果分析可知, 与现有的基准算法相比,提出的算法在哈希函数训练效果和训练时间上都具有较大的提高。  相似文献   

10.
视觉故事生成是图像内容描述衍生的跨模态学习任务,在图文游记自动生成、启蒙教育等领域有较好的应用研究意义。目前主流方法存在对图像细粒度特征描述薄弱、故事文本的图文相关性低、语言不丰富等问题。为此,该文提出了基于细粒度视觉特征和知识图谱的视觉故事生成算法。该算法针对如何对图像内容进行充分挖掘和扩展表示,在视觉和高层语义方面,分别设计实现了图像细粒度视觉特征生成器和图像语义概念词集合生成器两个重要模块。在这两个模块中,细粒度视觉信息通过含有实体关系的场景图结构进行图卷积学习,高层语义信息综合外部知识图谱与相邻图像的语义关联进行扩充丰富,最终实现对图像序列内容较为全面细致的表示。该文算法在目前视觉故事生成领域规模最大的VIST数据集上与主流先进的算法进行了测试。实验结果表明,该文所提算法生成的故事文本,在图文相关性、故事逻辑性、文字多样性等方面,在Distinct-N和TTR等客观指标上均取得较大领先优势,具有良好的应用前景。  相似文献   

11.
Learning-based hashing methods are becoming the mainstream for large scale visual search. They consist of two main components: hash codes learning for training data and hash functions learning for encoding new data points. The performance of a content-based image retrieval system crucially depends on the feature representation, and currently Convolutional Neural Networks (CNNs) has been proved effective for extracting high-level visual features for large scale image retrieval. In this paper, we propose a Multiple Hierarchical Deep Hashing (MHDH) approach for large scale image retrieval. Moreover, MHDH seeks to integrate multiple hierarchical non-linear transformations with hidden neural network layer for hashing code generation. The learned binary codes represent potential concepts that connect to class labels. In addition, extensive experiments on two popular datasets demonstrate the superiority of our MHDH over both supervised and unsupervised hashing methods.  相似文献   

12.
Currently, research on content based image copy detection mainly focuses on robust feature extraction. However, due to the exponential growth of online images, it is necessary to consider searching among large scale images, which is very time-consuming and unscalable. Hence, we need to pay much attention to the efficiency of image detection. In this paper, we propose a fast feature aggregating method for image copy detection which uses machine learning based hashing to achieve fast feature aggregation. Since the machine learning based hashing effectively preserves neighborhood structure of data, it yields visual words with strong discriminability. Furthermore, the generated binary codes leads image representation building to be of low-complexity, making it efficient and scalable to large scale databases. Experimental results show good performance of our approach.  相似文献   

13.
Efficient near-duplicate image detection is important for several applications that feature extraction and matching need to be taken online. Most image representations targeting at conventional image retrieval problems are either computationally expensive to extract and match, or limited in robustness. Aiming at this problem, in this paper, we propose an effective and efficient local-based representation method to encode an image as a binary vector, which is called Local-based Binary Representation (LBR). Local regions are extracted densely from the image, and each region is converted to a simple and effective feature describing its texture. A statistical histogram can be calculated over all the local features, and then it is encoded to a binary vector as the holistic image representation. The proposed binary representation jointly utilizes the local region texture and global visual distribution of the image, based on which a similarity measure can be applied to detect near-duplicate image effectively. The binary encoding scheme can not only greatly speed up the online computation, but also reduce memory cost in real applications. In experiments the precision and recall, as well as computational time of the proposed method are compared with other state-of-the-art image representations and LBR shows clear advantages on online near-duplicate image detection and video keyframe detection tasks.  相似文献   

14.
Learning-based hashing methods are becoming the mainstream for approximate scalable multimedia retrieval. They consist of two main components: hash codes learning for training data and hash functions learning for new data points. Tremendous efforts have been devoted to designing novel methods for these two components, i.e., supervised and unsupervised methods for learning hash codes, and different models for inferring hashing functions. However, there is little work integrating supervised and unsupervised hash codes learning into a single framework. Moreover, the hash function learning component is usually based on hand-crafted visual features extracted from the training images. The performance of a content-based image retrieval system crucially depends on the feature representation and such hand-crafted visual features may degrade the accuracy of the hash functions. In this paper, we propose a semi-supervised deep learning hashing (DLH) method for fast multimedia retrieval. More specifically, in the first component, we utilize both visual and label information to learn an optimal similarity graph that can more precisely encode the relationship among training data, and then generate the hash codes based on the graph. In the second stage, we apply a deep convolutional network to simultaneously learn a good multimedia representation and a set of hash functions. Extensive experiments on five popular datasets demonstrate the superiority of our DLH over both supervised and unsupervised hashing methods.  相似文献   

15.
A visual simultaneous localization and mapping (SLAM) system usually contains a relocalization module to recover the camera pose after tracking failure. The core of this module is to establish correspondences between map points and key points in the image, which is typically achieved by local image feature matching. Since recently emerged binary features have orders of magnitudes higher extraction speed than traditional features such as scale invariant feature transform, they can be applied to develop a real-time relocalization module once an efficient method of binary feature matching is provided. In this paper, we propose such a method by indexing binary features with hashing. Being different from the popular locality sensitive hashing, the proposed method constructs the hash keys by an online learning process instead of pure randomness. Specifically, the hash keys are trained with the aim of attaining uniform hash buckets and high collision rates of matched feature pairs, which makes the method more efficient on approximate nearest neighbor search. By distributing the online learning into the simultaneous localization and mapping process, we successfully apply the method to SLAM relocalization. Experiments show that camera poses can be recovered in real time even when there are tens of thousands of landmarks in the map.  相似文献   

16.
Efficient and compact representation of images is a fundamental problem in computer vision. In this paper, we propose methods that use Haar-like binary box functions to represent a single image or a set of images. A desirable property of these box functions is that their inner product operation with an image can be computed very efficiently. We propose two closely related novel subspace methods to model images: the non-orthogonal binary subspace (NBS) method and binary principal component analysis (B-PCA) algorithm. NBS is spanned directly by binary box functions and can be used for image representation, fast template matching and many other vision applications. B-PCA is a structure subspace that inherits the merits of both NBS (fast computation) and PCA (modeling data structure information). B-PCA base vectors are obtained by a novel PCA guided NBS method. We also show that BPCA base vectors are nearly orthogonal to each other. As a result, in the non-orthogonal vector decomposition process, the computationally intensive pseudo-inverse projection operator can be approximated by the direct dot product without causing significant distance distortion. Experiments on real image datasets show promising performance in image matching, reconstruction and recognition tasks with significant speed improvement.  相似文献   

17.
Subspace and similarity metric learning are important issues for image and video analysis in the scenarios of both computer vision and multimedia fields. Many real-world applications, such as image clustering/labeling and video indexing/retrieval, involve feature space dimensionality reduction as well as feature matching metric learning. However, the loss of information from dimensionality reduction may degrade the accuracy of similarity matching. In practice, such basic conflicting requirements for both feature representation efficiency and similarity matching accuracy need to be appropriately addressed. In the style of “Thinking Globally and Fitting Locally”, we develop Locally Embedded Analysis (LEA) based solutions for visual data clustering and retrieval. LEA reveals the essential low-dimensional manifold structure of the data by preserving the local nearest neighbor affinity, and allowing a linear subspace embedding through solving a graph embedded eigenvalue decomposition problem. A visual data clustering algorithm, called Locally Embedded Clustering (LEC), and a local similarity metric learning algorithm for robust video retrieval, called Locally Adaptive Retrieval (LAR), are both designed upon the LEA approach, with variations in local affinity graph modeling. For large size database applications, instead of learning a global metric, we localize the metric learning space with kd-tree partition to localities identified by the indexing process. Simulation results demonstrate the effective performance of proposed solutions in both accuracy and speed aspects.  相似文献   

18.
特征匹配是图像识别中一个基本研究问题。常用的匹配方式一般是基于贪婪算法的线性扫描方式,但只适用于低维数据。当数据维数超过一定程度时,这些匹配方法的时间效率将会急剧下降,甚至不强于强力线性扫描方法。本文提出一种基于最小哈希的二值特征匹配方法。通过最小哈希函数映射变换操作,将原始特征集合分成多个子集合,并将一个在超大集合下内查找相邻元素的问题转化为在一个很小的集合内查找相邻元素的问题,计算量有所下降。使用Jaccard距离度量的最小哈希函数能最大限度地保证原始数据中相似的向量对在哈希变换后依然相似。实验表明这种匹配方法应用在二值特征上时,可以获得比KD-Tree更好的匹配效果。   相似文献   

19.
20.
目的 稀疏编码是当前广泛使用的一种图像表示方法,针对稀疏编码及其改进算法计算过程复杂、费时等问题,提出一种哈希编码结合空间金字塔的图像分类算法。方法 首先,提取图像的局部特征点,构成局部特征点描述集。其次,学习自编码哈希函数,将局部特征点表示为二进制哈希编码。然后,在二进制哈希编码的基础上进行K均值聚类生成二进制视觉词典。最后,结合空间金字塔模型,将图像表示为空间金字塔直方图向量,并应用于图像分类。结果 在常用的Caltech-101和Scene-15数据集上进行实验验证,并和目前与稀疏编码相关的算法进行实验对比。与稀疏编码相关的算法相比,本文算法词典学习时间缩短了50%,在线编码速度提高了1.3~12.4倍,分类正确率提高了1%~5%。结论 提出了一种哈希编码结合空间金字塔的图像分类算法,利用哈希编码代替稀疏编码对局部特征点进行编码,并结合空间金字塔模型用于图像分类。实验结果表明,本文算法词典学习时间更短、编码速度更快,适用于在线词典学习和应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号