首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
为了实现视频拷贝的快速准确检索,提出一种基于非局部3D残差网络的紧凑视频指纹。该算法以三胞胎网络架构为基础,采用非局部模块3D残差网络同时捕获视频的全局与局部时空信息,在特征提取部分末端加入量化编码层,实现了原始视频数据到离散指纹码的端到端映射;设计了由角度关系三元组损失和量化误差损失组成的网络目标函数。大量的实验结果表明,与对比算法相比,该算法在保持紧凑的同时鲁棒性与独特性均表现突出,查准率与查全率有明显提升。  相似文献   

2.
目的 海量图像检索技术是计算机视觉领域研究热点之一,一个基本的思路是对数据库中所有图像提取特征,然后定义特征相似性度量,进行近邻检索。海量图像检索技术,关键的是设计满足存储需求和效率的近邻检索算法。为了提高图像视觉特征的近似表示精度和降低图像视觉特征的存储空间需求,提出了一种多索引加法量化方法。方法 由于线性搜索算法复杂度高,而且为了满足检索的实时性,需把图像描述符存储在内存中,不能满足大规模检索系统的需求。基于非线性检索的优越性,本文对非穷尽搜索的多索引结构和量化编码进行了探索新研究。利用多索引结构将原始数据空间划分成多个子空间,把每个子空间数据项分配到不同的倒排列表中,然后使用压缩编码的加法量化方法编码倒排列表中的残差数据项,进一步减少对原始空间的量化损失。在近邻检索时采用非穷尽搜索的策略,只在少数倒排列表中检索近邻项,可以大大减少检索时间成本,而且检索过程中不用存储原始数据,只需存储数据集中每个数据项在加法量化码书中的码字索引,大大减少内存消耗。结果 为了验证算法的有效性,在3个数据集SIFT、GIST、MNIST上进行测试,召回率相比近几年算法提升4%~15%,平均查准率提高12%左右,检索时间与最快的算法持平。结论 本文提出的多索引加法量化编码算法,有效改善了图像视觉特征的近似表示精度和存储空间需求,并提升了在大规模数据集的检索准确率和召回率。本文算法主要针对特征进行近邻检索,适用于海量图像以及其他多媒体数据的近邻检索。  相似文献   

3.
4.
A scene graph provides a powerful intermediate knowledge structure for various visual tasks, including semantic image retrieval, image captioning, and visual question answering. In this paper, the task of predicting a scene graph for an image is formulated as two connected problems, ie, recognizing the relationship triplets, structured as <subject‐predicate‐object>, and constructing the scene graph from the recognized relationship triplets. For relationship triplet recognition, we develop a novel hierarchical recurrent neural network with visual attention mechanism. This model is composed of two attention‐based recurrent neural networks in a hierarchical organization. The first network generates a topic vector for each relationship triplet, whereas the second network predicts each word in that relationship triplet given the topic vector. This approach successfully captures the compositional structure and contextual dependency of an image and the relationship triplets describing its scene. For scene graph construction, an entity localization approach to determine the graph structure is presented with the assistance of available attention information. Then, the procedures for automatically converting the generated relationship triplets into a scene graph are clarified through an algorithm. Extensive experimental results on two widely used data sets verify the feasibility of the proposed approach.  相似文献   

5.
Video Shot Boundary Detection (SBD) is the fundamental process towards video summarization and retrieval. A fast and efficient SBD algorithm is necessary for real-time video processing applications. Extensive work has focused on accurate shot boundary detection at the expense of demanding computational costs. In this paper, we propose a fast SBD approach that reduces the computation pixel-wise and frame-wise while still giving satisfactory accuracy. The proposed approach substantially speeds up the computation through reducing both detection region and scope. Color histogram and mutual information are used together to measure the difference between frames. Corner distribution of frames is utilized to exclude most of false boundaries. We conduct extensive experiments to evaluate the proposed approach, and the results show that our approach can not only speed up SBD, but also detect shot boundaries with high accuracy in both Cut (CUT) and Gradual Transition (GT) boundaries.  相似文献   

6.
宋立新  徐军 《信息与控制》2020,(2):188-194,202
针对网络图像数据的迅速增多导致传统图像检索的效果不能满足当前需求的问题,提出了一种基于深度置信网络(deep belief network,DBN)和迭代量化(iterative quantization,ITQ)的无监督学习图像检索的方法.首先,构造深度置信网络的模型,此模型是由3层受限玻尔兹曼机堆叠而成;然后,用此深度置信网络模型对原始图像的高维特征进行中维特征提取,再采用迭代量化的哈希方法,对提取图像中维特征进行二值编码;最后,针对MNIST、CIFAR-10和Corel-1000数据集对模型进行实验验证并评估.结果表明,所提出的方法与现在的几种主流方法相比检索性能更好.除此之外,本方法对乳腺数据集DDSM和肺结节CT图像数据集LIDC-IDRI中的检索也取得了较好的效果.  相似文献   

7.
Based on the analysis of temporal slices, we propose novel approaches for clustering and retrieval of video shots. Temporal slices are a set of two-dimensional (2-D) images extracted along the time dimension of an image volume. They encode rich set of visual patterns for similarity measure. In this paper, we first demonstrate that tensor histogram features extracted from temporal slices are suitable for motion retrieval. Subsequently, we integrate both tensor and color histograms for constructing a two-level hierarchical clustering structure. Each cluster in the top level contains shots with similar color while each cluster in bottom level consists of shots with similar motion. The constructed structure is then used for the cluster-based retrieval. The proposed approaches are found to be useful particularly for sports games, where motion and color are important visual cues when searching and browsing the desired video shots.  相似文献   

8.
汪海龙  禹晶  肖创柏 《自动化学报》2021,47(5):1077-1086
哈希学习能够在保持数据之间语义相似性的同时,将高维数据投影到低维的二值空间中以降低数据维度实现快速检索.传统的监督型哈希学习算法主要是将手工设计特征作为模型输入,通过分类和量化生成哈希码.手工设计特征缺乏自适应性且独立于量化过程使得检索的准确率不高.本文提出了一种基于点对相似度的深度非松弛哈希算法,在卷积神经网络的输出...  相似文献   

9.
何青  孙红霞 《计算机仿真》2020,(4):456-459,475
采用当前方法检索图像中存在的特征时,检索特征所用的时间较长,检索得到的特征数量较少,存在检索效率低和召回率低的问题。提出基于堆叠乘积量化的图像特征反馈性检索方法,结合乘积量化算法和加法量化算法得到堆叠乘积量化算法,通过堆叠乘积量化算法对图像进行降维处理,去除图像中存在的冗余信息和无用数据。在亮度、色彩和梯度三个方面对降维处理后的图像进行检索,获得图像的亮度特征、色彩特征和梯度特征,实现图像特征的反馈性检索。仿真结果表明,所提方法的检索效率高、召回率高。  相似文献   

10.
11.
An Image Retrieval Method Using DCT Features   总被引:1,自引:0,他引:1       下载免费PDF全文
  相似文献   

12.
目的 跨摄像头跨场景的视频行人再识别问题是目前计算机视觉领域的一项重要任务。在现实场景中,光照变化、遮挡、观察点变化以及杂乱的背景等造成行人外观的剧烈变化,增加了行人再识别的难度。为提高视频行人再识别系统在复杂应用场景中的鲁棒性,提出了一种结合双向长短时记忆循环神经网络(BiLSTM)和注意力机制的视频行人再识别算法。方法 首先基于残差网络结构,训练卷积神经网络(CNN)学习空间外观特征,然后使用BiLSTM提取双向时间运动信息,最后通过注意力机制融合学习到的空间外观特征和时间运动信息,以形成一个有判别力的视频层次表征。结果 在两个公开的大规模数据集上与现有的其他方法进行了实验比较。在iLIDS-VID数据集中,与性能第2的方法相比,首位命中率Rank1指标提升了4.5%;在PRID2011数据集中,相比于性能第2的方法,首位命中率Rank1指标提升了3.9%。同时分别在两个数据集中进行了消融实验,实验结果验证了所提出算法的有效性。结论 提出的结合BiLSTM和注意力机制的视频行人再识别算法,能够充分利用视频序列中的信息,学习到更鲁棒的序列特征。实验结果表明,对于不同数据集,均能显著提升识别性能。  相似文献   

13.
This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of dissimilarity space that is adapted to the asymmetric classification problem, and in turn to the query-by-example and relevance feedback paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (ie TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.  相似文献   

14.
An edge preserving image compression algorithm based on an unsupervised competitive neural network is proposed. The proposed neural network, the called weighted centroid neural network (WCNN), utilizes the characteristics of image blocks from edge areas. The mean/residual vector quantization (M/RVQ) scheme is utilized in this proposed approach as the framework of the proposed algorithm. The edge strength of image block data is utilized as a tool to allocate the proper code vectors in the proposed WCNN. The WCNN successfully allocates more code vectors to the image block data from edge area while it allocates less code vectors to the image black data from shade or non-edge area when compared to conventional neural networks based on VQ algorithm. As a result, a simple application of WCNN to an image compression problem gives improved edge characteristics in reconstructed images over conventional neural network based on VQ algorithms such as self-organizing map (SOM) and adaptive SOM.  相似文献   

15.
目的 传统的手绘图像检索方法主要集中在检索相同类别的图像,忽略了手绘图像的细粒度特征。对此,提出了一种新的结合细粒度特征与深度卷积网络的手绘图像检索方法,既注重通过深度跨域实现整体匹配,也实现细粒度细节匹配。方法 首先构建多通道混合卷积神经网络,对手绘图像和自然图像分别进行不同的处理;其次通过在网络中加入注意力模型来获取细粒度特征;最后将粗细特征融合,进行相似性度量,得到检索结果。结果 在不同的数据库上进行实验,与传统的尺度不变特征(SIFT)、方向梯度直方图(HOG)和深度手绘模型Deep SaN(sketch-a-net)、Deep 3DS(sketch)、Deep TSN(triplet sketch net)等5种基准方法进行比较,选取了Top-1和Top-10,在鞋子数据集上,本文方法Top-1正确率提升了12%,在椅子数据集上,本文方法Top-1正确率提升了11%,Top-10提升了3%,与传统的手绘检索方法相比,本文方法得到了更高的准确率。在实验中,本文方法通过手绘图像能在第1幅检索出绝大多数的目标图像,达到了实例级别手绘检索的目的。结论 提出了一种新的手绘图像检索方法,为手绘图像和自然图像的跨域检索提供了一种新思路,进行实例级别的手绘检索,与原有的方法相比,检索精度得到明显提升,证明了本文方法的可行性。  相似文献   

16.
针对现有的哈希图像检索方法表达能力较弱、训练速度慢、检索精度低,难以适应大规模图像检索的问题,提出了一种基于深度残差网络的迭代量化哈希图像检索方法(DRITQH)。首先,使用深度残差网络对图像数据进行多次非线性变换,从而提取图像数据的特征,并获得具有语义特征的高维特征向量;然后,使用主成分分析(PCA)对高维图像特征进行降维,同时运用迭代量化对生成的特征向量进行二值化处理,更新旋转矩阵,将数据映射到零中心二进制超立方体,从而最小化量化误差并得到最佳的投影矩阵;最后,进行哈希学习,以得到最优的二进制哈希码在汉明空间中进行图像检索。实验结果表明,DRITQH在NUS-WIDE数据集上,对4种哈希码的检索精度分别为0.789、0.831、0.838和0.846,与改进深度哈希网络(IDHN)相比分别提升了0.5、3.8、3.7和4.2个百分点,平均编码时间小了1 717 μs。DRITQH在大规模图像检索时减少了量化误差带来的影响,提高了训练速度,实现了更高的检索性能。  相似文献   

17.
Attempts have been made to extend SQL to work with multimedia databases. We are reserved on the representation ability of extended SQL to cope with the richness in content of multimedia data. In this paper we present an example of a multimedia database system, Computer Aided Facial Image Inference and Retrieval system (CAFIIR). The system stores and manages facial images and criminal records, providing necessary functions for crime identification. We would like to demonstrate some core techniques for multimedia database with CAFIIR system. Firstly, CAFIIR is a integrated system. Besides database management, there are image analysis, image composition, image aging, and report generation subsystems, providing means for problem solving. Secondly, the richness of multimedia data urges feature-based database for their management. CAFIIR is feature-based. A indexing mechanism,iconic index, has been proposed for indexing facial images using hierarchical self-organization neural network. The indexing method operates on complex feature measures and provides means for visual navigation. Thirdly, special retrieval methods for facial images have been developed, including visual browsing, similarity retrieval, free text retrieval and fuzzy retrieval.  相似文献   

18.
卷积神经网络因其对图像识别准确率高而在图像检索领域备受青睐,但处理大规模数据集时,基于卷积神经网络提取的深度特征维度高,容易引发"维度灾难".针对图像检索中深度特征维度高的问题,提出一种基于自适应融合网络特征提取与哈希特征降维的图像检索算法.由于传统哈希处理高维特征复杂度高,因此本文在卷积神经网络中加入自适应融合模块对特征进行重新整合,增强特征表征能力的同时降低特征维度;然后应用稀疏化优化算法对深度特征进行第2次降维,并通过映射获得精简的哈希码;最后,实验以Inception网络作为基础模型,在数据集CIFAR-10和ImageNet上进行了丰富的实验.实验结果表明,该算法能有效提高图像检索效率.  相似文献   

19.
随着网络视频的迅猛发展和广泛使用,网络不良视频的识别和过滤日益重要.通过对图像内容识别与过滤、视频结构分析与检索两个领域技术发展的分析,阐述了一种综合利用视频时域分割、关键帧提取、图像内容识别及皮肤检测等视频分析方面关键技术的解决方法.该方法简单,并且容易实现.此外,介绍了目前网络视频内容识别和过滤的研究现状和主要应用,分析了其面对的主要问题及未来发展趋势.  相似文献   

20.
针对视频动作识别中的时空建模问题,在深度学习框架下提出基于融合时空特征的时序增强动作识别方法.首先对输入视频应用稀疏时序采样策略,适应视频时长变化,降低视频级别时序建模成本.在识别阶段计算相邻特征图间的时序差异,以差异计算结果增强特征级别的运动信息.最后,利用残差结构与时序增强结构的组合方式提升网络整体时空建模能力.实验表明,文中算法在UCF101、HMDB51数据集上取得较高准确率,并在实际工业操作动作识别场景下,以较小的网络规模达到较优的识别效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号