首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
Bag-of-visual-words has been shown to be a powerful image representation and attained success in many computer vision and pattern recognition applications. Usually for a given classification task, researchers choose to build a specific visual vocabulary, and the problem of building a universal visual vocabulary is rarely addressed. In this paper we conduct extensive classification experiments with three features on four image datasets and show that the visual vocabularies built from different datasets can be exchanged without apparent performance loss. Furthermore, we investigate the correlation between the visual vocabularies built from different datasets and find that they are nearly identical, which explains why they are universal across classification tasks. We believe that this work reveals what is behind the universality of visual vocabularies and narrows the gap between bag-of-visual-words and bag-of-words in text domain.  相似文献   

2.
胡正平  涂潇蕾 《信号处理》2011,27(10):1536-1542
针对场景分类问题中,传统的“词包”模型不包含图像的上下文信息,且没有考虑图像特征间的类别差异问题,本文提出一种多方向上下文特征结合空间金字塔模型的场景分类方法。该方法首先对图像进行均匀网格分块并提取尺度不变(SIFT)特征,对每个局部图像块分别结合其周围三个方向的空间相邻区域,形成三种上下文特征;然后,将每类训练图像的上下文特征分别聚类形成视觉词汇,再将其连接形成最终的视觉词汇表,得到图像的视觉词汇直方图;最后,结合空间金字塔匹配算法形成金字塔直方图,并采用SVM分类器来进行分类。该方法将图像块在特征域的相似性同空间域的上下文关系有机地结合起来并加以类别区分,从而形成了具有更好区分力的视觉词汇表。在通用场景图像库上的实验表明,相比传统方法具有更好的分类性能。   相似文献   

3.
针对传统的视觉词典法存在的时间复杂度高,视觉单词同义性、歧义性和高维局部特征聚类不稳定问题,提出了一种基于随机化视觉词汇和聚类集成的目标分类方法。采用精确欧式位置敏感哈希(E2LSH)对训练图像库的局部特征点进行哈希映射,生成一组随机化视觉词汇;然后,聚类集成这组随机化视觉词汇,构建随机化视觉词汇集成词典(RVVAD);最后,基于该词典构建图像的视觉单词直方图并使用支持向量机(SVM)分类器完成目标分类。实验结果表明,本文方法有效增强了词典的表达能力,提高了目标分类的准确率。  相似文献   

4.
针对存在明显光照变化或遮挡物等室外复杂场景下,现有基于深度学习的视觉即时定位与地图构建(visual simultaneous localization and mapping,视觉SLAM)回环检测方法没有很好地利用图像的语义信息、场景细节且实时性差等问题,本文提出了一种YOLO-NKLT视觉SLAM回环检测方法。采用改进损失函数的YOLOv5网络模型获取具有语义信息的图像特征,构建训练集,对网络重训练,使提取的特征更加适用于复杂场景下的回环检测。为了进一步提高闭环检测的实时性,提出了一种基于非支配排序的KLT降维方法。通过在New College数据集和光照等变化更复杂的Nordland数据集上进行实验,结果表明:室外复杂场景下,相较于其他传统和基于深度学习的方法,所提方法具有更高的鲁棒性,可以取得更佳的准确率和实时性表现。  相似文献   

5.
6.
场景识别是计算机视觉研究中的一项基本任务.与图像分类不同,场景识别需要综合考虑场景的背景信息、局部场景特征以及物体特征等因素,导致经典卷积神经网络在场景识别上性能欠佳.为解决此问题,文中提出了一种基于深度卷积特征的场景全局与局部表示方法.此方法对场景图片的卷积特征进行变换从而为每张图片生成一个综合的特征表示.使用CAM...  相似文献   

7.
8.
Typically, k-means clustering or sparse coding is used for codebook generation in the bag-of-visual words (BoW) model. Local features are then encoded by calculating their similarities with visual words. However, some useful information is lost during this process. To make use of this information, in this paper, we propose a novel image representation method by going one step beyond visual word ambiguity and consider the governing regions of visual words. For each visual application, the weights of local features are determined by the corresponding visual application classifiers. Each weighted local feature is then encoded not only by considering its similarities with visual words, but also by visual words’ governing regions. Besides, locality constraint is also imposed for efficient encoding. A weighted feature sign search algorithm is proposed to solve the problem. We conduct image classification experiments on several public datasets to demonstrate the effectiveness of the proposed method.  相似文献   

9.
传统视觉词典模型没有考虑图像的多尺度和上下文语义共生关系.本文提出一种基于多尺度上下文语义信息的图像场景分类算法.首先,对图像进行多尺度分解,从多个尺度提取不同粒度的视觉信息;其次利用基于密度的自适应选择算法确定最优概率潜在语义分析模型主题数;然后,结合Markov随机场共同挖掘图像块的上下文语义共生信息,得到图像的多尺度直方图表示;最后结合支持向量机实现场景分类.实验结果表明,本文算法能有效利用图像的多尺度和上下文语义信息,提高视觉单词的语义准确性,从而改善场景分类性能.  相似文献   

10.
为了提高遥感图像场景分类中特征有效利用率,进而提高遥感影像分类精度,采用基于双通道深度密集特征融合的遥感影像分类方法,进行了理论分析和实验验证。首先通过构建复合密集网络模型, 分别提取图像卷积层特征和全连接层特征;然后为挖掘、利用图像深层信息,通过视觉词袋模型将提取的深层卷积层特征进行重组编码,捕获图像深层局部特征;最后采用线性加权方式将局部和全局特征融合、分类。结果表明,选用数据集UC Merced Land-Use和NWPU-RESISC45进行实验,取得的分类精度分别为93.81%和92.62%。该方法充分利用局部特征和全局特征的互补性,能实现图像深层信息的充分利用和表达。  相似文献   

11.
With the tremendous success of the visual question answering (VQA) tasks, visual attention mechanisms have become an indispensable part of VQA models. However, these attention-based methods do not consider any relationship among regions, which is crucial for the thorough understanding of the image by the model. We propose local relation networks for generating context-aware image features for each image region, which contain information on the relationship among the other image regions. Furthermore, we propose a multilevel attention mechanism to combine semantic information from the LRNs and the original image regions, rendering the decision of the model more reasonable. With these two measures, we improve the region representation and achieve better attentive effect and VQA performance. We conduct numerous experiments on the COCO-QA dataset and the largest VQA v2.0 benchmark dataset. Our model achieves competitive results, proving the effectiveness of our proposed LRNs and multilevel attention mechanism through visual demonstrations.  相似文献   

12.
In this study we present an efficient image categorization and retrieval system applied to medical image databases, in particular large radiograph archives. The methodology is based on local patch representation of the image content, using a "bag of visual words" approach. We explore the effects of various parameters on system performance, and show best results using dense sampling of simple features with spatial content, and a nonlinear kernel-based support vector machine (SVM) classifier. In a recent international competition the system was ranked first in discriminating orientation and body regions in X-ray images. In addition to organ-level discrimination, we show an application to pathology-level categorization of chest X-ray data, the most popular examination in radiology. The system discriminates between healthy and pathological cases, and is also shown to successfully identify specific pathologies in a set of chest radiographs taken from a routine hospital examination. This is a first step towards similarity-based categorization, which has a major clinical implications for computer-assisted diagnostics.  相似文献   

13.
刘硕研  须德  冯松鹤  刘镝  裘正定 《电子学报》2010,38(5):1156-1161
基于视觉单词的词包模型表示(Bag-of-Words)算法是目前场景分类中的主流方法.传统的视觉单词是通过无监督聚类图像块的特征向量得到的.针对传统视觉单词生成算法中没有考虑任何语义信息的缺点,本论文提出一种基于上下文语义信息的图像块视觉单词生成算法:首先,本文中使用的上下文语义信息是视觉单词之间的语义共生概率,它是由概率潜在语义分析模型(probabilistic Latent Semantic Analysis)自动分析得到,无需任何人工标注.其次,我们引入Markov随机场理论中类别标记的伪似然度近似的策略,将图像块在特征域的相似性同空间域的上下文语义共生关系有机地结合起来,从而更准确地为图像块定义视觉单词.最后统计视觉单词的出现频率作为图像的场景表示,利用支持向量机分类器完成图像的场景分类任务.实验结果表明,本算法能有效地提高视觉单词的语义准确性,并在此基础上改善场景分类的性能.  相似文献   

14.
黄鸿  徐科杰  石光耀 《电子学报》2000,48(9):1824-1833
高分辨率遥感图像地物信息丰富,但场景构成复杂,目前基于手工设计的特征提取方法不能满足复杂场景分类的需求,而非监督特征学习方法尽管能够挖掘局部图像块的本征结构,但单一种类及尺度的特征难以有效表达实际应用中复杂遥感场景特性,导致分类性能受限.针对此问题,本文提出了一种基于多尺度多特征的遥感场景分类方法.该算法首先设计了一种改进的谱聚类非监督特征(iUFL-SC)以有效表征图像块的本征结构,然后通过密集采样提取每幅遥感场景的iUFL-SC、LBP、SIFT等三种多尺度局部图像块特征,并通过视觉词袋模型(BoVW)获得场景的中层特征表达,以实现更为准确详实的特征描述,最后基于直方图交叉核的支持向量机(HIKSVM)进行分类.在UC Merced数据集以及WHU-RS19数据集上的实验结果表明本文方法可对遥感场景进行鉴别特征提取,有效提高分类性能.  相似文献   

15.
针对红外非可见光与可见光视觉在成像过程中不同的感光特性,面向隧道典型的“黑洞”和“白洞”问题,从自动驾驶车辆视角研究光照环境突变条件下的视觉辨识以及融合感知技术。分别选取低照度车辆进入隧道以及弱光线条件下车辆驶离隧道两种情形,利用局部能量、卷积稀疏表示算法(CSR)对两种图像进行融合实验,结合MI、SF、AG、QAB/F、SSIM、PSNR六种评价指标进行评价。实验结果表明,在隧道入口处图像CSR-E算法对比Curvelet、NSCT、NSCT-T、SR-C&L、SF-Energy-Q五种算法,边缘信息传递因子(QAB/F)提高了14.14%,隧道出口处图像运行平均时间减少1.17 ms,结构相似性(SSIM)提高了3.38%,所提出的红外非可见光与可见光视觉融合成像方法弥补单一传感器针对特定场景表达的不全面,实现对场景全面清晰准确的表达,有效解决了源图像的边缘信息丢失,增强图像的光谱信息。  相似文献   

16.
This paper presents an image representation and matching framework for image categorization in medical image archives. Categorization enables one to determine automatically, based on the image content, the examined body region and imaging modality. It is a basic step in content-based image retrieval (CBIR) systems, the goal of which is to augment text-based search with visual information analysis. CBIR systems are currently being integrated with picture archiving and communication systems for increasing the overall search capabilities and tools available to radiologists. The proposed methodology is comprised of a continuous and probabilistic image representation scheme using Gaussian mixture modeling (GMM) along with information-theoretic image matching via the Kullback-Leibler (KL) measure. The GMM-KL framework is used for matching and categorizing X-ray images by body regions. A multidimensional feature space is used to represent the image input, including intensity, texture, and spatial information. Unsupervised clustering via the GMM is used to extract coherent regions in feature space that are then used in the matching process. A dominant characteristic of the radiological images is their poor contrast and large intensity variations. This presents a challenge to matching among the images, and is handled via an illumination-invariant representation. The GMM-KL framework is evaluated for image categorization and image retrieval on a dataset of 1500 radiological images. A classification rate of 97.5% was achieved. The classification results compare favorably with reported global and local representation schemes. Precision versus recall curves indicate a strong retrieval result as compared with other state-of-the-art retrieval techniques. Finally, category models are learned and results are presented for comparing images to learned category models.  相似文献   

17.
场景图生成是计算机视觉领域中的热点研究方向,可连接上、下游视觉任务。场景图由形式为<主语-谓语-宾语>的三元组组成,模型需要对整幅图像的全局视觉信息进行编码,从而辅助场景理解。但目前模型在处理一对多、多对一和对称性等特殊的视觉关系时仍存在问题。基于知识图谱与场景图的相似性,我们将知识图谱中的转换嵌入模型迁移至场景图生成领域。为了更好地对此类视觉关系进行编码,本文提出了一种基于多模态特征转换嵌入的场景图生成框架,可对提取的视觉和语言等多模态特征进行重映射,最后使用重映射的特征进行谓语类别预测,从而在不明显增加模型复杂度的前提下构建更好的关系表示。该框架囊括并补充了现存的几乎所有转换嵌入模型的场景图实现,将四种转换嵌入模型(TransE、TransH、TransR、TransD)分别应用于场景图生成任务,同时详细阐述了不同的视觉关系类型适用的模型种类。本文所提框架扩展了传统应用方式,除独立模型之外,本文设计了新的应用方式,即作为即插即用的子模块插入到其他网络模型。本文利用大规模语义理解的视觉基因组数据集进行实验,实验结果充分验证了所提框架的有效性,同时,得到的更丰富的类别预测结...  相似文献   

18.
Eye movements provide important insight into the cognitive processes underlying the visual search tasks. For image understanding, although the visual search patterns of different observers while studying the same scene bear some common characteristics, the idiosyncrasy associated with individual observers provides both research opportunities and challenges. The aim of this paper is to study the spatial characteristics of visual search, together with the intrinsic visual features of the fixation points for comparing different visual search strategies. An analysis framework based on earth mover's distance (EMD) in normalized anatomical space is proposed, and the results are demonstrated with high resolution computed tomography (HRCT) images of the lungs. The study shows that through the effective use of both spatial and feature space representation, it is possible to untangle what appear to be uncorrelated fixation distribution patterns to reveal common visual search behaviors.  相似文献   

19.
传统的单词包(Bag-Of-Words, BOW)算法由于缺少特征之间的分布信息容易造成动作混淆,并且单词包大小的选择对识别结果具有较大影响。为了体现兴趣点的分布信息,该文在时空邻域内计算兴趣点之间的位置关系作为其局部时空分布一致性特征,并提出了融合兴趣点表观特征的增强单词包算法,采用多类分类支持向量机(Support Vector Machine, SVM)实现分类识别。分别针对单人和多人动作识别,在KTH数据集和UT-interaction数据集上进行实验。与传统单词包算法相比,增强单词包算法不仅提高了识别效率,而且削弱了单词包大小变化对识别率的影响,实验结果验证了算法的有效性。  相似文献   

20.
Effective categorization of the millions of aerial images from unmanned planes is a useful technique with several important applications. Previous methods on this task usually encountered such problems: (1) it is hard to represent the aerial images’ topologies efficiently, which are the key feature to distinguish the arial images rather than conventional appearance, and (2) the computational load is usually too high to build a realtime image categorization system. Addressing these problems, this paper proposes an efficient and effective aerial image categorization method based on a contextual topological codebook. The codebook of aerial images is learned with a multitask learning framework. The topology of each aerial image is represented with the region adjacency graph (RAG). Furthermore, a codebook containing topologies is learned by jointly modeling the contextual information, based on the extracted discriminative graphlets. These graphlets are integrated into a Bag-of-Words (BoW) representation for predicting aerial image categories. Contextual relation among local patches are taken into account in categorization to yield high categorization performance. Experimental results show that our approach is both effective and efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号