首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
一种联合阴影和目标区域图像的SAR目标识别方法   总被引:1,自引:0,他引:1  
地面目标的SAR图像中除了包含目标散射回波形成的区域,还包括由目标遮挡地面形成的阴影区域。但是由于这两种区域中的图像特性不相同,所以传统的SAR图像自动目标识别主要利用目标区域信息进行目标识别,或者单独使用阴影区域进行识别。该文提出一种阴影区域与目标区域图像联合的稀疏表示模型。通过使用1\2范数最小化方法求解该模型得到联合的稀疏表示,然后根据联合重构误差最小准则进行SAR图像目标识别。在运动和静止目标获取与识别(MSTAR)数据集上的识别实验结果表明,通过联合稀疏表示模型可以有效地将目标区域与阴影区域信息进行融合,相对于采用单独区域图像的稀疏表示识别方法性能更好。  相似文献   

2.
With the tremendous success of the visual question answering (VQA) tasks, visual attention mechanisms have become an indispensable part of VQA models. However, these attention-based methods do not consider any relationship among regions, which is crucial for the thorough understanding of the image by the model. We propose local relation networks for generating context-aware image features for each image region, which contain information on the relationship among the other image regions. Furthermore, we propose a multilevel attention mechanism to combine semantic information from the LRNs and the original image regions, rendering the decision of the model more reasonable. With these two measures, we improve the region representation and achieve better attentive effect and VQA performance. We conduct numerous experiments on the COCO-QA dataset and the largest VQA v2.0 benchmark dataset. Our model achieves competitive results, proving the effectiveness of our proposed LRNs and multilevel attention mechanism through visual demonstrations.  相似文献   

3.
Zero-shot learning (ZSL) aims to recognize unseen image classes without requiring any training samples of these specific classes. The ZSL problem is typically achieved by building up a semantic embedding space like attributes to bridge the visual features and class labels of images. Currently, most ZSL approaches focus on learning a visual-semantic alignment from seen classes using only the human-designed attributes, and then ZSL problem is solved by transferring semantic knowledge from seen classes to the unseen classes. However, few works indicate if the human-designed attributes are discriminative enough for image class prediction. To address this issue, we propose a semantic-aware dictionary learning (SADL) framework to explore these discriminative visual attributes across seen and unseen classes. Furthermore, the semantic cues are elegantly integrated into the feature representations via learned visual attributes for recognition task. Experiments conducted on two challenging benchmark datasets show that our approach outweighs other state-of-the-art ZSL methods.  相似文献   

4.
Image quality assessment (IQA) is a useful technique in computer vision and machine intelligence. It is widely applied in image retrieval, image clustering and image recognition. IQA algorithms generally rely on human visual system (HVS), which can reflect how human perceive salient regions in the image. In this paper, we leverage both low-level features and high-level semantic features to select salient regions, which will be concatenated to form GSPs by the designed saliency-constraint algorithm to mimic human visual system. We design an enhanced IQA index based on the GSPs to calculate the simialrity between reference image and test image to achieve image quality assessment. Experiments demonstrate that our IQA method can achieve satisfactory performance.  相似文献   

5.
Robust loop-closure detection is essential for visual SLAM. Traditional methods often focus on the geometric and visual features in most scenes but ignore the semantic information provided by objects. Based on this consideration, we present a strategy that models the visual scene as semantic sub-graph by only preserving the semantic and geometric information from object detection. To align two sub-graphs efficiently, we use a sparse Kuhn–Munkres algorithm to speed up the search for correspondence among nodes. The shape similarity and the Euclidean distance between objects in the 3-D space are leveraged unitedly to measure the image similarity through graph matching. Furthermore, the proposed approach has been analyzed and compared with the state-of-the-art algorithms at several datasets as well as two indoor real scenes, where the results indicate that our semantic graph-based representation without extracting visual features is feasible for loop-closure detection at potential and competitive precision.  相似文献   

6.
7.
刘硕研  须德  冯松鹤  刘镝  裘正定 《电子学报》2010,38(5):1156-1161
基于视觉单词的词包模型表示(Bag-of-Words)算法是目前场景分类中的主流方法.传统的视觉单词是通过无监督聚类图像块的特征向量得到的.针对传统视觉单词生成算法中没有考虑任何语义信息的缺点,本论文提出一种基于上下文语义信息的图像块视觉单词生成算法:首先,本文中使用的上下文语义信息是视觉单词之间的语义共生概率,它是由概率潜在语义分析模型(probabilistic Latent Semantic Analysis)自动分析得到,无需任何人工标注.其次,我们引入Markov随机场理论中类别标记的伪似然度近似的策略,将图像块在特征域的相似性同空间域的上下文语义共生关系有机地结合起来,从而更准确地为图像块定义视觉单词.最后统计视觉单词的出现频率作为图像的场景表示,利用支持向量机分类器完成图像的场景分类任务.实验结果表明,本算法能有效地提高视觉单词的语义准确性,并在此基础上改善场景分类的性能.  相似文献   

8.
We address the problem of visual classification with multiple features and/or multiple instances. Motivated by the recent success of multitask joint covariate selection, we formulate this problem as a multitask joint sparse representation model to combine the strength of multiple features and/or instances for recognition. A joint sparsity-inducing norm is utilized to enforce class-level joint sparsity patterns among the multiple representation vectors. The proposed model can be efficiently optimized by a proximal gradient method. Furthermore, we extend our method to the setup where features are described in kernel matrices. We then investigate into two applications of our method to visual classification: 1) fusing multiple kernel features for object categorization and 2) robust face recognition in video with an ensemble of query images. Extensive experiments on challenging real-world data sets demonstrate that the proposed method is competitive to the state-of-the-art methods in respective applications.  相似文献   

9.
Fine-grained Visual Categorization (FGVC) in computer vision aims to recognize images belonging to multiple subordinate categories of a super-category. The difficulty of FGVC lies in the close resemblance among inter-classes and large variations among intra-classes. Most existing networks only focus on a few discriminative regions, while ignoring many subtle complementary features. So we propose a Progressive Erasing Network (PEN). In PEN, a Multi-Grid Erasure mechanism augments data samples and assists in capturing the local discriminative features, where the overall structure of the image is destroyed indirectly through pixel-wise erasure. Cross-layer feature aggregation by extracting salient class features is of great significance in FGVC. However, the capability of cross-layer feature representation based on a simple aggregation strategy is still inefficient. To this end, the proposed Consistency loss explores the cross-layer semantic affinity, which guides the Cross-Layer Incentive (CLI) block to mine more efficient feature representations of different granularity. We also integrate Cross Entropy and Complementary Entropy to take the distribution of negative classes into account for better classification performance. Our method uses end-to-end training with only classification labels. Experimental results show that our model outperforms the state-of-the-art on three fine-grained benchmarks.  相似文献   

10.
马龙  王鲁平  李飚  沈振康 《信号处理》2010,26(12):1825-1832
提出了视觉注意驱动的基于混沌分析的运动检测方法(MDSA)。MDSA首先基于视觉注意机制提取图像的显著区域,而后对显著区域进行混沌分析以检测运动目标。算法技术路线为:首先根据场景图像提取多种视觉敏感的底层图像特征;然后根据特征综合理论将这些特征融合起来得到一幅反映场景图像中各个位置视觉显著性的显著图;而后对显著性水平最高的图像位置所在的显著区域运用混沌分析的方法进行运动检测;根据邻近优先和返回抑制原则提取下一最显著区域并进行运动检测,直至遍历所有的显著区域。本文对传统的显著区域提取方法进行了改进以减少计算量:以邻域标准差代替center-surround算子评估图像各位置的局部显著度,采用显著点聚类的方法代替尺度显著性准则提取显著区域;混沌分析首先判断各显著区域的联合直方图(JH)是否呈现混沌特征,而后依据分维数以一固定阈值对存在混沌的JH中各散点进行分类,最后将分类结果对应到显著区域从而实现运动分割。MDSA具有较好的运动分割效果和抗噪性能,对比实验和算法开销分析证明MDSA优于基于马塞克的运动检测方法(MDM)。   相似文献   

11.
A challenging problem in image content extraction and classification is building a system that automatically learns high-level semantic interpretations of images. We describe a Bayesian framework for a visual grammar that aims to reduce the gap between low-level features and high-level user semantics. Our approach includes modeling image pixels using automatic fusion of their spectral, textural, and other ancillary attributes; segmentation of image regions using an iterative split-and-merge algorithm; and representing scenes by decomposing them into prototype regions and modeling the interactions between these regions in terms of their spatial relationships. Naive Bayes classifiers are used in the learning of models for region segmentation and classification using positive and negative examples for user-defined semantic land cover labels. The system also automatically learns representative region groups that can distinguish different scenes and builds visual grammar models. Experiments using Landsat scenes show that the visual grammar enables creation of high-level classes that cannot be modeled by individual pixels or regions. Furthermore, learning of the classifiers requires only a few training examples.  相似文献   

12.
Image retrieval has lagged far behind text retrieval despite more than two decades of intensive research effort. Most of the research on image retrieval in the last two decades are on content based image retrieval or image retrieval based on low level features. Recent research in this area focuses on semantic image retrieval using automatic image annotation. Most semantic image retrieval techniques in literature, however, treat an image as a bag of features/words while ignore the structural or spatial information in the image. In this paper, we propose a structural image retrieval method based on automatic image annotation and region based inverted file. In the proposed system, regions in an image are treated the same way as keywords in a structural text document, semantic concepts are learnt from image data to label image regions as keywords and weight is assigned to each keyword according to spatial position and relationship. As the result, images are indexed and retrieved in the same way as structural document retrieval. Specifically, images are broken down to regions which are represented using colour, texture and shape features. Region features are then quantized to create visual dictionaries which are similar to monolingual dictionaries like English or Chinese dictionaries. In the next step, a semantic dictionary similar to a bilingual dictionary like the English–Chinese dictionary is learnt to mapping image regions to semantic concepts. Finally, images are then indexed and retrieved using a novel region based inverted file data structure. Results show the proposed method has significant advantage over the widely used Bayesian annotation models.  相似文献   

13.
Typically, k-means clustering or sparse coding is used for codebook generation in the bag-of-visual words (BoW) model. Local features are then encoded by calculating their similarities with visual words. However, some useful information is lost during this process. To make use of this information, in this paper, we propose a novel image representation method by going one step beyond visual word ambiguity and consider the governing regions of visual words. For each visual application, the weights of local features are determined by the corresponding visual application classifiers. Each weighted local feature is then encoded not only by considering its similarities with visual words, but also by visual words’ governing regions. Besides, locality constraint is also imposed for efficient encoding. A weighted feature sign search algorithm is proposed to solve the problem. We conduct image classification experiments on several public datasets to demonstrate the effectiveness of the proposed method.  相似文献   

14.
An efficient and effective region-based image retrieval framework   总被引:15,自引:0,他引:15  
An image retrieval framework that integrates efficient region-based representation in terms of storage and complexity and effective on-line learning capability is proposed. The framework consists of methods for region-based image representation and comparison, indexing using modified inverted files, relevance feedback, and learning region weighting. By exploiting a vector quantization method, both compact and sparse (vector) region-based image representations are achieved. Using the compact representation, an indexing scheme similar to the inverted file technology and an image similarity measure based on Earth Mover's Distance are presented. Moreover, the vector representation facilitates a weighted query point movement algorithm and the compact representation enables a classification-based algorithm for relevance feedback. Based on users' feedback information, a region weighting strategy is also introduced to optimally weight the regions and enable the system to self-improve. Experimental results on a database of 10,000 general-purposed images demonstrate the efficiency and effectiveness of the proposed framework.  相似文献   

15.
Time-frequency (t-f) analysis has clearly reached a certain maturity. One can now often provide striking visual representations of the joint time-frequency energy representation of signals. However, it has been difficult to take advantage of this rich source of information concerning the signal, especially for multidimensional signals. Properly constructed time-frequency distributions enjoy many desirable properties. Attempts to incorporate t-f analysis results into pattern recognition schemes have not been notably successful to date. Aided by Cohen's scale transform one may construct representations from the t-f results which are highly useful in pattern classification. Such methods can produce two dimensional representations which are invariant to time-shift, frequency-shift and scale changes. In addition, two dimensional objects such as images can be represented in a like manner in a four dimensional form. Even so, remaining extraneous variations often defeat the pattern classification approach. This paper presents a method based on noise subspace concepts. The noise subspace enhancement allows one to separate the desired invariant forms from extraneous variations, yielding much improved classification results. Examples from sound classification are discussed.  相似文献   

16.
This letter presents an efficient and simple image segmentation method for semantic object spatial segmentation.First,the image is filtered using contour-preserving filters.Then it is quasi-flat labeled.The small regions near the contour are classified as uncertain regions and are eliminated by region growing and merging.Further region merging is used to reduce the region number.The simulation results show its efficiency and simplicity,It can preserve the semantic object shape while emphasize on the perceptual complex part of the object.So it conforms to the humann visual perception very well.  相似文献   

17.
In this paper, we present an approach based on probabilistic latent semantic analysis (PLSA) to achieve the task of automatic image annotation and retrieval. In order to model training data precisely, each image is represented as a bag of visual words. Then a probabilistic framework is designed to capture semantic aspects from visual and textual modalities, respectively. Furthermore, an adaptive asymmetric learning algorithm is proposed to fuse these aspects. For each image document, the aspect distributions of different modalities are fused by multiplying different weights, which are determined by the visual representations of images. Consequently, the probabilistic framework can predict semantic annotation precisely for unseen images because it associates visual and textual modalities properly. We compare our approach with several state-of-the-art approaches on a standard Corel dataset. The experimental results show that our approach performs more effectively and accurately.  相似文献   

18.
Fine-grained visual classification (FGVC) is a critical task in the field of computer vision. However, FGVC is full of challenges due to the large intra-class variation and small inter-class variation of the classes to be classified on an image. The key in dealing with the problem is to capture subtle visual differences from the image and effectively represent the discriminative features. Existing methods are often limited by insufficient localization accuracy and insufficient feature representation capabilities. In this paper, we propose a cross-layer progressive attention bilinear fusion (CPABF in short) method, which can efficiently express the characteristics of discriminative regions. The CPABF method involves three components: 1) Cross-Layer Attention (CLA) locates and reinforces the discriminative region with low computational costs; 2) The Cross-Layer Bilinear Fusion Module (CBFM) effectively integrates the semantic information from the low-level to the high-level 3) Progressive Training optimizes the parameters in the network to the best state in a delicate way. The CPABF shows excellent performance on the four FGVC datasets and outperforms some state-of-the-art methods.  相似文献   

19.
We propose a new automatic image segmentation method. Color edges in an image are first obtained automatically by combining an improved isotropic edge detector and a fast entropic thresholding technique. After the obtained color edges have provided the major geometric structures in an image, the centroids between these adjacent edge regions are taken as the initial seeds for seeded region growing (SRG). These seeds are then replaced by the centroids of the generated homogeneous image regions by incorporating the required additional pixels step by step. Moreover, the results of color-edge extraction and SRG are integrated to provide homogeneous image regions with accurate and closed boundaries. We also discuss the application of our image segmentation method to automatic face detection. Furthermore, semantic human objects are generated by a seeded region aggregation procedure which takes the detected faces as object seeds.  相似文献   

20.
基于视觉感知的图像检索的研究   总被引:2,自引:0,他引:2       下载免费PDF全文
张菁  沈兰荪 《电子学报》2008,36(3):494-499
基于内容图像检索的一个突出问题是图像低层特征与高层语义之间存在的巨大鸿沟.针对相关反馈和感兴趣区检测在弥补语义鸿沟时存在主观性强、耗时的缺点,提出了视觉信息是一种客观反映图像高层语义的新特征,基于视觉信息进行图像检索可以有效减小语义鸿沟;并在总结视觉感知的研究进展和实现方法的基础上,给出了基于视觉感知的图像检索在感兴趣区检测、图像分割、相关反馈和个性化检索四个方面的研究思路.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号