首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.  相似文献   

2.
In this paper, we present an approach based on probabilistic latent semantic analysis (PLSA) to achieve the task of automatic image annotation and retrieval. In order to model training data precisely, each image is represented as a bag of visual words. Then a probabilistic framework is designed to capture semantic aspects from visual and textual modalities, respectively. Furthermore, an adaptive asymmetric learning algorithm is proposed to fuse these aspects. For each image document, the aspect distributions of different modalities are fused by multiplying different weights, which are determined by the visual representations of images. Consequently, the probabilistic framework can predict semantic annotation precisely for unseen images because it associates visual and textual modalities properly. We compare our approach with several state-of-the-art approaches on a standard Corel dataset. The experimental results show that our approach performs more effectively and accurately.  相似文献   

3.
柯逍  邹嘉伟  杜明智  周铭柯 《电子学报》2017,45(12):2925-2935
针对传统图像标注模型存在着训练时间长、对低频词汇敏感等问题,该文提出了基于蒙特卡罗数据集均衡和鲁棒性增量极限学习机的图像自动标注模型.该模型首先对公共图像库的训练集数据进行图像自动分割,选择分割后相应的种子标注词,并通过提出的基于综合距离的图像特征匹配算法进行自动匹配以形成不同类别的训练集.针对公共数据库中不同标注词的数据规模相差较大,提出了蒙特卡罗数据集均衡算法使得各个标注词间的数据规模大体一致.然后针对单一特征描述存在的不足,提出了多尺度特征融合算法对不同标注词图像进行有效的特征提取.最后针对传统极限学习机存在的隐层节点随机性和输入向量权重一致性的问题,提出了鲁棒性增量极限学习,提高了判别模型的准确性.通过在公共数据集上的实验结果表明:该模型可以在很短时间内实现图像的自动标注,对低频词汇具有较强的鲁棒性,并且在平均召回率、平均准确率、综合值等多项指标上均高于现流行的大多数图像自动标注模型.  相似文献   

4.
刘杰  杜军平 《电子学报》2014,42(5):987-991
图像语义标注是图像语义分析研究中的一个重要问题.在主题模型的基础上,本文提出一种新颖的跨媒体图像标注方法来进行图像间语义的传播.首先,对训练图像使用主题模型,抽取视觉模态和文本模态信息的潜在语义主题.然后,通过使用一个权重参数来融合两种模态信息的主题分布,从而学习到一种融合主题分布.最后,在融合主题分布的基础上训练一个标注模型来给目标图像赋予合适的语义信息.在标准的MSRC和Corel5K数据集上将提出的方法与最近著名的标注方法进行比较实验.标注性能的详细评价结果表明提出方法的有效性.  相似文献   

5.
基于视觉与标注相关信息的图像聚类算法   总被引:1,自引:0,他引:1       下载免费PDF全文
于林森  张田文 《电子学报》2006,34(7):1265-1269
算法首先按视觉相关程度对标注字进行打分,标注字的分值体现了语义一致图像的视觉连贯程度.利用图像语义类别固有的语言描述性,从图像标注中抽取具有明显视觉连贯性的标注字作为图像的语义类别,减少了数据库设计者繁琐的手工编目工作.按标注字信息对图像进行语义分类,提高了图像聚类的语义一致性.对4500幅Corel标注图像的聚类结果证实了算法的有效性.  相似文献   

6.
This paper presents a generalized relevance model for automatic image annotation through learning the correlations between images and annotation keywords. Different from previous relevance models that can only propagate keywords from the training images to the test ones, the proposed model can perform extra keyword propagation among the test images. We also give a convergence analysis of the iterative algorithm inspired by the proposed model. Moreover, to estimate the joint probability of observing an image with possible annotation keywords, we define the inter-image relations through proposing a new spatial Markov kernel based on 2D Markov models. The main advantage of our spatial Markov kernel is that the intra-image context can be exploited for automatic image annotation, which is different from the traditional bag-of-words methods. Experiments on two standard image databases demonstrate that the proposed model outperforms the state-of-the-art annotation models.  相似文献   

7.
Translating multiple real-world source images to a single prototypical image is a challenging problem. Notably, these source images belong to unseen categories that did not exist during model training. We address this problem by proposing an adaptive adversarial prototype network (AAPN) and enhancing existing one-shot classification techniques. To overcome the limitations that traditional works cannot extract samples from novel categories, our method tends to solve the image translation task of unseen categories through a meta-learner. We train the model in an adversarial learning manner and introduce a style encoder to guide the model with an initial target style. The encoded style latent code enhances the performance of the network with conditional target style images. The AAPN outperforms the state-of-the-art methods in one-shot classification of brand logo dataset and achieves the competitive accuracy in the traffic sign dataset. Additionally, our model improves the visual quality of the reconstructed prototypes in unseen categories. Based on the qualitative and quantitative analysis, the effectiveness of our model for few-shot classification and generation is demonstrated.  相似文献   

8.
This paper presents an image representation and matching framework for image categorization in medical image archives. Categorization enables one to determine automatically, based on the image content, the examined body region and imaging modality. It is a basic step in content-based image retrieval (CBIR) systems, the goal of which is to augment text-based search with visual information analysis. CBIR systems are currently being integrated with picture archiving and communication systems for increasing the overall search capabilities and tools available to radiologists. The proposed methodology is comprised of a continuous and probabilistic image representation scheme using Gaussian mixture modeling (GMM) along with information-theoretic image matching via the Kullback-Leibler (KL) measure. The GMM-KL framework is used for matching and categorizing X-ray images by body regions. A multidimensional feature space is used to represent the image input, including intensity, texture, and spatial information. Unsupervised clustering via the GMM is used to extract coherent regions in feature space that are then used in the matching process. A dominant characteristic of the radiological images is their poor contrast and large intensity variations. This presents a challenge to matching among the images, and is handled via an illumination-invariant representation. The GMM-KL framework is evaluated for image categorization and image retrieval on a dataset of 1500 radiological images. A classification rate of 97.5% was achieved. The classification results compare favorably with reported global and local representation schemes. Precision versus recall curves indicate a strong retrieval result as compared with other state-of-the-art retrieval techniques. Finally, category models are learned and results are presented for comparing images to learned category models.  相似文献   

9.
In past years, there have been several improvements in lossless image compression. All the recently proposed state-of-the-art lossless image compressors can be roughly divided into two categories: single and double-pass compressors. Linear prediction is rarely used in the first category, while TMW, a state-of-the-art double-pass image compressor, relies on linear prediction for its performance. We propose a single-pass adaptive algorithm that uses context classification and multiple linear predictors, locally optimized on a pixel-by-pixel basis. Locality is also exploited in the entropy coding of the prediction error. The results we obtained on a test set of several standard images are encouraging. On the average, our ALPC obtains a compression ratio comparable to CALIC while improving on some images  相似文献   

10.
For real-world simulation, terrain models must combine various types of information on material and texture in terrain reconstruction for the three-dimensional numerical simulation of terrain. However, the construction of such models using the conventional method often involves high costs in both manpower and time. Therefore, this study used a convolutional neural network (CNN) architecture to classify material in multispectral remote sensing images to simplify the construction of future models. Visible light (i.e., RGB), near infrared (NIR), normalized difference vegetation index (NDVI), and digital surface model (DSM) images were examined.This paper proposes the use of the robust U-Net (RUNet) model, which integrates multiple CNN architectures, for material classification. This model, which is based on an improved U-Net architecture combined with the shortcut connections in the ResNet model, preserves the features of shallow network extraction. The architecture is divided into an encoding layer and a decoding layer. The encoding layer comprises 10 convolutional layers and 4 pooling layers. The decoding layer contains four upsampling layers, eight convolutional layers, and one classification convolutional layer. The material classification process in this study involved the training and testing of the RUNet model. Because of the large size of remote sensing images, the training process randomly cuts subimages of the same size from the training set and then inputs them into the RUNet model for training. To consider the spatial information of the material, the test process cuts multiple test subimages from the test set through mirror padding and overlapping cropping; RUNet then classifies the subimages. Finally, it merges the subimage classification results back into the original test image.The aerial image labeling dataset of the National Institute for Research in Digital Science and Technology (Inria, abbreviated from the French Institut national de recherche en sciences et technologies du numérique) was used as well as its configured dataset (called Inria-2) and a dataset from the International Society for Photogrammetry and Remote Sensing (ISPRS). Material classification was performed with RUNet. Moreover, the effects of the mirror padding and overlapping cropping were analyzed, as were the impacts of subimage size on classification performance. The Inria dataset achieved the optimal results; after the morphological optimization of RUNet, the overall intersection over union (IoU) and classification accuracy reached 70.82% and 95.66%, respectively. Regarding the Inria-2 dataset, the IoU and accuracy were 75.5% and 95.71%, respectively, after classification refinement. Although the overall IoU and accuracy were 0.46% and 0.04% lower than those of the improved fully convolutional network, the training time of the RUNet model was approximately 10.6 h shorter. In the ISPRS dataset experiment, the overall accuracy of the combined multispectral, NDVI, and DSM images reached 89.71%, surpassing that of the RGB images. NIR and DSM provide more information on material features, reducing the likelihood of misclassification caused by similar features (e.g., in color, shape, or texture) in RGB images. Overall, RUNet outperformed the other models in the material classification of remote sensing images. The present findings indicate that it has potential for application in land use monitoring and disaster assessment as well as in model construction for simulation systems.  相似文献   

11.
Images captured in weak illumination conditions could seriously degrade the image quality. Solving a series of degradation of low-light images can effectively improve the visual quality of images and the performance of high-level visual tasks. In this study, a novel Retinex-based Real-low to Real-normal Network (R2RNet) is proposed for low-light image enhancement, which includes three subnets: a Decom-Net, a Denoise-Net, and a Relight-Net. These three subnets are used for decomposing, denoising, contrast enhancement and detail preservation, respectively. Our R2RNet not only uses the spatial information of the image to improve the contrast but also uses the frequency information to preserve the details. Therefore, our model achieved more robust results for all degraded images. Unlike most previous methods that were trained on synthetic images, we collected the first Large-Scale Real-World paired low/normal-light images dataset (LSRW dataset) to satisfy the training requirements and make our model have better generalization performance in real-world scenes. Extensive experiments on publicly available datasets demonstrated that our method outperforms the existing state-of-the-art methods both quantitatively and visually. In addition, our results showed that the performance of the high-level visual task (i.e., face detection) can be effectively improved by using the enhanced results obtained by our method in low-light conditions. Our codes and the LSRW dataset are available at: https://github.com/JianghaiSCU/R2RNet.  相似文献   

12.
13.
Recent studies have shown that sparse representation (SR) can deal well with many computer vision problems, and its kernel version has powerful classification capability. In this paper, we address the application of a cooperative SR in semi-supervised image annotation which can increase the amount of labeled images for further use in training image classifiers. Given a set of labeled (training) images and a set of unlabeled (test) images, the usual SR method, which we call forward SR, is used to represent each unlabeled image with several labeled ones, and then to annotate the unlabeled image according to the annotations of these labeled ones. However, to the best of our knowledge, the SR method in an opposite direction, that we call backward SR to represent each labeled image with several unlabeled images and then to annotate any unlabeled image according to the annotations of the labeled images which the unlabeled image is selected by the backward SR to represent, has not been addressed so far. In this paper, we explore how much the backward SR can contribute to image annotation, and be complementary to the forward SR. The co-training, which has been proved to be a semi-supervised method improving each other only if two classifiers are relatively independent, is then adopted to testify this complementary nature between two SRs in opposite directions. Finally, the co-training of two SRs in kernel space builds a cooperative kernel sparse representation (Co-KSR) method for image annotation. Experimental results and analyses show that two KSRs in opposite directions are complementary, and Co-KSR improves considerably over either of them with an image annotation performance better than other state-of-the-art semi-supervised classifiers such as transductive support vector machine, local and global consistency, and Gaussian fields and harmonic functions. Comparative experiments with a nonsparse solution are also performed to show that the sparsity plays an important role in the cooperation of image representations in two opposite directions. This paper extends the application of SR in image annotation and retrieval.  相似文献   

14.
一种基于稀疏编码的多核学习图像分类方法   总被引:2,自引:0,他引:2       下载免费PDF全文
亓晓振  王庆 《电子学报》2012,40(4):773-779
 本文提出一种基于稀疏编码的多核学习图像分类方法.传统稀疏编码方法对图像进行分类时,损失了空间信息,本文采用对图像进行空间金字塔多划分方式为特征加入空间信息限制.在利用非线性SVM方法进行图像分类时,空间金字塔的各层分别形成一个核矩阵,本文使用多核学习方法求解各个核矩阵的权重,通过核矩阵的线性组合来获取能够对整个分类集区分能力最强的核矩阵.实验结果表明了本文所提出图像分类方法的有效性和鲁棒性.对Scene Categories场景数据集可以达到83.10%的分类准确率,这是当前该数据集上能达到的最高分类准确率.  相似文献   

15.
Brain CT image classification is critical for assisting brain disease diagnosis. The brain CT images contain much noisy information, and the lesions are unstable in shape and location, making the classification task more difficult when using conventional CNN models. In this paper, we propose a novel Multi-scale Superpixel based Hierarchical Attention (MSHA) model for brain CT classification by introducing the multi-scale superpixels to a hierarchical fusion structure to remove noise and help the model focus on the lesion areas. MSHA contains three modules: (1) a Semantic-level Information Extractor that extracts appearance and geometry information based on the superpixel of the image, (2) a Mixed Multi-head Attention module that obtains the mixed attention features from the semantic-level information, and (3) a Hierarchical Fusion Structure that fuses the multi-scale attention features from coarse to fine. Experiments on the brain CT dataset demonstrate the effectiveness of the proposed model.  相似文献   

16.
Face age estimation, a computer vision task facing numerous challenges due to its potential applications in identity authentication, human–computer interface, video retrieval and robot vision, has been attracting increasing attention. In recent years, the deep convolutional neural networks (DCNN) have achieved state-of-the-art performance in age classification of face images. We propose a deep hybrid framework for age classification by exploiting DCNN as the raw feature extractor along with several effective methods, including fine-tuning the DCNN into a fine-tuned deep age feature extraction (FDAFE) model, introducing a new method of feature extracting, applying the maximum joint probability classifier to age classification and a strategy to incorporate information from face images more effectively to improve estimation capabilities further. In addition, we pre-process the original image to represent age information more accurately. Based on the discriminative and compact framework, state-of-the-art performance on several face image data sets has been achieved in terms of classification accuracy.  相似文献   

17.
随着互联网的普及和发展,电子商务网站数量急剧增长,迫切需要一个平台对在线销售商品进行标注以方便用户进行搜索。通过提取类别图像和测试图像的金字塔梯度方向直方图(PHOG)全部特征,然后计算两者之间的距离,测试图像与类别图像距离比较近的就属于同一类图像。利用Matlab语言开发出了能够实现这一检索分类模型。实验证明这一模型灵活性好,准确性高。  相似文献   

18.
针对稀疏表示分类器不能较好地适应多特征框架的问题,该文提出一种空间约束多特征联合稀疏编码模型,并以此实现遥感影像的自动标注。该方法利用l1,2混合范数正则化多特征编码系数,约束编码系数共享相同的稀疏模式,在保持多特征关联的同时,又不添加过于严格的约束。同时,将字典学习技术扩展到多特征框架中,通过约束字典更新的变换矩阵,解决了字典学习过程丢失多特征关联的问题。另外,针对遥感影像中的空间关系常常被忽略或者利用不充分的不足,还提出了将空间一致性与多特征联合稀疏编码相结合的分类准则,提高了标注性能。在遥感公开数据集与大尺寸卫星影像上的实验证明了该方法的有效性。  相似文献   

19.
Content based image retrieval is a common problem for a large image database. Many methods have been proposed for image retrieval for some particular type of datasets. In the proposed work, a new image retrieval technique has been introduced. This technique is useful for different kind of dataset. In the proposed method, center symmetric local binary pattern has been extracted from the original image to obtain the local information. Co-occurrence of pixel pairs in local pattern map have been observed in different directions and distances using gray level co-occurrence matrix. Earlier methods have utilized histogram to extract the frequency information of local pattern map but co-occurrence of pixel pairs is more robust than frequency of patterns. The proposed method is tested on three different category of images, i.e., texture, face and medical image database and compared with typical state-of-the-art local patterns.  相似文献   

20.
图像自动标注在检索大量数字图像时起到关键作用,它能将图像的视觉特征转化为图像的标注字信息,为用户的使用及检索带来极大的方便。研究了图像自动语义标注方法,设计并实现了基于Matlab图像自动标注系统,能够提取图像颜色特征和纹理特征,与已标注图像进行相似性度量并标注出图像语义关键词  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号