图像自动标注是模式识别与计算机视觉等领域中重要而又具有挑战性的问题.针对现有模型存在数据利用率低与易受正负样本不平衡影响等问题,提出了基于判别模型与生成模型的新型层叠图像自动标注模型.该模型第一层利用判别模型对未标注图像进行主题标注,获得相应的相关图像集;第二层利用提出的面向关键词的方法建立图像与关键词之间的联系,并使用提出的迭代算法分别对语义关键词与相关图像进行扩展;最后利用生成模型与扩展的相关图像集对未标注图像进行详细标注.该模型综合了判别模型与生成模型的优点,通过利用较少的相关训练图像来获得更好的标注结果.在Corel 5K图像库上进行的实验验证了该模型的有效性.  相似文献   

Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local visual features. In an iterative procedure, we use language (the words) to drive a perceptual grouping process that assembles an appearance model for a named object. Results of applying our method to three data sets in a variety of conditions demonstrate that, from complex, cluttered, real-world scenes with noisy captions, we can learn both the names and appearances of objects, resulting in a set of models invariant to translation, scale, orientation, occlusion, and minor changes in viewpoint or articulation. These named models, in turn, are used to automatically annotate new, uncaptioned images, thereby facilitating keyword-based image retrieval.  相似文献   

基于集成分类算法的自动图像标注   总被引:2,自引:0,他引:2  
蒋黎星  侯进 《自动化学报》2012,38(8):1257-1262
基于语义的图像检索技术中,按照图像的语义进行自动标注是一个具有挑战性的工作. 本文把图像的自动标注过程转化为图像分类的过程,通过有监督学习对每个图像区域分类并得到相应关键字,实现标注. 采用一种快速随机森林(Fast random forest, FRF)集成分类算法,它可以对大量的训练数据进行有效的分类和标注. 在基于Corel数据集的实验中,相比经典算法, FRF改善了运算速度,并且分类精度保持稳定. 在图像标注方面有很好的应用.  相似文献   

自动图像标识就是自动识别图像中的有意义目标并赋予其相应的语义关键词, 该过程虽然对于人类来说并不难, 但是对于计算机而言却是一项艰巨而有挑战性的任务. 鉴于人类识别物体通常是一个由粗到细的过程, 本文提出一种层次标识方案. 首先, 输入图像被自动分割成多个区域, 每个区域由支持向量机进行粗分类. 由于粗分类结果会直接影响后续细分类, 本文建立统计的上下文语义关系以修订不正确的粗标识. 接着为了对每个获得粗标识的区域进行细分类, 本文提出一种半监督期望最大化算法, 该算法不仅能为每一粗类别下的细类找到代表模式, 而且能对粗分类区域进行二次分类, 使其获得细标识. 最后我们再次应用上下文语义关系修订不合适的细标识. 为了证明上述识别方案的有效性, 我们开发了一个原型图像标识系统, 实验结果证明该层次标识方案是有效的.  相似文献   

针对超度量轮廓图(ultrametric contour map,UCM)层级图像分割算法对轮廓适应性弱、层级匹配能力较弱且分割碎片较多等问题,提出了一种自适应目标与内容匹配的改进UCM层级图像分割算法。该算法首先使用轮廓盒子提取图像关键轮廓,然后使用加权分水岭算法合并区域,提升轮廓适应性,并产生UCM层级树;随后,采用动态规划的方式自适应完成目标与内容匹配,最后使用调整尺度后的UCM层级树完成图像分割。在BSDS500数据集上进行了分割实验,实验结果表明提出的算法在各项分割指标上获得了显著的提升。分割掩盖率(segment cover,SC)、概率边缘指标(probabilistic region index,PRI)和信息变化率(information variation,IV)三个衡量指标分别在最优数据集尺度(optimal dataset scale,ODS)和最优图像尺度(optimal image scale,OIS)上获得了最佳的效果。UCM层级树通过尺度的调整,能够保证相同尺度的层级分割为同一层,减少了分割碎片,保证了层级匹配。该算法在分割精度上超越了当前大多数主流图像分割算法,同时保证时间复杂度在同一个级别。  相似文献   

Both image compression based on color quantization and image segmentation are two typical tasks in the field of image processing. Several techniques based on splitting algorithms or cluster analyses have been proposed in the literature. Self-organizing maps have been also applied to these problems, although with some limitations due to the fixed network architecture and the lack of representation in hierarchical relations among data. In this paper, both problems are addressed using growing hierarchical self-organizing models. An advantage of these models is due to the hierarchical architecture, which is more flexible in the adaptation process to input data, reflecting inherent hierarchical relations among data. Comparative results are provided for image compression and image segmentation. Experimental results show that the proposed approach is promising for image processing, and the powerful of the hierarchical information provided by the proposed model.  相似文献   

用基于视觉单词上下文的核函数对图像分类   总被引:3,自引:3,他引:3       下载免费PDF全文
当前在图像分析领域,将局部特征编码为视觉单词的做法非常流行。基于普通的视觉单词,提出了一种新的能够融合单词多层上下文的核函数。设计中体现了如下信息:1)多层的单词直方图;2)多层的“词组”直方图;3)单词(以及词组)的上下文的类别。然后将该核函数应用于支持向量机,对图像进行分类。在Corel图像库等公共测试集上,该方法取得出色的性能。此外,在一个实用性很强的复杂问题中进行了对比:识别成人图像和泳装图像。该方法的识别准确率,比经典方法提高了约7%。实验结果表明,将核函数度量同视觉单词的多层次描述结合在一起,能够显著提高图像的识别能力。  相似文献   

One of the most promising new technologies for widespread application is image annotation and retrieval. Nevertheless, this task is very difficult to accomplish as target images differ significantly in appearance and belong to a wide variety of categories. In this paper, we propose a new image annotation and retrieval method for miscellaneous weakly labeled images, by combining higher-order local auto-correlation (HLAC) features and a framework of probabilistic canonical correlation analysis. The distance between images can be defined in the intrinsic space for annotation using conceptual learning of images and their labels. Because this intrinsic space is highly compressed compared to the image feature space, our method achieves both faster and more accurate image annotation and retrieval. The HLAC features are powerful global features with additive and position invariant properties. These properties work well with images, which have an arbitrary number of objects at arbitrary locations. The proposed method is shown to outperform existing methods using a standard benchmark dataset.  相似文献   

针对现有深度学习图像修复方法对不同尺度特征的感知和表达能力存在不足的问题,提出一种利用多尺度通道注意力与分层残差网络的图像修复模型.首先采用U-Net作为生成器的主干网络,实现对破损图像的编码与解码操作;然后通过在编码器与解码器中分别构建多尺度的分层残差结构,以增强网络提取和表达破损图像特征的能力;最后在编码器与解码器间的跳跃连接中嵌入扩张的多尺度通道注意力模块,以提高模型对编码器中图像低级特征的利用效率.实验结果表明,在人脸、街景等数据集的破损图像修复上,该模型在主观视觉感受和客观评价指标方面均优于其他经典的图像修复方法.  相似文献   

This paper addresses parameter drift in stochastic models. We define a notion of context that represents invariant, stable-over-time behavior and we then propose an algorithm for detecting context changes in processing a stream of data. A context change is seen as model failure, when a probabilistic model representing current behavior is no longer able to “fit” newly encountered data. We specify our stochastic models using a first-order logic-based probabilistic modeling language called Generalized Loopy Logic (GLL). An important component of GLL is its learning mechanism that can identify context drift. We demonstrate how our algorithm can be incorporated into a failure-driven context-switching probabilistic modeling framework and offer several examples of its application.  相似文献   

Cybernetics and Systems Analysis - The method of parametric estimation for hierarchical stochastic models under incomplete observations is considered. The method is based on the features of the...  相似文献   

通过对犯罪现场的虚拟重建不仅能够重建出逼真的犯罪现场环境,而且实现了对虚拟物证的管理,解决犯罪现场重现的难题,使办案人员能从各个角度观察犯罪现场,更加形象化地分析罪犯的行为,为案件研判提供平台。  相似文献   

证件照片的特征提取与检索   总被引:1,自引:0,他引:1  
研究描述人脸特征的有效方法,讨论身份证照片的特征提取和检索.采用自适应肤色检测技术改进通用的肤色检测算法,进行脸部区域的划分.提出DCT系数投影法对面部五官区域进行分割,在各区域中提取面部几何特征.引入描述脸颊和下颔轮廓的曲线参数作为脸形特征,得到对人脸特征更准确的描述.将面部几何特征矢量匹配、脸形曲线参数匹配和脸部图像相关匹配相结合,实现人像照片的准确检索.实验表明该方法性能优良.  相似文献   

在图像去雾过程中,对大气光透射率估计不准确,会降低去雾图像场景亮度,并导致天空区域出现光晕现象。为此,提出一种基于分块优化透射率与自适应优化场景亮度的图像去雾算法。根据图像有雾程度评判标准对透射率进行分块优化,结合大气光强度求解大气散射模型获得无雾图像,并通过局部自适应调整图像灰度值来提高图像场景亮度。实验结果表明,相较于引导图滤波和对比度增加算法,该算法去雾后的图像更清晰,保边效果明显,且视觉效果更佳,适用于交通监管、安全监控和目标识别等应用领域。  相似文献   

基于分水岭和重叠率衡量的多级彩色图像分割   总被引:1,自引:0,他引:1       下载免费PDF全文
由于分水岭方法进行图像分割时经常是在梯度图像上进行,并经常产生过分割的结果,因此为克服图像过分割问题和提高分割的准确性,提出了一种基于分水岭和重叠率衡量分层融合策略的彩色图像分割新算法——HWO。该算法首先将RGB颜色空间转化到Lab颜色空间,并根据a、b维来提取统计2维直方图,同时在直方图上运用分水岭分割方法,通过对峰进行填充来得到图像的初步分割结果;然后将与填充对应的分割区域样本与高斯分布结合起来,对图像进行高斯混合模型假设下的参数估计;最后对模型与模型间进行重叠率衡量及分层区域融合,以得到最终的图像分割结果。实验中,首先采用训练图像集对算法涉及的两个参数进行确定,然后对测试图像集的分割效果和分割时间性能进行评估,评估是以标准的人工分割图像库为基准的。实验结果表明,该算法可解决过分割问题,其评估所得分准率及分全率综合衡量系数为0.609,而人工分割综合衡量系数为0.79,同时新方法的分割时间仅为传统方法的1/3,分割速度有了较大提高。  相似文献   

通过检测在交通肇事现场拍摄的照片上的标准模板,建立二维照片与三维场景之间的对应关系,借助标准模板的尺寸,直接从照片计算出肇事现场任意两点的距离,为交通事故的评判、绘制交通肇事现场平面图提供了客观数据。  相似文献   

We propose a novel framework of using a nonparametric Bayesian model, called Dual Hierarchical Dirichlet Processes (Dual-HDP) (Wang et al. in IEEE Trans. Pattern Anal. Mach. Intell. 31:539–555, 2009), for unsupervised trajectory analysis and semantic region modeling in surveillance settings. In our approach, trajectories are treated as documents and observations of an object on a trajectory are treated as words in a document. Trajectories are clustered into different activities. Abnormal trajectories are detected as samples with low likelihoods. The semantic regions, which are subsets of paths commonly taken by objects and are related to activities in the scene, are also modeled. Under Dual-HDP, both the number of activity categories and the number of semantic regions are automatically learnt from data. In this paper, we further extend Dual-HDP to a Dynamic Dual-HDP model which allows dynamic update of activity models and online detection of normal/abnormal activities. Experiments are evaluated on a simulated data set and two real data sets, which include 8,478 radar tracks collected from a maritime port and 40,453 visual tracks collected from a parking lot.  相似文献   

在Flickr图像共享网站上,大量无标签或者缺少标签的图像往往会因为标签信息的不完整,以致无法被有效地利用和检索。为了有效地进行图像检索,从Flickr用户经常会根据上传图像所隐含的主题而将其推荐到多个相关社群的特点出发,提出了一种新颖的基于社群隐含主题挖掘和多社群信息融合的自动图像标注算法。与传统的自动图像标注方法不同,该算法首先采用隐Dirichlet分配模型(latent Dirichlet allocation,LDA)对单个社群里的隐含主题(topic)进行挖掘,并利用隐含主题对由相似图像标签传播产生的初始“噪音”标签进行过滤;然后对同属于多个社群的图像,通过多社群信息融合来生成最终标注结果。实验结果显示了该新算法的有效性。  相似文献   

We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.  相似文献   

