期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Latent semantic factorization for multimedia representation learning

Zhang Hong Huang Yu Xu Xin Zhu Ziqi Deng Chunhua 《Multimedia Tools and Applications》2018,77(3):3353-3368

Due to the rapid development of multimedia applications, cross-media semantics learning is becoming increasingly important nowadays. One of the most challenging issues for cross-media semantics understanding is how to mine semantic correlation between different modalities. Most traditional multimedia semantics analysis approaches are based on unimodal data cases and neglect the semantic consistency between different modalities. In this paper, we propose a novel multimedia representation learning framework via latent semantic factorization (LSF). First, the posterior probability under the learned classifiers is served as the latent semantic representation for different modalities. Moreover, we explore the semantic representation for a multimedia document, which consists of image and text, by latent semantic factorization. Besides, two projection matrices are learned to project images and text into a same semantic space which is more similar with the multimedia document. Experiments conducted on three real-world datasets for cross-media retrieval, demonstrate the effectiveness of our proposed approach, compared with state-of-the-art methods.

相似文献

2.

Bridging the Gap: Query by Semantic Example 总被引：4，自引：0，他引：4

Nikhil Rasiwasia Moreno P.J. Vasconcelos N. 《Multimedia, IEEE Transactions on》2007,9(5):923-938

A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinomial, of posterior concept probabilities. Retrieval is based on the query-by-example paradigm: the user provides a query image, for which 1) a semantic multinomial is computed and 2) matched to those in the database. QBSE is shown to have two main properties of interest, one mostly practical and the other philosophical. From a practical standpoint, because it inherits the generalization ability of SR inside the space of known visual concepts (referred to as the semantic space) but performs much better outside of it, QBSE produces retrieval systems that are more accurate than what was previously possible. Philosophically, because it allows a direct comparison of visual and semantic representations under a common query paradigm, QBSE enables the design of experiments that explicitly test the value of semantic representations for image retrieval. An implementation of QBSE under the minimum probability of error (MPE) retrieval framework, previously applied with success to both QBVE and SR, is proposed, and used to demonstrate the two properties. In particular, an extensive objective comparison of QBSE with QBVE is presented, showing that the former significantly outperforms the latter both inside and outside the semantic space. By carefully controlling the structure of the semantic space, it is also shown that this improvement can only be attributed to the semantic nature of the representation on which QBSE is based. 相似文献

3.

Unsupervised approximate-semantic vocabulary learning for human action and video classification

Qiong Zhao Horace H.S. Ip 《Pattern recognition letters》2013

相似文献

4.

Visual and semantic context modeling for scene-centric image annotation

Zand Mohsen Doraisamy Shyamala Abdul Halin Alfian Mustaffa Mas Rina 《Multimedia Tools and Applications》2017,76(6):8547-8571

相似文献

5.

Statistical Model of Shape Moments with Active Contour Evolution for Shape Detection and Segmentation

Yan Zhang Bogdan J. Matuszewski Aymeric Histace Frédéric Precioso 《Journal of Mathematical Imaging and Vision》2013,47(1-2):35-47

This paper describes a novel method for shape representation and robust image segmentation. The proposed method combines two well known methodologies, namely, statistical shape models and active contours implemented in level set framework. The shape detection is achieved by maximizing a posterior function that consists of a prior shape probability model and image likelihood function conditioned on shapes. The statistical shape model is built as a result of a learning process based on nonparametric probability estimation in a PCA reduced feature space formed by the Legendre moments of training silhouette images. A greedy strategy is applied to optimize the proposed cost function by iteratively evolving an implicit active contour in the image space and subsequent constrained optimization of the evolved shape in the reduced shape feature space. Experimental results presented in the paper demonstrate that the proposed method, contrary to many other active contour segmentation methods, is highly resilient to severe random and structural noise that could be present in the data. 相似文献

6.

基于上下文和浅层空间编解码网络的图像语义分割方法

罗会兰黎宵《自动化学报》2022,48(7):1834-1846

当前图像语义分割研究基本围绕如何提取有效的语义上下文信息和还原空间细节信息两个因素来设计更有效算法. 现有的语义分割模型, 有的采用全卷积网络结构以获取有效的语义上下文信息, 而忽视了网络浅层的空间细节信息; 有的采用U型结构, 通过复杂的网络连接利用编码端的空间细节信息, 但没有获取高质量的语义上下文特征. 针对此问题, 本文提出了一种新的基于上下文和浅层空间编解码网络的语义分割解决方案. 在编码端, 采用二分支策略, 其中上下文分支设计了一个新的语义上下文模块来获取高质量的语义上下文信息, 而空间分支设计成反U型结构, 并结合链式反置残差模块, 在保留空间细节信息的同时提升语义信息. 在解码端, 本文设计了优化模块对融合后的上下文信息与空间信息进一步优化. 所提出的方法在3个基准数据集CamVid、SUN RGB-D和Cityscapes上取得了有竞争力的结果. 相似文献

7.

基于稀疏表征多分类器融合的遮挡人脸识别 总被引：2，自引：0，他引：2

邓楠徐正光王珺《计算机应用研究》2013,30(6):1914-1916

为了同时利用人脸局部信息, 提出一种基于稀疏表征多分类器融合的遮挡人脸识别方法。先对人脸进行多分辨率分块, 求取并根据各子块稀疏表征分类器的识别率确定其权重, 计算其后验概率估值, 最终利用加权融合准则进行多分类器融合识别。在AR和YaleA库的实验结果表明, 该算法结果比稀疏表征遮挡人脸识别的效果更好, 鲁棒性更高。相似文献

8.

一种基于稀疏典型性相关分析的图像检索方法 总被引：1，自引：0，他引：1

庄凌庄越挺吴江琴叶振超吴飞《软件学报》2012,23(5):1295-1304

图像语义检索的一个关键问题就是要找到图像底层特征与语义之间的关联,由于文本是表达语义的一种有效手段,因此提出通过研究文本与图像两种模态之间关系来构建反映两者间潜在语义关联的有效模型的思路,基于该模型,可使用自然语言形式(文本语句)来表达检索意图,最终检索到相关图像.该模型基于稀疏典型性相关分析(sparse canonical correlation analysis,简称sparse CCA),按照如下步骤训练得到:首先利用隐语义分析方法构造文本语义空间,然后以视觉词袋(bag of visual words)来表达文本所对应的图像,最后通过Sparse CCA算法找到一个语义相关空间,以实现文本语义与图像视觉单词间的映射.使用稀疏的相关性分析方法可以提高模型可解释性和保证检索结果稳定性.实验结果验证了Sparse CCA方法的有效性,同时也证实了所提出的图像语义检索方法的可行性. 相似文献

9.

Image representation for generic object recognition using higher-order local autocorrelation features on posterior probability images

Tetsu Matsukawa Takio Kurita 《Pattern recognition》2012,45(2):707-719

This paper presents a novel image representation method for generic object recognition by using higher-order local autocorrelations on posterior probability images. The proposed method is an extension of the bag-of-features approach to posterior probability images. The standard bag-of-features approach is approximately thought of as a method that classifies an image to a category whose sum of posterior probabilities on a posterior probability image is maximum. However, by using local autocorrelations of posterior probability images, the proposed method extracts richer information than the standard bag-of-features. Experimental results reveal that the proposed method exhibits higher classification performances than the standard bag-of-features method. 相似文献

10.

基于潜语义主题加强的跨媒体检索算法

黄育张鸿《计算机应用》2017,37(4):1061-1064

针对不同模态数据对相同语义主题表达存在差异性,以及传统跨媒体检索算法忽略了不同模态数据能以合作的方式探索数据的内在语义信息等问题,提出了一种新的基于潜语义主题加强的跨媒体检索（LSTR）算法。首先,利用隐狄利克雷分布（LDA）模型构造文本语义空间,然后以词袋（BoW）模型来表达文本对应的图像;其次,使用多分类逻辑回归对图像和文本分类,用得到的基于多分类的后验概率表示文本和图像的潜语义主题;最后,利用文本潜语义主题去正则化图像的潜语义主题,使图像的潜语义主题得到加强,同时使它们之间的语义关联最大化。在Wikipedia数据集上,文本检索图像和图像检索文本的平均查准率为57.0%,比典型相关性分析（CCA）、SM（Semantic Matching）、SCM（Semantic Correlation Matching）算法的平均查准率分别提高了35.1%、34.8%、32.1%。实验结果表明LSTR算法能有效地提高跨媒体检索的平均查准率。相似文献

11.

面向GIS多重表达的本体语义模型

郑茂辉冯学智蒋莹滢邓敏宋拥军《遥感信息》2006,(5):12-16

多重表达是地理信息弹性存取和集成分析的一个内在要求,它实质上提供了一个多尺度、多应用主题的信息集成机制。多重表达的概念建模需要在高度抽象的地理概念层次上实现地理信息一体化的弹性表达,而不是局限于数据库中几何多样性的一致性描述。基于本体的地理信息建模更贴近于认知模型,还有助于模型语义的表达以及基于语义的信息集成和共享。论文探讨了基于本体的地理概念表达与GIS语义的形式化,并通过多重表达上下文的抽象以及本体逻辑基础的上下文扩展,给出一个支持GIS多重表达的本体语义模型,该模型能够为多重表达数据库的实现提供一个易于理解和共享的形式化基础。相似文献

12.

Temporal Contexts for Discourse Representation: An Extension of the Conceptual Graph Approach

Bernard Moulin 《Applied Intelligence》1997,7(3):227-255

A discourse is composed of a sequence of sentences that must be interpreted with respect to the context in which they are uttered and to the actions that produce them: locutors' speech acts. The analysis of discourse content must be based on a pragmatic approach to the study of language in use. Some of the most obvious linguistic elements that require contextual information for their representation are deictic forms such as here, now, I, you, this , and verb tenses.Several authors have recognized a need for introducing contextual structures in knowledge representation models such as semantic networks. Sowa's Conceptual Graph Theory is a powerful approach to conceptually represent knowledge contained in discourses. However, it must be extended in order to represent several semantic and pragmatic mechanisms related to the expression of time in natural language. In this paper we present such an extension as a framework for modeling temporal knowledge in discourses integrating several features borrowed from speech act theory.First, we introduce the notions of time interval, temporal object, temporal situation, and temporal relation. Then, we discuss the importance of explicitly introducing the concept of time coordinate system in a discourse representation and we present different kinds of temporal contexts: narrator's perspective, agent's perspective and temporal localization. We show how this conceptual framework can be used to represent various referential mechanisms in discourse such as anaphoras, indexicals, direct and indirect styles. We also discuss how to model several linguistic phenomena such as speech act characteristics and the specification of performative and attitude utterances. Finally, we briefly discuss how verb tenses can be determined in a discourse on the basis of this temporal approach. 相似文献

13.

基于显著性语义区域加权的图像检索算法

陈宏宇邓德祥颜佳范赐恩《计算机应用》2019,39(1):136-142

针对计算视觉领域图像实例检索的问题，提出了一种基于深度卷积特征显著性引导的语义区域加权聚合方法。首先提取深度卷积网络全卷积层后的张量作为深度特征，并利用逆文档频率（IDF）方法加权深度特征得到特征显著图；然后将其作为约束，引导深度特征通道重要性排序以提取不同特殊语义区域深度特征，排除背景和噪声信息的干扰；最后使用全局平均池化进行特征聚合，并利用主成分分析（PCA）降维白化得到图像的全局特征表示，以进行距离度量检索。实验结果表明，所提算法提取的图像特征向量语义信息更丰富、辨识力更强，在四个标准的数据库上与当前主流算法相比准确率更高，鲁棒性更好。相似文献

14.

伪标签置信选择的半监督集成学习视频语义检测

尹玉詹永照姜震《计算机应用》2019,39(8):2204-2209

在视频语义检测中，有标记样本不足会严重影响检测的性能，而且伪标签样本中的噪声也会导致集成学习基分类器性能提升不足。为此，提出一种伪标签置信选择的半监督集成学习算法。首先，在三个不同的特征空间上训练出三个基分类器，得到基分类器的标签矢量；然后，引入加权融合样本所属某个类别的最大概率与次大概率的误差和样本所属某个类别的最大概率与样本所属其他各类别的平均概率的误差，作为基分类器的标签置信度，并融合标签矢量和标签置信度得到样本的伪标签和集成置信度；接着，选择集成置信度高的样本加入到有标签的样本集，迭代训练基分类器；最后，采用训练好的基分类器集成协作检测视频语义概念。该算法在实验数据集UCF11上的平均准确率到达了83.48%，与Co-KNN-SVM算法相比，平均准确率提高了3.48个百分点。该算法选择的伪标签能体现样本所属类别与其他类别的总体差异性，又能体现所属类别的唯一性，可减少利用伪标签样本的风险，有效提高视频语义概念检测的准确率。相似文献

15.

基于高层语义的图像检索算法 总被引：16，自引：0，他引：16

王崇骏杨育彬陈世福《软件学报》2004,15(10):1461-1469

利用Bayes统计学习和决策理论,建立了一种图像语义综合概率描述模型(image probability semanticmodel,简称IPSM).该模型是一种基于描述性特征建模方法的分层体系结构,由原始图像层、图像特征层、图像语义层、综合概率层、概率传播层和语义映射层6个部分组成.并在IPSM模型对图像的语义分类特征进行描述和提取的基础上,提出并实现了基于高层语义的图像检索算法(semantic high-1evel retrieval algorithm,简称SHM)以及基于高层语义的相关反馈算法(semantic relevance feedback,简称SRF).实验结果表明,IPSM模型及SHR和SRF两个算法能够有效地对图像的高层语义进行刻画,其图像匹配检索效果良好,并具有稳定的检索性能. 相似文献

16.

基于局部颜色-空间特征的图像语义概念检测 总被引：1，自引：0，他引：1

下载免费PDF全文

刘洁敏姚豫张瑞杨小康《中国图象图形学报》2008,13(10)

针对基于语义的图像检索系统,提出了一种基于局部颜色-空间特征的图像语义概念检测方法。各种基于颜色、纹理和形状的全局特征都存在着众多信息冗余项和干扰项,而该文提出的局部颜色-空间特征则是利用语义概念层的先验知识进行特征降维后提取出的特征,它能更好地描述图像的语义内容,且具有容易提取、计算复杂度低的优点。实验结果表明,基于局部颜色-空间特征的概念检测方法优于基于全局特征的概念检测方法,将其用于图像检索后的检索精度比采用基于全局颜色特征的方法提高了36.4%。相似文献

17.

基于特征空间切分建模的变形手势跟踪算法

张彦彬陈晓春《机器人》2018,40(4):401-412

为解决人机交互中手势形变和无规律运动带来的跟踪难题,提出了一种基于特征空间切分建模的非参数核密度估计算法来实现手势跟踪.首先,在检测模块中利用AdaBoost分类器检测图像中手势的存在,将检测到的手势位置信息传送给跟踪模块,该模块精确提取手势目标从而对其颜色建模.然后,利用目标的颜色模型对各帧图像进行后验概率密度估算,获取运动目标的概率密度图像,将其分解成手势运动区和同色干扰区.最后,对同色干扰区采用混合高斯建模来削弱同色目标的干扰.当目标丢失时启动再检测模块,并利用贝叶斯分类器与方差分类器实现手势目标重检.实验结果表明,该算法通过对特征空间切分建模以及不同分类器的级联解决了变形手势跟踪的同色干扰与再检测难题.该算法提高了跟踪的准确率（>81.5%）,适合于非刚性物体做无规则运动的复杂场景. 相似文献

18.

A robust face hallucination technique based on adaptive learning method

Rohit U. Abdu Rahiman V. George Sudhish N. 《Multimedia Tools and Applications》2017,76(15):16809-16829

Position-patch based approaches have been proposed for single-image face hallucination. This paper models the face hallucination problem as a coefficient recovery problem with respect to an adaptive training set for improved noise robustness. The image-adaptive training set is constructed by corrupting a local training set of position-patches by adding specific amounts of noise depending on the input image noise level. In this proposed method, image denoising and super-resolution are simultaneously carried out to obtain superior results. Though the principle is general and can be extended to most super-resolution algorithms, we discuss this in context of existing locality-constrained representation (LcR) approach in order to compare their performances. It can be demonstrated that the proposed approach can quantitatively and qualitatively yield better results in high noisy environments.

相似文献

19.

Contextual modeling on auxiliary points for robust image reranking

Ying LI Xiangwei KONG Haiyan FU Qi TIAN 《Frontiers of Computer Science》2019,13(5):1010

相似文献

20.

A middleware platform for the dynamic evolution of distributed component-based systems

Yu Zhou Xiaoxing Ma Harald Gall 《Computing》2014,96(8):725-747

In this paper, we present a middleware platform that supports the dynamic evolution of distributed component-based systems. It leverages the concept of ontologies to model the context of a system and an intrinsic mechanism is integrated to causally connect the dynamic architecture specification to the running system implementation. The ontological modeling covers both the environmental and the architectural knowledge using semantic data modeling. The intrinsic mechanism can automatically derive a run-time polymorphic architecture object to coordinate the involved components. The ontology based contextual representation and the polymorphic architecture-driven dynamic evolution are the two underpinnings of the platform. A scenario application—including the two primitive evolution actions—with the performance analysis is discussed to illustrate the feasibility. 相似文献