共查询到20条相似文献,搜索用时 0 毫秒
1.
In recent years we witnessed a surge of interest in subspace learning for image classification. However, the previous methods lack of high accuracy since they do not consider multiple features of the images. For instance, we can represent a color image by finding a set of visual features to represent the information of its color, texture and shape. According to the “Patch Alignment” Framework, we developed a new subspace learning method, termed Semi-Supervised Multimodal Subspace Learning (SS-MMSL), in which we can encode different features from different modalities to build a meaningful subspace. In particular, the new method adopts the discriminative information from the labeled data to construct local patches and aligns these patches to get the optimal low dimensional subspace for each modality. For local patch construction, the data distribution revealed by unlabeled data is utilized to enhance the subspace learning. In order to find a low dimensional subspace wherein the distribution of each modality is sufficiently smooth, SS-MMSL adopts an alternating and iterative optimization algorithm to explore the complementary characteristics of different modalities. The iterative procedure reaches the global minimum of the criterion due to the strong convexity of the criterion. Our experiments of image classification and cartoon retrieval demonstrate the validity of the proposed method. 相似文献
2.
Improvements in digital technology have made possible the production and distribution of huge quantities of digital multimedia
data. Tools for high-level multimedia documentation are becoming indispensable to efficiently access and retrieve desired
content from such data. In this context, automatic genre classification provides a simple and effective solution to describe
multimedia contents in a structured and well understandable way. We propose in this article a methodology for classifying
the genre of television programmes. Features are extracted from four informative sources, which include visual-perceptual
information (colour, texture and motion), structural information (shot length, shot distribution, shot rhythm, shot clusters
duration and saturation), cognitive information (face properties, such as number, positions and dimensions) and aural information
(transcribed text, sound characteristics). These features are used for training a parallel neural network system able to distinguish
between seven video genres: football, cartoons, music, weather forecast, newscast, talk show and commercials. Experiments
conducted on more than 100 h of audiovisual material confirm the effectiveness of the proposed method, which reaches a classification
accuracy rate of 95%.
Maurizio Montagnuolo
Born in 1975, Maurizio Montagnuolo received his Laurea degree in Telecommunications Engineering from the Polytechnic of Turin
in 2004, after developing his thesis at the RAI Research Centre. Currently, he is attending the Ph.D. course in “Business
and Management” at the University of Turin, in collaboration with RAI, and supported by EuriX S.r.l., Turin. His main research
interests concern the semantic classification of audiovisual content.
Alberto Messina
is from the RAI—Radiotelevisione Italiana Centre for Research and Technological Innovation (CRIT), Turin.
He began his collaboration as a research engineer with RAI in 1996, when he completed his MS Thesis in Electronic Engineering
(at Politecnico di Torino) about objective quality evaluation of MPEG2 video coding. After starting his career as a designer
of RAI’s Multimedia Catalogue, he has been involved in several internal and international research projects in the field of
digital archiving, with particular emphasis on automated documentation, and automated production. His current interests are
ranging from file formats and metadata standards to the domain of content analysis and information extraction algorithms,
where he now concentrates his main focus. Recently, he has started promising research activities concerning semantic information
extraction from the numerical analysis of audiovisual material, particularly in the field of conceptual characterisation of
multimedia objects, genre classification of multimedia items, automatic editorial segmentation of TV programmes. He is also
author of technical and scientific publications in this subject area. He has extensive collaborations with the local University
of Torino—Computer Science Department, which include common research projects and students’ tutorship. To complete his scientific
formation, he has recently decided to take a PhD in the area of Computer Science.
He is active member of several EBU projects including P/TVFILE, P/MAG and P/CP, chairman of the P/SCAIE project dealing with
automatic metadata extraction techniques. He is currently working in the EU PrestoSpace project in the Metadata Access and
Delivery area. He has served as Programme Committee Member in a Special Track of the 10th Conference of Italian Association
of Artificial Intelligence, and in the First Workshop on Ambient media Delivery and Interactive Television (AMDIT08).
相似文献
3.
针对多模态融合效果不佳,不能充分挖掘特定时间段,多视角关键情感信息的问题,提出了一种基于多视角的时序多模态情感分类模型,用于提取特定时间段,多视角下的关键情感信息。首先,对文本标题及文本内容两种视角下的数据进行低维空间词嵌入和序列表达,提取不同视角的多模态时序特征,对图片截取,水平镜像两种视角下的数据进行特征提取;其次,采用循环神经网络构建多模态数据的时序序列交互特征,增大互信息;最后,基于对比学习进行联合训练,完成情感分类。该模型在两个多模态情感分类基准数据集Yelp和Mutli-Zol上评估,准确度分别为73.92%、69.15%。综合实验表明,多视角的特定时间段多模态语句序列可提升模型性能。 相似文献
4.
International Journal on Document Analysis and Recognition (IJDAR) - The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to... 相似文献
5.
ABSTRACTIt is well known that various features extraction approaches are utilized in polarimetric synthetic aperture (PolSAR) terrain classification for representing the data characteristic. It needs relevant and effective feature fusion algorithms to process complicated features. To address this issue, this article presents a multimodal sparse representation (MSR) framework based algorithm to fuse the different feature vectors from the complicated data space. Polarimetric data features, decomposition features, and the texture features from Pauli colour-coded image are selected to represent multimodal data in different observation modes. The corresponding multimodal manifold regularizations are added to MSR framework to approximate the data structure. Considering the independence and correlation of features, the intrinsic affinity matrices are calculated from this framework. They are processed via local preserve projection algorithm to project the multimodal features into a low dimensionally intrinsic feature space for subsequent classification. Three datasets are utilized in experiments, Western Xi’an, Flevoland, and San Francisco Bay datasets from the Radarsat-2 system in C-band. The effect of regularization parameters and different dimensional fused features are analysed in visualization and quantitation performance. The experiment results demonstrate that the effectiveness and validity of proposed method are superior to other state-of-art methods. 相似文献
6.
Multimodal biometrics technology consolidates information obtained from multiple sources at sensor level, feature level, match score level, and decision level. It is used to increase robustness and provide broader population coverage for inclusion. Due to the inherent challenges involved with feature-level fusion, combining multiple evidences is attempted at score, rank, or decision level where only a minimal amount of information is preserved. In this paper, we propose the Group Sparse Representation based Classifier (GSRC) which removes the requirement for a separate feature-level fusion mechanism and integrates multi-feature representation seamlessly into classification. The performance of the proposed algorithm is evaluated on two multimodal biometric datasets. Experimental results indicate that the proposed classifier succeeds in efficiently utilizing a multi-feature representation of input data to perform accurate biometric recognition. 相似文献
7.
Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data. We employ two datasets for multimodal classification tasks, build models based on our architecture and other state-of-the-art models, and analyze their performance on various situations. The results show that our architecture outperforms the other multimodal fusion architectures when some parts of data are not available. 相似文献
8.
通过方面术语提取和方面级情感分类任务提取句子中的方面-情感对,有助于Twitter,Facebook等社交媒体平台挖掘用户对不同方面的情感,对个性化推荐有重要的意义. 在多模态领域,现有方法使用2个独立的模型分别完成2个子任务,方面术语提取提取句子中包含的商品、重要人物等实体或实体的方面,方面级情感分类根据给定的方面术语预测用户的情感倾向. 上述方法存在2个问题:1)使用2个独立的模型丢失了2个任务之间在底层特征的延续性,无法建模句子潜在的语义关联;2)方面级情感分类1次预测1个方面的情感,与方面术语提取同时提取多个方面的吞吐量不匹配,且2个模型串行执行使得提取方面-情感对的效率低. 为解决这2个问题,提出基于多模态方面术语提取和方面级情感分类的统一框架UMAS. 首先,建立共享特征模块,实现任务间潜在语义关联建模,并且共享表示层使得2个子任务只需关心各自上层的网络,降低了模型的复杂性;其次,模型利用序列标注同时输出句子中包含的多个方面及其对应的情感类别,提高了方面-情感对的提取效率. 此外,在这2个子任务中同时引入词性:利用其中蕴含的语法信息提升方面术语提取的性能;通过词性获取观点词信息,提升方面级情感分类的性能. 实验结果表明,该统一框架在Twitter2015,Restaurant2014这2个基准数据集上相比于多个基线模型具有优越的性能. 相似文献
9.
During disasters, multimedia content on social media sites offers vital information. Reports of injured or deceased people, infrastructure destruction, and missing or found people are among the types of information exchanged. While several studies have demonstrated the importance of both text and picture content for disaster response, previous research has primarily concentrated on the text modality and not so much success with multi-modality. Latest research in multi-modal classification in disaster related tweets uses comparatively primitive models such as KIMCNN and VGG16. In this research work we have taken this further and utilized state-of-the-art models in both text and image classification to try and improve multi-modal classification of disaster related tweets. The research was conducted on two different classification tasks, first to detect if a tweet is informative or not, second to understand the response needed. The process of multimodal analysis is broken down by incorporating different methods of feature extraction from the textual data corpus and pre-processing the corresponding image corpus, then we use several classification models to train and predict the output and compare their performances while tweaking the parameters to improve the results. Models such as XLNet, BERT and RoBERTa in text classification and ResNet, ResNeXt and DenseNet in image classification were trained and analyzed. Results show that the proposed multimodal architecture outperforms models trained using a single modality (text or image alone). Also, it proves that the newer state-of-the-art models outperform the baseline models by a reasonable margin for both the classification tasks. 相似文献
10.
In this paper, we tackle the problem of multimodal learning for autonomous robots. Autonomous robots interacting with humans in an evolving environment need the ability to acquire knowledge from their multiple perceptual channels in an unsupervised way. Most of the approaches in the literature exploit engineered methods to process each perceptual modality. In contrast, robots should be able to acquire their own features from the raw sensors, leveraging the information elicited by interaction with their environment: learning from their sensorimotor experience would result in a more efficient strategy in a life-long perspective. To this end, we propose an architecture based on deep networks, which is used by the humanoid robot iCub to learn a task from multiple perceptual modalities (proprioception, vision, audition). By structuring high-dimensional, multimodal information into a set of distinct sub-manifolds in a fully unsupervised way, it performs a substantial dimensionality reduction by providing both a symbolic representation of data and a fine discrimination between two similar stimuli. Moreover, the proposed network is able to exploit multimodal correlations to improve the representation of each modality alone. 相似文献
11.
针对在线学习中的算法效率问题,提出了一种增量式类内局部保持降维算法.该算法综合考虑了基于QR分解的降维算法与保类内Fisher判别分析法的优点,根据训练过程中新增的样本进行投影矩阵在线更新,克服了传统的批量式训练方法在线学习时计算量过分冗余的缺陷.同时,通过兼顾输入样本的局部结构和全局分布状态,使得该算法能够有效地应用于多簇、重叠的数据形态.在ORL人脸库和COIL20图像库上的实验表明,该增量式算法不仅在降维效果上基本与批量式算法保持一致,而且具有较大的效率优势. 相似文献
12.
Neural Computing and Applications - MRI is a broadly used imaging method to determine glioma-based tumors. During image processing, MRI provides large image information, and therefore, an accurate... 相似文献
13.
针对早期轻度认知障碍(MCI)根据医学诊断认知量表评估极有可能无法判断的问题,提出了一种多模态网络融合的MCI辅助诊断分类方法。基于图论的复杂网络分析方法在神经影像领域的应用已得到广泛认可,但采用不同模态的成像技术研究脑部疾病对大脑网络拓扑结构属性的影响会产生不同结果。首先,使用弥散张量成像(DTI)与静息态功能磁共振成像(rs-fMRI)数据构建大脑结构和功能连接的融合网络。然后,融合网络的拓扑属性被施以单因素方差分析(ANOVA),选择具有显著差异的属性作为分类特征。最后,利用支持向量机(SVM)留一法交叉验证对健康组和MCI组分类,估算准确率。实验结果表明,所提方法的分类结果准确率达到94.44%,相较单一模态数据法的分类结果有明显提高。所提方法诊断出的MCI患者在扣带回、颞上回以及额叶和顶叶部分区域等许多脑区表现出显著异常,与已有研究结果基本一致。 相似文献
14.
针对主成分分析(PCA)算法在人脸识别中识别率低的问题,提出一种基于类内加权平均值的模块PCA算法。该算法对每一类训练样本中每个训练样本的每个子块求类内加权平均值,用类内加权平均值对训练样本类内的相应子块进行规范化处理。由所有规范化后的子块构成总体散布矩阵,得到最优投影矩阵,由训练集全体子块的中间值对训练样本子块和测试样本子块进行规范化后投影到最优投影矩阵,得到识别特征,并用最近距离分类器分类。ORL人脸库上的实验结果表明,该算法的识别性能优于普通模块PCA算法。 相似文献
16.
Multimedia Tools and Applications - Due to the availability of an enormous amount of multimodal content on the social web and its applications, automatic sentiment analysis, and emotion detection... 相似文献
17.
为了更好地获取人脸的纹理特征和解决人脸多频带的权值问题,提出了双树复小波多频带类内类间不确定度特征融合的人脸识别算法。首先使用了人脸双树复小波多频带特征构建人脸的纹理特征,引入了双树复小波多频带类内类间的不确定度计算多频带特征权值,同时采用了二维主成份分析方法对人脸多频带特征进行重构线性子空间,人脸子空间加权融合得到的最终特征能够保证投影后样本在新的空间中有最小的类内距离和最大的类间距离。使用ORL人脸图像库进行了实验与分析,结果表明所提出的方法比经典的二维主成份分析、传统小波、Gabor小波和双树复小波方法取得了更好的识别效果。 相似文献
18.
Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual alignment, data labeling and feature selection under uncontrolled environments with occlusions, motion blurs, varying image resolutions and perspective distortions. In this work, we propose an effective multimodal temporal panorama approach for moving vehicle detection and classification using a novel long-range audio-visual sensing system. A new audio-visual vehicle (AVV) dataset is created, which features automatic vehicle detection and audio-visual alignment, accurate vehicle extraction and reconstruction, and efficient data labeling. In particular, vehicles’ visual images are reconstructed once detected in order to remove most of the occlusions, motion blurs, and variations of perspective views. Multimodal audio-visual features are extracted, including global geometric features (aspect ratios, profiles), local structure features (HOGs), as well various audio features (MFCCs, etc.). Using radial-based SVMs, the effectiveness of the integration of these multimodal features is thoroughly and systematically studied. The concept of MTP may not be only limited to visual, motion and audio modalities; it could also be applicable to other sensing modalities that can obtain data in the temporal domain. 相似文献
19.
The availability of the humongous amount of multimodal content on the internet, the multimodal sentiment classification, and emotion detection has become the most researched topic. The feature selection, context extraction, and multi-modal fusion are the most important challenges in multimodal sentiment classification and affective computing. To address these challenges this paper presents multilevel feature optimization and multimodal contextual fusion technique. The evolutionary computing based feature selection models extract a subset of features from multiple modalities. The contextual information between the neighboring utterances is extracted using bidirectional long-short-term-memory at multiple levels. Initially, bimodal fusion is performed by fusing a combination of two unimodal modalities at a time and finally, trimodal fusion is performed by fusing all three modalities. The result of the proposed method is demonstrated using two publically available datasets such as CMU-MOSI for sentiment classification and IEMOCAP for affective computing. Incorporating a subset of features and contextual information, the proposed model obtains better classification accuracy than the two standard baselines by over 3% and 6% in sentiment and emotion classification, respectively. 相似文献
20.
粗糙one-class支持向量机(ROC-SVM)在粗糙集理论基础上通过构建粗糙上超平面和下超平面来处理过拟合问题,但是在寻找最优分类超平面的过程中,忽略了训练样本类内结构这一非常重要的先验知识。因此,提出了一种基于类内散度的粗糙one-class支持向量机(WSROC-SVM),该方法通过最小化训练样本类内散度来优化训练样本类内结构,一方面使训练样本在高维特征空间中与坐标原点的间隔尽可能大,另一方面使得训练样本在粗糙上超平面尽可能紧密。在合成数据集和UCI数据集上的实验结果表明,较原始算法,该方法有着更高的识别率和更好的泛化性能,在解决实际分类问题上更具优越性。 相似文献
|