共查询到20条相似文献,搜索用时 0 毫秒
1.
Person identification using multiple cues 总被引:6,自引:0,他引:6
Brunelli R. Falavigna D. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(10):955-966
This paper presents a person identification system based on acoustic and visual features. The system is organized as a set of non-homogeneous classifiers whose outputs are integrated after a normalization step. In particular, two classifiers based on acoustic features and three based on visual ones provide data for an integration module whose performance is evaluated. A novel technique for the integration of multiple classifiers at an hybrid rank/measurement level is introduced using HyperBF networks. Two different methods for the rejection of an unknown person are introduced. The performance of the integrated system is shown to be superior to that of the acoustic and visual subsystems. The resulting identification system can be used to log personal access and, with minor modifications, as an identity verification system 相似文献
2.
《Computer Speech and Language》2007,21(1):219-230
In this study we present various techniques to evaluate the pronunciation of students of a foreign language without any knowledge of the uttered text. Previous attempts have shown that it is feasible to evaluate the pronunciation of a non-native speaker by having implicit or explicit knowledge of the uttered text, provided that enough utterances are available. Our approach is to use characteristics of the mother tongue (SOURCE language) of the speaker in the evaluation of his/her pronunciation. We recorded 20 Greek students speaking English (TARGET language) and evaluated their pronunciation using algorithms that include characteristics of the SOURCE language (Greek). We show that the pronunciation scores that are based on both TARGET- and SOURCE-language characteristics have better correlation with the human scores than those based only on characteristics of the TARGET language. As in previous studies, we found that the best-performing algorithms for automatic evaluation of pronunciation are based on speech recognition technology. 相似文献
3.
4.
目的 传统的单目视觉深度测量方法具有设备简单、价格低廉、运算速度快等优点,但需要对相机进行复杂标定,并且只在特定的场景条件下适用。为此,提出基于运动视差线索的物体深度测量方法,从图像中提取特征点,利用特征点与图像深度的关系得到测量结果。方法 对两幅图像进行分割,获取被测量物体所在区域;然后采用本文提出的改进的尺度不变特征变换SIFT(scale-invariant feature transtorm)算法对两幅图像进行匹配,结合图像匹配和图像分割的结果获取被测量物体的匹配结果;用Graham扫描法求得匹配后特征点的凸包,获取凸包上最长线段的长度;最后利用相机成像的基本原理和三角几何知识求出图像深度。结果 实验结果表明,本文方法在测量精度和实时性两方面都有所提升。当图像中的物体不被遮挡时,实际距离与测量距离之间的误差为2.60%,测量距离的时间消耗为1.577 s;当图像中的物体存在部分遮挡时,该方法也获得了较好的测量结果,实际距离与测量距离之间的误差为3.19%,测量距离所需时间为1.689 s。结论 利用两幅图像上的特征点来估计图像深度,对图像中物体存在部分遮挡情况具有良好的鲁棒性,同时避免了复杂的摄像机标定过程,具有实际应用价值。 相似文献
5.
Multimedia Tools and Applications - Multi-label classification is one of the most challenging tasks in the computer vision community, owing to different composition and interaction (e.g. partial... 相似文献
6.
针对语音情感识别问题,提出一种采用决策模板的多分类器融合方法,利用不同类型的声学特征子集来构造子分类器。不同的子集能充分提高各子分类器之间的“多样性”指标,这是多分类器融合算法能够成功应用的必备条件。与多数投票融合算法和支持向量机相比该方法取得了较好的识别结果。另一方面,从多样性指标分析的角度出发探究该方法能获得较好识别效果的原因。 相似文献
7.
Enrique M. Albornoz Diego H. Milone Hugo L. Rufiner 《Computer Speech and Language》2011,25(3):556-570
The recognition of the emotional state of speakers is a multi-disciplinary research area that has received great interest over the last years. One of the most important goals is to improve the voice-based human–machine interactions. Several works on this domain use the prosodic features or the spectrum characteristics of speech signal, with neural networks, Gaussian mixtures and other standard classifiers. Usually, there is no acoustic interpretation of types of errors in the results. In this paper, the spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated with different configurations and input features, in order to design a new hierarchical method for emotion classification. The proposed multiple feature hierarchical method for seven emotions, based on spectral and prosodic information, improves the performance over the standard classifiers and the fixed features. 相似文献
8.
针对单一语音特征对语音情感表达不完整的问题,将具有良好量化和插值特性的LSF参数与体现人耳听觉特性的MFCC参数相融合,提出基于线谱权重的MFCC(WMFCC)新特征。同时,通过高斯混合模型来对该参数建立模型空间,进一步得到GW-MFCC模型空间参数,以获取更高维的细节信息,进一步提高情感识别性能。采用柏林情感语料库进行验证,新参数的识别率比传统的MFCC和LSF分别有5.7%和6.9%的提高。实验结果表明,提出的WMFCC以及GW-MFCC参数可以有效地表现语音情感信息,提高语音情感识别率。 相似文献
9.
Ismail Mohd Adnan Shahin 《International Journal of Speech Technology》2013,16(3):341-351
Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges. 相似文献
10.
M. J. Marín-Jiménez R. Muñoz-Salinas E. Yeguas-Bolivar N. Pérez de la Blanca 《Machine Vision and Applications》2014,25(1):71-84
Human Interaction Recognition (HIR) in uncontrolled TV video material is a very challenging problem because of the huge intra-class variability of the classes (due to large differences in the way actions are performed, lighting conditions and camera viewpoints, amongst others) as well as the existing small inter-class variability (e.g., the visual difference between hug and kiss is very subtle). Most of previous works have been focused only on visual information (i.e., image signal), thus missing an important source of information present in human interactions: the audio. So far, such approaches have not shown to be discriminative enough. This work proposes the use of Audio-Visual Bag of Words (AVBOW) as a more powerful mechanism to approach the HIR problem than the traditional Visual Bag of Words (VBOW). We show in this paper that the combined use of video and audio information yields to better classification results than video alone. Our approach has been validated in the challenging TVHID dataset showing that the proposed AVBOW provides statistically significant improvements over the VBOW employed in the related literature. 相似文献
11.
Zhi-hua Chen Yi Liu Bin Sheng Jian-ning Liang Jing Zhang Yu-bo Yuan 《Multimedia Tools and Applications》2016,75(24):16943-16958
Image saliency analysis plays an important role in various applications such as object detection, image compression, and image retrieval. Traditional methods for saliency detection ignore texture cues. In this paper, we propose a novel method that combines color and texture cues to robustly detect image saliency. Superpixel segmentation and the mean-shift algorithm are adopted to segment an original image into small regions. Then, based on the responses of a Gabor filter, color and texture features are extracted to produce color and texture sub-saliency maps. Finally, the color and texture sub-saliency maps are combined in a nonlinear manner to obtain the final saliency map for detecting salient objects in the image. Experimental results show that the proposed method outperforms other state-of-the-art algorithms for images with complex textures. 相似文献
12.
In human–computer interaction (HCI), electroencephalogram (EEG) signals can be added as an additional input to computer. An integration of real-time EEG-based human emotion recognition algorithms in human–computer interfaces can make the users experience more complete, more engaging, less emotionally stressful or more stressful depending on the target of the applications. Currently, the most accurate EEG-based emotion recognition algorithms are subject-dependent, and a training session is needed for the user each time right before running the application. In this paper, we propose a novel real-time subject-dependent algorithm with the most stable features that gives a better accuracy than other available algorithms when it is crucial to have only one training session for the user and no re-training is allowed subsequently. The proposed algorithm is tested on an affective EEG database that contains five subjects. For each subject, four emotions (pleasant, happy, frightened and angry) are induced, and the affective EEG is recorded for two sessions per day in eight consecutive days. Testing results show that the novel algorithm can be used in real-time emotion recognition applications without re-training with the adequate accuracy. The proposed algorithm is integrated with real-time applications “Emotional Avatar” and “Twin Girls” to monitor the users emotions in real time. 相似文献
13.
14.
In this paper we show how surface orientation information inferred using shape-from-shading can be used to aid the process of fitting a 3D morphable model to an image of a face. We consider the problem of model dominance and show how shading constraints can be used to refine morphable model shape estimates, offering the possibility of exceeding the maximum possible accuracy of the model. We use this observation to motivate an optimisation scheme based on surface normal error. This ensures the fullest possible use of the information conveyed by the shading in an image. Moreover, our framework allows estimation of per-vertex albedo and bump maps which are not constrained to lie within the span of the model. This means the recovered model is capable of describing shape and reflectance phenomena not present in the training set. We show reconstruction and synthesis results and demonstrate that the shape and albedo estimates can be used for illumination insensitive recognition using only a single gallery image. 相似文献
15.
This paper presents a probabilistic approach for sensor-based localization with weak sensor data. Wireless received signal
strength measurements are used to disambiguate sonar measurements in symmetric environments. Particle filters are used to
model the multi-hypothesis estimation problem. Experiments indicate that multiple weak cues can provide robust position estimates
and that multiple sensors also aid in solving the kidnapped robot problem. 相似文献
16.
Recently, deep learning methodologies have become popular to analyse physiological signals in multiple modalities via hierarchical architectures for human emotion recognition. In most of the state-of-the-arts of human emotion recognition, deep learning for emotion classification was used. However, deep learning is mostly effective for deep feature extraction. Therefore, in this research, we applied unsupervised deep belief network (DBN) for depth level feature extraction from fused observations of Electro-Dermal Activity (EDA), Photoplethysmogram (PPG) and Zygomaticus Electromyography (zEMG) sensors signals. Afterwards, the DBN produced features are combined with statistical features of EDA, PPG and zEMG to prepare a feature-fusion vector. The prepared feature vector is then used to classify five basic emotions namely Happy, Relaxed, Disgust, Sad and Neutral. As the emotion classes are not linearly separable from the feature-fusion vector, the Fine Gaussian Support Vector Machine (FGSVM) is used with radial basis function kernel for non-linear classification of human emotions. Our experiments on a public multimodal physiological signal dataset show that the DBN, and FGSVM based model significantly increases the accuracy of emotion recognition rate as compared to the existing state-of-the-art emotion classification techniques. 相似文献
17.
《Expert systems with applications》2014,41(9):4330-4336
Social media platforms such as Twitter are becoming increasingly mainstream which provides valuable user-generated information by publishing and sharing contents. Identifying interesting and useful contents from large text-streams is a crucial issue in social media because many users struggle with information overload. Retweeting as a forwarding function plays an important role in information propagation where the retweet counts simply reflect a tweet’s popularity. However, the main reason for retweets may be limited to personal interests and satisfactions. In this paper, we use a topic identification as a proxy to understand a large number of tweets and to score the interestingness of an individual tweet based on its latent topics. Our assumption is that fascinating topics generate contents that may be of potential interest to a wide audience. We propose a novel topic model called Trend Sensitive-Latent Dirichlet Allocation (TS-LDA) that can efficiently extract latent topics from contents by modeling temporal trends on Twitter over time. The experimental results on real world data from Twitter demonstrate that our proposed method outperforms several other baseline methods. 相似文献
18.
Classification models based on statistical data have been developed that make it possible to identify a potential insider based on the indicators that manifest in the context of data incompleteness regarding the insider’s behavior. 相似文献
19.
Identifying facsimile duplicates using radial pixel densities 总被引:2,自引:0,他引:2
P. Chatelain 《International Journal on Document Analysis and Recognition》2002,4(4):219-225
A method for detecting full layout facsimile duplicates based on radial pixel densities is proposed. It caters for facsimiles,
including text and/or graphics. Pages may be positioned upright or inverted on the scanner bed. The method is not dependent
on the computation of text skew or text orientation. Using a database of original documents, 92% of non-duplicates and upright
duplicates as well as 89% of inverted duplicates could be correctly identified. The method is vulnerable to double scanning.
This occurs when documents are copied using a photocopier and the copies are subsequently transmitted using a facsimile machine.
Received September 29, 2000 / Revised: August 23, 2001 相似文献
20.
Stereo using monocular cues within the tensor voting framework 总被引:3,自引:0,他引:3
Mordohai P Medioni G 《IEEE transactions on pattern analysis and machine intelligence》2006,28(6):968-982
We address the fundamental problem of matching in two static images. The remaining challenges are related to occlusion and lack of texture. Our approach addresses these difficulties within a perceptual organization framework, considering both binocular and monocular cues. Initially, matching candidates for all pixels are generated by a combination of matching techniques. The matching candidates are then embedded in disparity space, where perceptual organization takes place in 3D neighborhoods and, thus, does not suffer from problems associated with scanline or image neighborhoods. The assumption is that correct matches produce salient, coherent surfaces, while wrong ones do not. Matching candidates that are consistent with the surfaces are kept and grouped into smooth layers. Thus, we achieve surface segmentation based on geometric and not photometric properties. Surface overextensions, which are due to occlusion, can be corrected by removing matches whose projections are not consistent in color with their neighbors of the same surface in both images. Finally, the projections of the refined surfaces on both images are used to obtain disparity hypotheses for unmatched pixels. The final disparities are selected after a second tensor voting stage, during which information is propagated from more reliable pixels to less reliable ones. We present results on widely used benchmark stereo pairs. 相似文献