共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper presents a person identification system based on acoustic and visual features. The system is organized as a set of non-homogeneous classifiers whose outputs are integrated after a normalization step. In particular, two classifiers based on acoustic features and three based on visual ones provide data for an integration module whose performance is evaluated. A novel technique for the integration of multiple classifiers at an hybrid rank/measurement level is introduced using HyperBF networks. Two different methods for the rejection of an unknown person are introduced. The performance of the integrated system is shown to be superior to that of the acoustic and visual subsystems. The resulting identification system can be used to log personal access and, with minor modifications, as an identity verification system 相似文献
2.
In this study we present various techniques to evaluate the pronunciation of students of a foreign language without any knowledge of the uttered text. Previous attempts have shown that it is feasible to evaluate the pronunciation of a non-native speaker by having implicit or explicit knowledge of the uttered text, provided that enough utterances are available. Our approach is to use characteristics of the mother tongue (SOURCE language) of the speaker in the evaluation of his/her pronunciation. We recorded 20 Greek students speaking English (TARGET language) and evaluated their pronunciation using algorithms that include characteristics of the SOURCE language (Greek). We show that the pronunciation scores that are based on both TARGET- and SOURCE-language characteristics have better correlation with the human scores than those based only on characteristics of the TARGET language. As in previous studies, we found that the best-performing algorithms for automatic evaluation of pronunciation are based on speech recognition technology. 相似文献
3.
Multimedia Tools and Applications - Multi-label classification is one of the most challenging tasks in the computer vision community, owing to different composition and interaction (e.g. partial... 相似文献
4.
针对单一语音特征对语音情感表达不完整的问题,将具有良好量化和插值特性的LSF参数与体现人耳听觉特性的MFCC参数相融合,提出基于线谱权重的MFCC(WMFCC)新特征。同时,通过高斯混合模型来对该参数建立模型空间,进一步得到GW-MFCC模型空间参数,以获取更高维的细节信息,进一步提高情感识别性能。采用柏林情感语料库进行验证,新参数的识别率比传统的MFCC和LSF分别有5.7%和6.9%的提高。实验结果表明,提出的WMFCC以及GW-MFCC参数可以有效地表现语音情感信息,提高语音情感识别率。 相似文献
5.
The recognition of the emotional state of speakers is a multi-disciplinary research area that has received great interest over the last years. One of the most important goals is to improve the voice-based human–machine interactions. Several works on this domain use the prosodic features or the spectrum characteristics of speech signal, with neural networks, Gaussian mixtures and other standard classifiers. Usually, there is no acoustic interpretation of types of errors in the results. In this paper, the spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated with different configurations and input features, in order to design a new hierarchical method for emotion classification. The proposed multiple feature hierarchical method for seven emotions, based on spectral and prosodic information, improves the performance over the standard classifiers and the fixed features. 相似文献
6.
针对语音情感识别问题,提出一种采用决策模板的多分类器融合方法,利用不同类型的声学特征子集来构造子分类器。不同的子集能充分提高各子分类器之间的“多样性”指标,这是多分类器融合算法能够成功应用的必备条件。与多数投票融合算法和支持向量机相比该方法取得了较好的识别结果。另一方面,从多样性指标分析的角度出发探究该方法能获得较好识别效果的原因。 相似文献
7.
Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in this work. This approach has been tested on our collected emotional speech database which is composed of six emotions. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues by 7.22 %, 4.45 %, and 19.56 %, respectively. This work also shows that the optimum speaker identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models in the emotional talking environments. The achieved average speaker identification performance based on the new proposed approach falls within 2.35 % of that obtained in subjective evaluation by human judges. 相似文献
8.
In human–computer interaction (HCI), electroencephalogram (EEG) signals can be added as an additional input to computer. An integration of real-time EEG-based human emotion recognition algorithms in human–computer interfaces can make the users experience more complete, more engaging, less emotionally stressful or more stressful depending on the target of the applications. Currently, the most accurate EEG-based emotion recognition algorithms are subject-dependent, and a training session is needed for the user each time right before running the application. In this paper, we propose a novel real-time subject-dependent algorithm with the most stable features that gives a better accuracy than other available algorithms when it is crucial to have only one training session for the user and no re-training is allowed subsequently. The proposed algorithm is tested on an affective EEG database that contains five subjects. For each subject, four emotions (pleasant, happy, frightened and angry) are induced, and the affective EEG is recorded for two sessions per day in eight consecutive days. Testing results show that the novel algorithm can be used in real-time emotion recognition applications without re-training with the adequate accuracy. The proposed algorithm is integrated with real-time applications “Emotional Avatar” and “Twin Girls” to monitor the users emotions in real time. 相似文献
9.
Human Interaction Recognition (HIR) in uncontrolled TV video material is a very challenging problem because of the huge intra-class variability of the classes (due to large differences in the way actions are performed, lighting conditions and camera viewpoints, amongst others) as well as the existing small inter-class variability (e.g., the visual difference between hug and kiss is very subtle). Most of previous works have been focused only on visual information (i.e., image signal), thus missing an important source of information present in human interactions: the audio. So far, such approaches have not shown to be discriminative enough. This work proposes the use of Audio-Visual Bag of Words (AVBOW) as a more powerful mechanism to approach the HIR problem than the traditional Visual Bag of Words (VBOW). We show in this paper that the combined use of video and audio information yields to better classification results than video alone. Our approach has been validated in the challenging TVHID dataset showing that the proposed AVBOW provides statistically significant improvements over the VBOW employed in the related literature. 相似文献
10.
Image saliency analysis plays an important role in various applications such as object detection, image compression, and image retrieval. Traditional methods for saliency detection ignore texture cues. In this paper, we propose a novel method that combines color and texture cues to robustly detect image saliency. Superpixel segmentation and the mean-shift algorithm are adopted to segment an original image into small regions. Then, based on the responses of a Gabor filter, color and texture features are extracted to produce color and texture sub-saliency maps. Finally, the color and texture sub-saliency maps are combined in a nonlinear manner to obtain the final saliency map for detecting salient objects in the image. Experimental results show that the proposed method outperforms other state-of-the-art algorithms for images with complex textures. 相似文献
11.
给出了利用Bloom filter识别长流的算法.提出了使用分层哈希的方法,减少了在哈希过程中的冲突.采用带有部分主机信息的哈希函数,利用哈希串的重叠和数量上的一致性,使在识别长流的过程中能够很方便地还原出主机的信息;给每个哈希函数独立的存储空间,也在很大程度上减少了哈希过程中所带来的内部冲突. 相似文献
12.
In this paper we show how surface orientation information inferred using shape-from-shading can be used to aid the process of fitting a 3D morphable model to an image of a face. We consider the problem of model dominance and show how shading constraints can be used to refine morphable model shape estimates, offering the possibility of exceeding the maximum possible accuracy of the model. We use this observation to motivate an optimisation scheme based on surface normal error. This ensures the fullest possible use of the information conveyed by the shading in an image. Moreover, our framework allows estimation of per-vertex albedo and bump maps which are not constrained to lie within the span of the model. This means the recovered model is capable of describing shape and reflectance phenomena not present in the training set. We show reconstruction and synthesis results and demonstrate that the shape and albedo estimates can be used for illumination insensitive recognition using only a single gallery image. 相似文献
13.
This paper presents a probabilistic approach for sensor-based localization with weak sensor data. Wireless received signal strength measurements are used to disambiguate sonar measurements in symmetric environments. Particle filters are used to model the multi-hypothesis estimation problem. Experiments indicate that multiple weak cues can provide robust position estimates and that multiple sensors also aid in solving the kidnapped robot problem. 相似文献
14.
Classification models based on statistical data have been developed that make it possible to identify a potential insider based on the indicators that manifest in the context of data incompleteness regarding the insider’s behavior. 相似文献
15.
A method for detecting full layout facsimile duplicates based on radial pixel densities is proposed. It caters for facsimiles,
including text and/or graphics. Pages may be positioned upright or inverted on the scanner bed. The method is not dependent
on the computation of text skew or text orientation. Using a database of original documents, 92% of non-duplicates and upright
duplicates as well as 89% of inverted duplicates could be correctly identified. The method is vulnerable to double scanning.
This occurs when documents are copied using a photocopier and the copies are subsequently transmitted using a facsimile machine.
Received September 29, 2000 / Revised: August 23, 2001 相似文献
16.
Social media platforms such as Twitter are becoming increasingly mainstream which provides valuable user-generated information by publishing and sharing contents. Identifying interesting and useful contents from large text-streams is a crucial issue in social media because many users struggle with information overload. Retweeting as a forwarding function plays an important role in information propagation where the retweet counts simply reflect a tweet’s popularity. However, the main reason for retweets may be limited to personal interests and satisfactions. In this paper, we use a topic identification as a proxy to understand a large number of tweets and to score the interestingness of an individual tweet based on its latent topics. Our assumption is that fascinating topics generate contents that may be of potential interest to a wide audience. We propose a novel topic model called Trend Sensitive-Latent Dirichlet Allocation (TS-LDA) that can efficiently extract latent topics from contents by modeling temporal trends on Twitter over time. The experimental results on real world data from Twitter demonstrate that our proposed method outperforms several other baseline methods. 相似文献
17.
Recently, recommendation system has become popular in many e-commerce websites. It helps users by suggesting products which they could buy. Existing work till now uses past feedback of user, similarity of other users’ buying pattern, or a hybrid approach in which both type of information is used. But the pitfall of these approaches is that there is a need to collect and process huge amount of data for good recommendation. This paper is aimed at developing an efficient recommendation system by incorporating user’s emotion and interest to provide good recommendations. The proposed system does not require any of aforementioned data and works without the continuous and interminable attention of the user. In this framework, we capture user’s eye-gaze and facial expression while exploring websites through inexpensive, visible light “webcam”. The eye-gaze detection method uses pupil-center extraction of both eyes and calculates the reference point through a joint probability. The facial expression uses landmark points of face and analyzes the emotion of the user. Both methods work in approximate real time and the proposed framework thus provides intelligent recommendations on-the-fly without requirement of feedback and buying patterns of users. 相似文献
18.
We address the fundamental problem of matching in two static images. The remaining challenges are related to occlusion and lack of texture. Our approach addresses these difficulties within a perceptual organization framework, considering both binocular and monocular cues. Initially, matching candidates for all pixels are generated by a combination of matching techniques. The matching candidates are then embedded in disparity space, where perceptual organization takes place in 3D neighborhoods and, thus, does not suffer from problems associated with scanline or image neighborhoods. The assumption is that correct matches produce salient, coherent surfaces, while wrong ones do not. Matching candidates that are consistent with the surfaces are kept and grouped into smooth layers. Thus, we achieve surface segmentation based on geometric and not photometric properties. Surface overextensions, which are due to occlusion, can be corrected by removing matches whose projections are not consistent in color with their neighbors of the same surface in both images. Finally, the projections of the refined surfaces on both images are used to obtain disparity hypotheses for unmatched pixels. The final disparities are selected after a second tensor voting stage, during which information is propagated from more reliable pixels to less reliable ones. We present results on widely used benchmark stereo pairs. 相似文献
19.
Semantic image segmentation aims to partition an image into non-overlapping regions and assign a pre-defined object class label to each region. In this paper, a semantic method combining low-level features and high-level contextual cues is proposed to segment natural scene images. The proposed method first takes the gist representation of an image as its global feature. The image is then over-segmented into many super-pixels and histogram representations of these super-pixels are used as local features. In addition, co-occurrence and spatial layout relations among object classes are exploited as contextual cues. Finally the features and cues are integrated into the inference framework based on conditional random field by defining specific potential terms and introducing weighting functions. The proposed method has been compared with state-of-the-art methods on the MSRC database, and the experimental results show its effectiveness. 相似文献
20.
This paper proposes a multimodal approach to distinguish silence from speech situations, and to identify the location of the active speaker in the latter case. In our approach, a video camera is used to track the faces of the participants, and a microphone array is used to estimate the Sound Source Location (SSL) using the Steered Response Power with the phase transform (SRP-PHAT) method. The audiovisual cues are combined, and two competing Hidden Markov Models (HMMs) are used to detect silence or the presence of a person speaking. If speech is detected, the corresponding HMM also provides the spatio-temporally coherent location of the speaker. Experimental results show that incorporating the HMM improves the results over the unimodal SRP-PHAT, and the inclusion of video cues provides even further improvements. 相似文献
|