首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Optimal algorithms for robust estimation and filtering are constructed. The noise is considered a deterministic variable belonging to a set described by a norm. Previous results, obtained for complete (one-to-one) and approximate information by M. Milanese and R. Tempo (ibid., vol.30, p.730-8, 1985), are extended to partial and approximate information. This information is considered useful in dealing with dynamic systems not completely identifiable and/or with two different sources of noise, such as process and measurement noise. For different norms characterizing the noise, optimal algorithms (in a min-max sense) are shown. In particular, for Hilbert norms a linear optimal algorithm is the well-known minimum variance estimator. For l norm an optimal algorithm, computable by linear programming, is presented. State estimation is formalized in the context of the general theory. For stable systems, an approximate state estimation is obtained by neglecting higher order powers. An upper bound determining the approximation introduced is derived. A numerical example illustrates the application of the theory  相似文献   

The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter (pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, (LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %.  相似文献   

Bagging schemes on the presence of class noise in classification   总被引:1,自引:0,他引:1  
In this paper, we study one application of Bagging credal decision tree, i.e. decision trees built using imprecise probabilities and uncertainty measures, on data sets with class noise (data sets with wrong assignations of the class label). For this aim, previously we also extend a original method that build credal decision trees to one which works with continuous features and missing data. Through an experimental study, we prove that Bagging credal decision trees outperforms more complex Bagging approaches on data sets with class noise. Finally, using a bias-variance error decomposition analysis, we also justify the performance of the method of Bagging credal decision trees, showing that it achieves a stronger reduction of the variance error component.  相似文献   

This paper examines the performance of a Distributed Speech Recognition (DSR) system in the presence of both background noise and packet loss. Recognition performance is examined for feature vectors extracted from speech using a physiologically-based auditory model, as an alternative to the more commonly-used Mel Frequency Cepstral Coefficient (MFCC) front-end. The feature vectors produced by the auditory model are vector quantised and combined in pairs for transmission over a statistically modelled channel that is subject to packet burst loss. In order to improve recognition performance in the presence of noise, the speech is enhanced prior to feature extraction using Wiener filtering. Packet loss mitigation to compensate for missing features is also used to further improve performance. Speech recognition results show the benefit of combining speech enhancement and packet loss mitigation to compensate for channel and environmental degradations.  相似文献   

This paper deals with speech emotion analysis within the context of increasing awareness of the wide application potential of affective computing. Unlike most works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in the valence dimension and a multi-stage classification scheme driven by a dimensional emotion model for better emotional class discrimination. Experimented on the Berlin dataset with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using the DES dataset with five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 76.15% on the same dataset.  相似文献   

We have developed a program, the Early Vocalization Analyzer (EVA), that analyses digitized recordings of infant vocalizations. The purpose of such a system is to automatically and reliably screen infants who may be at risk for later communication problems. EVA applies the landmark detection theory of Stevens et al., for the recognition of acoustic features in adult speech, to detect syllables in vocalizations produced by typically developing six to thirteen month old infants. We discuss the differences between adult-specific code and code written to analyse infant vocalizations. In a validity test, EVA achieved 90% agreement in marking 128 landmarks commonly identified by two human judges, was often closer to one or both judges than the humans were to each other. In a second test EVA and a human judge had 86% agreement in identifying 150 landmarks.  相似文献   

This papers studies the synthesis of speech over a wide vocal effort continuum and its perception in the presence of noise. Three types of speech are recorded and studied along the continuum: breathy, normal, and Lombard speech. Corresponding synthetic voices are created by training and adapting the statistical parametric speech synthesis system GlottHMM. Natural and synthetic speech along the continuum is assessed in listening tests that evaluate the intelligibility, quality, and suitability of speech in three different realistic multichannel noise conditions: silence, moderate street noise, and extreme street noise. The evaluation results show that the synthesized voices with varying vocal effort are rated similarly to their natural counterparts both in terms of intelligibility and suitability.  相似文献   

The presence of outliers can considerably degrade the performance of linear recursive algorithms based on the assumptions that measurements have a Gaussian distribution. Namely, in measurements there are rare, inconsistent observations with the largest part of population of observations (outliers). Therefore, synthesis of robust algorithms is of primary interest. The Masreliez–Martin filter is used as a natural frame for realization of the state estimation algorithm of linear systems. Improvement of performances and practical values of the Masreliez‐Martin filter as well as the tendency to expand its application to nonlinear systems represent motives to design the modified extended Masreliez–Martin filter. The behaviour of the new approach to nonlinear filtering, in the case when measurements have non‐Gaussian distributions, is illustrated by intensive simulations. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

The emotional quality of speech is defined as the global qualitative and hedonic impressions experienced by listeners. This research investigated the emotional quality of speech samples used in voice services. In a first experiment listening tests were conducted using 200 messages generated by 20 female speakers who pronounced two sentences in five elocution styles. Listeners grouped the messages according to similarities in terms of the impression of the messages. Verbal comments regarding hedonic effect on the listener and acoustic parameters of the voices' timbre and intonation were analysed. In a second experiment, the 200 messages were evaluated according to 20 criteria extracted from the first experiment. The results produced a precise perceptive portrait for each sequence, giving a full picture of the listeners' impressions of what they heard. The results can be applied to the design of voice services, as was done for the voicemail of France Telecom Orange.  相似文献   

Maffiolo V  Chateau N 《Ergonomics》2003,46(13-14):1375-1385
The emotional quality of speech is defined as the global qualitative and hedonic impressions experienced by listeners. This research investigated the emotional quality of speech samples used in voice services. In a first experiment listening tests were conducted using 200 messages generated by 20 female speakers who pronounced two sentences in five elocution styles. Listeners grouped the messages according to similarities in terms of the impression of the messages. Verbal comments regarding hedonic effect on the listener and acoustic parameters of the voices' timbre and intonation were analysed. In a second experiment, the 200 messages were evaluated according to 20 criteria extracted from the first experiment. The results produced a precise perceptive portrait for each sequence, giving a full picture of the listeners' impressions of what they heard. The results can be applied to the design of voice services, as was done for the voicemail of France Telecom Orange.  相似文献   

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams. Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because of data’s high speed and huge volume. Another open problem in data stream classification is how to deal with missing values. When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification. The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change, noise and missing values coexist.  相似文献   

We consider robust regulation (against steps and sinusoids) in the presence of unstructured uncertainty. The unstructured uncertainty is norm bounded by a constant that is given a priori. This problem is equivalent to a certain multiobjective problem where one objective is robust regulation and the other is the standard objective of H suboptimal control. It is shown that a solution to this problem exists if and only if the standard H problem admits a solution and certain matrix inequalities are satisfied. These solvability conditions are readily computable. Controller synthesis is also addressed  相似文献   

总结和分析了近年来情感可视语音合成领域的一些关键研究成果和研究方法,并根据可视语音合成机制的不同,从基于图像的方法和基于模型的方法两个角度对情感可视语音合成技术进行了系统归类和阐述,分析对比了其各自的优缺点及性能差异。重点讨论了各文献合成的可视语音在真实性和情感表现力两个方面的实现机理和程度。最后指出了合成具有情感表现力的可视语音应该重点考虑的一些问题,为情感可视语音合成的进一步研究指明了方向。  相似文献   

语音情感信息具有非线性、信息冗余、高维等复杂特点,数据含有大量噪声,传统识别模型难以消除冗余和噪声信息,导致语音情感识别正确率十分低.为了提高语音情感识别正确率,利用小波分析去噪和神经网络的非线性处理能力,提出一种基于过程神经元网络的语音情感智能识别模型.采用小波分析对语音情感信号进行去噪处理,利用主成分分析消除语音情感特征中的冗余信息,采用过程神经元网络对语音情感进行分类识别.仿真结果表明,基于过程神经元网络的识别模型的识别率比K近邻提高了13%,比支持向量机提高了8.75%,该模型是一种有效的语音情感智能识别工具.  相似文献   

噪声鲁棒语音识别研究综述*   总被引:3,自引:1,他引:2  
针对噪声环境下的语音识别问题,对现有的噪声鲁棒语音识别技术进行讨论,阐述了噪声鲁棒语音识别研究的主要问题,并根据语音识别系统的构成将噪声鲁棒语音识别技术按照信号空间、特征空间和模型空间进行分类总结,分析了各种鲁棒语音识别技术的特点、实现,以及在语音识别中的应用。最后展望了进一步的研究方向。  相似文献   

作为仅次于及时通信和搜索引擎的中国互联网网民第三大应用,网络音乐及其应用技术受到业界学者的青睐。音乐作为人类最重要的交流媒介,携带着丰富的情感信息,计算机音乐情感分析更是得到人机情感交互技术领域的高度重视。在基于歌词文本的音乐情感分析过程中,一部合理的音乐领域情感词典,将提供更加细致、更加准确的分析结果。以改进后的Hevner情感环模型为基础,借助HowNet所提供的语义资源和从网络爬取的歌词文本语料库,构建了一部树形层次结构的音乐领域中文情感词典,并利用LRC歌词携带的时间标签获取歌曲的语速信息,实现了基于情感向量空间模型和情感词典的歌词情感分类。实验表明与人工构建的情感词典相比,所构建的情感词典更适用于音乐领域。  相似文献   

Many object-tracking algorithms are based on low-level features detected in the image. Typically, the object shape and position are estimated to fit the observed features. Unfortunately, image analysis methods often produce invalid features (outliers) which do not belong to the object boundary. These features have a strong influence on the shape estimates, leading to meaningless tracking results. This paper proposes a robust tracking algorithm which is able to deal with outliers, inspired in the probabilistic data association filter proposed in the context of point tracking. The algorithm is based on two key concepts. First, middle level features (strokes) are used instead of low-level ones (edge points). Second, two labels (valid/invalid) are considered for each stroke. Since the stroke labels are unknown all labeling sequences are considered and a probability (confidence degree) is assigned to each of them. In this way, all the strokes contribute to track the moving object but with different weights. This allows a robust performance of the tracker in the presence of outliers. Experimental tests are provided to assess the performance of the proposed algorithm in lip and gesture tracking and surveillance applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号