共查询到20条相似文献,搜索用时 15 毫秒
1.
The measurement of the word error rate (WER) of a speech recognizer is valuable for the development of new algorithms but provides only the most limited information about the performance of the recognizer. We propose the use of a human reference standard to assess the performance of speech recognizers, so that the performance of a recognizer could be quoted as being equivalent to the performance of a human hearing speech which is subject to X dB of degradation. This approach should have the major advantage of being independent of the database and speakers used for testing. Furthermore, it would allow factors beyond the word error rate to be measured, such as the performance within an interactive speech system. In this paper, we report on preliminary work to explore the viability of this approach. This has consisted of recording a suitable database for experimentation, devising a method of degrading the speech in a controlled way and conducting two set of experiments on listeners to measure their responses to degraded speech to establish a reference. Results from these experiments raise several questions about the technique but encourage us to experiment with comparisons with automatic recognizers. 相似文献
2.
在SEED-DEC5502DSP嵌入式系统开发平台上实现了一个面向非特定人的孤立词语音识别系统,和传统的基于特定人的语音识别系统相比,该系统无需用户训练,易于使用。系统采用改进的基于语音对数域能量变化率的实时端点检测算法,只对检测的有声段语音进行特征提取,从而减少了要处理的语音帧数;提出了改进的共享声学单元状态发射概率共享的解码策略,进一步降低了计算负担。实验表明系统在100词条的情况下识别率达到98.1%,识别时间为1.03倍实时。 相似文献
3.
Large vocabulary continuous speech recognition can benefit from an efficient data structure for representing a large number of acoustic hypotheses compactly. Word graphs or lattices have been chosen as such an efficient interface between acoustic recognition engines and subsequent language processing modules. This paper first investigates the effect of pruning during acoustic decoding on the quality of word lattices and shows that by combining different pruning options (at the model level and word level), we can obtain word lattices with comparable accuracy to the original lattices and a manageable size. In order to use the word lattices as the input for a post-processing language module, they should preserve the target hypotheses and their scores while being as small as possible. In this paper, we introduce a word graph compression algorithm that significantly reduces the number of words in the graphical representation without eliminating utterance hypotheses or distorting their acoustic scores. We compare this word graph compression algorithm with several other lattice size-reducing approaches and demonstrate the relative strength of the new word graph compression algorithm for decreasing the number of words in the representation. Experiments are conducted across corpora and vocabulary sizes to determine the consistency of the pruning and compression results. 相似文献
4.
This paper presents an intelligent recognizer of the cognitive state of an e-learner as an integral part of confidence-based e-learning (CBeL) system. It addresses the problem of providing technology-driven pedagogical support to an e-Learner to achieve the desired cognitive state of mastery which is endowed by high levels of both knowledge and confidence. As per best of our knowledge concerned, no prior work has been done in the area of CBeL. The issue is crucial in the present scenario of teaching–learning in the twenty-first century where lifelong learning is gaining increasing importance vis-à-vis traditional classroom teaching–learning. However, self-learning is an indispensable mode of lifelong learning. It is felt that e-learning systems should have the capacity to simulate the behavior a human expert to identify the gap between the learners’ cognitive state and the learning objective with the intension of guiding the self-learner take initiative to bridge the gap with appropriate action and eventually achieve his learning objective. An artificial neural network-based intelligent recognizer has been designed to identify the CBeL state of the learner on the basis of his performance in a CBeL test. This recognizer is the major agent that facilitates the implementation of the proposed CBeL system. Extensive experimentation has been carried out to ensure the performance of the recognizer. Results show ample evidence that the ANN-based intelligent recognizer is able to faithfully simulate the behavior of a human evaluator. 相似文献
5.
The challenge to enhance the naturalness and efficiency of spoken language man–machine interface, emotional speech identification and its classification has been a predominant research area. The reliability and accuracy of such emotion identification greatly depends on the feature selection and extraction. In this paper, a combined feature selection technique has been proposed which uses the reduced features set artifact of vector quantizer (VQ) in a Radial Basis Function Neural Network (RBFNN) environment for classification. In the initial stage, Linear Prediction Coefficient (LPC) and time–frequency Hurst parameter ( pH) are utilized to extract the relevant feature, both exhibiting complementary information from the emotional speech. Extensive simulations have been carried out using Berlin Database of Emotional Speech (EMO-DB) with various combination of feature set. The experimental results reveal 76 % accuracy for pH and 68 % for LPC using standalone feature set, whereas the combination of feature sets, ( LP VQC and pH VQC) enhance the average accuracy level up to 90.55 %. 相似文献
6.
Metamodeling techniques have been widely used in engineering design to improve efficiency in the simulation and optimization of design systems that involve computationally expensive simulation programs. Many existing applications are restricted to deterministic optimization. Very few studies have been conducted on studying the accuracy of using metamodels for optimization under uncertainty. In this paper, using a two-bar structure system design as an example, various metamodeling techniques are tested for different formulations of optimization under uncertainty. Observations are made on the applicability and accuracy of these techniques, the impact of sample size, and the optimization performance when different formulations are used to incorporate uncertainty. Some important issues for applying metamodels to optimization under uncertainty are discussed. 相似文献
7.
The Internet connects hundreds of millions of computers across the world running on multiple hardware and software platforms
providing communication and commercial services. However, this interconnectivity among computers also enables malicious users
to misuse resources and mount Internet attacks. The continuously growing Internet attacks pose severe challenges to develop
a flexible, adaptive security oriented methods. Intrusion detection system (IDS) is one of most important component being
used to detect the Internet attacks. In literature, different techniques from various disciplines have been utilized to develop
efficient IDS. Artificial intelligence (AI) based techniques plays prominent role in development of IDS and has many benefits
over other techniques. However, there is no comprehensive review of AI based techniques to examine and understand the current
status of these techniques to solve the intrusion detection problems. In this paper, various AI based techniques have been
reviewed focusing on development of IDS. Related studies have been compared by their source of audit data, processing criteria,
technique used, dataset, classifier design, feature reduction technique employed and other experimental environment setup.
Benefits and limitations of AI based techniques have been discussed. The paper will help the better understanding of different
directions in which research has been done in the field of IDS. The findings of this paper provide useful insights into literature
and are beneficial for those who are interested in applications of AI based techniques to IDS and related fields. The review
also provides the future directions of the research in this area. 相似文献
8.
为研究鱼雷涡轮机转子系统的瞬态动力学特性,结合实际启动工况,采用传递矩阵法建立了转子系统的瞬态运动方程,并用Newmark-β数值积分方法进行求解,模拟分析了不同启动过程中转子的瞬态响应历程.结果显示:考虑不同函数形式的(线性、指数、分段)升速过程时,涡轮转子系统各阶临界转速没有显著差异,但共振峰值以及震荡收敛时间差别较大.其中,最符合实际工况的是分段函数形式的升速过程,该过程过二阶临界转速的共振峰值最小.本文的工作可以为鱼雷涡轮转子系统的优化设计提供参考. 相似文献
9.
Automatic speech recognition is the central part of the wheel towards the natural person-to-machine interaction technique. Due to the high disparity of speaking styles, speech recognition surely demands composite methods to constitute this irregularity. A speech recognition method can work in numerous distinct states such as speaker dependent/independent speech, isolated/continuous/spontaneous speech recognition, for less to very large vocabulary. The Punjabi language is being spoken by concerning 104 million peoples in India, Pakistan and other countries with Punjabi migrants. The Punjabi language is written in Gurmukhi writing in Indian Punjab, while in Shahmukhi writing in Pakistani Punjab. In the paper, the objective is to build the speaker independent automatic spontaneous speech recognition system for the Punjabi language. The system is also capable to recognize the spontaneous Punjabi live speech. So far, no work has to be achieved in the area of spontaneous speech recognition system for the Punjabi language. The user interfaces for Punjabi live speech system is created by using the java programming. Till now, automatic speech system is trained with 6012 Punjabi words and 1433 Punjabi sentences. The performance measured in terms of recognition accuracy which is 93.79% for Punjabi words and 90.8% for Punjabi sentences. 相似文献
10.
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through complex spectrum subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters; and (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15–20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions. 相似文献
11.
We present a practical technique for using a writer-independent recognition engine to improve the accuracy and speed while reducing the training requirements of a writer-dependent symbol recognizer. Our writer-dependent recognizer uses a set of binary classifiers based on the AdaBoost learning algorithm, one for each possible pairwise symbol comparison. Each classifier consists of a set of weak learners, one of which is based on a writer-independent handwriting recognizer. During online recognition, we also use the n-best list of the writer-independent recognizer to prune the set of possible symbols and thus reduce the number of required binary classifications. In this paper, we describe the geometric and statistical features used in our recognizer and our all-pairs classification algorithm. We also present the results of experiments that quantify the effect incorporating a writer-independent recognition engine into a writer-dependent recognizer has on accuracy, speed, and user training time. 相似文献
12.
针对语音识别系统在噪声环境下不能保持很好鲁棒性的问题,提出了一种切换语音功率谱估计算法。该算法假设语音的幅度谱服从Chi分布,提出了一种改进的基于最小均方误差(MMSE)的语音功率谱估计算法。然后,结合语音存在的概率(SPP),推导出改进的基于语音存在概率的MMSE估计器。接下来,将改进的MSME估计器与传统的维纳滤波器结合。在噪声干扰比较大时,使用改进的MMSE估计器来估计纯净语音的功率谱,当噪声干扰较小时,改用传统的维纳滤波器以减少计算量,最终得到用于识别系统的切换语音功率谱估计算法。实验结果表明,所提算法相比传统的瑞利分布下的MMSE估计器在各种噪声的情况下识别率平均提高在8个百分点左右,在去除噪声干扰、提高识别系统鲁棒性的同时,减小了语音识别系统的功耗。 相似文献
13.
Some necessary background in speech recognition and window systems is given, with an analysis of how they might be combined. Xspeak, a navigation application, and its operation and a field study of its use are described. With Xspeak, window navigation tasks usually performed with a mouse can be controlled by voice. An improved version, Xspeak II, which incorporates a language for translating spoken commands, is introduced 相似文献
14.
In this paper, we provide a comparative study of spectral front-end features used as representations for speech signals by processing multitaper magnitude and phase spectra, for speaker verification with expressive speech. In particular, the multitaper modified group delay function (MT-MOGDF) and multitaper magnitude (MT-MAG) spectra of the speech signals are employed to obtain low variance estimates of speech spectra. We observe that the cues that aid in representation of expressive speech are evident in the MT-MOGDF spectrum than the MT-MAG spectrum in terms of mean Formant value and Formant bandwidth. Our extensive experimental study on a speaker verification system with a Gaussian mixture model based universal background model classifier on expressive speech using the IITKGP-SESC and EMODB databases show that MT-MOGDF performs better than MT-MAG technique, in terms of equal error rate and minimum decision cost function. This improvement due to MT-MOGDF is owed to a better representation and a low-variance estimate of the speech spectrum. Our results highlight the utility of MT-MOGDF as a potential alternative for MT-MAG representation for speaker verification problems in general. 相似文献
17.
Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%. 相似文献
18.
International Journal of Speech Technology - Due the property of the chaotic systems by means of randomness, sensitivity to the initial values and large size of keys, chaos masking plays a vital... 相似文献
19.
The present paper introduces a new data analyzer, a compression-based self-organizing recognizer, the PRDC-CSOR (Pattern Representation scheme using Data Compression – Compression based Self ORganizing Recognizer), with a preliminary application to image data. The PRDC-CSOR is an extension of the authors’ previously proposed pattern representation scheme using data compression (PRDC). Contrary to the traditional statistical-model-based recognition system methods, the PRDC-CSOR constructs itself using incoming data only. The basic tool, compressibility, is an approximation of the Kolmogorov complexity K(x) defined in an individual text x as a countermeasure against the Shannon entropy H(X) defined on an ensemble X. Due to this feature, a highly automatic self-organizing recognition system becomes possible as demonstrated in this paper. 相似文献
20.
This paper addresses the problem of recognising speech in the presence of a competing speaker. We review a speech fragment decoding technique that treats segregation and recognition as coupled problems. Data-driven techniques are used to segment a spectro-temporal representation into a set of fragments, such that each fragment is dominated by one or other of the speech sources. A speech fragment decoder is used which employs missing data techniques and clean speech models to simultaneously search for the set of fragments and the word sequence that best matches the target speaker model. The paper investigates the performance of the system on a recognition task employing artificially mixed target and masker speech utterances. The fragment decoder produces significantly lower error rates than a conventional recogniser, and mimics the pattern of human performance that is produced by the interplay between energetic and informational masking. However, at around 0 dB the performance is generally quite poor. An analysis of the errors shows that a large number of target/masker confusions are being made. The paper presents a novel fragment-based speaker identification approach that allows the target speaker to be reliably identified across a wide range of SNRs. This component is combined with the recognition system to produce significant improvements. When the target and masker utterance have the same gender, the recognition system has a performance at 0 dB equal to that of humans; in other conditions the error rate is roughly twice the human error rate. 相似文献
|