期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Supporting dictation speech recognition error correction: the impact of external information

《Behaviour & Information Technology》2012,31(6):761-774

相似文献

2.

The Military Computer Family: a New. Joint-Service Approach to Military Computer Acquisition

Coleman A.H. Smith W.R. 《Computer》1977,10(10):12-15

A new approach to developing and acquiring computers for the military has been under development over the past 2½ years by the Army's Electronics Command and the Navy's Air Systems Command. Known as the Military Computer Family Program, the joint effort now includes the Air Force's Electronic Systems Division and Rome Air Development Center (as observers). It is aimed at providing defense system developers with a software-compatible family of military computers that have extensive systems/support software. The various phases of the work being performed under the MCF Program are the subject of the principal articles in this issue. The overall objectives and major thrusts are summarized below. 相似文献

3.

Military applications of expert systems

John F. Gilmore 《Future Generation Computer Systems》1985,1(6):403-410

The advent of expert systems has led to the development of advanced computer systems in areas of medicine, geology, mathematics, chemistry, vision, speech and electronics. The recent acceptance of artificial intelligence as an appropriate technology for military applications has evolved into the development of a number of defense related expert systems. This paper reviews several military applications of expert systems in the areas of advanced visual target recognition, autonomous tactical vehicles, and combat pilot aid systems. System concepts for these areas are described. Several references are provided for each. A number of other military applications of expert systems are discussed through the paper. 相似文献

4.

Survey on speech emotion recognition: Features, classification schemes, and databases

Moataz El Ayadi Author Vitae Mohamed S. Kamel Author Vitae Author Vitae 《Pattern recognition》2011,44(3):572-587

Recently, increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. This paper is a survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system. The first one is the choice of suitable features for speech representation. The second issue is the design of an appropriate classification scheme and the third issue is the proper preparation of an emotional speech database for evaluating system performance. Conclusions about the performance and limitations of current speech emotion recognition systems are discussed in the last section of this survey. This section also suggests possible ways of improving speech emotion recognition systems. 相似文献

5.

基于GMM-UBM的声纹识别技术的特征参数研究

周玥媛孔钦《计算机技术与发展》2020,(5):76-83

声纹识别技术实现的关键点在于从语音信号中提取语音特征参数,此参数具备表征说话人特征的能力。基于GMM-UBM模型,通过Matlab实现文本无关的声纹识别系统,对主流静态特征参数MFCC、LPCC、LPC以及结合动态参数的MFCC,从说话人确认与说话人辨认两种应用角度进行性能比较。在取不同特征参数阶数、不同高斯混合度和使用不同时长的训练语音与测试语音的情况下,从理论识别效果、实际识别效果、识别所用时长、识别时长占比等多个方面进行了分析与研究。最终结果表明:在GMM-UBM模式识别方法下,三种静态特征参数中MFCC绝大多数时候具有最佳识别效果,同时其系统识别耗时最长;识别率与语音特征参数的阶数之间并非单调上升关系。静态参数在结合较佳阶数的动态参数时能够提升识别效果;增加动态参数阶数与提高系统识别效果之间无必然联系。相似文献

6.

浅谈连续语音识别中的关键技术

车士伟吾守尔·斯拉木《电脑与信息技术》2010,18(2):6-9,65

连续语音识别系统的出现,更进一步的推动了连续语音识别的研究及应用,但识别技术的成熟也同时推动了更高层次的语音理解技术的研究。文章分别对连续语音识别中可能出现的关联词技术、关键词技术、顽健性技术、自适应性技术及搜索策略给予了介绍。相似文献

7.

Significance of the Modified Group Delay Feature in Speech Recognition

Rajesh M. Hegde Hema A. Murthy Venkata Ramana Rao Gadde 《IEEE transactions on audio, speech, and language processing》2007,15(1):190-202

Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed 相似文献

8.

A review of speech recognition applications in the office

Jan M. Noyes Clive R. Frankish 《Behaviour & Information Technology》1989,8(6):475-486

Since the 1970s, many improvements have been made in the technology available for automatic speech recognition (ASR). Changes in the methods of analysing the incoming speech have resulted in larger, more complex vocabularies being used with greater recognition accuracy. Despite this enhanced performance and substantial research activity, the introduction of voice input into the office is still largely unrealized. This paper reviews the state-of-the-art of office applications of ASR, dividing them into the areas of voice messaging and word processing activities, data entry and information retrieval systems, and environmental control. Within these areas, cartographic computer-aided-design systems are identified as an application with proven success. The slow growth of voice input in the office is discussed in the light of constraints imposed by existing speech technology, and the need for human factors evaluation of potential applications. 相似文献

9.

Towards an account of intuitiveness

《Behaviour & Information Technology》2012,31(6):475-482

Abstract

Since the 1970s, many improvements have been made in the technology available for automatic speech recognition (ASR). Changes in the methods of analysing the incoming speech have resulted in larger, more complex vocabularies being used with greater recognition accuracy. Despite this enhanced performance and substantial research activity, the introduction of voice input into the office is still largely unrealized. This paper reviews the state-of-the-art of office applications of ASR, dividing them into the areas of voice messaging and word processing activities, data entry and information retrieval systems, and environmental control. Within these areas, cartographic computer-aided-design systems are identified as an application with proven success. The slow growth of voice input in the office is discussed in the light of constraints imposed by existing speech technology, and the need for human factors evaluation of potential applications. 相似文献

10.

汉语大词汇量连续语音识别系统研究进展 总被引：6，自引：1，他引：5

倪崇嘉刘文举徐波《中文信息学报》2009,23(1):112

大词汇量连续语音识别(LVCSR)技术近年来发展迅速,并在许多领域得到了广泛的应用,国内外许多大公司加大了对语音识别技术的研究,不少商业化的语音识别系统已经面世,并得到较为广泛的使用。该文综述了近年来大词汇量连续语音识别技术的研究进展,描述了汉语大词汇量连续语音识别系统,主要是基于统计方法的语音识别系统的框架与设计方法,对语音识别系统的一些关键技术和原理进行了分析,并对近年来国内外对语音识别研究发展动向进行了讨论。相似文献

11.

连续语音识别技术及其应用前景分析

刘豫军夏聪《网络安全技术与应用》2014,(8):15-16

连续语音识别技术,是集语音处理、模式识别、句法和语义分析于一体的综合性语音处理技术,能够识别任意的连续语音,如一个句子或一段话,大大提高了语音交互的连续性和体验度,是语音识别技术的核心之一。本文介绍了连续语音识别技术的研究现状及几种常见的技术方法,并且分析探讨了连续语音识别技术的应用和发展前景。相似文献

12.

Improving the performance of keyword spotting system for children's speech through prosody modification

《Digital Signal Processing》2019

Searching for words of interest from a speech sequence is referred to as keyword spotting (KWS). A myriad of techniques have been proposed over the years for effectively spotting keywords from adults' speech. However, not much work has been reported on KWS for children's speech. The speech data for adult and child speakers differs significantly due to physiological differences between the two groups of speakers. Consequently, the performance of a KWS system trained on adults' speech degrades severely when used by children due to the acoustic mismatch. In this paper, we present our efforts towards improving the performance of keyword spotting systems for children's speech under limited data scenario. In this regard, we have explored prosody modification in order to reduce the acoustic mismatch resulting from the differences in pitch and speaking-rate. The prosody modification technique explored in this paper is the one based on glottal closure instant (GCI) events. The approach based on zero-frequency filtering (ZFF) is used to compute the GCI locations. Further, we have presented two different ways for effectively applying prosody modification. In the first case, prosody modification is applied to the children's speech test set prior to the decoding step in order to improve the recognition performance. Alternatively, we have also applied prosody modification to the training data from adult speakers. The original as well as the prosody modified adults' speech data are then augmented together before learning the statistical parameters of the KWS system. The experimental evaluations presented in this paper show that, significantly improved performances for children's speech are obtained by both of the aforementioned approaches of applying prosody modification. Prosody-modification-based data augmentation helps in improving the performance with respect to adults' speech as well. 相似文献

13.

DSP-based voice activity detection and background noise reduction

Charu Singh Maarten Venter Rajesh Kumar Muthu David Brown 《International Journal of Speech Technology》2018,21(4):851-859

These days’ speech processing devices like voice-controlled devices, radio, and cell phones have gained more popularity in the area of military, audio forensics, speech recognition, education and health sectors. In the real world, speech signal during communication always contains background noise. The main task of speech related applications is voice activity detection (VAD) which include speech communication, speech recognition, and speech coding. Noise-reduction schemes for speech communication may increase the quality of speech and improve working efficiency in military aviation. Most of the developed algorithms can improve the quality of speech but unable to remove the background noise from the speech. This study provides researchers with a summary of the challenges in speech communication with background noise and provides research directions in the area of military personnel and workforces who work in noisy environments. Results of the study reveal that the DSP-based voice activity detection and background noise reduction algorithm reduced the spurious values of the speech signal. 相似文献

14.

Speech Recognition and In-Vehicle Telematics Devices: Potential Reductions in Driver Distraction

Marvin C. McCallum John L. Campbell Joel B. Richman James L. Brown Emily Wiese 《International Journal of Speech Technology》2004,7(1):25-33

Speech Recognition is frequently cited as a potential remedy to distraction resulting from drivers' operation of in-vehicle devices. This position typically assumes that the introduction of speech recognition will result in reduced cognitive workload and improved driving performance. Past research neither fully supports nor fully discounts this assumption. However, it is difficult to compare many of these studies, due to differences in device operation tasks, the pacing of those tasks, speech recognition system performance, and system interface designs. In an effort to directly address the effect of voice recognition on driver distraction, the present authors developed a capability to manipulate the performance characteristics of a speech recognition system through a Wizard of Oz speech recognition system and installed this system in a simulated driving environment. The sensitivity of the simulated driving environment and speech recognition accuracy manipulation were evaluated in an initial study comparing driver cognitive workload and driving performance during self-paced simulated operation of a personal digital assistant (PDA) during no PDA use, manual control of the PDA, and speech control of the PDA. In the Speech PDA condition, speech recognition accuracy was varied between drivers. Analysis of drivers' emergency braking response times and rated cognitive workload revealed significantly lower cognitive demand and better performance in the No PDA condition when compared to the Manual PDA condition. The Speech PDA condition resulted in response times and rated cognitive workload levels that were between the No PDA and Manual PDA conditions, but not significantly different from either of these conditions. Further analysis of emergency braking performance revealed a non-significant trend towards better performance in conjunction with higher speech recognition accuracy levels. The potential for reducing driver distraction through the careful development and evaluation of speech recognition systems is discussed. 相似文献

15.

Speech Recognition Moves from Software to Hardware

《Computer》2006,39(11):15-18

Speech recognition has long promised a natural way to improve user interaction with computers, cars, and other devices. During the past 30 years, researchers have gradually upgraded the technology to the point that it is used in a number of these settings. However, because of limitations in processing power and other factors, the applications typically have been relatively simple, and speech recognition has not been widely used, despite the growing desire to implement it in PCs, cell phones, applications that automate home utilities and entertainment devices, and other systems. Researchers have been working on implementing speech recognition in dedicated processors for about 20 years, but the chips still have limited capabilities and work with only relatively small vocabularies. As such, few companies sell speech chips. Now, though, scientists are interested in developing high-end speech chips that work with large vocabularies of words and that recognize continuous speech. Despite its promise, speech-chip technology faces technical and marketplace challenges 相似文献

16.

Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili

Ralf Steinberger Sylvia Ombuya Mijail Kabadjov Bruno Pouliquen Leo Della Rocca Jenya Belyaeva Monica de Paola Camelia Ignat Erik van der Goot 《Language Resources and Evaluation》2011,45(3):311-330

The Europe Media Monitor (EMM) family of applications is a set of multilingual tools that gather, cluster and classify news in currently fifty languages and that extract named entities and quotations (reported speech) from twenty languages. In this paper, we describe the recent effort of adding the African Bantu language Swahili to EMM. EMM is designed in an entirely modular way, allowing plugging in a new language by providing the language-specific resources for that language. We thus describe the type of language-specific resources needed, the effort involved, and ways of boot-strapping the generation of these resources in order to keep the effort of adding a new language to a minimum. The text analysis applications pursued in our efforts include clustering, classification, recognition and disambiguation of named entities (persons, organisations and locations), recognition and normalisation of date expressions, as well as the identification of reported speech quotations by and about people. 相似文献

17.

Design patterns in object-oriented frameworks

Srinivasan S. 《Computer》1999,32(2):24-32

Developing interactive software systems with complex user interfaces has become increasingly common. Given this trend, it is important that new technology be based on flexible architectures that do not require developers to understand all the complexities inherent in a system. Object-oriented frameworks provide an important enabling technology for reusing both the architecture and the functionality of software components. Frameworks typically have a steep learning curve since the user must understand the abstract design of the underlying framework as well as the object collaboration rules or contracts-which are often not apparent in the framework interface-prior to using the framework. The author describes her experience with developing an object-oriented framework for speech recognition applications that use IBM's ViaVoice speech recognition technology. Design patterns help to effectively communicate the internal framework design and reduce dependence on the documentation 相似文献

18.

Environment dependent noise tracking for speech enhancement

Nitish Krishnamurthy John H. L. Hansen 《International Journal of Speech Technology》2013,16(3):303-312

Numerous efforts have focused on the problem of reducing the impact of noise on the performance of various speech systems such as speech recognition, speaker recognition, and speech coding. These approaches consider alternative speech features, improved speech modeling, or alternative training for acoustic speech models. This study presents an alternative viewpoint by approaching the same problem from the noise perspective. Here, a framework is developed to analyze and use the noise information available for improving performance of speech systems. The proposed framework focuses on explicitly modeling the noise and its impact on speech system performance in the context of speech enhancement. The framework is then employed for development of a novel noise tracking algorithm for achieving better speech enhancement under highly evolving noise types. The first part of this study employs a noise update rate in conjunction with a target enhancement algorithm to evaluate the need for tracking in many enhancement algorithms. It is shown that noise tracking is more beneficial in some environments than others. This is evaluated using the Log-MMSE enhancement scheme for a corpus of four noise types consisting of Babble (BAB), White Gaussian (WGN), Aircraft Cockpit (ACN), and Highway Car (CAR) using the Itakura-Saito (IS) (Gray et al. in IEEE Trans. Acoust. Speech Signal Process. 28:367–376, 1980) quality measure. A test set of 200 speech utterances from the TIMIT corpus are used for evaluations. The new Environmentally Aware Noise Tracking (EA-NT) method is shown to be superior in comparison with the contemporary noise tracking algorithms. Evaluations are performed for speech degraded using a corpus of four noise types consisting of: Babble (BAB), Machine Gun (MGN), Large Crowd (LCR), and White Gaussian (WGN). Unlike existing approaches, this study provides an effective foundation for addressing noise in speech by emphasizing noise modeling so that available resources can be used to achieve more reliable overall performance in speech systems. 相似文献

19.

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Mohamed O. M. Khelifa Yahya Mohamed Elhadj Yousfi Abdellah Mostafa Belkasmi 《International Journal of Speech Technology》2017,20(4):937-949

Conventional Hidden Markov Model (HMM) based Automatic Speech Recognition (ASR) systems generally utilize cepstral features as acoustic observation and phonemes as basic linguistic units. Some of the most powerful features currently used in ASR systems are Mel-Frequency Cepstral Coefficients (MFCCs). Speech recognition is inherently complicated due to the variability in the speech signal which includes within- and across-speaker variability. This leads to several kinds of mismatch between acoustic features and acoustic models and hence degrades the system performance. The sensitivity of MFCCs to speech signal variability motivates many researchers to investigate the use of a new set of speech feature parameters in order to make the acoustic models more robust to this variability and thus improve the system performance. The combination of diverse acoustic feature sets has great potential to enhance the performance of ASR systems. This paper is a part of ongoing research efforts aspiring to build an accurate Arabic ASR system for teaching and learning purposes. It addresses the integration of complementary features into standard HMMs for the purpose to make them more robust and thus improve their recognition accuracies. The complementary features which have been investigated in this work are voiced formants and Pitch in combination with conventional MFCC features. A series of experimentations under various combination strategies were performed to determine which of these integrated features can significantly improve systems performance. The Cambridge HTK tools were used as a development environment of the system and experimental results showed that the error rate was successfully decreased, the achieved results seem very promising, even without using language models. 相似文献

20.

Analysis of lip geometric features for audio-visual speech recognition

Kaynak M.N. Qi Zhi Cheok A.D. Sengupta K. Zhang Jian Ko Chi Chung 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2004,34(4):564-570

Audio-visual speech recognition employing both acoustic and visual speech information is a novel extension of acoustic speech recognition and it significantly improves the recognition accuracy in noisy environments. Although various audio-visual speech-recognition systems have been developed, a rigorous and detailed comparison of the potential geometric visual features from speakers' faces is essential. Thus, in this paper the geometric visual features are compared and analyzed rigorously for their importance in audio-visual speech recognition. Experimental results show that among the geometric visual features analyzed, lip vertical aperture is the most relevant; and the visual feature vector formed by vertical and horizontal lip apertures and the first-order derivative of the lip corner angle leads to the best recognition results. Speech signals are modeled by hidden Markov models (HMMs) and using the optimized HMMs and geometric visual features the accuracy of acoustic-only, visual-only, and audio-visual speech recognition methods are compared. The audio-visual speech recognition scheme has a much improved recognition accuracy compared to acoustic-only and visual-only speech recognition especially at high noise levels. The experimental results showed that a set of as few as three labial geometric features are sufficient to improve the recognition rate by as much as 20% (from 62%, with acoustic-only information, to 82%, with audio-visual information at a signal-to-noise ratio of 0 dB). 相似文献