期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Behaviour & Information Technology》2012,31(6):475-482

Abstract

Since the 1970s, many improvements have been made in the technology available for automatic speech recognition (ASR). Changes in the methods of analysing the incoming speech have resulted in larger, more complex vocabularies being used with greater recognition accuracy. Despite this enhanced performance and substantial research activity, the introduction of voice input into the office is still largely unrealized. This paper reviews the state-of-the-art of office applications of ASR, dividing them into the areas of voice messaging and word processing activities, data entry and information retrieval systems, and environmental control. Within these areas, cartographic computer-aided-design systems are identified as an application with proven success. The slow growth of voice input in the office is discussed in the light of constraints imposed by existing speech technology, and the need for human factors evaluation of potential applications. 相似文献

2.

Ergonomic improvement using natural language processing for voice-directed order selection in large industrial settings

David T. Goomas Timothy D. Ludwig 《人机工程学与制造业中的人性因素》2023,33(6):537-544

This field study examined the automatic speech recognition (ASR) of voice-directed computerized systems for order selectors employed in large industrial settings (e.g., fulfillment centers, distribution centers, warehouses, and manufacturing plants). Voice-directed systems for order selection require selectors to listen to instructions via a headset and speak into a microphone, directing each worker to select products for store orders throughout the facility. Originally, ASR used voice recognition that required “voice enrollment” (voice setup) for each worker plus a trainer's time required as part of the setup. Voice setup generally averaged about 60 min for both the worker and the trainer. Lately, a newer technology now utilizes “speech recognition,” which eliminates voice enrollment altogether. This study measured order selector voice setup times between voice recognition and speech recognition in five facilities. In two distribution centers where speech recognition was implemented, all voice setup hours for all order selectors (n = 55) plus the trainer's time were eliminated. This amounted to a total savings of 110 h. Moreover, using speech recognition becomes a recurring saving for each new employee entering the organization. Now the focus of training is shifted from voice setup to immediately training workers to select orders via voice, an ergonomic improvement. 相似文献

3.

Motion does matter: an examination of speech-based text entry on the move

Kathleen J. Price Min Lin Jinjuan Feng Rich Goldman Andrew Sears Julie A. Jacko 《Universal Access in the Information Society》2006,4(3):246-257

Desktop interaction solutions are often inappropriate for mobile devices due to small screen size and portability needs. Speech recognition can improve interactions by providing a relatively hands-free solution that can be used in various situations. While mobile systems are designed to be transportable, few have examined the effects of motion on mobile interactions. This paper investigates the effect of motion on automatic speech recognition (ASR) input for mobile devices. Speech recognition error rates (RER) have been examined with subjects walking or seated, while performing text input tasks and the effect of ASR enrollment conditions on RER. The obtained results suggest changes in user training of ASR systems for mobile and seated usage. 相似文献

4.

AUTOMATIC SPEECH RECOGNITION

Louis Fried 《Information Systems Management》1996,13(1):29-37

Automatic speech recognition (ASR) technology provides a natural interface for mission-critical multimedia applications. This article discusses the state of ASR technoloav. selection of an ASR system, and an approach for developing ASR applications. 相似文献

5.

Speech recognition and synthesis technology development at NTT for telecommunications services

Kazuo Hakoda Mikio Kitai Shigeki Sagayama 《International Journal of Speech Technology》1997,2(2):145-153

This paper describes recent developments at NTT in the areas of speech recognition, speech synthesis, and interactive voice systems as they relate to telecommunications applications. Speaker-independent largevocabulary speech recognition based on context-dependent phone models and LR parser, and high-quality text-to-speech (TTS) conversion using the waveform concatenation method, both realized as software, have enabled interactive voice systems for fast and easy prototyping of telephone-based applications. Practical applications are discussed with examples. 相似文献

6.

Merge-Weighted Dynamic Time Warping for Speech Recognition

下载免费PDF全文

张湘莉兰骆志刚李明《计算机科学技术学报》2014,29(6):1072-1082

Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe... 相似文献

7.

基于CT的语音识别控制的系统设计和应用

谭保华熊健民刘幺和《自动化技术与应用》2005,24(6):10-12

语音控制是智能接口技术的热点问题之一。语音控制主要指在各种情况下,机器可以准确识别出人语音的内容,并根据语音包含的信息执行人的各种意图。根据笔者的工程开发实践,实现了一个由IVR、TTS和ASR系统支持的CT技术系统,并在实现计算机图形运动逻辑的基础上,建立和测试了基于CT技术的语音控制逻辑。相似文献

8.

Automatic speech recognition in practice

Dylan M. Jones Clive R. Frankish Kevin Hapeshi 《Behaviour & Information Technology》1992,11(2):109-122

There is a growing interest in the commercial possibilities offered by automatic speech recognition (ASR) technology. Unfortunately the prospective user has little independent guidance with respect to the potential success of any proposed implementation. There do exist a few general human factors guidelines on the use of ASR, but most of the corpus of knowledge that forms part of the lore within the ASR community is based on the unpublished experiences of system.developers and users themselves. The present paper attempts to redress this balance; it is a summary of the experiences of users and system designers at 30 research and commercial sites in the UK and USA where ASR has been extensively used or tested. The application represented were classified as vehicle, office, industrial, and aids for disabled people. A number of important human factors issues were identified, and the relative success of the various applications are discussed. 相似文献

9.

Multi-band automatic speech recognition

《Computer Speech and Language》2001,15(2):151-174

This paper presents a new architecture for automatic speech recognition systems which is characterized by the division of the spectral domain of the speech signal into several independent frequency bands. This model is based on the psycho-acoustic work of Fletcher (1953) who proposed a similar principle for the human auditory system. Jont B. Allen published a paper in 1994 in which he summarized the work of Fletcher and also proposed to adapt the multi-band paradigm to automatic speech recognition (ASR) (Allen, 1994). Many researchers have then studied this principle and built such ASR systems. The goal of this paper is to analyse some of the most important issues in the design of a multi-band ASR system in order to determine which architecture it should have in which environment. Two other major problems are then considered: how to train multi-band systems and how to use them for continuous ASR. 相似文献

10.

Automatic speech recognition- an approach for designing inclusive games

Moyen Mohammad Mustaquim 《Multimedia Tools and Applications》2013,66(1):131-146

Computer games are now a part of our modern culture. However, certain categories of people are excluded from this form of entertainment and social interaction because they are unable to use the interface of the games. The reason for this can be deficits in motor control, vision or hearing. By using automatic speech recognition systems (ASR), voice driven commands can be used to control the game, which can thus open up the possibility for people with motor system difficulty to be included in game communities. This paper aims at find a standard way of using voice commands in games which uses a speech recognition system in the backend, and that can be universally applied for designing inclusive games. Present speech recognition systems however, do not support emotions, attitudes, tones etc. This is a drawback because such expressions can be vital for gaming. Taking multiple types of existing genres of games into account and analyzing their voice command requirements, a general ASRS module is proposed which can work as a common platform for designing inclusive games. A fuzzy logic controller proposed then is to enhance the system. The standard voice driven module can be based on algorithm or fuzzy controller which can be used to design software plug-ins or can be included in microchip. It then can be integrated with the game engines; creating the possibility of voice driven universal access for controlling games. 相似文献

11.

Commercial Speech Recognition Technology in the Military Domain: Results of Two Recent Research Efforts

David T.?Williamson Email author Mark H.?Draper Gloria L.?Calhoun Timothy P.?Barry 《International Journal of Speech Technology》2005,8(1):9-16

While speech recognition technology has long held the potential for improving the effectiveness of military operations, it has only been within the last several years that speech systems have enabled the realization of that potential. Commercial speech recognition technology developments aimed at improving robustness for automotive and cellular phone applications have capabilities that can be exploited in various military systems. This paper discusses the results of two research efforts directed toward applying commercial-off-the-shelf speech recognition technology in the military domain. The first effort discussed is the development and evaluation of a speech recognition interface to the Theater Air Planning system responsible for the generation of air tasking orders in a military Air Operations Center. The second effort examined the utility of speech versus conventional manual input for tasks performed by operators in an unmanned aerial vehicle control station simulator. Both efforts clearly demonstrate the military benefits obtainable from the proper application of speech technology. 相似文献

12.

基于SoPC的孤立词语音识别系统的设计

孙玉郭宝增《微型机与应用》2012,31(2):74-76,79

采用SoPC方法,实现了基于动态时间规整(DTW)算法的孤立词语音识别系统,该系统可以作为电器系统的语音命令控制模块使用。考虑嵌入式系统的特点,对端点检测算法和模式匹配算法进行了选择和调整。实验表明,该语音识别系统运行速度和识别准确性能够适应语音控制的要求。SoPC设计方式灵活,适合对系统进行改进升级。相似文献

13.

Beyond ASR 1-best: Using word confusion networks in spoken language understanding 总被引：1，自引：0，他引：1

Dilek Hakkani-Tür Frdric Bchet Giuseppe Riccardi Gokhan Tur 《Computer Speech and Language》2006,20(4):495-514

We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. 相似文献

14.

基于盲源分离和噪声抑制的语音信号识别

下载免费PDF全文

刘晶《计算机测量与控制》2018,26(12):140-144

为了更准确地在噪声环境中对不同语音信号进行识别,提出了一种用于普适语音环境下的自优化语音活动检测(VAD)算法,该算法运用个性化语音命令自动识别系统的语音信号,并能够有效地从多个发声者的混合语音中分离出个体发声者的声音,通过跟踪语音功率谱的较高幅度部分和自适应地抑制噪声来检测发声者的语音信号;设计并实现了一种处理多个发声者任务的自动语音识别(ASR),免去了对干净的语音变化进行先验估计,直接利用噪声本身产生语音/非语音判决的阈值以完成自优化过程;使用语音数据库NOIZEUS进行了评价测试,实验结果表明,所提出的盲源分离和噪声抑制方法不需要任何额外的计算过程,有效地减少了计算负担。相似文献

15.

Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

Bojan Kotnik Damjan Vlaj Bogomir Horvat 《International Journal of Speech Technology》2003,6(3):205-219

The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power. 相似文献

16.

Voice input/output in inspection data entry

Thomas J. Betty Andrew A. Jens E.Clay Watkins 《Computers & Industrial Engineering》1985,9(3):215-223

Today's voice technology can provide voice input/output systems that can be used effectively in the factory. This will happen only if the proper technology is connected to suitable applications. Discrete word-dependent speaker recognition in inspection data entry is one such combination. 相似文献

17.

Design and implementation of a user-oriented speech recognition interface: the synergy of technology and human factors

《Interacting with computers》1994,6(1):41-60

The design and implementation of a user-oriented speech recognition interface are described. The interface enables the use of speech recognition in so-called interactive voice response systems which can be accessed via a telephone connection. In the design of the interface a synergy of technology and human factors is achieved. This synergy is very important for making speech interfaces a natural and acceptable form of human-machine interaction. Important concepts such as interfaces, human factors and speech recognition are discussed. Additionally, an indication is given as to how the synergy of human factors and technology can be realised by a sketch of the interface's implementation. An explanation is also provided of how the interface might be integrated in different applications fruitfully. 相似文献

18.

Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition

《IEEE transactions on audio, speech, and language processing》2009,17(1):84-94

The performance of current automatic speech recognition (ASR) systems often deteriorates radically when the input speech is corrupted by various kinds of noise sources. Several methods have been proposed to improve ASR robustness over the last few decades. The related literature can be generally classified into two categories according to whether the methods are directly based on the feature domain or consider some specific statistical feature characteristics. In this paper, we present a polynomial regression approach that has the merit of directly characterizing the relationship between speech features and their corresponding distribution characteristics to compensate for noise interference. The proposed approach and a variant were thoroughly investigated and compared with a few existing noise robustness approaches. All experiments were conducted using the Aurora-2 database and task. The results show that our approaches achieve considerable word error rate reductions over the baseline system and are comparable to most of the conventional robustness approaches discussed in this paper. 相似文献

19.

A study on the challenges and opportunities of speech recognition for Bengali language

Mridha M. F. Ohi Abu Quwsar Hamid Md Abdul Monowar Muhammad Mostafa 《Artificial Intelligence Review》2022,55(4):3431-3455

Speech recognition is a fascinating process that offers the opportunity to interact and command the machine in the field of human-computer interactions. Speech recognition is a language-dependent system constructed directly based on the linguistic and textual properties of any language. Automatic speech recognition (ASR) systems are currently being used to translate speech to text flawlessly. Although ASR systems are being strongly executed in international languages, ASR systems’ implementation in the Bengali language has not reached an acceptable state. In this research work, we sedulously disclose the current status of the Bengali ASR system’s research endeavors. In what follows, we acquaint the challenges that are mostly encountered while constructing a Bengali ASR system. We split the challenges into language-dependent and language-independent challenges and guide how the particular complications may be overhauled. Following a rigorous investigation and highlighting the challenges, we conclude that Bengali ASR systems require specific construction of ASR architectures based on the Bengali language’s grammatical and phonetic structure.

相似文献

20.

基于语音识别技术的智能控制系统设计

王富中黄文浩《自动化与仪表》2006,21(4):8-10

语音识别技术近些年来发展非常迅速，并且在许多方面已经有了很好的应用。以语音识别技术在对话娃娃中的应用为例，阐述如何利用语音识别技术来设计智能控制系统，并详细介绍了系统的结构和原理。该系统具有很好的扩展性，稍微做些改动，就可以设计出各种各样的语音控制系统。相似文献