首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Since the 1970s, many improvements have been made in the technology available for automatic speech recognition (ASR). Changes in the methods of analysing the incoming speech have resulted in larger, more complex vocabularies being used with greater recognition accuracy. Despite this enhanced performance and substantial research activity, the introduction of voice input into the office is still largely unrealized. This paper reviews the state-of-the-art of office applications of ASR, dividing them into the areas of voice messaging and word processing activities, data entry and information retrieval systems, and environmental control. Within these areas, cartographic computer-aided-design systems are identified as an application with proven success. The slow growth of voice input in the office is discussed in the light of constraints imposed by existing speech technology, and the need for human factors evaluation of potential applications.  相似文献   

2.
This field study examined the automatic speech recognition (ASR) of voice-directed computerized systems for order selectors employed in large industrial settings (e.g., fulfillment centers, distribution centers, warehouses, and manufacturing plants). Voice-directed systems for order selection require selectors to listen to instructions via a headset and speak into a microphone, directing each worker to select products for store orders throughout the facility. Originally, ASR used voice recognition that required “voice enrollment” (voice setup) for each worker plus a trainer's time required as part of the setup. Voice setup generally averaged about 60 min for both the worker and the trainer. Lately, a newer technology now utilizes “speech recognition,” which eliminates voice enrollment altogether. This study measured order selector voice setup times between voice recognition and speech recognition in five facilities. In two distribution centers where speech recognition was implemented, all voice setup hours for all order selectors (n = 55) plus the trainer's time were eliminated. This amounted to a total savings of 110 h. Moreover, using speech recognition becomes a recurring saving for each new employee entering the organization. Now the focus of training is shifted from voice setup to immediately training workers to select orders via voice, an ergonomic improvement.  相似文献   

3.
This paper describes recent developments at NTT in the areas of speech recognition, speech synthesis, and interactive voice systems as they relate to telecommunications applications. Speaker-independent largevocabulary speech recognition based on context-dependent phone models and LR parser, and high-quality text-to-speech (TTS) conversion using the waveform concatenation method, both realized as software, have enabled interactive voice systems for fast and easy prototyping of telephone-based applications. Practical applications are discussed with examples.  相似文献   

4.
This article presents an overview of different approaches for providing automatic speech recognition (ASR) technology to mobile users. Three principal system architectures with respect to the employment of a wireless communication link are analyzed: Embedded Speech Recognition Systems, Network Speech Recognition (NSR) and Distributed Speech Recognition (DSR). An overview of the solutions having been standardized so far as well as a critical analysis of the latest developments in the field of speech recognition in mobile environments is given. Open issues, pros and cons of the different methodologies and techniques are highlighted. Special emphasis is placed on the constraints and limitations ASR applications are confronted with under different architectures.  相似文献   

5.
Desktop interaction solutions are often inappropriate for mobile devices due to small screen size and portability needs. Speech recognition can improve interactions by providing a relatively hands-free solution that can be used in various situations. While mobile systems are designed to be transportable, few have examined the effects of motion on mobile interactions. This paper investigates the effect of motion on automatic speech recognition (ASR) input for mobile devices. Speech recognition error rates (RER) have been examined with subjects walking or seated, while performing text input tasks and the effect of ASR enrollment conditions on RER. The obtained results suggest changes in user training of ASR systems for mobile and seated usage.  相似文献   

6.

Speech recognition is a fascinating process that offers the opportunity to interact and command the machine in the field of human-computer interactions. Speech recognition is a language-dependent system constructed directly based on the linguistic and textual properties of any language. Automatic speech recognition (ASR) systems are currently being used to translate speech to text flawlessly. Although ASR systems are being strongly executed in international languages, ASR systems’ implementation in the Bengali language has not reached an acceptable state. In this research work, we sedulously disclose the current status of the Bengali ASR system’s research endeavors. In what follows, we acquaint the challenges that are mostly encountered while constructing a Bengali ASR system. We split the challenges into language-dependent and language-independent challenges and guide how the particular complications may be overhauled. Following a rigorous investigation and highlighting the challenges, we conclude that Bengali ASR systems require specific construction of ASR architectures based on the Bengali language’s grammatical and phonetic structure.

  相似文献   

7.
《Ergonomics》2012,55(11):1359-1370
Abstract

This paper presents the results of two experiments examining the effect of voice generation and recognition systems on dual-task performance. In the first experiment subjects performed a task combination consisting of a spatial short-term memory task and a verbal short-term memory task. In the second experiment the subjects performed a combination consisting of a one-dimensional compensatory tracking task and the verbal short-term memory task used in experiment I. In both experiments, stimuli for the verbal short-term memory task were presented either visually on a CRT or auditorily using a voice generation system. Subjects responded either by using a keypad or by using a voice recognition system. A strictly between-subjects design was used in both experiments to avoid problems associated with asymmetric transfer. In both experiments the use of a voice generation system benefited dual-task performance. Experiment I showed no significant difference between speech responses and manual responses on any dependent measure. Experiment II showed significantly faster correct reaction times (RTs) for speech responses to the verbal short-term memory task only when the responses were adjusted for the delay inherent in the speech recognition system. The implications of these studies for the application of voice generation and recognition systems is discussed.  相似文献   

8.
Automatic speech recognition (ASR) technology provides a natural interface for mission-critical multimedia applications. This article discusses the state of ASR technoloav. selection of an ASR system, and an approach for developing ASR applications.  相似文献   

9.
There is a growing interest in the commercial possibilities offered by automatic speech recognition (ASR) technology. Unfortunately the prospective user has little independent guidance with respect to the potential success of any proposed implementation. There do exist a few general human factors guidelines on the use of ASR, but most of the corpus of knowledge that forms part of the lore within the ASR community is based on the unpublished experiences of system.developers and users themselves. The present paper attempts to redress this balance; it is a summary of the experiences of users and system designers at 30 research and commercial sites in the UK and USA where ASR has been extensively used or tested. The application represented were classified as vehicle, office, industrial, and aids for disabled people. A number of important human factors issues were identified, and the relative success of the various applications are discussed.  相似文献   

10.
Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe...  相似文献   

11.
Computer games are now a part of our modern culture. However, certain categories of people are excluded from this form of entertainment and social interaction because they are unable to use the interface of the games. The reason for this can be deficits in motor control, vision or hearing. By using automatic speech recognition systems (ASR), voice driven commands can be used to control the game, which can thus open up the possibility for people with motor system difficulty to be included in game communities. This paper aims at find a standard way of using voice commands in games which uses a speech recognition system in the backend, and that can be universally applied for designing inclusive games. Present speech recognition systems however, do not support emotions, attitudes, tones etc. This is a drawback because such expressions can be vital for gaming. Taking multiple types of existing genres of games into account and analyzing their voice command requirements, a general ASRS module is proposed which can work as a common platform for designing inclusive games. A fuzzy logic controller proposed then is to enhance the system. The standard voice driven module can be based on algorithm or fuzzy controller which can be used to design software plug-ins or can be included in microchip. It then can be integrated with the game engines; creating the possibility of voice driven universal access for controlling games.  相似文献   

12.
语音控制是智能接口技术的热点问题之一。语音控制主要指在各种情况下,机器可以准确识别出人语音的内容,并根据语音包含的信息执行人的各种意图。根据笔者的工程开发实践,实现了一个由IVR、TTS和ASR系统支持的CT技术系统,并在实现计算机图形运动逻辑的基础上,建立和测试了基于CT技术的语音控制逻辑。  相似文献   

13.
14.
This paper presents a new architecture for automatic speech recognition systems which is characterized by the division of the spectral domain of the speech signal into several independent frequency bands. This model is based on the psycho-acoustic work of Fletcher (1953) who proposed a similar principle for the human auditory system. Jont B. Allen published a paper in 1994 in which he summarized the work of Fletcher and also proposed to adapt the multi-band paradigm to automatic speech recognition (ASR) (Allen, 1994). Many researchers have then studied this principle and built such ASR systems. The goal of this paper is to analyse some of the most important issues in the design of a multi-band ASR system in order to determine which architecture it should have in which environment. Two other major problems are then considered: how to train multi-band systems and how to use them for continuous ASR.  相似文献   

15.
This paper explains a new hybrid method for Automatic Speaker Recognition using speech signals based on the Artificial Neural Network (ANN). ASR performance characteristics is regarded as the foremost challenge and necessitated to be improved. This research work mainly focusses on resolving the ASR problems as well as to improve the accuracy of the prediction of a speaker.. Mel Frequency Cepstral Coefficient (MFCC) is greatly exploited for signal feature extraction.The input samples are created using these extracted features and its dimensions have been reduced using Self Organizing Feature Map (SOFM). Finally, using the reduced input samples, recognition is performed using Multilayer Perceptron (MLP) with Bayesian Regularization.. The training of the network has been accomplished and verified by means of real speech datasets from the Multivariability speaker recognition database for 10 speakers. The proposed method is validated by performance estimation as well as classification accuracies in contradiction to other models.The proposed method gives better recognition rate and 93.33% accuracy is attained.  相似文献   

16.
While speech recognition technology has long held the potential for improving the effectiveness of military operations, it has only been within the last several years that speech systems have enabled the realization of that potential. Commercial speech recognition technology developments aimed at improving robustness for automotive and cellular phone applications have capabilities that can be exploited in various military systems. This paper discusses the results of two research efforts directed toward applying commercial-off-the-shelf speech recognition technology in the military domain. The first effort discussed is the development and evaluation of a speech recognition interface to the Theater Air Planning system responsible for the generation of air tasking orders in a military Air Operations Center. The second effort examined the utility of speech versus conventional manual input for tasks performed by operators in an unmanned aerial vehicle control station simulator. Both efforts clearly demonstrate the military benefits obtainable from the proper application of speech technology.  相似文献   

17.
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output.  相似文献   

18.
19.
Today's voice technology can provide voice input/output systems that can be used effectively in the factory. This will happen only if the proper technology is connected to suitable applications. Discrete word-dependent speaker recognition in inspection data entry is one such combination.  相似文献   

20.
Abstract

Although automatic speech recognition (ASR) can provide a medium of controlling computers which is relatively easy to use, novice users often have problems with it during their initial practices. In this study, two methods for training subjects to use ASR are compared. One group of subjects received a short demonstration given by an experienced ASR user and the other group received verbal instructions on how to use the device. The results show that subjects given a demonstration achieved better performance than those given instructions (p< 0.005). This is explained by virtue of the fact that the successful use of ASR requires procedural knowledge which is better acquired through some form of practice than through instruction. It is concluded that a demonstration provides ‘practice by proxy’. ‘Task like’ forms of enrolment are discussed. It is suggested that although they can provide the possibility of practice, they are not applicable to all types of ASR use. A demonstration provides users with task familiarization, and an appropriate style of speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号