排序方式: 共有22条查询结果,搜索用时 15 毫秒
1.
The art and science of speech recognition have been advanced to the state where it is now possible to communicate reliably with a computer by speaking to it in a disciplined manner using a vocabulary of moderate size. It is the purpose of this paper to outline two aspects of speech-recognition research. First, we discuss word recognition as a classical pattern-recognition problem and show how some fundamental concepts of signal processing, information theory, and computer science can be combined to give us the capability of robust recognition of isolated words and simple connected word sequences. We then describe methods whereby these principles, augmented by modern theories of formal language and semantic analysis, can be used to study some of the more general problems in speech recognition. It is anticipated that these methods will ultimately lead to accurate mechanical recognition of fluent speech under certain controlled conditions. 相似文献
2.
Accurate detection of the boundaries of a speech utterance during a recording interval has been shown to be crucial for reliable and robust automatic speech recognition. The endpoint detection problem is fairly straightforward for high-level speech signals spoken in low-level stationary noise environments (e.g. signal-to-noise ratios greater than 30 dB). However, these ideal conditions do not always exist. One example, where reliable word detection is difficult, is speech spoken in a mobile environment. Because of road, tire, fan noises, etc. detection of speech often becomes problematic.Currently, most endpoint detection algorithms use only signal energy and duration information to perform the endpoint detection task. These algorithms perform quite well with reasonable signal-to-noise ratios. However, under the harshest of conditions (e.g. in a car travelling at 60 mph with the fan on high) these algorithms begin to fail.In this paper, an endpoint detection algorithm is presented which is based on hidden Markov model (HMM) technology. The algorithm explicitly determines a set of speech endpoints based on the output of a Viterbi decoding algorithm. This algorithm was tested using a template-based speech recognition system and also using an HMM based system.Based on a speaker dependent speech database from four talkers, recorded in a mobile environment under five different driving conditions (including traveling at 60 mph with the fan on), we tested several endpoint detection schemes. The results showed that, under some conditions, the HMM-based approach to endpoint detection performed significantly better than the energy-based system. The overall accuracy of the system using the HMM endpoint detector, when trained with clean inputs and when tested on the 11 word digits vocabulary (zero through nine and oh) with speech recorded in various mobile environments, was 99.7%. The equivalent accuracy of the energy based endpoint detector was 95.2% in a template based recognizer. 相似文献
3.
RN Gunn PA Sargent CJ Bench EA Rabiner S Osman VW Pike SP Hume PM Grasby AA Lammertsma 《Canadian Metallurgical Quarterly》1998,8(4):426-440
[Carbonyl-11C]WAY-100635 is a promising PET radioligand for the 5-HT1A receptor, having demonstrated more favorable characteristics for in vivo imaging than the previously available [O-methyl-11C]WAY-100635. The current study evaluates different tracer kinetic modelling strategies for the quantification of 5-HT1A receptor binding in human brain. Mathematical modelling of the carbonyl-labeled radiotracer is investigated using compartmental structures, including both plasma input and reference tissue approaches. Furthermore, the application of basis function methods allows for the investigation of parametric imaging, providing functional maps of both delivery and binding of the radioligand. Parameter estimates of binding from normal volunteers indicate a low intra- versus a high intersubject variability. It is concluded that a simplified reference tissue approach may be used to quantify 5-HT1A binding either in terms of ROI data or as parametric images. 相似文献
4.
Ephraim Y. Rabiner L.R. 《IEEE transactions on information theory / Professional Technical Group on Information Theory》1990,36(2):372-380
Some relations among approaches that have been applied to estimating models for acoustic signals in speech recognition systems are examined. In particular, the modeling approaches based on maximum likelihood (ML), maximum mutual information (MMI), and minimum discrimination information (MDI) are studied. It is shown that all three approaches can be formulated uniformly as MDI modeling approaches for simultaneous estimation of the acoustic models for all words in the vocabulary and that none of the approaches requires any model correctness assumption. The three approaches differ in the effective source being modeled and in the probability distribution attributed to this source 相似文献
5.
Cox R.V. Kamm C.A. Rabiner L.R. Schroeter J. Wilpon J.G. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1314-1337
In the future, the world of telecommunications will be vastly different than it is today. The driving force will be the seamless integration of real time communications (e.g. voice, video, music, etc.) and data into a single network, with ubiquitous access to that network anywhere, anytime, and by a wide range of devices. The only currently available ubiquitous access device to the network is the telephone, and the only ubiquitous user access technology mode is spoken voice commands and natural language dialogues with machines. In the future, new access devices and modes will augment speech in this role, but are unlikely to supplant the telephone and access by speech anytime soon. Speech technologies have progressed to the point where they are now viable for a broad range of communications services, including: compression of speech for use over wired and wireless networks; speech synthesis, recognition, and understanding for dialogue access to information, people, and messaging; and speaker verification for secure access to information and services. The paper provides brief overviews of these technologies, discusses some of the unique properties of wireless, plain old telephone service, and Internet protocol networks that make voice communication and control problematic, and describes the types of voice services available in the past and today, and those that we foresee becoming available over the next several years 相似文献
6.
The two methods described for giving voices to computers recognize the importance of economical storage of speech information and extensive vocabularies, and consequently are based on principles of speech synthesis. The first, formant synthesis, generates connected speech from low-bit-rate representations of spoken words. The second, text synthesis, produces connected speech solely from printed English text. For both methods the machine must contain stored knowledge of fundamental rules of language and acoustic constraints of human speech. Formant synthesis from an input information rate of about 1000 bits per second is demonstrated, as is text synthesis from a rate of about 75 bits per second. To give the reader an opportunity to evaluate some of the results described, a sample recording is available; see Appendix A for details. 相似文献
7.
The performance of isolated word speech recognition system has steadily improved over time as we learn more about how to represent the significant events in speech, and how to capture these events via appropriate analysis procedures and training algorithms. In particular, algorithms based on both template matching (via dynamic time warping (DTW) procedures) and hidden Markov models (HMMs) have been developed which yield high accuracy on several standard vocabularies, including the 10 digits (zero to nine) and the set of 26 letters of the English alphabet (A-Z). Results are given showing currently attainable performance of a laboratory system for both template-based (DTW) and HMM-based recognizers, operating in both speaker trained and speaker independent modes, on the digits and the alphabet vocabularies using telephone recordings. We show that the average error rates of these systems, on standard vocabularies, are significantly lower than those reported several years back on the exact same databases, thereby reflecting the progress which has been made in all aspects of the speech recognition process. 相似文献
8.
Cox R.V. Haskell B.G. LeCun Y. Shahraray B. Rabiner L. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1998,86(5):755-824
The challenge of multimedia processing is to provide services that seamlessly integrate text, sound, image, and video information and to do it in a way that preserves the ease of use and interactivity of conventional plain old telephone service (POTS) telephony. To achieve this goal, there are a number of technological problems that must be considered, including: compression and coding of multimedia signals, including algorithmic issues, standards issues, and transmission issues; synthesis and recognition of multimedia signals, including speech, images, handwriting, and text; organization, storage, and retrieval of multimedia signals, including the appropriate method and speed of delivery, resolution, and quality of service; access methods to the multimedia signal, including spoken natural language interfaces, agent interfaces, and media conversion tools; searching by text, speech, and image queries; browsing by accessing the text, by voice, or by indexed images. In each of these areas, a great deal of progress has been made in the past few years, driven in part by the relentless growth in multimedia personal computers and in part by the promise of broad-band access from the home and from wireless connections. Standards have also played a key role in driving new multimedia services, both on the POTS network and on the Internet. It is the purpose of this paper to review the status of the technology in each of the areas listed above and to illustrate current capabilities by describing several multimedia applications that have been implemented at AT&T Labs over the past several years 相似文献
9.
10.