首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
From 1991 to 2005, China’s High Technology Research and Development Program (HTRDP) sponsored a series of technology evaluations on Chinese information processing and intelligent human-machine interface, which is called HTRDP evaluations, or “863” evaluations in brief. This paper introduces the HTRDP evaluations in detail. The general information of the HTRDP evaluation is presented first, including the history, the concerned technology categories, the organizer, the participants, and the procedure, etc. Then the evaluations on each technology are described in detail respectively, covering Chinese word segmentation, machine translation, acoustic speech recognition, text to speech, text summarization, text categorization, information retrieval, character recognition, and face detection and recognition. For the evaluations on each technology categories, the history, the evaluation tasks, the data, the evaluation method, etc., are given. The last section concludes the paper and discusses possible future work.  相似文献   

2.
基于内容的音频检索:概念和方法   总被引:38,自引:1,他引:37  
F过去对视觉媒体的检索,如图象和视频,进行了大量的研究。但是我们注意到音频也是多媒体中的一种典型媒体,是信息的一种常用载体。常规的自理是把数字音频当成非结构化流媒体。然而音频是语音的载体、包含丰富的听觉特征,并且具有结构信息。因此需要并且可以基于这些内容对音频进行存取。本文根据当前相关研究的进展,综述基于内容的音频检索方法,包括面向语音、音乐和音频分析的检索、音频分割等;分析并总结出音频内容及其检  相似文献   

3.
 In recent years, available audio corpora are rapidly increasing from fast growing Internet and digital libraries. How to classify and retrieve sound files relevant to the user's interest from large databases is crucial for building multimedia web search engines. In this paper, content-based technology has been applied to classify and retrieve audio clips using a fuzzy logic system, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips with mixed types. Two features selected from various extracted features are used as input to a constructed fuzzy inference system (FIS). The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the distributions of extracted audio features. Speech and music can thus be discriminated by the FIS. Furthermore, female and male speech can be separated by another FIS, whereas percussion can be distinguished from other music instruments. In addition, we can use multiple FISs to form a “fuzzy tree” for retrieval of more types of audio clips. With this approach, we can classify and retrieve generic audios more accurately, using fewer features and less computation time, compared to other existing approaches.  相似文献   

4.
基于内容的音频检索关键技术研究   总被引:4,自引:0,他引:4  
朱爱红  李连 《现代计算机》2003,(11):37-40,51
音频是一种重要的媒体,包含丰富的听觉特征。本文根据当前音频检索研究的进展,综述基于内容的音频检索方法,讨论了一些音频检索技术研究中的关键技术:音频特征提取、音频分类、语音识别技术等。最后展望了音频检索技术的发展前景。  相似文献   

5.
When human beings converse, they alternate between talking and listening. Participating in such turntaking behaviors is more difficult for machines that use speech recognition to listen and speech output to talk. This paper describes an algorithm for managing such turn-taking through the use of a sliding capture window. The device is specific to discrete speech recognition technologies that do not have access to echo cancellation. As such, it addresses those inexpensive applications that suffer the most from turn-taking errors—providing a “speech button” that stabilizes the interface. Correcting for short-lived turn-taking errors can be thought of as “debouncing” the button. An informal study based on ten subjects using a voice dialing application illuminates the design.  相似文献   

6.
We have previously developed a method for the recognition of the facial expression of a speaker. For facial expression recognition, we previously selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last vowel in an utterance. By using the speech recognition system named Julius, thermal static images are saved at the timed positions of just before speaking, and when just speaking the phonemes of the first and last vowels. To implement our method, we recorded three subjects who spoke 25 Japanese first names which provided all combinations of the first and last vowels. These recordings were used to prepare first the training data and then the test data. Julius sometimes makes a mistake in recognizing the first and/or last vowel (s). For example, /a/ for the first vowel is sometimes misrecognized as /i/. In the training data, we corrected this misrecognition. However, the correction cannot be carried out in the test data. In the implementation of our method, the facial expressions of the three subjects were distinguished with a mean accuracy of 79.8% when they exhibited one of the intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” The mean accuracy of the speech recognition of vowels by Julius was 84.1%.  相似文献   

7.
Because of the media digitization, a large amount of information such as speech, audio and video data is produced everyday. In order to retrieve data from these databases quickly and precisely, multimedia technologies for structuring and retrieving of speech, audio and video data are strongly required. In this paper, we overview the multimedia technologies such as structuring and retrieval of speech, audio and video data, speaker indexing, audio summarization and cross media retrieval existing today for TV news detabase. The main purpose of structuring is to produce tables of contents and indices from audio and video data automatically. In order to make these technologies feasible, first, processing units such as words on audio data and shots on video data are extracted. On a second step, they are meaningfully integrated into topics. Furthermore, the units extracted from different types of media are integrated for higher functions. Yasuo Ariki, Ph.D.: He is a Professor in the Department of Electronics and Informatics at the Ryukoku University. He received his B.E., M.E. and Ph.D. in information science from Kyoto University in 1974, 1976 and 1979, respectively. He had been an Assistant in Kyoto University from 1980 to 1990, and stayed at Edinburgh University as visiting academic from 1987 to 1990. His research interests are in speech and image recognition and in information retrieval and database. He is a member of IPSJ, IEICE, ASJ, Soc. Artif. Intel. and IEEE.  相似文献   

8.
The automated software system “Black Square,” Version 1.2 is described. The system is intended for the automation of image processing, analysis, and recognition. It is an open system for generating new knowledge: objects, algorithms of image processing, recognition procedures originally not intended for image processing, and methods for solving applied problems. The system combines the features of information retrieval, reference, training, and computing systems. This work was partially supported by the Russian Foundation for Basic Research, project nos. 03-07-90406, 05-04-49846, and 05-07-08000; by the INTAS grant no. 04-77-7067; by the Cooperative grant “Image Analysis and Synthesis: Theoretical Foundations and Prototypical Applications in Medical Imaging” within agreement between Italian National Research Council and Russian Academy of Sciences (RAS); by the grant of the RAS in the framework of the Program “Fundamental Science to Medicine.” An erratum to this article is available at .  相似文献   

9.
广电总局的电视监测业务已经实现了设备控制自动化和卫星信号采集的数字化、信息化和网络化,但基于内容的异态事件监测和信息处理还是完全依赖人工完成.语音处理、语音识别和关联检索等技术的发展,为电视监测业务智能化提供了可能.本文介绍了电视监测业务智能辅助系统的架构,该系统能够自动定位电视节目,把电视新闻语音转化为文字,对敏感语言内容预警,并关联聚类相关信息,方便人工后续处理.  相似文献   

10.
The automated software system Black Square, Version 1.2 is described. The system is intended for the automation of image processing, analysis, and recognition. It is an open system for generating new knowledge: objects, algorithms of image processing, recognition procedures originally not intended for image processing, and methods for solving applied problems. The system combines the features of information retrieval, reference, training, and computing systems. This work was partially supported by the Russian Foundation for Basic Research, project nos. 04-07-90187 and 05-07-08000; by the INTAS grant no. 04-77-7067; by the Cooperative grant “Image Analysis and Synthesis: Theoretical Foundations and Prototypical Applications in Medical Imaging” within agreement between Italian National Research Council and Russian Academy of Sciences (RAS); by the RAS Program “Fundamental Science to Medicine”; and by the Program of Scientific Research of the Presidium of RAS “Mathematical Modeling and Intellectual Systems.”  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号