共查询到10条相似文献,搜索用时 125 毫秒
1.
Liu Qun Wang Xiangdong Liu Hong Sun Le Tang Sheng Xiong Deyi Hou Hongxu Lv Yuanhua Li Wenbo Lin Shouxun Qian Yueliang 《Frontiers of Computer Science in China》2007,1(1):58-93
From 1991 to 2005, China’s High Technology Research and Development Program (HTRDP) sponsored a series of technology evaluations
on Chinese information processing and intelligent human-machine interface, which is called HTRDP evaluations, or “863” evaluations
in brief. This paper introduces the HTRDP evaluations in detail. The general information of the HTRDP evaluation is presented
first, including the history, the concerned technology categories, the organizer, the participants, and the procedure, etc.
Then the evaluations on each technology are described in detail respectively, covering Chinese word segmentation, machine
translation, acoustic speech recognition, text to speech, text summarization, text categorization, information retrieval,
character recognition, and face detection and recognition. For the evaluations on each technology categories, the history,
the evaluation tasks, the data, the evaluation method, etc., are given. The last section concludes the paper and discusses
possible future work. 相似文献
2.
基于内容的音频检索:概念和方法 总被引:38,自引:1,他引:37
F过去对视觉媒体的检索,如图象和视频,进行了大量的研究。但是我们注意到音频也是多媒体中的一种典型媒体,是信息的一种常用载体。常规的自理是把数字音频当成非结构化流媒体。然而音频是语音的载体、包含丰富的听觉特征,并且具有结构信息。因此需要并且可以基于这些内容对音频进行存取。本文根据当前相关研究的进展,综述基于内容的音频检索方法,包括面向语音、音乐和音频分析的检索、音频分割等;分析并总结出音频内容及其检 相似文献
3.
M. Liu C. Wan L. Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(5):357-364
In recent years, available audio corpora are rapidly increasing from fast growing Internet and digital libraries. How to
classify and retrieve sound files relevant to the user's interest from large databases is crucial for building multimedia
web search engines. In this paper, content-based technology has been applied to classify and retrieve audio clips using a
fuzzy logic system, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips with mixed
types. Two features selected from various extracted features are used as input to a constructed fuzzy inference system (FIS).
The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the
distributions of extracted audio features. Speech and music can thus be discriminated by the FIS. Furthermore, female and
male speech can be separated by another FIS, whereas percussion can be distinguished from other music instruments. In addition,
we can use multiple FISs to form a “fuzzy tree” for retrieval of more types of audio clips. With this approach, we can classify
and retrieve generic audios more accurately, using fewer features and less computation time, compared to other existing approaches. 相似文献
4.
基于内容的音频检索关键技术研究 总被引:4,自引:0,他引:4
音频是一种重要的媒体,包含丰富的听觉特征。本文根据当前音频检索研究的进展,综述基于内容的音频检索方法,讨论了一些音频检索技术研究中的关键技术:音频特征提取、音频分类、语音识别技术等。最后展望了音频检索技术的发展前景。 相似文献
5.
Bruce E. Balentine Colin M. Ayer Clint L. Miller Brian L. Scott 《International Journal of Speech Technology》1997,2(1):7-19
When human beings converse, they alternate between talking and listening. Participating in such turntaking behaviors is more
difficult for machines that use speech recognition to listen and speech output to talk. This paper describes an algorithm
for managing such turn-taking through the use of a sliding capture window. The device is specific to discrete speech recognition
technologies that do not have access to echo cancellation. As such, it addresses those inexpensive applications that suffer
the most from turn-taking errors—providing a “speech button” that stabilizes the interface. Correcting for short-lived turn-taking
errors can be thought of as “debouncing” the button. An informal study based on ten subjects using a voice dialing application
illuminates the design. 相似文献
6.
Yasunari Yoshitomi Taro Asada Kyouhei Shimada Masayoshi Tabuse 《Artificial Life and Robotics》2011,16(3):318-323
We have previously developed a method for the recognition of the facial expression of a speaker. For facial expression recognition,
we previously selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last
vowel in an utterance. By using the speech recognition system named Julius, thermal static images are saved at the timed positions
of just before speaking, and when just speaking the phonemes of the first and last vowels. To implement our method, we recorded
three subjects who spoke 25 Japanese first names which provided all combinations of the first and last vowels. These recordings
were used to prepare first the training data and then the test data. Julius sometimes makes a mistake in recognizing the first
and/or last vowel (s). For example, /a/ for the first vowel is sometimes misrecognized as /i/. In the training data, we corrected
this misrecognition. However, the correction cannot be carried out in the test data. In the implementation of our method,
the facial expressions of the three subjects were distinguished with a mean accuracy of 79.8% when they exhibited one of the
intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” The mean accuracy of the speech recognition
of vowels by Julius was 84.1%. 相似文献
7.
Yasuo Ariki 《New Generation Computing》2000,18(4):341-357
Because of the media digitization, a large amount of information such as speech, audio and video data is produced everyday.
In order to retrieve data from these databases quickly and precisely, multimedia technologies for structuring and retrieving
of speech, audio and video data are strongly required. In this paper, we overview the multimedia technologies such as structuring
and retrieval of speech, audio and video data, speaker indexing, audio summarization and cross media retrieval existing today
for TV news detabase. The main purpose of structuring is to produce tables of contents and indices from audio and video data
automatically. In order to make these technologies feasible, first, processing units such as words on audio data and shots
on video data are extracted. On a second step, they are meaningfully integrated into topics. Furthermore, the units extracted
from different types of media are integrated for higher functions.
Yasuo Ariki, Ph.D.: He is a Professor in the Department of Electronics and Informatics at the Ryukoku University. He received his B.E., M.E.
and Ph.D. in information science from Kyoto University in 1974, 1976 and 1979, respectively. He had been an Assistant in Kyoto
University from 1980 to 1990, and stayed at Edinburgh University as visiting academic from 1987 to 1990. His research interests
are in speech and image recognition and in information retrieval and database. He is a member of IPSJ, IEICE, ASJ, Soc. Artif.
Intel. and IEEE. 相似文献
8.
I. B. Gurevich D. V. Harazishvili O. Salvetti A. A. Trykova I. A. Vorob’ev 《Pattern Recognition and Image Analysis》2006,16(1):113-115
The automated software system “Black Square,” Version 1.2 is described. The system is intended for the automation of image
processing, analysis, and recognition. It is an open system for generating new knowledge: objects, algorithms of image processing,
recognition procedures originally not intended for image processing, and methods for solving applied problems. The system
combines the features of information retrieval, reference, training, and computing systems.
This work was partially supported by the Russian Foundation for Basic Research, project nos. 03-07-90406, 05-04-49846, and
05-07-08000; by the INTAS grant no. 04-77-7067; by the Cooperative grant “Image Analysis and Synthesis: Theoretical Foundations
and Prototypical Applications in Medical Imaging” within agreement between Italian National Research Council and Russian Academy
of Sciences (RAS); by the grant of the RAS in the framework of the Program “Fundamental Science to Medicine.”
An erratum to this article is available at . 相似文献
9.
10.
I. B. Gurevich 《Pattern Recognition and Image Analysis》2006,16(1):138-140
The automated software system Black Square, Version 1.2 is described. The system is intended for the automation of image processing,
analysis, and recognition. It is an open system for generating new knowledge: objects, algorithms of image processing, recognition
procedures originally not intended for image processing, and methods for solving applied problems. The system combines the
features of information retrieval, reference, training, and computing systems.
This work was partially supported by the Russian Foundation for Basic Research, project nos. 04-07-90187 and 05-07-08000;
by the INTAS grant no. 04-77-7067; by the Cooperative grant “Image Analysis and Synthesis: Theoretical Foundations and Prototypical
Applications in Medical Imaging” within agreement between Italian National Research Council and Russian Academy of Sciences
(RAS); by the RAS Program “Fundamental Science to Medicine”; and by the Program of Scientific Research of the Presidium of
RAS “Mathematical Modeling and Intellectual Systems.” 相似文献