期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Indexing and Retrieval of Audio: A Survey 总被引：3，自引：0，他引：3

Lu Goujun 《Multimedia Tools and Applications》2001,15(3):269-290

With more and more audio being captured and stored, there is a growing need for automatic audio indexing and retrieval techniques that can retrieve relevant audio pieces quickly on demand. This paper provides a comprehensive survey of audio indexing and retrieval techniques. We first describe main audio characteristics and features and discuss techniques for classifying audio into speech and music based on these features. Indexing and retrieval of speech and music is then described separately. Finally, significance of audio in multimedia indexing and retrieval is discussed. 相似文献

2.

Concept framework for audio information retrieval: ARF

下载免费PDF全文

李国辉武德峰张军《计算机科学技术学报》2003,18(5):0-0

相似文献

3.

An analytical evaluation of search by content and interaction patterns on multimodal meeting records

Matt-M. Bouamrane Saturnino Luz 《Multimedia Systems》2007,13(2):89-102

It has been suggested that combining content-based indexing with automatically generated temporal metadata might help improve search and browsing of recordings of computer-mediated collaborative activities such as on-line meetings, which are characterised by extensive multimodal communication. This paper presents an analytical evaluation of the effectiveness of these techniques as implemented through automatic speech recognition and temporal mapping. In particular, it assesses the extent to which this strategy can help uncover contextual relationships between audio and text segments in recorded remote meetings. Results show that even simple temporal mapping can effectively support retrieval of recorded audio segments, improve retrieval performance in situations where speech recognition alone would have exhibited prohibitively high word error rates, and provide a basic form of semantic adaptation. The authors are listed in alphabetical order. 相似文献

4.

Content-based audio classification and retrieval using a fuzzy logic system: towards multimedia search engines

M. Liu C. Wan L. Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(5):357-364

In recent years, available audio corpora are rapidly increasing from fast growing Internet and digital libraries. How to classify and retrieve sound files relevant to the user's interest from large databases is crucial for building multimedia web search engines. In this paper, content-based technology has been applied to classify and retrieve audio clips using a fuzzy logic system, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips with mixed types. Two features selected from various extracted features are used as input to a constructed fuzzy inference system (FIS). The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the distributions of extracted audio features. Speech and music can thus be discriminated by the FIS. Furthermore, female and male speech can be separated by another FIS, whereas percussion can be distinguished from other music instruments. In addition, we can use multiple FISs to form a “fuzzy tree” for retrieval of more types of audio clips. With this approach, we can classify and retrieve generic audios more accurately, using fewer features and less computation time, compared to other existing approaches. 相似文献

5.

基于内容的音频检索:概念和方法 总被引：38，自引：1，他引：37

李国辉李恒峰《小型微型计算机系统》2000,21(11):1173-1177

Ｆ过去对视觉媒体的检索,如图象和视频,进行了大量的研究。但是我们注意到音频也是多媒体中的一种典型媒体,是信息的一种常用载体。常规的自理是把数字音频当成非结构化流媒体。然而音频是语音的载体、包含丰富的听觉特征,并且具有结构信息。因此需要并且可以基于这些内容对音频进行存取。本文根据当前相关研究的进展,综述基于内容的音频检索方法,包括面向语音、音乐和音频分析的检索、音频分割等;分析并总结出音频内容及其检相似文献

6.

分段式音频检索算法 总被引：3，自引：0，他引：3

郑贵滨韩纪庆李海峰郑铁然《计算机科学》2005,32(3):73-75

本文提出了一种基于分段的快速音频检索算法。该算法将检索目标划分成多个较小的片段,每个片段可以独立进行检索;检索过程中使用检索窗控制参与检索的片段及数量。该算法的速度不随检索目标的长度变化,检索速度可调,且能获得良好的查全率和查准率,适用于从未知音频数据源中检索任意长度的特定音频数据及实时应用场合。相似文献

7.

The Cambridge University Multimedia Document Retrieval Demo System

A. Tuerk S.E. Johnson P. Jourlin K. Spärck Jones P.C. Woodland 《International Journal of Speech Technology》2001,4(3-4):241-250

The Cambridge University Multimedia Document Retrieval (CU-MDR) Demo System is a web-based application that allows the user to query a database of radio broadcasts that are available on the Internet. The audio from several radio stations is downloaded and transcribed automatically. This gives a collection of text and audio documents that can be searched by a user. The paper describes how speech recognition and information retrieval techniques are combined in the CU-MDR Demo System and shows how the user can interact with it. 相似文献

8.

Relevance feedback for category search in music retrieval based on semantic concept learning

Man-Kwan Shan Meng-Fen Chiang Fang-Fei Kuo 《Multimedia Tools and Applications》2008,39(2):243-262

Traditional content-based music retrieval systems retrieve a specific music object which is similar to what a user has requested. However, the need exists for the development of category search for the retrieval of a specific category of music objects which share a common semantic concept. The concept of category search in content-based music retrieval is subjective and dynamic. Therefore, this paper investigates a relevance feedback mechanism for category search of polyphonic symbolic music based on semantic concept learning. For the consideration of both global and local properties of music objects, a segment-based music object modeling approach is presented. Furthermore, in order to discover the user semantic concept in terms of discriminative features of discriminative segments, a concept learning mechanism based on data mining techniques is proposed to find the discriminative characteristics between relevant and irrelevant objects. Moreover, three strategies, the Most-Positive, the Most-Informative, and the Hybrid, to return music objects concerning user relevance judgments are investigated. Finally, comparative experiments are conducted to evaluate the effectiveness of the proposed relevance feedback mechanism. Experimental results show that, for a database of 215 polyphonic music objects, 60% average precision can be achieved through the use of the proposed relevance feedback mechanism.

Fang-Fei KuoEmail:

相似文献

9.

A multimodal approach for extracting content descriptive metadata from lecture videos

Vidhya Balasubramanian Sooryanarayan Gobu Doraisamy Navaneeth Kumar Kanakarajan 《Journal of Intelligent Information Systems》2016,46(1):121-145

相似文献

10.

基于音频指纹的两步固定音频检索

乔立能夏秀渝叶于林《计算机系统应用》2017,26(5):266-271

提出了一种基于过零率和音频指纹的两步固定音频检索算法.在基于过零率直方图的初步检索中,采用直方图的迭代计算和动态的观测窗滑动步长来减少计算量并加快搜索速度,快速筛选出相似度较高的候选音频片段;接着基于降维Philips音频指纹对候选音频进行精检索,进一步提高检索精度.实验结果表明,该音频检索算法在保证较好的检索准确性基础上,大幅度提高了检索速度,且具有较好的鲁棒性. 相似文献

11.

Soft indexing of speech content for search in spoken documents

《Computer Speech and Language》2007,21(3):458-478

相似文献

12.

新闻视频、音频中的主题检测

陈凯江欧嘉致黄萱菁吴立德《计算机科学》2002,29(11):98-100

1.引言面对日益庞大的信息量,如何有效地检索到感兴趣的内容是至关重要的。新闻视频、音频(包括电视、广播)与文字报道相比,更为生动,表达更为丰富,但也有数据量大、难以组织、索引、检索等缺点。这主要体现在两方面; 文本有标题、段等明显的辅助标记,而视频、音频则没有。一般的浏览工具只有播放、快进、快退、拖动定位等简单手段。这对于几十、几百小时,而且还在日益增长的视频、音频数据库,是远远不能满足要求的。相似文献

13.

Let's hear it for audio mining

《Computer》2002,35(10):23-25

相似文献

14.

Sentence boundary detection in conversational speech transcripts using noisily labeled examples

Hironori Takeuchi L. Venkata Subramaniam Shourya Roy Diwakar Punjani Tetsuya Nasukawa 《International Journal on Document Analysis and Recognition》2007,10(3-4):147-155

相似文献

15.

Developing a semantic-enable information retrieval mechanism

Ming-Yen Chen Hui-Chuan Chu Yuh-Min Chen 《Expert systems with applications》2010,37(1):322-340

相似文献

16.

音频信息检索 总被引：10，自引：0，他引：10

李恒峰李国辉《计算机工程》1999,25(8):78-80

回顾了国内外现行的音频信息检索方法,分析了常见的音频数据处理技术,包括语音识别技术和基于内容的音频检索技术,提出了基于内容的音频检索的一般方法,并指出了相应研究中的关键问题。相似文献

17.

Multimedia technologies for structuring and retrieval of TV news

Yasuo Ariki 《New Generation Computing》2000,18(4):341-357

Because of the media digitization, a large amount of information such as speech, audio and video data is produced everyday. In order to retrieve data from these databases quickly and precisely, multimedia technologies for structuring and retrieving of speech, audio and video data are strongly required. In this paper, we overview the multimedia technologies such as structuring and retrieval of speech, audio and video data, speaker indexing, audio summarization and cross media retrieval existing today for TV news detabase. The main purpose of structuring is to produce tables of contents and indices from audio and video data automatically. In order to make these technologies feasible, first, processing units such as words on audio data and shots on video data are extracted. On a second step, they are meaningfully integrated into topics. Furthermore, the units extracted from different types of media are integrated for higher functions. Yasuo Ariki, Ph.D.: He is a Professor in the Department of Electronics and Informatics at the Ryukoku University. He received his B.E., M.E. and Ph.D. in information science from Kyoto University in 1974, 1976 and 1979, respectively. He had been an Assistant in Kyoto University from 1980 to 1990, and stayed at Edinburgh University as visiting academic from 1987 to 1990. His research interests are in speech and image recognition and in information retrieval and database. He is a member of IPSJ, IEICE, ASJ, Soc. Artif. Intel. and IEEE. 相似文献

18.

Precise Environmental Searches: Integrating Hierarchical Information Search with EnviroDaemon

George Chang Gunjan Samtani Marcus Healey Franz Kurfess Jason Wang 《Journal of Systems Integration》2001,10(3):253-267

Information retrieval has evolved from searches of references, to abstracts, to documents. Search on the Web involves search engines that promise to parse full-text and other files: audio, video, and multimedia. With the indexable Web at 320 million pages and growing, difficulties with locating relevant information have become apparent. The most prevalent means for information retrieval relies on syntax-based methods: keywords or strings of characters are presented to a search engine, and it returns all the matches in the available documents. This method is satisfactory and easy to implement, but it has some inherent limitations that make it unsuitable for many tasks. Instead of looking for syntactical patterns, the user often is interested in keyword meaning or the location of a particular word in a title or header. This paper describes some precise search approaches in the environmental domain that locate information according to syntactic criteria, augmented by the utilization of information in a certain context. The main emphasis of this paper lies in the treatment of structured knowledge, where essential aspects about the topic of interest are encoded not only by the individual items, but also by their relationships among each other. Examples for such structured knowledge are hypertext documents, diagrams, logical and chemical formulae. Benefits of this approach are enhanced precision and approximate search in an already focused, context-specific search engine for the environment: EnviroDaemon. 相似文献

19.

Web image retrieval using majority-based ranking approach 总被引：1，自引：0，他引：1

Gunhan Park Yunju Baek Heung-Kyu Lee 《Multimedia Tools and Applications》2006,31(2):195-219

Web image retrieval has characteristics different from typical content-based image retrieval; web images have associated textual cues. However, a web image retrieval system often yields undesirable results, because it uses limited text information such as surrounding text, URLs, and image filenames. In this paper, we propose a new approach to retrieval, which uses the image content of retrieved results without relying on assistance from the user. Our basic hypothesis is that more popular images have a higher probability of being the ones that the user wishes to retrieve. According to this hypothesis, we propose a retrieval approach that is based on a majority of the images under consideration. We define four methods for finding the visual features of majority of images; (1) majority-first method, (2) centroid-of-all method, (3) centroid-of-top K method, and (4) centroid-of-largest-cluster method. In addition, we implement a graph/picture classifier for improving the effectiveness of web image retrieval. We evaluate the retrieval effectiveness of both our methods and conventional ones by using precision and recall graphs. Experimental results show that the proposed methods are more effective than conventional keyword-based retrieval methods. 相似文献

20.

Turkish Broadcast News Transcription and Retrieval

《IEEE transactions on audio, speech, and language processing》2009,17(5):874-883

相似文献