首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
音频信息检索综述   总被引:2,自引:0,他引:2  
随着多媒体和Internet技术的广泛应用和深入普及,多媒体数据的数量正在飞速增长,音频数据作为多媒体数据的重要组成部分,其信息量也在不断膨胀。如何有效的对音频信息进行检索成为现代信息检索的一个重要的研究领域。然而,音频数据同传统的文本数据不同,文本检索中的相关技术不能简单地应用到音频信息检索中,因为后者往往是基于语义的,所以不可避免地要用到音频数据的特征提取和模式匹配等技术。本文将对音频数据检索相关技术和系统做一综述性的介绍。  相似文献   

2.
音频信息检索   总被引:10,自引:0,他引:10  
回顾了国内外现行的音频信息检索方法,分析了常见的音频数据处理技术,包括语音识别技术和基于内容的音频检索技术,提出了基于内容的音频检索的一般方法,并指出了相应研究中的关键问题。  相似文献   

3.
Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel defined feature space. A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We apply the algorithm in Japanese–English cross-language information retrieval. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. Computational complexity is an important issue when applying KCCA to large dataset as in information retrieval. We experimentally evaluate several methods to alleviate the problem of applying KCCA to large datasets. We also investigate cross-language document classification using KCCA as well as other methods. Our results show that it is feasible to use a classifier learned in one language to classify the documents in other languages.  相似文献   

4.
The Cambridge University Multimedia Document Retrieval (CU-MDR) Demo System is a web-based application that allows the user to query a database of radio broadcasts that are available on the Internet. The audio from several radio stations is downloaded and transcribed automatically. This gives a collection of text and audio documents that can be searched by a user. The paper describes how speech recognition and information retrieval techniques are combined in the CU-MDR Demo System and shows how the user can interact with it.  相似文献   

5.
The amount of information available to information workers recently has becomeoverwhelming. This confronts information workers with two majorproblems: finding the information needed, and accessing it; they arecalled the search problem and the access problem, respectively. Asthe main result of our research an architecture is specified of anautomated tool that provides integrated support for searching andaccessing multimedia documents that may be located at arbitraryplaces. The architecture contains a database with information aboutthe documents and with thesaurus-like information. The architecturealso contains a browse mechanism and a query mechanism for inspectingthe database. In the design process of the architecture, severalfundamental questions arose, like “What is a document?”and “ What is a medium kind?”. The developed answers tosome of these questions are considered to have a general characterand thus to be useful also outside the scope of the research at hand.The paper concludes with an overview of the current status of theproject and a discussion of future work.  相似文献   

6.
个人计算机中存在大量无结构文档,从无结构文档中提取有效信息是实现语义桌面管理的一个重点和难点。而实体的识别和提取又是信息提取技术中的一个重要前提和关键步骤。本文首先提出一种利用文本线索和本体元数据来识别无结构文档中实体的方法,然后手工建立一个文档集合,在该集合上验证新方法在特定领域内的实体识别效果。  相似文献   

7.
检索一篇文档在其他语言中的译文对于双语平行语料库的建立是一件很有意义的工作。本文提出一种改进的跨语言相似文档检索算法,该算法使用双语词典或统计翻译模型作为双语知识库,查找两篇文档的共同翻译词对,把翻译词对的权重作为一种特征来进行相似度计算,用Dice方法的改进算法计算双语文档的相似度。在实验中,统计检索文档的译文排在检索结果前 N位的总次数来评价算法的性能,并使用了两个噪音数据集来评价算法的有效性。实验表明,在噪音数据干扰比较大的情况下,译文排在检索结果前5位的译文结果接近90%。实验证明,翻译词对的权重对于相似度计算有很大帮助,本算法可以有效地发现一种语言书写的文档在另一种语言中的译稿。  相似文献   

8.
局域网并行处理在语音识别中的应用   总被引:1,自引:0,他引:1  
在语音识别中,不论是训练语音识别系统,还是使用系统识别语音,都需要进行大量的数据处理,这使得语音识别的研究和实现都变的非常困难。本文提出一种基于局域网的分布式计算机系统的快速并行数据处理方法来实现语音识别的模型训练和语音的识别,不仅加 了训练和识别的速度,节约了大量的时间,而且降低了语音识别任务对硬件的要求,取得了满意的效果。  相似文献   

9.
 For small, portable devices, speech input has the advantages of low-cost and small hardware, can be used on the move or whilst the eyes & hands are busy, and is natural and quick. Rather than rely on imperfect speech recognition we propose that information entered as speech is kept as speech and suitable tools are provided to allow quick and easy access to the speech-as-data records. This paper summarises our work on the technologies needed for these tools – for organising, browsing, searching and compressing the stored speech. These technologies go a long way towards giving stored speech the characteristics of text without the associated input problems. Received: 5 March 2002 / Accepted: 1 September 2002 Nick Haddock Consultant Acknowledgements The authors would like to thank the whole HP Labs Gryphon team for their valuable contributions to this work - Mike Collins for the hierarchical chunking algorithm, Erik Geelhoed and David Frohlich for the users perspective, Richard Hull for starting off the compression work, Steve Loughran for productisation, and Dave Reynolds for his consistent advice and support. We would also like to thank our partners at Cambridge University, Steve Young and Tony Robinson, whose expertise and technology formed the foundation for this work, as well as the efforts of Kate Knill on wordspotting, Carl Seymour on compression, James Christie on recognition, and Robin Valenza whose brief excursion into the world of speech technology helped develop a simple and effective summarisation technique. Finally we would like to thank the reviewers for their many helpful comments.  相似文献   

10.
Genome resequencing with short reads generated from pyrosequencing generally relies on mapping the short reads against a single reference genome. However, mapping of reads from multiple reference genomes is not possible using a pairwise mapping algorithm. In order to align the reads w.r.t each other and the reference genomes, existing multiple sequence alignment(MSA) methods cannot be used because they do not take into account the position of these short reads with respect to the genome, and are highly inefficient for a large number of sequences. In this paper, we develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align such a large number of reads from single or multiple reference genomes. The proposed alignment algorithm accurately aligns the erroneous reads, and has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of execution time, quality of the alignments, and the ability of the algorithm to handle reads from multiple haplotypes. We report high quality multiple alignment of up to 0.5 million reads. The algorithm is shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.  相似文献   

11.
Document management inside an organization is a complex and broadly scoped problem. This paper approaches the technical and social issues of Intranet document management by developing a straightforward document lifecycle model consisting of five phases: creation, publication, organization, access, and destruction. A document management system (DMS) which encompasses these areas should also have an evaluation component so its effectiveness can be measured.The document lifecycle is visualized as a waterfall model to help explore the discrete phases of an idealized Intranet DMS. The discussion of this model pinpoints where traditional DMS have fallen short, most notably in the areas of user-to-user and user-to-evaluator communication and coordination.From the document lifecycle, we derive an agent framework to integrate technical and social considerations and guide the design, implementation, and evaluation of a flexible and efficient DMS. The lifecycle model and agent framework are useful to organize both technical and social perspectives in this area.  相似文献   

12.
Customer relationship management (CRM) is the overall process of building and maintaining profitable customer relationships by delivering superior customer value and satisfaction. A CRM strategy involves the entire enterprise and is employed on an ongoing basis. Despite the fact that CRM projects incur huge expenditures, a large percentage fails to achieve the stated objectives. Failure in CRM initiatives could be avoided if a firm's CRM strategies are intelligently linked with its employees, customers, channels, and IT infrastructure. In this paper, we focus on those linkages, particularly on the linkages between an organization's CRM strategies and its IT infrastructure. Even though the relationships between IT and business strategies have been extensively explored in the IT alignment literature, prior research has not addressed how a firm's CRM strategies are aligned with its IT infrastructure. In this paper, we investigate the issues relating to CRM-IT alignment based on an in-depth case study of a large, well-known Internet travel agency.  相似文献   

13.
This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.  相似文献   

14.
Current mobility protocols and architectures are mainly targeted to devices or applications and they usually lack the ability to support user-centric paradigms; moreover, they usually face a single aspect of the problem, i.e., terminal handover or session mobility. Full mobility support is only available to specific applications or protocols (e.g., SIP) but these approaches do not exploit all facilities for movement detection at the network/link layers and do not allow to use the same framework for different applications. This paper proposes a generic mobility framework for terminal handover and session migration. It pursues the user-centric paradigm and builds a cross-layer architecture, yielding to a high level of generality, applicability and flexibility. Unlike other approaches, it does not require any modification in correspondent peers and works with a minimal network infrastructure. Software implementations are described for two representative real-time multimedia applications, i.e., media streaming and interactive conference. The effectiveness of the framework was analyzed by means of both performance measurements in local and Internet testbeds and user evaluation during a live demo conducted at a national science exhibition.  相似文献   

15.
 In recent years, available audio corpora are rapidly increasing from fast growing Internet and digital libraries. How to classify and retrieve sound files relevant to the user's interest from large databases is crucial for building multimedia web search engines. In this paper, content-based technology has been applied to classify and retrieve audio clips using a fuzzy logic system, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips with mixed types. Two features selected from various extracted features are used as input to a constructed fuzzy inference system (FIS). The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the distributions of extracted audio features. Speech and music can thus be discriminated by the FIS. Furthermore, female and male speech can be separated by another FIS, whereas percussion can be distinguished from other music instruments. In addition, we can use multiple FISs to form a “fuzzy tree” for retrieval of more types of audio clips. With this approach, we can classify and retrieve generic audios more accurately, using fewer features and less computation time, compared to other existing approaches.  相似文献   

16.
一种通过内容和结构查询文档数据库的方法   总被引:4,自引:0,他引:4       下载免费PDF全文
文档是有一定逻辑结构的,标题、章节、段落等这些概念是文档的内在逻辑.不同的用户对文档的检索,有不同的需求,检索系统如何提供有意义的信息,一直是研究的中心任务.结合文档的结构和内容,对结构化文件的检索,提出了一种新的计算相似度的方法.这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节.基于这种方法实现了一个问题回答系统,测试集是微软的百科全书Encarta,通过与传统方法实验比较,证明通过这种方法检索的文章片断更合理、更有效.  相似文献   

17.
    
A typical spoken content retrieval solution integrates multiple technologies that belong to the areas of automatic speech recognition and information retrieval. Due to the rich set of challenges – many of them language specific – as well as widespread impact, numerous research sites in the world are actively engaged in this research area. This special issue highlights some of the recent advances in spoken content retrieval.  相似文献   

18.
    
This paper proposes an efficient speech data selection technique that can identify those data that will be well recognized. Conventional confidence measure techniques can also identify well-recognized speech data. However, those techniques require a lot of computation time for speech recognition processing to estimate confidence scores. Speech data with low confidence should not go through the time-consuming recognition process since they will yield erroneous spoken documents that will eventually be rejected. The proposed technique can select the speech data that will be acceptable for speech recognition applications. It rapidly selects speech data with high prior confidence based on acoustic likelihood values and using only speech and monophone models. Experiments show that the proposed confidence estimation technique is over 50 times faster than the conventional posterior confidence measure while providing equivalent data selection performance for speech recognition and spoken document retrieval.  相似文献   

19.
In this paper we introduce a dynamic programming algorithm which performs linear text segmentation by global minimization of a segmentation cost function which incorporates two factors: (a) within-segment word similarity and (b) prior information about segment length. We evaluate segmentation accuracy of the algorithm by precision, recall and Beeferman's segmentation metric. On a segmentation task which involves Choi's text collection, the algorithm achieves the best segmentation accuracy so far reported in the literature. The algorithm also achieves high accuracy on a second task which involves previously unused texts.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号