首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 687 毫秒
1.
简要分析连续语音识别技术原理,介绍了语音识别网格构建海量多媒体新闻素材检索系统,该技术显著提升了多媒体新闻制播体系的素材资产化水平,为视音频媒体的多媒体内容资源检索带来了革命性变化。以中国国际广播电台(China Radio International,CRI)为例,描述了语音识别网格技术所带来的实际应用效果。  相似文献   

2.
基于多媒体融合的图像检索系统的实现   总被引:1,自引:0,他引:1  
提出一种融合了文本、语音、图像等信息特征的图像检索方案,并以MATLAB为平台构建了一种基于语音识别技术的新型图像检索系统.与基于文本或基于内容的图像检索系统相比,该系统既提高了检索性能,又使得人机交互更加便利.  相似文献   

3.
随着多媒体技术和因特网的发展,基于内容的图像检索已经成为多媒体处理中的关键技术。本文在单一纹理特征检索的基础上提出了利用综合纹理和像素中心两个特征共同进行检索的方法。对真实图像库的检索实验表明.综合特征检索比单一特征检索更符合检索要求,因此能取得更好的检索效果。  相似文献   

4.
综合语义与颜色特征的图像检索技术研究   总被引:2,自引:2,他引:0  
针对多媒体搜索引擎系统中的图像检索技术,本文提出了应用图像的高层语义特征和底层颜色特征作为图像检索的综合指标,将图像文本和视觉信息融合起来,给出了一种综合语义和颜色特征的图像检索系统的体系架构.以填补多媒体底层特征和高层语义之间的差异,并在此基础上提出了相关算法,使图像检索能够满足用户的需求.提高图像检索的效率和精度。  相似文献   

5.
嵌入式系统正逐渐成为语音识别实际应用的首选平台。该文在嵌入式平台上研究HMM连续语音识别的计算复杂度要素,提出特征系数屏蔽方法和综合剪枝相结合的瘦身计算方法,降低计算复杂度并保持识别率。该方法在嵌入式平台上研究的实验数据表明,HMM连续语音识别瘦身系统与基线系统相比,计算时间从基线系统的100%降低到27.91%,识别率仅从基线系统的89.65%下降到89.41%。  相似文献   

6.
张飞宇 《电子科技》2012,25(10):43-45,48
在线协同学习平台中,为方便用户更加快捷地查找网络中的多媒体文件,完成基于教学视频内容信息检索,视频语音识别是重要环节。教学视频语音识别系统是基于隐马尔可夫模型下语音识别的一个实例,旨在实现教学音/视频文件中文字提取的功能,具有重要的应用价值。文中对语音识别系统的应用软件进行了需求分析,通过其相关功能的性能测试结果表明,该系统实现并展示了将视频中的音频信息文字转换的过程。  相似文献   

7.
运用TMS320C5416实现了语音自动识别装置。该装置利用一种新的语音信号r阶的倒谱线性回归系数等参数构成识别的特征矢量集,运用模糊矢量量化技术实现了特定人的语音识别。实验结果表明该系统具有识别精度高、识别速度快等特点.是一种语音自动识别装置的有效的硬件实现方案。  相似文献   

8.
檀蕊莲 《信息技术》2010,34(8):103-104
说话人识别是语音识别的一种特殊方式,其目的不是识别语音内容,而是识别说话人是谁,即从语音信号中提取个人特征。采用矢量量化(VQ)可避免困难的语音分段问题和时间归整问题,且作为一种数据压缩手段可大大减少系统所需的数据存储量。通过说话人识别相关技术的研究,提出并设计了一个基于VQ的说话人识别系统,实验证明,当用于训练的数据量较小时,可以得到比较稳定的识别性能。  相似文献   

9.
为了方便学生使用中文或英文说出学号与名字登录注册系统,设计了中英文数字语音登录系统。采用MFCC(Mel频率倒谱系数)作为语音特征参数,在隐马尔可夫模型HMM(HiddenMarkovModel)框架下建立了基于语音识别开发工具包HTK的中英文连续数字语音识别系统,包括对语音信号的预处理、特征参数的提取,识别模版的训练,最后送到识别器进行识别。通过建立中文、英文和中英文混合训练集和测试集声学模型,并得到了较高的识别率,从而加强多媒体注册系统的稳定性和鲁棒性。  相似文献   

10.
为完成对河北广播电视台广播在播节目按照节目单内容进行保存及编目检索的需求,设计建设一套安全、高效的广播媒资系统,可以对广播节目进行编目、存储,方便查询调用。通过对主备服务器及存储阵列、交换机的冗余设置,可保证数据的安全及系统的稳定运行,通过阿里云语音识别服务,可以进行初编目,设置网闸只让音频文件通过,更好的保障了安全、方便的利用媒资系统的内容。实践表明:该方案设计合理,运行稳定,能满足按照节目单保存音频内容、编目检索及利用语音识别初编目的需求。  相似文献   

11.
Rapid increase in the amount of the digital audio collections presenting various formats, types, durations and other parameters that the digital multimedia world refers demands a generic framework for robust and efficient indexing and retrieval based on the aural content. Moreover, from the content-based multimedia retrieval point of view, the audio information can be even more important than the visual part as it is mostly unique and significantly stable within the entire duration of the content. A generic and robust audio-based multimedia indexing and retrieval framework, which has been developed and tested under the MUVIS system, is presented. This framework supports the dynamic integration of the audio feature extraction modules during the indexing and retrieval phases and therefore provides a test-bed platform for developing robust and efficient aural feature extraction techniques. Furthermore, the proposed framework is designed based on the high-level content classification and segmentation in order to improve the speed and accuracy of the aural retrievals. Both theoretical and experimental results are finally presented, including the comparative measures of retrieval performance with respect to the visual counterpart.  相似文献   

12.
This paper describes an indexing system that automatically creates metadata for multimedia broadcast news content by integrating audio, speech, and visual information. The automatic multimedia content indexing system includes acoustic segmentation (AS), automatic speech recognition (ASR), topic segmentation (TS), and video indexing features. The new spectral-based features and smoothing method in the AS module improved the speech detection performance from the audio stream of the input news content. In the speech recognition module, automatic selection of acoustic models achieved both a low WER, as with parallel recognition using multiple acoustic models, and fast recognition, as with the single acoustic model. The TS method using word concept vectors achieved more accurate results than the conventional method using local word frequency vectors. The information integration module provides the functionality of integrating results from the AS module, TS module, and SC module. The story boundary detection accuracy was improved by combining it with the AS results and the SC results compared to the sole TS results  相似文献   

13.
张小博  蒋铭 《电视技术》2015,39(13):36-39
针对目前媒资管理系统依赖编目信息进行检索而出现的编目信息难以覆盖媒资数据的所有语义内容、由于人的理解不同而导致的编目信息不一致、媒资编目费力费时等问题,设计了不依赖编目信息的基于全文检索、语音识别、人脸识别、关键帧提取等的智能媒资检索系统,对媒资内容自动分析、媒资特征索引、媒资特征检索进行了阐述,并采用基于B/S的分布式架构进行了实现.结果证明,该方案设计具有较高的可靠性和稳定性,在媒资管理中得到了良好的应用.  相似文献   

14.
The fundamentals of speech recognition are reviewed. The dimensions of the speech recognition task, speech feature analysis, pattern classification using hidden Markov models, language processing, and the current accuracy of speech recognition systems are discussed. The applications of speech recognition in telecommunications, voice dictation, speech understanding for data retrieval, and consumer products are examined  相似文献   

15.
Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.  相似文献   

16.
With the rapid development of social media platforms, huge amount of user generated contents (UGC) are generated ceaselessly. In recent years, content based microblog retrieval has attracted extensive research attention. Effective microblog retrieval services complex analysis of short text and multimedia contents. In this paper, we present a quality biased multimedia microblog retrieval framework. First, we develop an anchor graph based multiview embedding framework which maps the multimedia content features into a unified latent space. Then, the content matching scores of testing microblogs related to the query are obtained by a Markov random field. Further, we employ an quality model to incorporate both microblog quality and content matching. As compared with the state-of-art methods, experimental results demonstrate the effectiveness of the proposed approach.  相似文献   

17.
视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将'VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上。  相似文献   

18.
19.
In this article, we propose a novel system for feature selection, which is one of the key problems in content-based image indexing and retrieval as well as various other research fields such as pattern classification and genomic data analysis. The proposed system aims at enhancing semantic image retrieval results, decreasing retrieval process complexity, and improving the overall system usability for end-users of multimedia search engines. Three feature selection criteria and a decision method construct the feature selection system. Two novel feature selection criteria based on inner-cluster and intercluster relations are proposed in the article. A majority voting-based method is adapted for efficient selection of features and feature combinations. The performance of the proposed criteria is assessed over a large image database and a number of features, and is compared against competing techniques from the literature. Experiments show that the proposed feature selection system improves semantic performance results in image retrieval systems. This work was supported by the Academy of Finland, Project No. 213,462 (Finnish Centre of Excellence Program 2006–2011).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号