首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
当前的语音识别模型在英语、法语等表音文字中已取得很好的效果。然而,汉语是一种典型的表意文字,汉字与语音没有直接的对应关系,但拼音作为汉字读音的标注符号,与汉字存在相互转换的内在联系。因此,在汉语语音识别中利用拼音作为解码时的约束,可以引入一种更接近语音的归纳偏置。该文基于多任务学习框架,提出一种基于拼音约束联合学习的汉语语音识别方法,以端到端的汉字语音识别为主任务,以拼音语音识别为辅助任务,通过共享编码器,同时利用汉字与拼音识别结果作为监督信号,增强编码器对汉语语音的表达能力。实验结果表明,相比基线模型,该文提出的方法取得了更优的识别效果,词错误率降低了2.24%。  相似文献   

2.
This work, divided into Part I and II, describes the development of GorUP a Semantic Speech Recognition System in the Basque context. Part I analyses cross-lingual approaches oriented to under-resourced languages and Part II the development of the Language Identification system. During the development, data optimization methods and Soft Computing methodologies oriented to complex environment are used in order to overcome the lack of resources. Moreover, in this context three languages coexist: French, Spanish and Basque. Indeed our main goal is the development of robust Automatic Speech Recognition (ASR) systems for Basque, but all language variability has to be analyzed. In this regard, Basque speakers mix during the speech not only sounds but also words of the three languages which results in a strong presence of cross-lingual elements. Besides, Basque is an agglutinative language with a special morpho-syntactic structure inside the words that may lead to intractable vocabularies. Nowadays, our work is oriented to Information Retrieval and mainly to small internet mass-media. In these cases the available resources for Basque in general, and for this task in particular, are very few and complex to process because of the noisy environment. Thus, the methods employed in this development (ontology-based approach or cross-lingual methodologies oriented to profit from more powerful languages) could suit the requirements of many under-resourced languages.  相似文献   

3.
This paper presents our work in automatic speech recognition (ASR) in the context of under-resourced languages with application to Vietnamese. Different techniques for bootstrapping acoustic models are presented. First, we present the use of acoustic–phonetic unit distances and the potential of crosslingual acoustic modeling for under-resourced languages. Experimental results on Vietnamese showed that with only a few hours of target language speech data, crosslingual context independent modeling worked better than crosslingual context dependent modeling. However, it was outperformed by the latter one, when more speech data were available. We concluded, therefore, that in both cases, crosslingual systems are better than monolingual baseline systems. The proposal of grapheme-based acoustic modeling, which avoids building a phonetic dictionary, is also investigated in our work. Finally, since the use of sub-word units (morphemes, syllables, characters, etc.) can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling for under-resourced languages, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. The proposed lattice combination scheme results in a relative syllable error rate reduction of 6.6% over the sentence MAP baseline method for a Vietnamese ASR task.   相似文献   

4.
5.
6.
Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator’s voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator’s speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator’s speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.  相似文献   

7.
Massive Open Online Courses (MOOCs) are becoming an essential source of information for both students and teachers. Noticeably, MOOCs have to adapt to the fast development of new technologies; they also have to satisfy the current generation of online students. The current MOOCs’ Management Systems, such as Coursera, Udacity, edX, etc., use content management platforms where content are organized in a hierarchical structure. We envision a new generation of MOOCs that support interpretability with formal semantics by using the SemanticWeb and the online social networks. Semantic technologies support more flexible information management than that offered by the current MOOCs’ platforms. Annotated information about courses, video lectures, assignments, students, teachers, etc., can be composed from heterogeneous sources, including contributions from the communities in the forum space. These annotations, combined with legacy data, build foundations for more efficient information discovery in MOOCs’ platforms. In this article we review various Collaborative Semantic Filtering technologies for building Semantic MOOCs’ management system, then, we present a prototype of a semantic middle-sized platform implemented at Western Kentucky University that answers these aforementioned requirements.  相似文献   

8.
The need for content-based access to image and video information from media archives has captured the attention of researchers in recent years. Research efforts have led to the development of methods that provide access to image and video data. These methods have their roots in pattern recognition. The methods are used to determine the similarity in the visual information content extracted from low level features. These features are then clustered for generation of database indices. This paper presents a comprehensive survey on the use of these pattern recognition methods which enable image and video retrieval by content.  相似文献   

9.
交通监控视频中车辆的有效检测和实时跟踪,是车辆行为分析和识别的前提,也是智能交通系统(ITS) 的核心内容和关键技术。本文在深入分析当前车辆属性识别方法以及车辆视频检索关键技术的基础上,结合交通监控 视频的自身特点,从应用的角度出发,设计一款融合车牌、车身颜色、车型等多个车辆外观属性进行层次识别的机动车辆 检索系统。该系统可为用户提供多种查询方式,从而实现交通监控视频中相关机动车辆的准确检索。  相似文献   

10.
This paper concentrates on the problem of designing and developing a spoken query retrieval (SQR) system to access large document databases via voice. The main challenge is to identify and address issues related to the adaptation and scalability of integrating automatic speech recognition (ASR) and information retrieval (IR). In this paper, a Context Aware Language Model (CALM) framework allowing information retrieval to large document databases via voice is presented and findings from a research study using the framework will be discussed as well.  相似文献   

11.
Automatic Speaker Recognition (ASR) refers to the task of identifying a person based on his or her voice with the help of machines. ASR finds its potential applications in telephone based financial transactions, purchase of credit card and in forensic science and social anthropology for the study of different cultures and languages. Results of ASR are highly dependent on database, i.e., the results obtained in ASR are meaningless if recording conditions are not known. In this paper, a methodology and a typical experimental setup used for development of corpora for various tasks in the text-independent speaker identification in different Indian languages, viz., Marathi, Hindi, Urdu and Oriya have been described. Finally, an ASR system is presented to evaluate the corpora.  相似文献   

12.
Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collection. We show how multilingual lexicons with under-resourced languages can be constructed using simple bilingual translation lists, which are more readily available. The prototype multilingual lexicon developed comprise six member languages: English, Malay, Chinese, French, Thai and Iban, the last of which is an under-resourced language in Borneo. Quick evaluations showed that 91.2  % of 500 random multilingual entries in the generated lexicon require minimal or no human correction.  相似文献   

13.
阐述一种新颖的新闻视频结构化浏览和标注系统。应用基于时空切片分析的新闻主播检测方法和基于颜色直方图的镜头分割方法实现新闻视频的结构化。通过自动语音识别技术和特定语义概念模型的建立实现了对主播场景的文本信息标注和对新闻故事镜头的语义概念标注。该系统有利于用户根据个人爱好进行新闻视频的浏览和编辑,有效实现新闻视频的索引和浏览。  相似文献   

14.
Hierarchical video browsing and feature-based video retrieval are two standard methods for accessing video content. Very little research, however, has addressed the benefits of integrating these two methods for more effective and efficient video content access. In this paper, we introduce InsightVideo, a video analysis and retrieval system, which joins video content hierarchy, hierarchical browsing and retrieval for efficient video access. We propose several video processing techniques to organize the content hierarchy of the video. We first apply a camera motion classification and key-frame extraction strategy that operates in the compressed domain to extract video features. Then, shot grouping, scene detection and pairwise scene clustering strategies are applied to construct the video content hierarchy. We introduce a video similarity evaluation scheme at different levels (key-frame, shot, group, scene, and video.) By integrating the video content hierarchy and the video similarity evaluation scheme, hierarchical video browsing and retrieval are seamlessly integrated for efficient content access. We construct a progressive video retrieval scheme to refine user queries through the interactions of browsing and retrieval. Experimental results and comparisons of camera motion classification, key-frame extraction, scene detection, and video retrieval are presented to validate the effectiveness and efficiency of the proposed algorithms and the performance of the system.  相似文献   

15.
16.
随着大数据时代的到来,各种音频、视频文件日益增多,如何高效地定位关键敏感信息具有非常重要的研究意义。目前研究人员对针对英语和汉语的语音检索技术进行了深入的研究,而针对维吾尔语的语音检索技术还处于起步阶段。该文对维吾尔语语音关键词检索技术进行了研究并采用了大词汇量连续语音识别、利用聚类算法将多候选词图转换为混淆网络、倒排索引、置信度以及相关度的计算等技术和方法,对维吾尔语语音检索系统进行了研究与搭建。最后在测试集上对该系统进行测试,测试结果显示,在语音识别正确率为82.1%的情况下,检索系统的召回率分别达到97.0%和79.1%时,虚警率分别为13.5%和8.5%。  相似文献   

17.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

18.
视频字幕检索是视频检索领域的重要部分。随着OCR技术的不断完善,视频字幕检索算法也取得了很多重大突破,然而在检索效果提升的同时,视频包含的大量图像、文字信息使数据处理成为制约字幕提取的性能瓶颈。众核架构高性能协处理器近年发展迅猛,为高性能计算研究打下了良好的硬件基础。将Intel众核MIC应用到视频字幕提取中,选用OpenMP并行语言进行加速。通过在Intel Xeon Phi 7110P进行测试,获得了比较理想的加速比。  相似文献   

19.
朱成军  李超  熊璋 《计算机工程》2007,33(10):218-219
视频中的文本提供了描述视频内容的有用信息,对于构建基于高级语义的多媒体检索系统具有重要作用。该文从视频文本的特点出发,分析了视频文本检测和识别的各种技术方法及优缺点,以及该领域国内外的发展现状和下一步研究的重点方向。  相似文献   

20.
随着网络视频的迅猛发展和广泛使用,网络不良视频的识别和过滤日益重要.通过对图像内容识别与过滤、视频结构分析与检索两个领域技术发展的分析,阐述了一种综合利用视频时域分割、关键帧提取、图像内容识别及皮肤检测等视频分析方面关键技术的解决方法.该方法简单,并且容易实现.此外,介绍了目前网络视频内容识别和过滤的研究现状和主要应用,分析了其面对的主要问题及未来发展趋势.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号