共查询到20条相似文献,搜索用时 31 毫秒
1.
当前的语音识别模型在英语、法语等表音文字中已取得很好的效果。然而,汉语是一种典型的表意文字,汉字与语音没有直接的对应关系,但拼音作为汉字读音的标注符号,与汉字存在相互转换的内在联系。因此,在汉语语音识别中利用拼音作为解码时的约束,可以引入一种更接近语音的归纳偏置。该文基于多任务学习框架,提出一种基于拼音约束联合学习的汉语语音识别方法,以端到端的汉字语音识别为主任务,以拼音语音识别为辅助任务,通过共享编码器,同时利用汉字与拼音识别结果作为监督信号,增强编码器对汉语语音的表达能力。实验结果表明,相比基线模型,该文提出的方法取得了更优的识别效果,词错误率降低了2.24%。 相似文献
2.
Nora Barroso Karmele López de?Ipi?a Odei Barroso Aitzol Ezeiza Carmen Hernández Manuel Gra?a 《International Journal of Speech Technology》2012,15(1):33-40
This work, divided into Part I and II, describes the development of GorUP a Semantic Speech Recognition System in the Basque
context. Part I analyses cross-lingual approaches oriented to under-resourced languages and Part II the development of the
Language Identification system. During the development, data optimization methods and Soft Computing methodologies oriented
to complex environment are used in order to overcome the lack of resources. Moreover, in this context three languages coexist:
French, Spanish and Basque. Indeed our main goal is the development of robust Automatic Speech Recognition (ASR) systems for
Basque, but all language variability has to be analyzed. In this regard, Basque speakers mix during the speech not only sounds
but also words of the three languages which results in a strong presence of cross-lingual elements. Besides, Basque is an
agglutinative language with a special morpho-syntactic structure inside the words that may lead to intractable vocabularies.
Nowadays, our work is oriented to Information Retrieval and mainly to small internet mass-media. In these cases the available
resources for Basque in general, and for this task in particular, are very few and complex to process because of the noisy
environment. Thus, the methods employed in this development (ontology-based approach or cross-lingual methodologies oriented
to profit from more powerful languages) could suit the requirements of many under-resourced languages. 相似文献
3.
《IEEE transactions on audio, speech, and language processing》2009,17(8):1471-1482
4.
5.
6.
Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator’s voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator’s speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator’s speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people. 相似文献
7.
Massive Open Online Courses (MOOCs) are becoming an essential source of information for both students and teachers. Noticeably, MOOCs have to adapt to the fast development of new technologies; they also have to satisfy the current generation of online students. The current MOOCs’ Management Systems, such as Coursera, Udacity, edX, etc., use content management platforms where content are organized in a hierarchical structure. We envision a new generation of MOOCs that support interpretability with formal semantics by using the SemanticWeb and the online social networks. Semantic technologies support more flexible information management than that offered by the current MOOCs’ platforms. Annotated information about courses, video lectures, assignments, students, teachers, etc., can be composed from heterogeneous sources, including contributions from the communities in the forum space. These annotations, combined with legacy data, build foundations for more efficient information discovery in MOOCs’ platforms. In this article we review various Collaborative Semantic Filtering technologies for building Semantic MOOCs’ management system, then, we present a prototype of a semantic middle-sized platform implemented at Western Kentucky University that answers these aforementioned requirements. 相似文献
8.
The need for content-based access to image and video information from media archives has captured the attention of researchers in recent years. Research efforts have led to the development of methods that provide access to image and video data. These methods have their roots in pattern recognition. The methods are used to determine the similarity in the visual information content extracted from low level features. These features are then clustered for generation of database indices. This paper presents a comprehensive survey on the use of these pattern recognition methods which enable image and video retrieval by content. 相似文献
9.
交通监控视频中车辆的有效检测和实时跟踪,是车辆行为分析和识别的前提,也是智能交通系统(ITS)
的核心内容和关键技术。本文在深入分析当前车辆属性识别方法以及车辆视频检索关键技术的基础上,结合交通监控
视频的自身特点,从应用的角度出发,设计一款融合车牌、车身颜色、车型等多个车辆外观属性进行层次识别的机动车辆
检索系统。该系统可为用户提供多种查询方式,从而实现交通监控视频中相关机动车辆的准确检索。 相似文献
10.
This paper concentrates on the problem of designing and developing a spoken query retrieval (SQR) system to access large document
databases via voice. The main challenge is to identify and address issues related to the adaptation and scalability of integrating
automatic speech recognition (ASR) and information retrieval (IR). In this paper, a Context Aware Language Model (CALM) framework
allowing information retrieval to large document databases via voice is presented and findings from a research study using
the framework will be discussed as well. 相似文献
11.
Automatic Speaker Recognition (ASR) refers to the task of identifying a person based on his or her voice with the help of
machines. ASR finds its potential applications in telephone based financial transactions, purchase of credit card and in forensic
science and social anthropology for the study of different cultures and languages. Results of ASR are highly dependent on
database, i.e., the results obtained in ASR are meaningless if recording conditions are not known. In this paper, a methodology
and a typical experimental setup used for development of corpora for various tasks in the text-independent speaker identification
in different Indian languages, viz., Marathi, Hindi, Urdu and Oriya have been described. Finally, an ASR system is presented
to evaluate the corpora. 相似文献
12.
Lian Tze Lim Lay-Ki Soon Tek Yong Lim Enya Kong Tang Bali Ranaivo-Malançon 《Language Resources and Evaluation》2014,48(3):479-492
Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collection. We show how multilingual lexicons with under-resourced languages can be constructed using simple bilingual translation lists, which are more readily available. The prototype multilingual lexicon developed comprise six member languages: English, Malay, Chinese, French, Thai and Iban, the last of which is an under-resourced language in Borneo. Quick evaluations showed that 91.2 % of 500 random multilingual entries in the generated lexicon require minimal or no human correction. 相似文献
13.
14.
InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval 总被引:2,自引:0,他引:2
Xingquan Zhu Elmagarmid A.K. Xiangyang Xue Lide Wu Catlin A.C. 《Multimedia, IEEE Transactions on》2005,7(4):648-666
Hierarchical video browsing and feature-based video retrieval are two standard methods for accessing video content. Very little research, however, has addressed the benefits of integrating these two methods for more effective and efficient video content access. In this paper, we introduce InsightVideo, a video analysis and retrieval system, which joins video content hierarchy, hierarchical browsing and retrieval for efficient video access. We propose several video processing techniques to organize the content hierarchy of the video. We first apply a camera motion classification and key-frame extraction strategy that operates in the compressed domain to extract video features. Then, shot grouping, scene detection and pairwise scene clustering strategies are applied to construct the video content hierarchy. We introduce a video similarity evaluation scheme at different levels (key-frame, shot, group, scene, and video.) By integrating the video content hierarchy and the video similarity evaluation scheme, hierarchical video browsing and retrieval are seamlessly integrated for efficient content access. We construct a progressive video retrieval scheme to refine user queries through the interactions of browsing and retrieval. Experimental results and comparisons of camera motion classification, key-frame extraction, scene detection, and video retrieval are presented to validate the effectiveness and efficiency of the proposed algorithms and the performance of the system. 相似文献
15.
16.
随着大数据时代的到来,各种音频、视频文件日益增多,如何高效地定位关键敏感信息具有非常重要的研究意义。目前研究人员对针对英语和汉语的语音检索技术进行了深入的研究,而针对维吾尔语的语音检索技术还处于起步阶段。该文对维吾尔语语音关键词检索技术进行了研究并采用了大词汇量连续语音识别、利用聚类算法将多候选词图转换为混淆网络、倒排索引、置信度以及相关度的计算等技术和方法,对维吾尔语语音检索系统进行了研究与搭建。最后在测试集上对该系统进行测试,测试结果显示,在语音识别正确率为82.1%的情况下,检索系统的召回率分别达到97.0%和79.1%时,虚警率分别为13.5%和8.5%。 相似文献
17.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
18.
视频字幕检索是视频检索领域的重要部分。随着OCR技术的不断完善,视频字幕检索算法也取得了很多重大突破,然而在检索效果提升的同时,视频包含的大量图像、文字信息使数据处理成为制约字幕提取的性能瓶颈。众核架构高性能协处理器近年发展迅猛,为高性能计算研究打下了良好的硬件基础。将Intel众核MIC应用到视频字幕提取中,选用OpenMP并行语言进行加速。通过在Intel Xeon Phi 7110P进行测试,获得了比较理想的加速比。 相似文献
19.
20.
随着网络视频的迅猛发展和广泛使用,网络不良视频的识别和过滤日益重要.通过对图像内容识别与过滤、视频结构分析与检索两个领域技术发展的分析,阐述了一种综合利用视频时域分割、关键帧提取、图像内容识别及皮肤检测等视频分析方面关键技术的解决方法.该方法简单,并且容易实现.此外,介绍了目前网络视频内容识别和过滤的研究现状和主要应用,分析了其面对的主要问题及未来发展趋势. 相似文献