广播语音的自动识别、标注、检索等是涉及到语音技术、自然语言处理、信息检索等多个领域的综合性课题。在介绍了广播语音的自动标注与检索的研究概况并分析了其中涉及的关键技术基础上,提出了面向普通话广播语音的多层次自动标注框架以及基于多层次标注的语音检索方案,对文档层、句子层和词语层的标注属性进行了探讨,采用了递归标注方法对属性逐层细化,并讨论了对语音自动标注至关重要的语音识别引擎和语音流分割等问题。基于本文提出的方法,对10 小时的普通话广播语音资料进行了标注和检索,得到了比较满意的实验结果。  相似文献   

This paper presents the application of a multi-scale paradigm to Chinese spoken document retrieval (SDR) for improving retrieval performance. Multi-scale refers to the use of both words and subwords for retrieval. Words are basic units in a language that carry lexical meaning, and subword units (such as phonemes, syllables or characters) are building components for words. Retrieval using subword indexing units is better than retrieval using words because of the robustness of subword units to out-of-vocabulary (OOV) words during speech recognition and ambiguities in word segmentation. Experimental results have demonstrated that subword bigrams can bring improvement in retrieval performance over words (~9.56%). Application of multi-scale fusion to SDR aims at combining the lexical information of words and the robustness of subwords. This work presents the first detailed investigation for a Cantonese broadcast news retrieval task using two different multi-scale fusion approaches: pre-retrieval fusion and post-retrieval fusion. Multi-scale retrieval using both words and syllable bigrams achieves improvement in retrieval performance (~1.90%) over retrieval on the composite scales.  相似文献   

随着我国电视台新闻制播网络化的大力发展,很多省份的电视台都对新闻制作网络系统进行了设计。主要对我国电视台新闻制播网络系统设计进行了具体分析,并对新闻直播指挥系统技术需求进行了总结,提出一种融合各种技术手段的新闻直播指挥系统方案。  相似文献   

In this paper, we introduce the Interactive Systems Laboratories multimedia data indexing and retrieval system 'View4You'. The main components of the system, namely the segmenter, the speech recognizer and the information retrieval engine, are described in detail. In the View4You system, public television newscasts are recorded on a daily basis. The newscasts are automatically segmented and an index is created for each of the segments by means of automatic speech recognition. The user can query the system in natural language. The system returns a list of segments which is sorted by relevance with respect to the user query. By selecting a segment, the user can watch the corresponding part of the news show on his or her computer screen. Several end to end evaluations on real world data, using questions from naive users, are described. By substituting each of the components of the system with a perfect (manually simulated) one, the effect of the components' imperfection on the end to end result can be determined. We show that the information retrieval component has the largest impact on the system performance, followed by the segmentation. The quality of the speech recognizer, as long as its error rate is below approximately 25%, is shown to have only a relatively small importance.  相似文献   

张红  黄泰翼  徐波 《自动化学报》2001,27(3):338-345
广播电视新闻自动记录系统是近两年国际上出现的大词汇量连续语音识别系统研究的新热点,是语音识别技术进一步走向实用化的重要过渡形式.文中介绍了目前国际上广播电视新闻自动记录系统出现的背景和发展历史,从系统性能与理论研究两方面介绍了这方面的研究现状并加以分析,最后对开发我国自己的广播电视新闻自动记录系统提出了具体的发展方案.  相似文献   

维吾尔语广播新闻连续语音敏感词检索系统   总被引:1,自引:0,他引:1  
首先介绍语音信号来源于新疆人民广播电台维吾尔语新闻的敏感词语音语料库的建设。然后用该语料库进行基于HMM的模型训练。模型训练中详细介绍识别基元端点检测、特征提取、矢量量化、码本构建、HMM模型训练过程和结果。最后用该语料库和HMM训练模型对维吾尔语广播新闻连续语音信号进行敏感词检索,并对检索结果进行分析。  相似文献   

本文针对人们通过网络点播新闻的需求,开发了面向Web的新闻视频检索系统NeWeb.该系统由Web服务系统及查询系统组成,前者与客户端交互,将客户端的检索需求传递给查询系统,并将结果以适当的形式返回给用户;后者负责组织新闻视频的内容并执行查询任务.NeWeb集成了基于内容的视频检索技术及Web技术,提高了检索效率,扩展了应用范围.  相似文献   

>维吾尔语广播新闻敏感词检索系统的研究   总被引:1,自引:0,他引:1  
维吾尔语广播新闻敏感词检索系统是以HMM为基础。在MATLAB平台上设计实现的。该系统的特点包括 1.由于维吾尔语敏感词数量不多,该系统语音语料库很小。2.由于广播新闻中的发音较为标准规范,在识别中避免了说话人发音上的不规范,这有利于语音识别系统性能的提高。3.由于选择词素为识别基元,易于识别基元端点检测。  相似文献   

介绍了NewsVideoCAR系统的构成,核心技术的基本思想和浏览界面的设计要点.  相似文献   

首先,用MATLAB开发一个敏感词检索系统;然后,用该系统对语音信号来自于新疆广播电台网站的维吾尔语新闻60分节目语音进行连续敏感词检索;最后,对识别结果进行分析并提出提高正识率的思路.  相似文献   

语言模型自适应的目的是减小模型与识别任务之间的语言差异。这些差异包括词典差异、风格和内容差异以及模型的概率分布差异。本文提出一种新的非迭代的中文新词提取方法和一种新的开放式词典的中文语言模型。基于这些技术,本文提出一个面向广播语音识别的语言模型自适应框架,该框架联合了以下技术: 一种新的非迭代的新词提取方法,一种新的中文开放式词典语言模型,一种基于困惑度( PPL) 的背景语料筛选方法和一个 N2gram 概率分布自适应模块。另外,本文还专门分析了在语言模型自适应过程中命名实体词的识别情况。实验表明,通过使用该框架,误识率相对下降了10 % ,实体词识别准确率提高了4 %。  相似文献   

该文针对新闻视频设计并实现了一个显著人脸检索系统。首先将新闻视频分割成镜头序列,利用训练好的CascadeAdaboost人脸检测器对每个镜头检测出一定数目的候选人脸,按照一些规则选取可信度高的作为样本,用于提取该镜头内的肤色模型。接着对肤色分割后的区域进行位置、大小分析和模板匹配,以淘汰非人脸区域,确定待跟踪的对象列表。为了做精确的跟踪和识别,系统对每个跟踪对象建立更细致的肤色模型。跟踪过程中每间隔一定帧数重新进行人脸检测,以减少误差积累和探测是否有新人脸出现。最后从每个人脸序列挑选最适合进行人脸识别的图像建立其特征脸空间,结合肤色信息和PCA算法判断其是否为要检索的目标人脸。  相似文献   

Personalcasting: Tailored Broadcast News   总被引:1,自引:0,他引:1  
Broadcast news sources and newspapers provide society with the vast majority of real-time information. Unfortunately, cost efficiencies and real-time pressures demand that producers, editors, and writers select and organize content for stereotypical audiences. In this article we illustrate how content understanding, user modeling, and tailored presentation generation promise personalcasts on demand. Specifically, we report on the design and implementation of a personalized version of a broadcast news understanding system, MITRE’s Broadcast News Navigator (BNN), that tracks and infers user content interests and media preferences. We report on the incorporation of Local Context Analysis to both expand the user’s original query to the most related terms in the corpus, as well as to allow the user to provide interactive feedback to enhance the relevance of selected newsstories. We describe an empirical study of the search for stories on ten topics from a video corpus. By personalizing both the selection of stories and the form in which they are delivered, we provide users with tailored broadcast news. This individual news personalization provides more fine-grained content tailoring than current personalized television program level recommenders and does not rely on externally provided program metadata.  相似文献   

We propose a mandarin Chinese singing voice synthesis system, in which hidden Markov model (HMM)-based speech synthesis technique is used. A mandarin Chinese singing voice corpus is recorded and musical contextual features are well designed for training. F0 and spectrum of singing voice are simultaneously modeled with context-dependent HMMs. There is a new problem, F0 of singing voice is always sparse because of large amount of context, i.e., tempo and pitch of note, key, time signature and etc. So the features hardly ever appeared in the training data cannot be well obtained. To address this problem, difference between F0 of singing voice and that of musical score (DF0) is modeled by a single Viterbi training. To overcome the over-smoothing of the generated F0 contour, syllable level F0 model based on discrete cosine transforms (DCT) is applied, F0 contour is generated by integrating two-level statistical models. The experimental results demonstrate that the proposed system outperforms the baseline system in both objective and subjective evaluations. The proposed system can generate a more natural F0 contour. Furthermore, the syllable level F0 model can make singing voice more expressive.   相似文献   

A number of researchers have been building high-level semantic concept detectors such as outdoors, face, building, to help with semantic video retrieval. Our goal is to examine how many concepts would be needed, and how they should be selected and used. Simulating performance of video retrieval under different assumptions of concept detection accuracy, we find that good retrieval can be achieved even when detection accuracy is low, if sufficiently many concepts are combined. We also derive suggestions regarding the types of concepts that would be most helpful for a large concept lexicon. Since our user study finds that people cannot predict which concepts will help their query, we also suggest ways to find the best concepts to use. Ultimately, this paper concludes that "concept-based" video retrieval with fewer than 5000 concepts, detected with a minimal accuracy of 10% mean average precision is likely to provide high accuracy results in broadcast news retrieval.  相似文献   

针对广播电视新闻节目中的主持人跟踪问题,提出了一种将说话人分割聚类和说话人确认有效结合的算法,并根据该算法设计了一套主持人跟踪系统.该系统首先利用音频活动检测算法去除新闻音频资料中的静音段,然后说话人分割聚类算法将多说话人语音段分成若干单一话者语段,最后通过基于 GMM-UBM 的说话人确认算法辨认每段单一话者语段的话者身份是否为目标主持人.此外,分析了 T-Norm 对系统性能的影响.以中央电视台《新闻联播》为评测数据集,实验结果表明,该算法取得了良好的效果,跟踪系统的查准率(Precision)和查全率(Recall)分别为93.03%和84.34%.  相似文献   

News Vide0CAR:一个基于内容的视频新闻节目浏览检索系统   总被引:3,自引:0,他引:3  
介绍了NewsVideoCAR系统的构成,核心技术的基本思想和浏览界面的设计要点。  相似文献   

