首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BIC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using $Delta BIC$ , a widely-adopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnan's window-growing-based approach, Siegler 's fixed-size sliding window approach, and Delacourt and Wellekens's DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy.   相似文献   

2.
3.
针对广播电视新闻节目中的主持人跟踪问题,提出了一种将说话人分割聚类和说话人确认有效结合的算法,并根据该算法设计了一套主持人跟踪系统.该系统首先利用音频活动检测算法去除新闻音频资料中的静音段,然后说话人分割聚类算法将多说话人语音段分成若干单一话者语段,最后通过基于 GMM-UBM 的说话人确认算法辨认每段单一话者语段的话者身份是否为目标主持人.此外,分析了 T-Norm 对系统性能的影响.以中央电视台《新闻联播》为评测数据集,实验结果表明,该算法取得了良好的效果,跟踪系统的查准率(Precision)和查全率(Recall)分别为93.03%和84.34%.  相似文献   

4.
In this paper we proposed two-stage segmentation approach for splitting the TV broadcast news bulletins into sequence of news stories and codebooks derived from vector quantization are used for retrieving the segmented stories. At the first stage of segmentation, speaker (news reader) specific characteristics present in initial headlines of news bulletin are used for gross level segmentation. During second stage, errors in the gross level segmentation (first stage) are corrected by exploiting the speaker specific information captured from the individual news stories other than headlines. During headlines the captured speaker specific information is mixed with background music, and hence the segmentation at the first stage may not be accurate. In this work speaker specific information is represented by using mel frequency cepstral coefficients, and captured by Gaussian mixture models (GMMs). The proposed two-stage segmentation method is evaluated on manual segmented broadcast TV news bulletins. From the evaluation results, it is observed that about 93 % of the news stories are correctly segmented, 7 % are missed and 6 % are spurious. For navigating the bulletins, a quick navigation indexing method is developed based on speaker change points. Performance of the proposed two-stage segmentation and quick navigation methods are evaluated using GMM and neural networks models. For retrieving the target news stories from news corpus, sequence of codebook indices derived from vector quantization is explored. Proposed retrieval approach is evaluated using queries of different sizes. Evaluation results indicating that the retrieval accuracy is proportional to size of the query.  相似文献   

5.
6.
7.
8.
利用EHMM和CLR的说话人分割聚类算法   总被引:1,自引:0,他引:1  
针对传统的说话人分割聚类系统中,由于聚类时话者信息不足而影响切分准确度的问题,本文提出了一种基于进化隐马尔科夫模型和交叉对数似然比距离测度的多层次说话人分割聚类算法,在传统的话者分割聚类算法的基础上引入了重分割和重聚类的机制,以及基于距离测度和贝叶斯信息准则的分层聚类算法,有效的解决了传统方法中切分准确度受到话者信息制约的问题.在美国国家标准技术署(NIST)2003 Spring RT数据库上的实验结果表明,本文提出的算法比传统算法系统性能相对提高了41%.  相似文献   

9.
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.  相似文献   

10.
对新闻播报节目进行自动主题划分,可以有效地组织和利用新闻播报类数据。目前自动故事单元划分的研究以视频数据为主,音频的语音识别文本中包含丰富的语义信息,同时声音事件的转换也可以提供很多重要信息,能够有效的进行基于语义的主题划分。根据这些信息,该文提出了一种基于规则的多信息融合的方法,利用切分点邻域的音频类型信息来修正使用语义信息的切分结果,完成主题划分。实验表明根据规则进行特征融合后,新闻节目主题划分的F-估值为64.8%,错误概率Pk和WindowDiff分别达到18.3%和24.5%。  相似文献   

11.
12.
13.
This paper addresses an integrated information mining techniques for broadcasting TV-news. This utilizes technique from the fields of acoustic, image, and video analysis, for information on news story title, newsman and scene identification. The goal is to construct a compact yet meaningful abstraction of broadcast TV-news, allowing users to browse through large amounts of data in a non-linear fashion with flexibility and efficiency. By adding acoustic analysis, a news program can be partitioned into news and commercial clips, with 90% accuracy on a data set of 400 h TV-news recorded off the air from July 2005 to August 2006. By applying speaker identification and/or image detection techniques, each news stories can be segmented with a better accuracy of 95.92%. On-screen captions or subtitles are recognized by OCR techniques to produce the text title of each news stories. The extracted title words can be used to link or to navigate more related news contents on the WWW. In cooperation with facial and scene analysis and recognition techniques, OCR results can provide users with multimodal query on specific news stories. Some experimental results are presented and discussed for the system reliability, performance evaluation and comparison.  相似文献   

14.
说话人聚类是说话人分离中的一个重要过程,然而传统的以贝叶斯信息准则作为距离测度的层次聚类方式,会出现聚类误差向上传递的情况。本文提出了一种逐级算法增强处理机制。当片段之间的最小贝叶斯信息准则距离超过设定的门限值时,或者类别个数到达一定程度时,将当前聚类结果作为初始类中心,通过变分贝叶斯迭代法重新对每个类别中的片段调优,最后再依据概率线性判别分析得分门限确定说话人个数。实验表明,本文方法在美国国家标准技术署08 summed测试集上,使得“类纯度”和“说话人纯度”比传统算法都有了一定提升,且使得说话人分离整体性能相对提升了27.6%。  相似文献   

15.

TV news channels present rich and complete experience of various events through audio-visual content. This makes television news an influential medium to affect masses and thus persuaded various social scientists and regulators to monitor and analyze the content of broadcast videos. An organized archive of newscast is a prerequisite for any such analysis. Creating such archive requires segmentation of continuous news videos into suitable logical units. Based on the application, these logical units may be one of channel content obtained after advertisement removal, different shows, news stories or video shots. In this work, we propose an end to end system with software architecture for segmenting the TV broadcast videos at all these four granularities. The videos are segmented into shots. Video shots are used as basic unit for all further processing. Video shots are first subjected to advertisement detection and removal to obtain the non-commercial channel content. This channel content is further processed to identify various program boundaries. We propose to identify three types of shows based on the presentation format viz. news bulletins, interviews and debates. News bulletins so obtained are processed further to obtain news stories. We propose a modular and scalable framework and software architecture for the broadcast segmentation system for deployment on a computation cluster. This involves scheduler based recording module and broadcast segmentation module. We have presented the detailed software architecture for individual modules, automation of entire processing pipeline along with resource and database management systems. We have implemented and verified the software architecture by deploying the proposed system on a cluster of nine desktops and one workstation. The deployed system was used for round the clock processing of three Indian English news channels.

  相似文献   

16.
Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions.  相似文献   

17.
提出了一种基于新闻视频中的标题字幕信息和音视频特征对新闻事件进行分割的方法,并实现了一个新闻事件分割、浏览和检索的原型系统。提出的方法综合利用新闻视频中的标题检测、主持人画面检测以及静音片段和语者切换检测技术分割整段新闻中的新闻事件。实验结果表明,该方法较仅利用标题的新闻事件分割方法在分割准确性上有了显著提高。  相似文献   

18.
19.
20.
This paper describes the SoVideo broadcast news retrieval system for Mandarin Chinese. The system is based on technologies such as large vocabulary continuous speech recognition for Mandarin Chinese, automatic story segmentation, and information retrieval. Currently, the database consists of 177 hours of broadcast news, which yielded 3,264 stories by automatic story segmentation. We discuss the development and evaluation of each component of the retrieval system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号