期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

郑李磊谢磊芦咪咪王晓暄杨玉莲张艳宁《电子学报》2011,39(Z1):69-74

本文设计与实现了一个全自动中文新闻字幕生成系统,输入为新闻视频,输出为视频对应的字幕文本.以<新闻联播>为语料,实现了音频提取、音频分类与切分、说话人识别、大词汇量连续语音识别、视频文件的播放和文本字幕的自动生成等多项功能.新闻字幕的自动生成,避免了繁重费时的人工字幕添加过程.实验表明,该系统识别率高,能够满足听障等特... 相似文献

2.

Audio Feature Extraction and Analysis for Scene Segmentation and Classification 总被引：8，自引：0，他引：8

Zhu Liu Yao Wang Tsuhan Chen 《The Journal of VLSI Signal Processing》1998,20(1-2):61-79

Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence. 相似文献

3.

Three‐Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content

Andrej Zgank 《ETRI Journal》2010,32(5):810-818

相似文献

4.

基于多流三音素DBN模型的音视频语音识别和音素切分

吕国云蒋冬梅樊养余赵荣椿 H.Sahli W.Verhelst 《电子与信息学报》2009,31(2):297-301

为实现音视频语音识别和同时对音频视频流进行准确的音素切分,该文提出一个新的多流异步三音素动态贝叶斯网络(MM-ADBN-TRI)模型,在词级别上描述了音频视频流的异步性,音频流和视频流都采用了词-三音素-状态-观测向量的层次结构,识别基元是三音素,描述了连续语音中的协同发音现象.实验结果表明:该模型在音视频语音识别和对音频视频流的音素切分方面,以及在确定音视频流的异步关系上,都具备较好的性能. 相似文献

5.

Speech and language technologies for audio indexing and retrieval 总被引：6，自引：0，他引：6

Makhoul J. Kubala F. Leek T. Daben Liu Long Nguyen Schwartz R. Srivastava A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1338-1353

With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough `n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives 相似文献

6.

Addable Stress Speech Recognition with Multiplexing HMM: Training and Non-training Decision

Pakapong Amornkul Kosin Chamnongthai Punnarumol Temdee 《Wireless Personal Communications》2014,76(3):503-521

In stress speech recognition, a recognition model that is capable of processing multi-stress speech needs to be designed in the view points of accuracy and add-ability. This paper proposes addable stress speech recognition with multiplexing Hidden-Markov model (HMM). To achieve multi-stress speech, we propose a multiplexing topology that combines multiple stress speech models. Since each stress affects a speech in different way, having a speech recognition model that specifically trained to recognize words effected by the stress help improve the recognition rates. However, since each stress speech model gives it own independent recognized word, we need to have an effective decision module to choose the correct word. In each stress speech model, a MFCC is applied to the input speech. The result is fed into a HMM that is segmented into N parts. Each part of the segmentation provides its own tentative recognized word which in turn is an input to the proposed non-training decision module. Based on these tentative recognized words from segments of all stress speech models, the final recognized word is decided using coarse-to-fine concept performed by a majority vote, segment-weighted difference square score and next best score, respectively. Besides neutral speech, the proposed method was verified using three stresses including angry, loud, and Lombard. The results showed that the proposed method achieved 94.7 % recognition rate comparing to 94.2 % of the training-based decision method. 相似文献

7.

Generic content-based audio indexing and retrieval framework

Kiranyaz S. Gabbouj M. 《Vision, Image and Signal Processing, IEE Proceedings -》2006,153(3):285-297

Rapid increase in the amount of the digital audio collections presenting various formats, types, durations and other parameters that the digital multimedia world refers demands a generic framework for robust and efficient indexing and retrieval based on the aural content. Moreover, from the content-based multimedia retrieval point of view, the audio information can be even more important than the visual part as it is mostly unique and significantly stable within the entire duration of the content. A generic and robust audio-based multimedia indexing and retrieval framework, which has been developed and tested under the MUVIS system, is presented. This framework supports the dynamic integration of the audio feature extraction modules during the indexing and retrieval phases and therefore provides a test-bed platform for developing robust and efficient aural feature extraction techniques. Furthermore, the proposed framework is designed based on the high-level content classification and segmentation in order to improve the speed and accuracy of the aural retrievals. Both theoretical and experimental results are finally presented, including the comparative measures of retrieval performance with respect to the visual counterpart. 相似文献

8.

电台音频采集管理系统设计

瞿珊瑚刘宏波张旭《通信技术》2020,(2):491-497

针对电台接收信号依赖于人工值守还未实现自动化接收的问题,在软件无线电的理论基础上,提出一种电台音频采集管理系统总体技术方案,并设计了一款电台外接的电台音频处理终端,旨在电台工作时终端可以自动采集、记录、识别及回放接收到的音频信号,以提高信号接收效率。该终端由通用软件无线电模块、Morse音频信号处理模块以及语音识别模块构成,模块之间相互独立,具有一定的通用性和实用性。相似文献

9.

Speech, audio, and acoustic processing for multimedia

Juang B.H. 《Signal Processing Magazine, IEEE》1997,14(4):34-36

The primary advances in speech and audio signal processing that contributed to the maturing of multimedia applications are discussed in the areas of speech and audio signal compression, speech synthesis, acoustic processing and echo control, and network echo cancellation 相似文献

10.

广播新闻语料识别中的自动分段和分类算法 总被引：1，自引：0，他引：1

吕萍颜永红《电子与信息学报》2006,28(12):2292-2295

该介绍了中文广播新闻语料识别任务中的自动分段和自动分类算法。提出了3阶段自动分段系统。该方法通过粗分段、精细分段和平滑3个阶段,将音频流分割为易于识别的音频段。在精细分段阶段,文中提出两种算法：动态噪声跟踪分段算法和基于单音素解码的分段算法。仿效说话人鉴别中的方法,文中提出了基于混合高斯模型的分类算法。该算法较好地解决了音频段的多类判决问题。在新闻联播测试数据中的实验结果表明,该文提出的自动分段和分类算法性能与手工分段分类性能几乎相当。相似文献

11.

Multimedia content analysis-using both audio and visual clues 总被引：1，自引：0，他引：1

Yao Wang Zhu Liu Jin-Cheng Huang 《Signal Processing Magazine, IEEE》2000,17(6):12-36

相似文献

12.

Automatic object extraction over multiscale edge field for multimedia retrieval.

Serkan Kiranyaz Miguel Ferreira Moncef Gabbouj 《IEEE transactions on image processing》2006,15(12):3759-3772

相似文献

13.

一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别 总被引：4，自引：0，他引：4

谢磊付中华蒋冬梅赵荣椿 Wernet Verhelst Hichem Sahli Jan Conlenis 《电子与信息学报》2005,27(1):64-68

视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将'VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80％以上。相似文献

14.

Fully automatic face recognition system using a combined audio-visual approach 总被引：2，自引：0，他引：2

Albiol A. Torres L. Delp E.J. 《Vision, Image and Signal Processing, IEE Proceedings -》2005,152(3):318-326

This paper presents a novel audio and video information fusion approach that greatly improves automatic recognition of people in video sequences. To that end, audio and video information is first used independently to obtain confidence values that indicate the likelihood that a specific person appears in a video shot. Finally, a post-classifier is applied to fuse audio and visual confidence values. The system has been tested on several news sequences and the results indicate that a significant improvement in the recognition rate can be achieved when both modalities are used together. 相似文献

15.

说话人分割聚类研究进展

下载免费PDF全文

马勇鲍长春《信号处理》2013,29(9):1190-1199

说话人分割聚类是近几年新兴起的语音信号处理研究方向,它主要研究如何确定连续语流中多说话人起止时间的位置,并标出每个语音段对应的说话人。这项研究对自动语音识别、多说话人识别和基于内容的音频分析等都具有重要的意义。根据说话人分割和聚类实现过程不同,本文从异步策略和同步策略的角度回顾了十年来国内外研究的主流算法、技术和代表系统,对比了不同代表系统在近几年NIST富信息转写评测的结果,最后讨论了目前还存在的问题,并对未来的发展进行了展望。相似文献

16.

Integration of acoustic and visual speech signals using neuralnetworks

Yuhas B.P. Goldstein M.H. Jr. Sejnowski T.J. 《Communications Magazine, IEEE》1989,27(11):65-71

Results from a series of experiments that use neural networks to process the visual speech signals of a male talker are presented. In these preliminary experiments, the results are limited to static images of vowels. It is demonstrated that these networks are able to extract speech information from the visual images and that this information can be used to improve automatic vowel recognition. The structure of speech and its corresponding acoustic and visual signals are reviewed. The specific data that was used in the experiments along with the network architectures and algorithms are described. The results of integrating the visual and auditory signals for vowel recognition in the presence of acoustic noise are presented 相似文献

17.

Multimodal Approach for Summarizing and Indexing News Video

Jae‐Gon Kim Hyun Sung Chang Young‐tae Kim Kyeongok Kang Munchurl Kim Jinwoong Kim Hyung‐Myung Kim 《ETRI Journal》2002,24(1):1-11

相似文献

18.

Semantic segmentation and summarization of music: methods based on tonality and recurrent structure

Wei Chai 《Signal Processing Magazine, IEEE》2006,23(2):124-132

This paper describes a study on automatic music segmentation and summarization from audio signals. The paper inquires scientifically into the nature of human perception of music and offers a practical solution to difficult problems of machine intelligence for automated multimedia content analysis and information retrieval. Specifically, three problems are addressed: segmentation based on tonality analysis, segmentation based on recurrent structural analysis, and summarization. Experimental results are evaluated quantitatively, demonstrating the promise of the proposed methods 相似文献

19.

一种基于段级特征和自动标识的语言辨识算法

张文林屈丹李弼程王波王炳锡《信号处理》2008,24(4)

本文研究了一种结合"声学信息"和"音素配位学信息"进行语言辨识的新算法,首先在预处理中对语音进行自动分段,在特征层上引入带有长时信息的段级特征参数--段级移位差分倒谱,在模型层上利用高斯混合模型(Gaussi-an Mixture Model,GMM)将语音信号自动标识为符号序列,进而引入多元语言模型(Multi-gram Language Model,MLM)来对"音素配位学信息"进行建模,最后将"GMM得分"和"MLM得分"送入后端多分类支持向量机模型得到最终识别结果.相关实验表明,新系统不需手工标识的语料,识别速度快,对OGI标准语料库中的五种语言获得了开集正识率为78.84%的结果. 相似文献

20.

嵌入式语音控制选单系统的实现与应用

于春雪《电声技术》2012,36(1):55-59,73

采用ARM处理器$3C2440A构建嵌入式系统，利用音频芯片UDA1341TS对语音信号进行编解码，应用语音识别技术实现语音控制。介绍了系统设计原理和工作机制，并阐述了控制选单的软硬件设计方案和识别算法原理，给出测试方法。实验结果表明，系统能实现特定指令的语音控制，识别率高、实时性好，可适应复杂的工作环境。相似文献