首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
基于音视特征的视频内容检测方法   总被引:1,自引:1,他引:1       下载免费PDF全文
蔡群  陆松年  杨树堂 《计算机工程》2007,33(22):240-242
提出了一种结合音视频双重特征检测视频内容的新方法,以提高对视频内容的识别准确率。该方法分别对视觉特征和音频特征进行分析,引入支持向量机对音频段进行分类,并综合音视域的分析结果对视频内容进行判断。针对特殊视频片断进行分析,证明结合音视特征的分析方法可行有效,可应用于视频内容监控及特定视频片段的检索与分割。  相似文献   

2.
基于支持向量机的音频分类与分割   总被引:8,自引:0,他引:8  
音频分类与分割是提取音频结构和内容语义的重要手段,是基于内容的音频、视频检索和分析的基础。支持向量机(SVM)是一种有效的统计学习方法。本文提出了一种基于SVM的音频分类算法。将音频分为5类:静音、噪音、音乐、纯语音和带背景音的语音。在分类的基础上,采用3个平滑规则对分类结果进行平滑。分析了SVM分类嚣的分类性能,同时也评估了本文提出的新的音频特征在SVM分类嚣上的分类效果。实验结果显示,基于SVM的音频分类算法分类效果良好,平滑处理后的音频分割结果比较准确。  相似文献   

3.
暴力镜头检测是近年来的研究热点之一。早期的暴力镜头检测主要依赖视频特征,由于音频信息具有良好的稳定性和在不同文化和人群之间的一致性,现在人们越来越多地关注音频信息的使用。为此研究使用音频特征对电影镜头中的暴力音频事件进行检测。为此提出了一种基于多尺度时长的特征提取方法。提取了除MFCC、LPC、能量等短时特征以外,还提取了能量均值方差、子带能量均值和方差、帧间差分等长时特征。暴力镜头中出现较多且具有代表性的音频事件有爆炸、尖叫、枪击三种。本文以电影的镜头为识别单位,使用支持向量机分类算法实现了一个检测系统。通过在15部好莱坞电影上的实验,表明本文基于多尺度时长的音频特征在暴力音频事件检测工作中,能够取得较好的结果。  相似文献   

4.
视频数据中的音频流包含了丰富的语义信息.在基于内容的视频检索中,对音频信息的分析是不可分割的一部分.本文主要讨论基于内容的音频场景分割,分析各种音频特征及提取方法,并在此基础上提出一种新的音频流分割方法,根据六种音频类型(语音、音乐、静音、环境音、纯语音、音乐背景下的语音和环境音背景下的语音)的音频特征对视频数据中的音频流分割音频场景.实验证明该方法是有效的,在保证一定的分割精度的同时,准确率和查全率都得到了较大的提高.  相似文献   

5.
基于音频特征的多小波域水印算法   总被引:3,自引:0,他引:3  
基于对音频特征的分析,提出了一种多小波域的水印算法.结合人类听觉系统的时频掩蔽特性,该算法分析音频帧的过零率及时域能量,确定用于嵌入水印的帧.利用音频的分抽样特征和多小波变换在信号处理中的优势,将每一个音频帧进行分抽样为两个子音频帧并分别将其变换到多小波域.利用两个子音频帧在多小波域的能量来估计所嵌入水印的容量,并根据它们的能量大小关系完成水印的嵌入.水印的提取过程转为一个使用支持向量机进行处理的二分类问题.实验结果验证了所提出的水印算法能根椐音频自身的特点寻找到适合用于嵌入水印的音频帧,且能动态调整水印的嵌入强度,在保证听觉质量的同时提高了水印的鲁棒性.  相似文献   

6.
随着多媒体技术的发展,自动检测出数字视频节目里面嵌入的广告是很具挑战性的研究.然而,由于嵌入的广告的制作方式和表现手法的多样性,很多自动检测模型的实验结果往往不甚理想.为了提高检测系统的鲁棒性,提出了3阶段广告检测系统.首先,提出了基于区域特征重要性的镜头检测算法(RBFID,region-based feature importance detection),实现视频播放中突变镜头和消隐镜头的检测,同时从每个镜头提取出一些统计特征用来标识镜头.然后,利用SVM的优异分类特性实现镜头分类.最后为了能得到精确的广告视频段,利用广告视频在内容和时间上的连续性来消除错分的镜头,然后将广告镜头整合成广告视频段.本系统在30个电视节目的片段上进行验证,实验结果表明此广告检测系统具有实用性.  相似文献   

7.
音频自动分类中的特征分析和抽取   总被引:8,自引:1,他引:8  
音频特征分析和抽取是音频自动分类的基础,本文将音频对象分为静音,噪音,纯语音,带背景音语音,音乐等5类,从帧层次和段层次上深入分析了不同类音频之间的区别性特征,包括帧层次上的MFCC,频域能量,子带能量,过零率,频谱中心等特征,在此基础上计算了段层次上的基本音频特征,包括静音比率,子带能量比均值等,提出了3个音频”流”特征-High-ZCR比率,Low-Frequency-Energy比率,频谱流量.设计并实现了一种基于支持向量机(support vector machine)的自动分类器,考察了上述特征组成的特征集合在该分类器中的分类性能.实验表明,本文提出的特征有效,分类性能良好.  相似文献   

8.
基于小波变换和支持向量机的音频分类   总被引:2,自引:0,他引:2       下载免费PDF全文
音频特征提取是音频分类的基础,而音频分类又是内容的音频检索的关键。综合分析了语音和音乐的区别性特征,提出一种基于小波变换和支持向量机的音频特征提取和分类的方法,用于纯语音、音乐、带背景音乐的语音以及环境音的分类,并且评估了新特征集合在SVM分类器上的分类效果。实验结果表明,提出的音频特征有效、合理,分类性能较好。  相似文献   

9.
在分析镜头边界类型、检测方法的基础上,根据镜头的连续性特征,将一个二级级联分类器应用于镜头边界检测.第一级分类器根据视频帧灰度方差特征,将无明显变化的视频序列从原始视频序列中分离出去,得到一个新的视频序列;第二级分类器在新视频序列的基础上,提取视频图像的像素对差值、HSV空间颜色直方图的各分量差值以及边缘直方图X,Y分量差值等视频特征,并采用支持向量机多分类策略进行镜头边界类型的检测.实验结果表明,与积聚算法及SVM—TMRA算法相比,文中算法的综合性能更高且具有较高的实时性.  相似文献   

10.
基于语义概念的视频检索系统的设计与实现   总被引:2,自引:0,他引:2       下载免费PDF全文
设计并实现了一种基于语义概念的视频检索系统,该系统包括视频镜头分割与关键帧提取、语义概念检测和用户检索3个部分。系统采用镜头分割与关键帧提取对视频进行层次分割,并对关键帧图像提取有效的图像低层特征,再使用支持向量机(SVM)进行概念的检测,最后针对概念内容进行视频检索。在概念检测中,提出了一种基于验证平均准确率的线性加权方法对SVM的分类结果进行后融合。实验结果表明,该方法可以达到较高的检索准确率。  相似文献   

11.
AVS是《信息技术先进音视频编码》系列标准的简称,是中国自主制定的音视频编码标准,主要面向高清晰度电视、高密度光存储和移动媒体等应用。它是一套包含系统、视频、音频、媒体版权管理在内的完整标准体系,其中视频标准包括两部分:面向数字电视应用领域的AVS-P2和面向移动应用领域的AVS-P7。针对AVS两种视频标准基于移动视频应用领域上的关键技术进行比较,通过实验数据进行分析;对两种视频标准在移动视频领域的应用前景进行探讨。  相似文献   

12.
A method that exploits an information theoretic framework to extract optimized audio features using video information is presented. A simple measure of mutual information (MI) between the resulting audio and video features allows the detection of the active speaker among different candidates. This method involves the optimization of an Mi-based objective function. No approximation is needed to solve this optimization problem, neither for the estimation of the probability density functions (pdfs) of the features, nor for the cost function itself. The pdfs are estimated from the samples using a nonparametric approach. The challenging optimization problem is solved using a global method: the differential evolution algorithm. Two information theoretic optimization criteria are compared and their ability to extract audio features specific to speech production is discussed. Using these specific audio features, candidate video features are then classified as member of the "speaker" or "non-speaker" class, resulting in a speaker detection scheme. As a result, our method achieves a speaker detection rate of 100% on in-house test sequences, and of 85% on most commonly used sequences.  相似文献   

13.
In this paper, a novel audio-visual scene change detection algorithm is presented and evaluated experimentally. An enhanced set of eigen-audioframes is created that is related to an audio signal subspace, where audio background changes are easily discovered. An analysis is presented that justifies why this subspace favors scene change detection. Additionally, a novel process is developed in order to detect audio scene change candidates in this subspace. Visual information is used to align audio scene change indications with neighboring video shot changes and, accordingly, to reduce the false alarm rate of the audio-only scene change detection. Moreover, video fade effects are identified and used independently in order to track scene changes. The false alarm rate is reduced further by extracting acoustic features in order to verify that the scene change indications are valid. The detection methodology was tested on newscast videos provided by the TRECVID2003 video test set. The experimental results demonstrate that the proposed method achieves an F-measure exceeding 0.85. Accordingly, it effectively tackles the scene change detection problem  相似文献   

14.
广告检测是指从电视节目中自动地检测出广告序列。传统方法使用基于计算机视觉算法的技术框架对视频内容进行广告检测,无法在性能和效率上满足商业化需求。本算法仅利用广告音频信息完成广告检测工作。首先抽取广告库的原始音频信息,使用短时傅里叶变换将其转化为声谱图,再应用预筛选的滤波器集合进行二值化,得到局部特征描述子,组成广告音频库;其次在广告检测过程中,以同样方法提取其特征描述子,在广告音频库中检索得到检测结果。基于音频匹配的广告检测算法具有存储小、准确度高、实时性强等诸多优点。实验表明该算法可以显著提高广告检测系统的鲁棒性和性能,可应用于现实场景中。  相似文献   

15.
为实现对P2P-TV应用的实时内容检测,简要介绍了P2P-TV监控系统对P2P-TV平台与频道的精细识别,针对PPTV采用ASF流媒体格式进行数据流传输、节点之间通过UDP协议获取数据,在精确识别出平台与频道的基础上,识别出数据传输过程中的A/V数据包,获知A/V数据包的序号、A/V数据的长度及起始终止位置,通过在线将A/V数据提取并还原为媒体文件并进行内容检测。  相似文献   

16.
The different aspects of sensors integration, and specifically that of a Mass Spectrometer (MS) with audio and video signals, are investigated for detecting and monitoring indoor fire events. The present study focuses on comparing the capabilities of a variety of chemical sensors, on answering technical challenges in regard to the integration of chemical, audio and video signals and on discussing integration issues for potential field applications. Controlled, small scale fire experiments were carried out in the laboratory. A commercial MS coupled with an in-house developed Pulsed Sampling System (PSS), was used for on-line sampling and near real-time monitoring of the evolved volatiles. The detection limit of PSS-MS was found to be 150 ppbv and its linearity was confirmed up to 10 ppmv using benzene gas standards. The profiles of ions with m/z 57, 78, 91 and 106, corresponding to indicative Volatile Organic Compounds (VOCs) of the fire event, were recorded and compared with the concentration profiles of CO2, CO, O2, NO and H/C (C3H8), acquired by the gas sensors of a commercial exhaust gas analyzer. Audio and video signals were recorded by a microphone and a visual camera, simultaneously, with PSS-MS data. Two types of fire experiments were performed in order to simulate field conditions: (a) direct fire monitoring, in case of unobstructed direct fire view and (b) indirect fire monitoring through reflection of audio and video signals on metallic surfaces, for simulating obstacles preventing direct fire view. The information derived by audio and video signals reaffirmed the chemical detection inferences for both types of fire experiments, thus increasing the credibility of each individual method. Occasionally, video, audio and chemical information were complementary, thus counterbalancing the detection limitations of the individual methods. The integrated approach of combining MS data with audio and video signals appears to be a promising method in safety and security applications, where reliable, early detection and real-time monitoring is necessary.  相似文献   

17.
Pornographic video detection based on multimodal fusion is an effective approach for filtering pornography. However, existing methods lack accurate representation of audio semantics and pay little attention to the characteristics of pornographic audios. In this paper, we propose a novel framework of fusing audio vocabulary with visual features for pornographic video detection. The novelty of our approach lies in three aspects: an audio semantics representation method based on an energy envelope unit (EEU) and bag-of-words (BoW), a periodicity-based audio segmentation algorithm, and a periodicity-based video decision algorithm. The first one, named the EEU+BoW representation method, is proposed to describe the audio semantics via an audio vocabulary. The audio vocabulary is constructed by k-means clustering of EEUs. The latter two aspects echo with each other to make full use of the periodicities in pornographic audios. Using the periodicity-based audio segmentation algorithm, audio streams are divided into EEU sequences. After these EEUs are classified, videos are judged to be pornographic or not by the periodicity-based video decision algorithm. Before fusion, two support vector machines are respectively applied for the audio-vocabulary-based and visual-features-based methods. To fuse their results, a keyframe is selected from each EEU in terms of the beginning and ending positions, and then an integrated weighted scheme and a periodicity-based video decision algorithm are adopted to yield final detection results. Experimental results show that our approach outperforms the traditional one which is only based on visual features, and achieves satisfactory performance. The true positive rate achieves 94.44% while the false positive rate is 9.76%.  相似文献   

18.
基于视听分层模型的实时爆炸场景识别   总被引:1,自引:0,他引:1  
提出在实时环境下使用基于听觉和视觉的分层模型对MPEG多媒体数据流中的“爆炸”场景在压缩域进行识别的算法.首先用一个粗分支持向量机把爆炸和类似爆炸的音频从别的音频中识别出来,然后再分别用几个精细支持向量机把爆炸和类似爆炸的音频区分开,由此得到音频爆炸备选场景.由于大多数爆炸场景均伴随剧烈的视觉突变,因此对得到的音频爆炸备选场景再判断其对应的视觉特征是否发生了变化,得到最后的识别结果。  相似文献   

19.
While most existing sports video research focuses on detecting event from soccer and baseball etc., little work has been contributed to flexible content summarization on racquet sports video, e.g. tennis, table tennis etc. By taking advantages of the periodicity of video shot content and audio keywords in the racquet sports video, we propose a novel flexible video content summarization framework. Our approach combines the structure event detection method with the highlight ranking algorithm. Firstly, unsupervised shot clustering and supervised audio classification are performed to obtain the visual and audio mid-level patterns respectively. Then, a temporal voting scheme for structure event detection is proposed by utilizing the correspondence between audio and video content. Finally, by using the affective features extracted from the detected events, a linear highlight model is adopted to rank the detected events in terms of their exciting degrees. Experimental results show that the proposed approach is effective.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号