首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, Autoassociative Neural Network (AANN) models are explored for segmentation and indexing the films (movies) using audio features. A two-stage method is proposed for segmenting the film into sequence of scenes, and then indexing them appropriately. In the first stage, music and speech plus music segments of the film are separated, and music segments are labelled as title and fighting scenes based on their position. At the second stage, speech plus music segments are classified into normal, emotional, comedy and song scenes. In this work, Mel frequency cepstral coefficients (MFCCs), zero crossing rate and intensity are used as audio features for segmentation and indexing the films. The proposed segmentation and indexing method is evaluated on manual segmented Hindi films. From the evaluation results, it is observed that title, fighting and song scenes are segmented and indexed without any errors, and most of the errors are observed in discriminating the comedy and normal scenes. Performance of the proposed AANN models used for segmentation and indexing of the films, is also compared with hidden Markov models, Gaussian mixture models and support vector machines.  相似文献   

2.
The need for watching movies is in perpetual increase due to the widespread of the internet and the increasing popularity of the video on demand service. The important mass of movies stored in the Internet or in VOD servers need to be structured to accelerate the browsing operation. In this paper, we propose a new system called "The Scene Pathfinder" that aims at segmenting the movies into scenes to give users the opportunity to have a non- sequential access and to watch particular scenes of the movie. This helps them to judge quickly the movie and decide if they have to buy or to download it and avoiding waste of time and money. The proposed approach is multimodal. We use both of visual and auditory information to accomplish the segmentation. We base on the assumption that every movie scene is either action or non- action scene. Non-action scenes are generally characterized by static backgrounds and occur in the same place. For this reason, we base on the content information and on the Kohonen map to extract these kinds of scenes (shots agglomerations). Action scenes are characterized by high tempo and motion. For this reason, we base on tempo features and on the Fuzzy CMeans to classify shots and to localize the action zones. The two processes are complementary. Indeed, the over segmentation that may occur in the extraction of action scenes by basing on the content information is repaired by the Fuzzy clustering. Our system is tested on a varied database and obtained results show the merit of our approach and that our assumptions are well-founded.  相似文献   

3.
倪宁  卢刚  卜佳俊 《计算机仿真》2006,23(8):184-187,195
目前场景检测的研究,主要是基于图像和视频。但音频同样具有丰富的场景信息,基于音频分析的计算量是比较少的,对自动或者半自动的场景检测,基于音频分析的方法也是更为让用户接受的。可以把基于音频分析的方法作为视频场景检测的辅助手段,以获得更为准确的场景检测和分割。该文提出了一个基于内容的音频分析系统,对视频序列实现基于音频分析的场景检测和分割。该系统能有效的解决许多诸如图像变化了,而实际场景并未变化的情形。且本系统整体运算复杂度较基于视频/图像的场景检测与分割系统要低。  相似文献   

4.
提出了一种基于新闻视频中的标题字幕信息和音视频特征对新闻事件进行分割的方法,并实现了一个新闻事件分割、浏览和检索的原型系统。提出的方法综合利用新闻视频中的标题检测、主持人画面检测以及静音片段和语者切换检测技术分割整段新闻中的新闻事件。实验结果表明,该方法较仅利用标题的新闻事件分割方法在分割准确性上有了显著提高。  相似文献   

5.
基于支持向量机的音频分类与分割   总被引:8,自引:0,他引:8  
音频分类与分割是提取音频结构和内容语义的重要手段,是基于内容的音频、视频检索和分析的基础。支持向量机(SVM)是一种有效的统计学习方法。本文提出了一种基于SVM的音频分类算法。将音频分为5类:静音、噪音、音乐、纯语音和带背景音的语音。在分类的基础上,采用3个平滑规则对分类结果进行平滑。分析了SVM分类嚣的分类性能,同时也评估了本文提出的新的音频特征在SVM分类嚣上的分类效果。实验结果显示,基于SVM的音频分类算法分类效果良好,平滑处理后的音频分割结果比较准确。  相似文献   

6.
Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.  相似文献   

7.
音频内容分割与聚类的研究   总被引:1,自引:0,他引:1  
分析了采用音频特征检测音频边缘来分割音频的过程,给出了采用高斯混合模型GMM描述音频段的方法;介绍了音频段聚类的实现;并给出了实验结果,实验结果说明分割和聚类的效果较好。  相似文献   

8.
The segmentation into scenes helps users to browse movie archives and to select the interesting ones. In a given movie, we have two kinds of scenes: action scenes and non-action scenes. To detect action scenes, we rely on tempo features as motion and audio energy. However, to detect non-action scenes, we have to use the content information. In this paper, we present a new approach to detect non-action movie scenes. The main idea is the use of a new dynamic variant of the self-organizing maps called MIGSOM (Multilevel Interior Growing self-organizing maps) to detect agglomerations of shots in movie scenes. The originality of MIGSOM model lies in its architecture for evolving the structure of the network. The MIGSOM algorithm is generated by a growth process by adding nodes where it is necessary, whether from the boundaries or the interior of the map. In addition, the advantage of the proposed MIGSOM algorithm is their ability to find the best structure of the output space through the training process and to represent better the semantics of the data. Our system is tested on a varied database and compared to the classical SOM and others works. The obtained results show the merit of our approach in term of recall and precision rates and that our assumptions are well founded.  相似文献   

9.
Image segmentation is crucial for multimedia applications. Multimedia databases utilize segmentation for the storage and indexing of images and video. Image segmentation is used for object tracking in the new MPEG-7 video compression standard. It is also used in video conferencing for compression and coding purposes. These are only some of the multimedia applications in image segmentation. It is usually the first task of any image analysis process, and thus, subsequent tasks rely heavily on the quality of segmentation. The proposed method of color image segmentation is very effective in segmenting a multimedia-type image into regions. Pixels are first classified as either chromatic or achromatic depending on their HSI color values. Next, a seed determination algorithm finds seed pixels that are in the center of regions. These seed pixels are used in the region growing step to grow regions by comparing these seed pixels to neighboring pixels using the cylindrical distance metric. Merging regions that are similar in color is a final means used for segmenting the image into even smaller regions.  相似文献   

10.
音频分割是音频分析与检测应用的基础,也是多媒体数据分析领域中的一个十分重要和困难的问题。目前大多数传统的音频流分割方法普遍存在虚假分割点过多、运算量太大、误检与漏检率过高等缺点。为了提高分割性能,有效降低误检率和漏检率,提出关于定长窗分层检测的音频分割算法,采用定长窗滑动遍历音频流,窗内自顶向下分层次地计算检测跳变点,最后用局部极值判定方法验证检测到的候选跳变点。实验结果证明,同传统的混合分割算法相比,处理速度得到大幅提高的同时,跳变点的召回率提高7.1%,准确率达92%。  相似文献   

11.
The study examines the effect of four important aspects of film skimming, including segmentation process, proportion of total skimmed length (TSL), multiple cues available, and genre/domain of the film. We design three experiments to explore their effects on representativeness for video skim. The results of Experiment 1 show that the skimmed video combined with 10% of total skimmed length and 5 or 10 s of skimmed segment (SS) is more efficient for representativeness. The results of Experiment 2 show that the skimmed video with mostly ending part and multiple cues can significantly improve representativeness. The results of Experiment 3 reveal that the representativeness of skimmed video with different types of movie is significantly different.In our experiments, the proportion of TSL is set to three levels, 5%, 10%, and 15%, while the size of SS is also set to three levels, 2.5, 5, and 10 s for the segmentation process. We observe that the skimmed video with the longer TSL and SS has the better representativeness of movie content, but the four combinations for 10% and 15% with 5 s and 10 s are insignificantly different. The finding is helpful for reducing the time cost of skimming video. Furthermore, we applied two important factors—personality focus of the medium and multiple cues, from media richness theory to our skimming method in order to raise the representativeness of video skim for different films. In the personality focus of the medium, we define a movie as having three parts—beginning, middle, and ending. In the multiple cues, the skimmed video with synchronized subtitle, audio, and video can assist our comprehension and reduce the uncertainty. We find that the skimmed video with mostly ending part and synchronized subtitle, audio, and video can raise the representativeness of movie content.  相似文献   

12.
基于播音员识别的新闻视频故事分割方法   总被引:1,自引:0,他引:1       下载免费PDF全文
新闻视频的语义单元分割是基于内容的新闻视频检索和情报挖掘的重要步骤,受到众多研究者的关注。提出了一种基于播音员识别的新闻视频故事单分割的新方法,首先从新闻节目中提取各播音员的声学感知特征的作为其声纹,训练出其相应的混合高斯模型(GMM),并采用KL差异法从视频镜头中探测出各播音员和非播音员音频镜头,最后结合视频字幕帧事件和新闻节目特殊的结构知识对新闻节目进行故事单元分割。在2个多小时的CCTV和CNN新闻视频实验中获得96.02%查准率和92.58%的查全率。  相似文献   

13.
This correspondence is concerned with a method for image segmentation on the visual principle. The inconsistency between the conventional discriminating criterion and the human vision mechanism in perceiving an object and its background is analyzed and an improved discriminating criterion with visual nonlinearity is defined. A new model and an algorithm for image segmentation calculation are proposed based on the spatially adaptive principle of human vision and the relevant hypotheses about object recognition. This is a two-stage process of image segmentation. First, initial segmentation is realized with the bottom-up segmenting algorithm, followed by the goal-driven segmenting algorithm to improve the segmentation results concerning certain regions of interest. Experimental results show that, compared with some conventional and gradient-based segmenting methods, the new method has the excellent performance of extracting small objects from the images of natural scenes with a complicated background.  相似文献   

14.
15.
Detection and representation of scenes in videos   总被引:4,自引:0,他引:4  
This paper presents a method to perform a high-level segmentation of videos into scenes. A scene can be defined as a subdivision of a play in which either the setting is fixed, or when it presents continuous action in one place. We exploit this fact and propose a novel approach for clustering shots into scenes by transforming this task into a graph partitioning problem. This is achieved by constructing a weighted undirected graph called a shot similarity graph (SSG), where each node represents a shot and the edges between the shots are weighted by their similarity based on color and motion information. The SSG is then split into subgraphs by applying the normalized cuts for graph partitioning. The partitions so obtained represent individual scenes in the video. When clustering the shots, we consider the global similarities of shots rather than the individual shot pairs. We also propose a method to describe the content of each scene by selecting one representative image from the video as a scene key-frame. Recently, DVDs have become available with a chapter selection option where each chapter is represented by one image. Our algorithm automates this objective which is useful for applications such as video-on-demand, digital libraries, and the Internet. Experiments are presented with promising results on several Hollywood movies and one sitcom.  相似文献   

16.
RoleNet: Movie Analysis from the Perspective of Social Networks   总被引:1,自引:0,他引:1  
With the idea of social network analysis, we propose a novel way to analyze movie videos from the perspective of social relationships rather than audiovisual features. To appropriately describe role's relationships in movies, we devise a method to quantify relations and construct role's social networks, called RoleNet. Based on RoleNet, we are able to perform semantic analysis that goes beyond conventional feature-based approaches. In this work, social relations between roles are used to be the context information of video scenes, and leading roles and the corresponding communities can be automatically determined. The results of community identification provide new alternatives in media management and browsing. Moreover, by describing video scenes with role's context, social-relation-based story segmentation method is developed to pave a new way for this widely-studied topic. Experimental results show the effectiveness of leading role determination and community identification. We also demonstrate that the social-based story segmentation approach works much better than the conventional tempo-based method. Finally, we give extensive discussions and state that the proposed ideas provide insights into context-based video analysis.   相似文献   

17.
18.
Pornographic video detection based on multimodal fusion is an effective approach for filtering pornography. However, existing methods lack accurate representation of audio semantics and pay little attention to the characteristics of pornographic audios. In this paper, we propose a novel framework of fusing audio vocabulary with visual features for pornographic video detection. The novelty of our approach lies in three aspects: an audio semantics representation method based on an energy envelope unit (EEU) and bag-of-words (BoW), a periodicity-based audio segmentation algorithm, and a periodicity-based video decision algorithm. The first one, named the EEU+BoW representation method, is proposed to describe the audio semantics via an audio vocabulary. The audio vocabulary is constructed by k-means clustering of EEUs. The latter two aspects echo with each other to make full use of the periodicities in pornographic audios. Using the periodicity-based audio segmentation algorithm, audio streams are divided into EEU sequences. After these EEUs are classified, videos are judged to be pornographic or not by the periodicity-based video decision algorithm. Before fusion, two support vector machines are respectively applied for the audio-vocabulary-based and visual-features-based methods. To fuse their results, a keyframe is selected from each EEU in terms of the beginning and ending positions, and then an integrated weighted scheme and a periodicity-based video decision algorithm are adopted to yield final detection results. Experimental results show that our approach outperforms the traditional one which is only based on visual features, and achieves satisfactory performance. The true positive rate achieves 94.44% while the false positive rate is 9.76%.  相似文献   

19.
分布式远程教育平台中的音频传输   总被引:1,自引:0,他引:1  
倪敏 《计算机工程》2004,30(4):42-43,60
随着网络的普及和发展,远程教育将会起到越来越重要的作用。远程教育是指通过音频、视频等计算机技术把课程传送到远程的教育。在东南大学目前已有的高性能网络研究和CORBA系统的研究基础上,开发了分布式的远程教育平台。分析了CORBA A/V流服务,根据网络教育中音频应用的需求,设计并实现了符合CORBA A/V流规范的音频传输控制接口AudioCtrl。  相似文献   

20.
面向编钟乐舞的音视频同步技术的研究   总被引:2,自引:2,他引:2  
音视频同步作为连接音乐,动作的纽带在面向编钟乐舞的音乐驱动动作编辑流程中起到关键性的作用,通过从音频解析入手,提出了音频乐段与动作序列的对应关系,讨论了音频流与视频流如何在实时的环境中同步展示,并给出了音频,视频同步模型的具体实现方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号