首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 929 毫秒
1.
Stitching motions in multiple videos into a single video scene is a challenging task in current video fusion and mosaicing research and film production. In this paper, we present a novel method of video motion stitching based on the similarities of trajectory and position of foreground objects. First, multiple video sequences are registered in a common reference frame, whereby we estimate the static and dynamic backgrounds, with the former responsible for distinguishing the foreground from the background and the static region from the dynamic region, and the latter functioning in mosaicing the warped input video sequences into a panoramic video. Accordingly, the motion similarity is calculated by reference to trajectory and position similarity, whereby the corresponding motion parts are extracted from multiple video sequences. Finally, using the corresponding motion parts, the foregrounds of different videos and dynamic backgrounds are fused into a single video scene through Poisson editing, with the motions involved being stitched together. Our major contributions are a framework of multiple video mosaicing based on motion similarity and a method of calculating motion similarity from the trajectory similarity and the position similarity. Experiments on everyday videos show that the agreement of trajectory and position similarities with the real motion similarity plays a decisive role in determining whether two motions can be stitched. We acquire satisfactory results for motion stitching and video mosaicing.  相似文献   

2.
3.
An approach is presented to detect faces and facial features on a video segment based on multi-cues,including gray-level distribution,color,motion,templates,algebraic fatures and so on .Faces are fist detected across the frames by using color segmentation,template matching and artificial neural network.A PCA-based (Principal Component Analysis) feature detector for still images is then used to detect facial features on each single frame until the resulting features of three adjacent frames ,named as base frames,are consistent with each other.The fatures of frames neighboring the base ,frames are first detected by the still-imagte feature detector,then verified and corrected according to the smoothness constraint and the planar surface motion constraint.Experiments have been performed on video segments captured under different environments ,and the presented method is proved to be robust and accurate over variable poses ,agtes and illumination conditions.  相似文献   

4.
This paper addresses the problem of detecting objectionable videos, which has never been carefully studied before. Our method can be efficiently used to filter objectionable videos on Internet. One tensor based key-frame selection algorithm, one cube based color model and one objectionable video estimation algorithm are presented. The key frame selection is based on motion analysis using the three-dimensional structure tensor. Then the cube based color model is employed to detect skin color in each key frame. Finally, the video estimation algorithm is applied to estimate objectionable degree in videos. Experimental results on a variety of real-world videos downloaded from Internet show that this method is promising.  相似文献   

5.
In order to recognize and track all of the heads exactly in top view images, a novel approach of 3D feature extraction of heads based on target region matching is presented. The main idea starts from the disparity of head region, which is generally extracted in global dense disparity image obtained by block matching method. Deferent from the block matching, the correspondence searching in target region matching is not done in the regions around every pixel in image but in the candidate head regions extracted in advance by monocular image processing. As the number of candidate head regions is far less than the resolution of image, the computational complexity and time consume can be largely reduced. After the disparity of candidate head regions are obtained, the 3D features of head, including the height feature and the perspective feature, can be extracted to largely improve the accuracy of head recognition.  相似文献   

6.
Yudian JI  Chuang GAN 《软件》2013,(1):152-154
Gender recognition has gained more attention. However, most of the studies have focused on face images acquired under controlled conditions. In this paper, we investigate gender recognition on real-life faces. We proposed a gender recognition scheme, which is composed of four parts: face detection, median filter, feature extraction, and gender classifier. MULBP features are adopted and combined with a SVM classifier for gender recognition. The MULBP feature is robust to noise and illumination variations. In the experiment, we obtain 98.32% using LFW database and 97.30% on Samsung Gender dataset, which shows the superior performance in gender recognition compared with the conventional operators.  相似文献   

7.
Owing to the enormous information and complex structure, video semantic processing is a tricky issue all along. Current researches are restricted within recognizing relative simple semantic in some certain domains. This paper brings forward a novel method transiting low-level feature to high-level semantic with Markov chain by stages, which takes object semantic as the core. This method is valid for recognizing complex event semantic. Semantic concept mapping mechanism based on semantic template is presented to realize the automatic recognition of video semantic. In the experiment contrasting with IBM's IMAT, our method shows more extensive recognition range and higher accuracy. Experimental results are encouraging, and indicate that the performance of the proposed approach is effective.  相似文献   

8.
Automatic content analysis of sports videos is a valuable and challenging task. Motivated by analogies between a class of sports videos and languages, the authors propose a novel approach for sports video analysis based on compiler principles. It integrates both semantic analysis and syntactic analysis to automatically create an index and a table of contents for a sports video. Each shot of the video sequence is first annotated and indexed with semantic labels through detection of events using domain knowledge. A grammar-based parser is then constructed to identify the tree structure of the video content based on the labels. Meanwhile, the grammar can be used to detect and recover errors during the analysis. As a case study, a sports video parsing system is presented in the particular domain of diving. Experimental results indicate the proposed approach is effective.  相似文献   

9.
In video information retrieval,key frame extraction has been recognized as one of the important research issues.Although much progress has been made,the existing approaches are either computationally expensive or ineffective in capturing salient visual content.In this paper,we first discuss the importance of key frame extraction and then briefly review and evaluate the existing approaches.To overcome the shortcominge of the existing approaches,we introduce a new algorithm for key frame extraction based on unsupervised clustring.Meanwhile,we provide a feedback chain to adjust the granularity of the extraction result.The proposed algorithm is both computationally simple and able to capture the visual content.The efficiency and effectiveness are validated by large amount of real-world videos.  相似文献   

10.
In recent years, many image-based rendering techniques have advanced from static to dynamic scenes and thus become video-based rendering (VBR) methods. But actually, only a few of them can render new views on-line. We present a new VBR system that creates new views of a live dynamic scene. This system provides high quality images and does not require any background subtraction. Our method follows a plane-sweep approach and reaches real-time rendering using consumer graphic hardware, graphics processing unit (GPU). Only one computer is used for both acquisition and rendering. The video stream acquisition is performed by at least 3 webcams. We propose an additional video stream management that extends the number of webcams to 10 or more. These considerations make our system low-cost and hence accessible for everyone. We also present an adaptation of our plane-sweep method to create simultaneously multiple views of the scene in real-time. Our system is especially designed for stereovision using autostereoscopic displays. The new views are computed from 4 webcams connected to a computer and are compressed in order to be transfered to a mobile phone. Using GPU programming, our method provides up to 16 images of the scene in real-time. The use of both GPU and CPU makes this method work on only one consumer grade computer.  相似文献   

11.
Video mosaics for virtual environments   总被引:38,自引:0,他引:38  
As computer-based video becomes ubiquitous with the expansion of transmission, storage, and manipulation capabilities, it will offer a rich source of imagery for computer graphics applications. This article looks at one way to use video as a new source of high-resolution, photorealistic imagery for these applications. If you walked through an environment, such as a building interior, and filmed a video sequence of what you saw you could subsequently register and composite the video images together into large mosaics of the scene. In this way, you can achieve an essentially unlimited resolution. Furthermore, since you can acquire the images using any optical technology, you can reconstruct any scene regardless of its range or scale. Video mosaics can be used in many different applications, including the creation of virtual reality environments, computer-game settings, and movie special effects. I present algorithms that align images and composite scenes of increasing complexity-beginning with simple planar scenes and progressing to panoramic scenes and, finally, to scenes with depth variation. I begin with a review of basic imaging equations and conclude with some novel applications of the virtual environments created using the algorithms presented  相似文献   

12.
A variety of research is in progress to detect wanted scenes from videos. A method of detecting scenes wanted by a user through scene rearrangement based on calculated visual similarity is limited in that such a method does not reflect elements along the storyline through which a user remembers a movie. A movie’s story is built up by characters, and such build-up is closely related with emotions of characters in a film. A movie browsing system based on storyline is executable by applying characters and those characters’ emotions. Thus, methods of extracting key characters in each scene and of clustering scenes through extraction of emotion vectors from dialogues in each scene are hereby suggested. This paper also proposes to develop a movie browsing method and a system based on emotions of characters.  相似文献   

13.
We introduce a novel efficient technique for automatically transforming a generic renderable 3D scene into a simple graph representation named ExploreMaps, where nodes are nicely placed point of views, called probes, and arcs are smooth paths between neighboring probes. Each probe is associated with a panoramic image enriched with preferred viewing orientations, and each path with a panoramic video. Our GPU‐accelerated unattended construction pipeline distributes probes so as to guarantee coverage of the scene while accounting for perceptual criteria before finding smooth, good looking paths between neighboring probes. Images and videos are precomputed at construction time with off‐line photorealistic rendering engines, providing a convincing 3D visualization beyond the limits of current real‐time graphics techniques. At run‐time, the graph is exploited both for creating automatic scene indexes and movie previews of complex scenes and for supporting interactive exploration through a low‐DOF assisted navigation interface and the visual indexing of the scene provided by the selected viewpoints. Due to negligible CPU overhead and very limited use of GPU functionality, real‐time performance is achieved on emerging web‐based environments based on WebGL even on low‐powered mobile devices.  相似文献   

14.
最近几十年来,航拍图片和视频在城市规划、沿海地区监视、军事任务等方面都得到了广泛的运用。因而了解航拍图片中所包含的内容,研究航拍视频所拍摄的场景类型就显得异常重要。目前流行的场景分类算法大多是针对自然场景的,很少有针对高分辨率航拍场景分类的算法。针对高分辨率航拍图片的场景分类给出了一种分层式算法。该算法首先用尺度不变特征转换(scale-invariant feature transform,SIFT)算法提取鲁棒的块局部特征,然后在视觉词袋的基础上,用经局限型波兹曼模型(restricted Boltzmarm machine,RBM)初始化的深层信念网络(deep belief network,DBN)来表示低层特征与高层视频特征之间的关系;同时深层信念网络也起到了分类器的作用。实验结果表明,该算法在处理高分辨率航拍图片场景分类问题时都要略好于目前主流算法。  相似文献   

15.
16.
目的 深度伪造是新兴的一种使用深度学习手段对图像和视频进行篡改的技术,其中针对人脸视频进行的篡改对社会和个人有着巨大的威胁。目前,利用时序或多帧信息的检测方法仍处于初级研究阶段,同时现有工作往往忽视了从视频中提取帧的方式对检测的意义和效率的问题。针对人脸交换篡改视频提出了一个在多个关键帧中进行帧上特征提取与帧间交互的高效检测框架。方法 从视频流直接提取一定数量的关键帧,避免了帧间解码的过程;使用卷积神经网络将样本中单帧人脸图像映射到统一的特征空间;利用多层基于自注意力机制的编码单元与线性和非线性的变换,使得每帧特征能够聚合其他帧的信息进行学习与更新,并提取篡改帧图像在特征空间中的异常信息;使用额外的指示器聚合全局信息,作出最终的检测判决。结果 所提框架在FaceForensics++的3个人脸交换数据集上的检测准确率均达到96.79%以上;在Celeb-DF数据集的识别准确率达到了99.61%。在检测耗时上的对比实验也证实了使用关键帧作为样本对检测效率的提升以及本文所提检测框架的高效性。结论 本文所提出的针对人脸交换篡改视频的检测框架通过提取关键帧减少视频级检测中的计算成本和时间消耗,使用卷积神经网络将每帧的人脸图像映射到特征空间,并利用基于自注意力的帧间交互学习机制,使得每帧特征之间可以相互关注,学习到有判别性的信息,使得检测结果更加准确,整体检测过程更高效。  相似文献   

17.
阮晓钢  张家辉  黄静  柴洁  武悦 《控制与决策》2021,36(9):2211-2217
无人机视频由于拍摄的位置和场景不断移动,环境参数亦不断变化,采用以往针对固定场景的去雾方法不能达到最佳效果.为了使无人机视频去雾算法具有自适应性,提出一种基于滚动时域粒子群优化的视频去雾算法.将基于周期和事件混合驱动的滚动调度策略与粒子群算法(PSO)相结合,对可调去雾参数进行滚动优化调整,当与上次优化的帧间隔数大于阈值或环境和场景发生改变时,启动粒子群优化算法重新选取最佳去雾参数.针对无人机视频,分别应用所提出算法和固定去雾参数算法进行实验和对比分析,实验结果表明,对于环境因素动态变化的视频,所提出算法比固定去雾参数算法具有更好的对比度和视觉效果.  相似文献   

18.
19.
传统的基于物理信号的火焰识别方法易被外部环境干扰,且现有火焰图像特征提取方法对于火焰和场景的区分度较低,从而导致火焰种类或场景改变时识别精度降低。针对这一问题,提出一种基于局部特征过滤和极限学习机的快速火焰识别方法,将颜色空间信息引入尺度不变特征变换(SIFT)算法。首先,将视频文件转化成帧图像,利用SIFT算法对所有图像提取特征描述符;其次,通过火焰在颜色空间上的信息特性进一步过滤局部噪声特征点,并借助关键点词袋(BOK)方法,将特征描述符转换成对应的特征向量;最后放入极限学习机进行训练,从而快速得到火焰识别模型。在火焰公开数据集及真实火灾场景图像进行的实验结果表明:所提方法对不同场景和火焰类型均具有较高的识别率和较快的检测速度,实验识别精度达97%以上;对于包含4301张图片数据的测试集,模型识别时间仅需2.19 s;与基于信息熵、纹理特征、火焰蔓延率的支持向量机模型,基于SIFT、火焰颜色空间特性的支持向量机模型,基于SIFT的极限学习机模型三种方法相比,所提方法在测试集精度、模型构建时间上均占有优势。  相似文献   

20.
王萍  庞文浩 《计算机应用》2019,39(7):2081-2086
针对原始空时双通道卷积神经网络(CNN)模型对长时段复杂视频中行为识别率低的问题,提出了一种基于视频分段的空时双通道卷积神经网络的行为识别方法。首先将视频分成多个等长不重叠的分段,对每个分段随机采样得到代表视频静态特征的帧图像和代表运动特征的堆叠光流图像;然后将这两种图像分别输入到空域和时域卷积神经网络进行特征提取,再在两个通道分别融合各视频分段特征得到空域和时域的类别预测特征;最后集成双通道的预测特征得到视频行为识别结果。通过实验讨论了多种数据增强方法和迁移学习方案以解决训练样本不足导致的过拟合问题,分析了不同分段数、预训练网络、分段特征融合方案和双通道集成策略对行为识别性能的影响。实验结果显示所提模型在UCF101数据集上的行为识别准确率达到91.80%,比原始的双通道模型提高了3.8个百分点;同时在HMDB51数据集上的行为识别准确率也比原模型提高,达到61.39%,这表明所提模型能够更好地学习和表达长时段复杂视频中人体行为特征。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号