共查询到20条相似文献,搜索用时 15 毫秒
1.
Philippe Muller 《Computational Intelligence》2002,18(3):420-450
We present here a theory of motion from a topological point of view, in a symbolic perspective. Taking space–time histories of objects as primitive entities, we introduce temporal and topological relations on the thus defined space–time to characterize classes of spatial changes. The theory thus accounts for qualitative spatial information, dealing with underspecified, symbolic information when accurate data are not available or unnecessary. We show that these structures give a basis for commonsense spatio–temporal reasoning by presenting a number of significant deductions in the theory. This can serve as a formal basis for languages describing motion events in a qualitative way. 相似文献
2.
Gwo Giun Lee Ming-Jiun Wang He-Yuan Lin Drew Wei-Chi Su Bo-Yun Lin 《Multimedia, IEEE Transactions on》2007,9(3):455-465
This paper presents a new spatio-temporal motion estimation algorithm and its VLSI architecture for video coding based on algorithm and architecture co-design methodology. The algorithm consists of the new strategies of spatio-temporal motion vector prediction, modified one-at-a-time search scheme, and multiple update paths derived from optimization theory. The hardware specification is for high-definition video coding. We applied the ME algorithm to H.264 reference software. Our algorithm surpasses recently published research and achieves close performance to full search. The VLSI implementation proves the low cost feature of our algorithm. The algorithm and architecture co-design concept is highly emphasized in this paper. We provide some quantitative example to show the necessity of algorithm and architecture co-design 相似文献
3.
Lathoud G. Odobez J.-M. 《IEEE transactions on audio, speech, and language processing》2007,15(5):1696-1710
Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing aids, and more natural human-machine interaction. Such applications of distant microphones require to determine where and when the speakers are talking. This is inherently a multisource problem, because of background noise sources, as well as the natural tendency of multiple speakers to talk over each other. Moreover, spontaneous speech utterances are highly discontinuous, which makes it difficult to track the multiple speakers with classical filtering approaches, such as Kalman filtering of particle filters. As an alternative, this paper proposes a probabilistic framework to determine the trajectories of multiple moving speakers in the short-term only, i.e., only while they speak. Instantaneous location estimates that are close in space and time are grouped into ldquoshort-term clustersrdquo in a principled manner. Each short-term cluster determines the precise start and end times of an utterance and a short-term spatial trajectory. Contrastive experiments clearly show the benefit of using short-term clustering, on real indoor recordings with seated speakers in meetings, as well as multiple moving speakers. 相似文献
4.
Douglas S.C. Gupta M. Sawada H. Makino S. 《IEEE transactions on audio, speech, and language processing》2007,15(5):1511-1520
This paper derives two spatio-temporal extensions of the well-known FastICA algorithm of Hyvarinen and Oja that are applicable to the convolutive blind source separation task. Our time-domain algorithms combine multichannel spatio-temporal prewhitening via multistage least-squares linear prediction with novel adaptive procedures that impose paraunitary constraints on the multichannel separation filter. The techniques converge quickly to a separation solution without any step size selection or divergence difficulties, and unlike other methods, ours do not require special coefficient initialization procedures to obtain good separation performance. They also allow for the efficient reconstruction of individual signals as observed in the sensor measurements directly from the system parameters for single-input multiple-output blind source separation tasks. An analysis of one of the adaptive constraint procedures shows its fast convergence to a paraunitary filter bank solution. Numerical evaluations of the proposed algorithms and comparisons with several existing convolutive blind source separation techniques indicate the excellent relative performance of the proposed methods. 相似文献
5.
针对互联网相似视频内容检测问题,提出了基于短空时变化的鲁棒视频哈希算法。特征提取和特征量化是该算法的两个关键步骤。在特征提取中,与现有基于时空信息融合的特征提取方法相比,该算法的创新性在于充分利用相邻帧之间 局部空域信息的短时变化(简称“短空时变化”)来提取特征。该算法首先构造视频内接球,并以球心为起点对内接球进行划分,获取一系列内接球环,从而捕捉相邻帧的空域信息的短时变化,然后将球环非负矩阵分解系数作为视频内容进行特征表示;在特征量化中,该算法采用改进的曼哈顿量化策略将视频特征映射成二进制的哈希序列,更好地保留了原空间中的近邻关系,提高了量化的准确度。实验结果表明,该算法具有良好的性能。 相似文献
6.
图像感知哈希技术是一门较新型的技术,哈希提取过程的关键步骤是特征提取,传统的基于DCT变换的感知哈希技术抗几何攻击能力较差。试图在图像生成哈希之前,将几何形变对图像的影响去除,首先将图像进行正则化,使图像具有几何不变性,然后再进行DCT特征系数提取,最后通过量化、编码生成最终哈希。该算法可以抵抗任意的仿射变换。 相似文献
7.
8.
视频摘要生成是计算机视觉领域必不可少的关键任务,这一任务的目标是通过选择视频内容中信息最丰富的部分来生成一段简洁又完整的视频摘要,从而对视频内容进行总结.所生成的视频摘要通常为一组有代表性的视频帧(如视频关键帧)或按时间顺序将关键视频片段缝合所形成的一个较短的视频.虽然视频摘要生成方法的研究已经取得了相当大的进展,但现有的方法存在缺乏时序信息和特征表示不完备的问题,很容易影响视频摘要的正确性和完整性.为了解决视频摘要生成问题,本文提出一种空时变换网络模型,该模型包括三大模块,分别为:嵌入层、特征变换与融合层、输出层.其中,嵌入层可同时嵌入空间特征和时序特征,特征变换与融合层可实现多模态特征的变换和融合,最后输出层通过分段预测和关键镜头选择完成视频摘要的生成.通过空间特征和时序特征的分别嵌入,以弥补现有模型对时序信息表示的不足;通过多模态特征的变换和融合,以解决特征表示不完备的问题.我们在两个基准数据集上做了充分的实验和分析,验证了我们模型的有效性. 相似文献
9.
针对小波变换域特有的图象数据结构 ,对视频编码算法提出了若干改进策略。实验表明这些改进的编码方法有利于图象压缩比的提高和视觉效果的改善 相似文献
10.
提出一种基于鲁棒Hash的视频拷贝检测方案.通过对特征点进行分类,选取在时空域上持久存在的稳定点,对邻域点进行微分计算构造局部特征.将多维特征数据进行Hilbert编码,并选取有效位作为检测Hash码.为了准确的在目标视频中定位可疑内容,提出了Hash匹配方案,将序列相似度作为匹配的依据,提高匹配精度.实验结果表明本方案拥有较好检测性能,适用于视频内容的拷贝检测. 相似文献
11.
Hashing is a common solution for content-based multimedia retrieval by encoding high-dimensional feature vectors into short binary codes. Previous works mainly focus on image hashing problem. However, these methods can not be directly used for video hashing, as videos contain not only spatial structure within each frame, but also temporal correlation between successive frames. Several researchers proposed to handle this by encoding the extracted key frames, but these frame-based methods are time-consuming in real applications. Other researchers proposed to characterize the video by averaging the spatial features of frames and then the existing hashing methods can be adopted. Unfortunately, the sort of “video” features does not take the correlation between frames into consideration and may lead to the loss of the temporal information. Therefore, in this paper, we propose a novel unsupervised video hashing framework via deep neural network, which performs video hashing by incorporating the temporal structure as well as the conventional spatial structure. Specially, the spatial features of videos are obtained by utilizing convolutional neural network, and the temporal features are established via long-short term memory. After that, the time series pooling strategy is employed to obtain the single feature vector for each video. The obtained spatio-temporal feature can be applied to many existing unsupervised hashing methods. Experimental results on two real datasets indicate that by employing the spatio-temporal features, our hashing method significantly improves the performance of existing methods which only deploy the spatial features, and meanwhile obtains higher mean average precision compared with the state-of-the-art video hashing methods. 相似文献
12.
基于DirectShow设计视频加密Transform Filter 总被引:4,自引:0,他引:4
实现了基于DirectShow技术开发视频加密TransformFilter。首先介绍了视频加密的原理和方案,分析了Di-rectShow的体系结构,然后研究了基于DirectShow技术如何实现视频加密TransformFilter,并给出了试验结果和结论。试验结果表明该Filter可以对实时视频进行加密,加密速度快,效果好。 相似文献
13.
14.
15.
提出一种基于Surfacelet变换并结合SPIHT算法的视频压缩编码方法。这种新方法把视频信号作为特殊的三维信号,对信号的空间和时间维进行整体处理。Surfacelet变换具有多方向分解、各向异性、高效率的树结构滤波器组、可完全重建和低冗余度等性质。SPIHT算法具有分辨率质量上的可伸缩性、渐进传输性等特性。利用Surfacelet变换的分解系数在各层间相关以及图像能量集中的特性,结合SPIHT算法完成视频数据的压缩编码。这种新的视频压缩编码方法能弥补三维小波变换的缺陷,达到更高的 PSNR 值和更好的视觉效果,尤其适用于纹理复杂度较高、运动幅度较小的视频。 相似文献
16.
17.
针对数字视频版权保护及信息隐藏技术的需要,本文提出了一种基于帧间小波变换的视频水印算法。该算法首先对视频序列进行等长分组,然后对各组中的视频序列进行最大级数的帧间一维小波变换,从而可以得到一帧低频图像和多帧高频图像,其中低频图像聚集了视频序列的大部分能量,而高频图像的能量相对较小。本文对低频图像进行奇异值分解,然后将水印图像嵌入到低频图像的奇异值变换域上。在水印的嵌入与提取过程中只对4帧视频图像进行小波变换,所需内存较小,并易于硬件实现。实验结果表明,本文提出的视频水印算法具有较好的隐蔽性和鲁棒性。 相似文献
18.
提出了一种人体运动姿态视频检索的新方法,整体算法分为典型姿态学习和姿态检索两个阶段。首先提取样本库中人体姿态的时空运动特征点作为姿态运动底层特征,一个姿态对应一个时空特征点集合;计算每个特征点的时空三维邻域中像素的梯度,进而为每个姿态建立一个梯度直方图;其次,采用非监督的聚类方法对姿态样本归类,按照语义要求提取多个典型姿态;最后,用基于EM的高斯混合模型对聚类结果建模,形成典型姿态检索的分类器,完成姿态建模的的学习阶段。运动姿态的视频检索是根据最大概率匹配准则,对输入的测试视频进行姿态匹配,从而实现基于语义的姿态检索。基于Weizmann和KTH标准测试视频库的大量实验结果表明,本文提出的方法能够准确有效地检索人体运动姿态。 相似文献
19.
提出了一种新的视频多描述编码方法,即基于平衡多小波图像变换的视频多描述编码。首先,把图像进行多小波变换,然后,按照图像多小波变换后子带图像分量的异同重组图像的多小波系数,相同分量的多小波系数就构成图像的一个描述。视频的每帧图像都如是处理,就得到视频的多描述编码。一个好的视频多描述编码方案应该满足两个基本条件:各个描述平均分担图像信息和各个描述能够互相差错掩埋。通过统计分析和数学推导发现,在目前几种常用的多小波中,只有平衡多小波能够满足视频多描述编码的两个条件。本文给出了各个描述之间进行差错掩埋的数学公式,并据此提出视频多描述编码的算法。实验结果表明,即使在丢失四分之三数据的情况下,该算法依然能够以接近30dB的PSNR值和较好的视觉效果恢复原图像。 相似文献