首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
目的 时序动作检测(temporal action detection)作为计算机视觉领域的一个热点课题,其目的是检测视频中动作发生的具体区间,并确定动作的类别。这一课题在现实生活中具有深远的实际意义。如何在长视频中快速定位且实现时序动作检测仍然面临挑战。为此,本文致力于定位并优化动作发生时域的候选集,提出了时域候选区域优化的时序动作检测方法TPO (temporal proposal optimization)。方法 采用卷积神经网络(convolutional neural network,CNN)和双向长短期记忆网络(bidirectional long short term memory,BLSTM)来捕捉视频的局部时序关联性和全局时序信息;并引入联级时序分类优化(connectionist temporal classification,CTC)方法,评估每个时序位置的边界概率和动作概率得分;最后,融合两者的概率得分曲线,优化时域候选区域候选并排序,最终实现时序上的动作检测。结果 在ActivityNet v1.3数据集上进行实验验证,TPO在各评价指标,如一定时域候选数量下的平均召回率AR@100(average recall@100),曲线下的面积AUC (area under a curve)和平均均值平均精度mAP (mean average precision)上分别达到74.66、66.32、30.5,而各阈值下的均值平均精度mAP@IoU (mAP@intersection over union)在阈值为0.75和0.95时也分别达到了30.73和8.22,与SSN (structured segment network)、TCN (temporal context network)、Prop-SSAD (single shot action detector for proposal)、CTAP (complementary temporal action proposal)和BSN (boundary sensitive network)等方法相比,TPO的所有性能指标均有提高。结论 本文提出的模型兼顾了视频的全局时序信息和局部时序信息,使得预测的动作候选区域边界更为准确和灵活,同时也验证了候选区域的准确性能够有效提高时序动作检测的精确度。  相似文献   

2.
Li  Tianyu  Bing  Bing  Wu  Xinxiao 《Multimedia Tools and Applications》2021,80(2):2123-2139

Temporal action proposal generation for temporal action localization aims to capture temporal intervals that are likely to contain actions from untrimmed videos. Prevailing bottom-up proposal generation methods locate action boundaries (the start and the end) with high classifying probabilities. But for many actions, motions at boundaries are not discriminative, which makes action segments and background segments be classified into boundary classes, thereby generating low-overlap proposals. In this work, we propose a novel method that generates proposals by evaluating the continuity of video frames, and then locates the start and the end with low continuity. Our method consists of two modules: boundary discrimination and proposal evaluation. The boundary discrimination module trains a model to understand the relationship between two frames and uses the continuity of frames to generate proposals. The proposal evaluation module removes background proposals via a classification network, and evaluates the integrity of proposals with probability features by an integrity network. Extensive experiments are conducted on two challenging datasets: THUMOS14 and ActivityNet 1.3, and the results demonstrate that our method outperforms the state-of-the-art proposal generation methods.

  相似文献   

3.
Temporal localization is crucial for action video recognition. Since the manual annotations are expensive and time-consuming in videos, temporal localization with weak video-level labels is challenging but indispensable. In this paper, we propose a weakly-supervised temporal action localization approach in untrimmed videos. To settle this issue, we train the model based on the proxies of each action class. The proxies are used to measure the distances between action segments and different original action features. We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds. Compared with state-of-the-art methods, our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets.  相似文献   

4.
时序行为检测是指在一段未分割的长视频中,检测出其中包含的若干行为片段的起止时间和类别.针对该项任务,提出基于双流卷积神经网络的行为检测模型.首先使用双流卷积神经网络提取视频的特征序列,然后使用TAG (Temporal Actionness Grouping)生成行为提议,为了构建高质量的行为提议,将行为提议送入边界回归网络中修正边界,使之更为贴近真实数据,再将行为提议扩展为含有上下文信息的三段式特征设计,最后使用多层感知机对行为进行识别.实验结果表明,本算法在THUMOS 2014数据集和ActivityNet v1.3数据集获得较好的识别率.  相似文献   

5.
在针对视频的人体活动定位和识别领域中,现有的时序行为提名方法无法很好地解决行为特征长期依赖性而导致提名召回率较低.针对此问题,提出了一种上下文信息融合的时序行为提名方法.该方法首先采用三维卷积网络提取视频单元的时空特征,然后采用双向门控循环网络构建上下文关系预测出时序行为区间.针对门控循环单元(GRU)存在参数较多和梯...  相似文献   

6.
根据视频内容自动生成文本序列的密集描述生成融合了计算机视觉与自然语言处理技术。现有密集描述生成方法多强调视频中的视觉与运动信息而忽略了其中的音频信息,关注事件的局部信息或简单的事件级上下文信息而忽略了事件间的时序结构和语义关系。为此,该文提出一种基于多模态特征的视频密集描述生成方法。该方法首先在动作提议生成阶段使用Timeception层作为基础模块以更好适应动作片段时间跨度的多样性,其次在动作提议生成和描述生成两阶段均利用音频特征增强提议和描述生成效果,最后使用时序语义关系模块建模事件间的时序结构和语义信息以进一步增强描述生成的准确性。特别地,该文还构建了一个基于学习场景的视频密集描述数据集SDVC以探究该文所提方法在学习场景现实应用中的有效性。在ActivityNet Captions和SDVC数据集上的实验结果表明,动作提议生成AUC值分别提升0.8%和6.7%;使用真实动作提议进行描述生成时,BLEU_3值分别提升1.4%和4.7%,BLEU_4值分别提升0.9%和5.3%;使用生成的动作提议进行描述生成时,SDVC数据集BLEU_3、BLEU_4值分别提升2.3%和2.2%。  相似文献   

7.
尹丽华  康亮  朱文华 《计算机应用》2022,42(8):2564-2570
为剔除复杂运动前景对视频稳像精度的干扰,同时结合时空显著性在运动目标检测上的独特优势,提出一种融入时空显著性的高精度视频稳像算法。该算法一方面通过时空显著性检测技术识别出运动目标并对其进行剔除;另一方面,采用多网格的运动路径进行运动补偿。具体包括:SURF特征点提取和匹配、时空显著性目标检测、网格划分与运动矢量计算、运动轨迹生成、多路径平滑、运动补偿等环节。实验结果表明,相较于传统的稳像算法,所提算法在稳定度(Stability)指标方面表现突出。对于有大范围运动前景干扰的视频,所提算法比RTVSM(Robust Traffic Video Stabilization Method assisted by foreground feature trajectories)的Stability指标提高了约9.6%;对于有多运动前景干扰的视频,所提算法比Bundled-paths算法的Stability指标提高了约5.8%,充分说明了所提算法对于复杂场景的稳像优势。  相似文献   

8.
9.
近年来,深度学习在人工智能领域表现出优异的性能。基于深度学习的人脸生成和操纵技术已经能够合成逼真的伪造人脸视频,也被称作深度伪造,让人眼难辨真假。然而,这些伪造人脸视频可能会给社会带来巨大的潜在威胁,比如被用来制作政治虚假新闻,从而引发政治暴力或干扰正常选举等。因此,亟需研发对应的检测方法来主动发现伪造人脸视频。现有的方法在制作伪造人脸视频时,容易在空间上和时序上留下一些细微的伪造痕迹,比如纹理和颜色上的扭曲或脸部的闪烁等。主流的检测方法同样采用深度学习,可以被划分为两类,即基于视频帧的方法和基于视频片段的方法。前者采用卷积神经网络(Convolutional Neural Network,CNN)发现单个视频帧中的空间伪造痕迹,后者则结合循环神经网络(Recurrent Neural Network,RNN)捕捉视频帧之间的时序伪造痕迹。这些方法都是基于图像的全局信息进行决策,然而伪造痕迹一般存在于五官的局部区域。因而本文提出了一个统一的伪造人脸视频检测框架,利用全局时序特征和局部空间特征发现伪造人脸视频。该框架由图像特征提取模块、全局时序特征分类模块和局部空间特征分类模块组成。在FaceForensics++数据集上的实验结果表明,本文所提出的方法比之前的方法具有更好的检测效果。  相似文献   

10.
This article introduces a temporal deductive database system featuring a logic programming language and an algebraic front-end. The language, called Temporal DATALOG, is an extension of DATALOG based on a linear-time temporal logic in which the flow of time is modeled by the set of natural numbers. Programs of Temporal DATALOG are considered as temporal deductive databases, specifying temporal relationships among data and providing base relations to the algebraic front-end. The minimum model of a given Temporal DATALOG program is regarded as the temporal database the program models intensionally. The algebraic front-end, called TRA, is a point-wise extension of the relational algebra upon the set of natural numbers. When needed during the evaluation of TRA expressions, slices of temporal relations over intervals can be retrieved from a given temporal deductive database by bottom-up evaluation strategies.
A modular extension of Temporal DATALOG is also proposed, through which temporal relations created during the evaluation of TRA expressions may be fed back to the deductive part for further manipulation. Modules therefore enable the algebra to have full access to the deductive capabilities of Temporal DATALOG and to extend it with nonstandard algebraic operators. This article also shows that the temporal operators of TRA can be simulated in Temporal DATALOG by program clauses.  相似文献   

11.
Temporal Action Localization (TAL) aims to predict both action category and temporal boundary of action instances in untrimmed videos, i.e., start and end time. Existing works usually adopt fully-supervised solutions, however, one of the practical bottlenecks in these solutions is the large amount of labeled training data required. To reduce expensive human label cost, this paper focuses on a rarely investigated yet practical task named semi-supervised TAL and proposes an effective active learning method, named AL-STAL. We leverage four steps for actively selecting video samples with high informativeness and training the localization model, named Train, Query, Annotate, Append. Two scoring functions that consider the uncertainty of localization model are equipped in AL-STAL, thus facilitating the video sample ranking and selection. One takes entropy of predicted label distribution as measure of uncertainty, named Temporal Proposal Entropy (TPE). And the other introduces a new metric based on mutual information between adjacent action proposals, named Temporal Context Inconsistency (TCI). To validate the effectiveness of proposed method, we conduct extensive experiments on three benchmark datasets THUMOS’14, ActivityNet 1.3 and ActivityNet 1.2. Experiment results show that AL-STAL outperforms the existing competitors and achieves satisfying performance compared with fully-supervised learning.  相似文献   

12.
基于深度学习的行为识别算法往往由于复杂的网络设计而难以在实际应用中达到快速、准确的识别效果.针对以上情况,提出一种轻量型的基于时移和片组注意力融合的端到端双流神经网络模型.算法在RGB与光流分支网络中,采用时间稀疏分组随机采样策略实现长时程建模,利用时移模块在时间维度上置换部分通道从而结合邻帧信息来提升时序表征能力,同...  相似文献   

13.
视频隐写分析技术可以侦测出含有隐藏秘密信息的视频,为社会安全提供保障。视频除了含有图像内的空间信息,还蕴含着相邻帧图像之间的时间信息。针对这一特点,提出一种精细化辨识时空特征的视频隐写分析方法。该方法对视频在时间和空间维度的特征量进行精细化建模。采用Marcov对视频空间层次上的块内和块间过程进行建模,以提取空间特征量。利用差值分析视频时间层次上的变化,以提取时间特征量,并将时间和空间特征量输入到SVM模型中进行训练和检测。实际测试结果表明,该方法能够有效区分载密视频和非载密视频,对3 100段测试视频样本的检测准确率高达97.13%。  相似文献   

14.
在人体骨架结构动作识别方法中,很多研究工作在提取骨架结构上的空间信息和运动信息后进行融合,没有对具有复杂时空关系的人体动作进行高效表达。本文提出了基于姿态运动时空域融合的图卷积网络模型(PM-STFGCN)。对于在时域上存在大量的干扰信息,定义了一种基于局部姿态运动的时域关注度模块(LPM-TAM),用于抑制时域上的干扰并学习运动姿态的表征。设计了基于姿态运动的时空域融合模块(PM-STF),融合时域运动和空域姿态特征并进行自适应特征增强。通过实验验证,本文提出的方法是有效性的,与其他方法相比,在识别效果上具有很好的竞争力。设计的人体动作交互系统,验证了在实时性和准确率上优于语音交互系统。  相似文献   

15.
Time-Constrained Keyframe Selection Technique   总被引:1,自引:0,他引:1  
In accessing large collections of digitized videos, it is often difficult to find both the appropriate video file and the portion of the video that is of interest. This paper describes a novel technique for determining keyframes that are different from each other and provide a good representation of the whole video. We use keyframes to distinguish videos from each other, to summarize videos, and to provide access points into them. The technique can determine any number of keyframes by clustering the frames in a video and by selecting a representative frame from each cluster. Temporal constraints are used to filter out some clusters and to determine the representative frame for a cluster. Desirable visual features can be emphasized in the set of keyframes. An application for browsing a collection of videos makes use of the keyframes to support skimming and to provide visual summaries.  相似文献   

16.
The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential Aggregation Tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge which was best for the remaining cases. We present a cost model that closely predicts the running time of Greedy Time Division Merge.  相似文献   

17.
A serious bottleneck towards multimedia e-learning is the non-availability of required bandwidth to view the lecture videos at good resolution because of large size of lecture videos. Content-based compression of video data can greatly enhance the bandwidth utilization over scarce resource networks. In this paper, an educational video compression technique is presented that dynamically allocates the space according to the content importance of each video segment in the educational videos. We present a phase-correlation-based motion estimation and compensation algorithm that assists in the compression of important moving objects in an efficient manner. Temporal coherence is exploited in a two-phase manner. First, the frames with high similarity are categorized and encoded efficiently. Second, the compression ratio is adapted according to the frame content. Shots that are of high importance are stored at a higher bit rate as compared to the frames of relatively low importance. The importance and priority of the frames is computed automatically by our algorithm. Results over several hours of educational videos and comparison with the state-of-the-art compression algorithms illustrate the high compression performance of our technique.  相似文献   

18.
Action recognition on large categories of unconstrained videos taken from the web is a very challenging problem compared to datasets like KTH (6 actions), IXMAS (13 actions), and Weizmann (10 actions). Challenges like camera motion, different viewpoints, large interclass variations, cluttered background, occlusions, bad illumination conditions, and poor quality of web videos cause the majority of the state-of-the-art action recognition approaches to fail. Also, an increased number of categories and the inclusion of actions with high confusion add to the challenges. In this paper, we propose using the scene context information obtained from moving and stationary pixels in the key frames, in conjunction with motion features, to solve the action recognition problem on a large (50 actions) dataset with videos from the web. We perform a combination of early and late fusion on multiple features to handle the very large number of categories. We demonstrate that scene context is a very important feature to perform action recognition on very large datasets. The proposed method does not require any kind of video stabilization, person detection, or tracking and pruning of features. Our approach gives good performance on a large number of action categories; it has been tested on the UCF50 dataset with 50 action categories, which is an extension of the UCF YouTube Action (UCF11) dataset containing 11 action categories. We also tested our approach on the KTH and HMDB51 datasets for comparison.  相似文献   

19.
Temporal segmentation of videos into meaningful image sequences containing some particular activities is an interesting problem in computer vision. We present a novel algorithm to achieve this semantic video segmentation. The segmentation task is accomplished through event detection in a frame-by-frame processing setup. We propose using one-class classification (OCC) techniques to detect events that indicate a new segment, since they have been proved to be successful in object classification and they allow for unsupervised event detection in a natural way. Various OCC schemes have been tested and compared, and additionally, an approach based on the temporal self-similarity maps (TSSMs) is also presented. The testing was done on a challenging publicly available thermal video dataset. The results are promising and show the suitability of our approaches for the task of temporal video segmentation.  相似文献   

20.
汤娜  叶小平  汤庸  彭鹏  杜梦圆 《软件学报》2016,27(9):2290-2302
时态数据管理是常规数据管理的深化和扩展,具有理论研究的意义与实践应用的价值.时态数据索引是时态数据管理的重要技术支撑,是其中的一个研究热点.首先,提出了一种时态数据结构,通过数据节点间的偏序关系,可将常规的二维时间区间的处理转化为基于偏序的时态等价类上的一维的处理,该数据结构可以快速有效地处理时态操作;其次,在该新型时态数据结构基础上研究了时态XML索引TempPartialIndex,其基本特征是将时态数据结构整合到非时态的XML索引中,即,将其整合到语义层之中,通过时态过滤和语义过滤掉大量节点之后,再进行结构连接;另外,着重讨论了基于TempPartialIndex“一次一集合”及其时态变量查询和增量式的动态更新机制.同时,仿真结果表明:TempPartialIndex能够有效地支持时态XML的各类查询及更新操作,技术上具有可行性和有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号