首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
现有的基于多示例学习的恐怖视频识别算法都是假设示例间是相互独立的,而忽略了恐怖视频中存在的上下文信息和示例包的统计特性.因此,本文提出了一种多视角融合稀疏表示模型.该模型分别从集合视角、上下文视角以及统计特性视角三个不同的视角来看待一个视频片段,并利用联合稀疏表示框架将三个不同视角融合到一个分类框架中,用来进行恐怖视频的识别.在恐怖视频库上的实验结果验证了算法在恐怖视频识别中比现有的其它算法有更好的性能和稳定性.  相似文献   

2.
为了解决常见视频跟踪方法在复杂场景中难以有效跟踪运动物体的难题,研究了在粒子滤波框架下基于多特征融合的判别式视频跟踪算法.首先分析了特征提取和跟踪算法的鲁棒性和准确性的关系,指出融合多种特征能有效地提升算法在复杂场景中的跟踪效果,然后选择提取HSV颜色特征和HOG特征描述目标表观,并在线训练逻辑斯特回归分类器构造判别式目标表观模型.在公开的复杂场景视频进行测试,比较了使用单一特征和多种特征的实验效果,并且将所提算法和经典跟踪算法进行了比较,实验结果表明融合多种特征的视频跟踪更具鲁棒性和准确性.  相似文献   

3.
针对火灾检测算法检测多尺度火焰和烟雾精度低,且实时性差的问题,提出了一种基于Transformer改进YOLO v4的火灾检测方法.首先,结合MHSA(Multi-Head Self-Attention)改进了CSPDarknet53主干网络,建模全局依赖关系以充分利用上下文信息.此外,基于MHSA改进了PANet模块进行多尺度特征图融合,获取更多的细节特征.为验证改进方法的有效性,与YOLO v4、YOLO v3等算法进行比较.实验证明,不仅能够检测多尺度目标,且视频监控场景下达到实时性,具有准确率高、误报率低、检测实时性等优点,满足监控视频场景下的火灾检测任务.  相似文献   

4.
随着视频获取设备和技术的不断发展,视频数量增长快速,在海量视频中精准查找目标视频片段是具有挑战的任务。跨模态视频片段检索旨在根据输入一段查询文本,模型能够从视频库中找出符合描述的视频片段。现有的研究工作多是关注文本与候选视频片段的匹配,忽略了视频上下文的“语境”信息,在视频理解时,存在对特征关系表达不足的问题。针对此,该文提出一种基于显著特征增强的跨模态视频片段检索方法,通过构建时间相邻网络学习视频的上下文信息,然后使用轻量化残差通道注意力突出视频片段的显著特征,提升神经网络对视频语义的理解能力。在公开的数据集TACoS和ActivityNet Captions的实验结果表明,该文所提方法能更好地完成视频片段检索任务,比主流的基于匹配的方法和基于视频-文本特征关系的方法取得了更好的表现。  相似文献   

5.
水下目标识别的特征融合分类器设计   总被引:4,自引:0,他引:4  
本文对水下目标识别中的特征融合技术进行了研究,讨论了特征融合技术中的问题和解决途径,并给出了特征融合分类器的特性;设计了模糊融合分类器,给出了具体的算法.该分类器对样本在模式空间中的分布不做任何假定,注重类别间的相互约束,强调各个模式的独立作用,用类似于模糊并运算的方式综合这些作用.在实际应用中,通过与已有的分类器比较,表明模糊融合分类器能够综合多种信号特征,有效地提高了分类性能.  相似文献   

6.
针对现有烟火检测算法存在的漏检和误检问题,提出一种基于高效全局上下文网络(EGC-Net)的轻量级烟火检测新算法。该算法以轻量级目标检测网络YOLOX为基础网络,将改进的EGC-Net嵌入到YOLOX的主干特征提取网络与特征金字塔网络之间。EGC-Net由上下文建模、特征转换和特征融合3阶段结构组成,用于获得图像的全局上下文信息,建模烟火目标与其背景信息的远程依赖关系,并结合通道注意力机制学习更具判别力的视觉特征用于烟火检测。实验结果表明,本文提出的EGC-YOLOX烟火检测算法的图像级召回率为95.56%,图像级误报率为4.75%,均优于对比的其他典型轻量级算法,且速度满足实时检测的要求。该算法可在安防和消防领域推广,用于实时火灾监控和预警管理。  相似文献   

7.
基于图的Co-Training网页分类   总被引:1,自引:0,他引:1  
侯翠琴  焦李成 《电子学报》2009,37(10):2173-2180
 本文充分利用网页数据的超链接关系和文本信息,提出了一种用于网页分类的归纳式半监督学习算法:基于图的Co-training网页分类算法(Graph based Co-training algorithm for web page classification),简称GCo-training,并从理论上证明了算法的有效性.GCo-training在Co-training算法框架下,迭代地学习一个基于由超链接信息构造的图的半监督分类器和一个基于文本特征的Bayes 分类器.基于图的半监督分类器只利用少量的标记数据,通过挖掘数据间大量的关系信息就可达到比较高的预测精度,可为Bayes分类器提供大量的标记信息;反过来学习大量标记信息后的Bayes分类器也可为基于图的分类器提供有效信息.迭代过程中,二者互相帮助,不断提高各自的性能,而后Bayes分类器可以用来预测大量未见数据的类别.在Web→KB数据集上的实验结果表明,与利用文本特征和锚文本特征的Co-training算法和基于EM的Bayes算法相比,GCo-training算法性能优越.  相似文献   

8.
张迎周  符炜 《电子学报》2013,41(8):1457-1461
在现有的过程内单子切片算法基础上,提出基于回填待定标号的过程间单子切片算法:先以待定标号初始化子过程中开始处参数变量的切片;再对其进行过程内单子切片分析,据此可得相应参数间依赖关系;最后回填切片表中相应的待定标号,从而获得所需的过程间单子切片.算法充分利用了过程内单子切片的结果,相当程度上避免了重复计算,无需进一步构造诸如特征子图、连接语法等中间形式,同时通过参数间依赖避免了调用上下文问题.此外,文中算法保留了过程内单子切片算法的强语言适应性和组合性.  相似文献   

9.
程序标准化转换中的指针分析算法研究   总被引:3,自引:0,他引:3       下载免费PDF全文
王甜甜  苏小红  马培军 《电子学报》2009,37(5):1104-1108
 针对已有指针分析算法的程序中间表示不能充分表示程序的语法结构与语义,而导致不适合应用于程序标准化转换的问题,提出基于控制依赖树的流敏感和上下文敏感的过程间指针分析算法.将程序表示为控制依赖树,改进指向表示法用以表示指针别名,在此基础上定义数据流公式,对控制依赖树进行流敏感和上下文敏感的指针分析.实验结果表明,该算法的准确性高于Emami指针分析算法的准确性,并且应用于程序标准化时可显著提高代码多样化消除率.  相似文献   

10.
徐堃  徐佩霞 《电子技术》2009,36(11):69-71,63
本文采用改进的Adaboost算法对静态图像中人体进行检测。针对传统算法中训练速度较慢,并且存在风险敏感的问题。本文提出一种快速特征选择算法,通过构造统计表,保存特征信息,避免每轮弱分类器训练时对所有特征重新计算分类误差;并结合fisher判别分析对选出的弱分类器进行训练,学习得到一个新的线性判别方程,最大化不同类别数据之间的可分离性,达到优化强分类器降低风险敏感影响的目的。实验结果表明,相对于传统的Adaboost算法,本文给出的方法加快了特征选择的速度,并具有较好的检测性能。  相似文献   

11.
Video action recognition is an important topic in computer vision tasks. Most of the existing methods use CNN-based models, and multiple modalities of image features are captured from the videos, such as static frames, dynamic images, and optical flow features. However, these mainstream features contain much static information including object and background information, where the motion information of the action itself is not distinguished and strengthened. In this work, a new kind of motion feature is proposed without static information for video action recognition. We propose a quantization of motion network based on the bag-of-feature method to learn significant and discriminative motion features. In the learned feature map, the object and background information is filtered out, even if the background is moving in the video. Therefore, the motion feature is complementary to the static image feature and the static information in the dynamic image and optical flow. A multi-stream classifier is built with the proposed motion feature and other features, and the performance of action recognition is enhanced comparing to other state-of-the-art methods.  相似文献   

12.
The analysis of moving objects in videos, especially the recognition of human motions and gestures, is attracting increasing emphasis in computer vision area. However, most existing video analysis methods do not take into account the effect of video semantic information. The topological information of the video image plays an important role in describing the association relationship of the image content, which will help to improve the discriminability of the video feature expression. Based on the above considerations, we propose a video semantic feature learning method that integrates image topological sparse coding with dynamic time warping algorithm to improve the gesture recognition in videos. This method divides video feature learning into two phases: semi-supervised video image feature learning and supervised optimization of video sequence features. Next, a distance weighting based dynamic time warping algorithm and K-nearest neighbor algorithm is leveraged to recognize gestures. We conduct comparative experiments on table tennis video dataset. The experimental results show that the proposed method is more discriminative to the expression of video features and can effectively improve the recognition rate of gestures in sports video.  相似文献   

13.
Bag-of-words models have been widely used to obtain the global representation for action recognition. However, these models ignored the structure information, such as the spatial and temporal contextual information for action representation. In this paper, we propose a novel structured codebook construction method to encode spatial and temporal contextual information among local features for video representation. Given a set of training videos, our method first extracts local motion and appearance features. Next, we encode the spatial and temporal contextual information among local features by constructing correlation matrices for local spatio-temporal features. Then, we discover the common patterns of movements to construct the structured codebook. After that, actions can be represented by a set of sparse coefficients with respect to the structured codebook. Finally, a simple linear SVM classifier is applied to predict the action class based on the action representation. Our method has two main advantages compared to traditional methods. First, our method automatically discovers the mid-level common patterns of movements that capture rich spatial and temporal contextual information. Second, our method is robust to unwanted background local features mainly because most unwanted background local features cannot be sparsely represented by the common patterns and they are treated as residual errors that are not encoded into the action representation. We evaluate the proposed method on two popular benchmarks: KTH action dataset and UCF sports dataset. Experimental results demonstrate the advantages of our structured codebook construction.  相似文献   

14.
15.
王增强  张文强  张良 《信号处理》2020,36(8):1272-1279
现有的视频行为识别方法在特征提取过程中,存在忽略各个特征之间相互作用关系的问题,对近似动作的区分效果不理想。因此,提出引入高阶注意力机制的人体行为识别方法。在深度卷积神经网络中引入高阶注意力模块,通过注意力机制建模和利用复杂和高阶的统计信息,对训练过程中特征图各个部分的权重进行重新分配,从而关注局部细粒度信息,产生有区别性的关注建议,捕获行为之间的细微差异。在UCF101和HMDB51这两个人体行为数据集上的实验结果表明,与现有方法相比,识别率得到了一定的提升,验证了所提出方法的有效性和鲁棒性,提高了对近似行为的辨别能力。   相似文献   

16.
We address the problem of learning representations from the videos without manual annotation. Different video clips sampled from the same video usually have a similar background and consistent motion. A novel self-supervised task is designed to learn such temporal coherence, which is measured by the mutual information in our work. First, we maximize the mutual information between features extracted from the clips which are sampled from the same video. This encourages the network to learn the shared content by these clips. As a result, the network may focus on the background and ignore the motion in videos due to that different clips from the same video normally have the same background. Second, to address this issue, we simultaneously maximize the mutual information between the feature of the video clip and the local regions where salient motion exists. Our approach, which is referred to as Deep Video Infomax (DVIM), strikes a balance between the background and the motion when learning the temporal coherence. We conduct extensive experiments to test the performance of the proposed DVIM on various tasks. Experimental results of fine-tuning for the high-level action recognition problems validate the effectiveness of the learned representations. Additional experiments for the task of action similarity labeling also demonstrate the generalization of the learned representations of the DVIM.  相似文献   

17.
针对现有场景文本识别方法只关注局部序列字符 分类,而忽略了整个单词全局信息的问题,提出 了一种多级特征选择的场景文本识别(multilevel feature selection scene text recogn ition,MFSSTR)算 法。该算法使用堆叠块体系结构,利用多级特征选择模块在视觉特征中分别捕获上下文特征 和语义特 征。在字符预测过程中提出一种新颖的多级注意力选择解码器(multilevel attention sele ction decoder, MASD),将视觉特征、上下文特征和语义特征拼接成一个新的特征空间,通过自注意力机制 将新的特征 空间重新加权,在关注特征序列的内部联系的同时,选择更有价值的特征并参与解码预测, 同时在训练 过程中引入中间监督,逐渐细化文本预测。实验结果表明,本文算法在多个公共场景文本 数据集上识 别准确率能达到较高水平,特别是在不规则文本数据集SVTP上准确率能达到87.1%,相比于当前热门算法提升了约2%。  相似文献   

18.
Semantic high-level event recognition of videos is one of most interesting issues for multimedia searching and indexing. Since low-level features are semantically distinct from high-level events, a hierarchical video analysis framework is needed, i.e., using mid-level features to provide clear linkages between low-level audio-visual features and high-level semantics. Therefore, this paper presents a framework for video event classification using temporal context of mid-level interval-based multimodal features. In the framework, a co-occurrence symbol transformation method is proposed to explore full temporal relations among multiple modalities in probabilistic HMM event classification. The results of our experiments on baseball video event classification demonstrate the superiority of the proposed approach.  相似文献   

19.
Video summarization aims at selecting valuable clips for browsing videos with high efficiency. Previous approaches typically focus on aggregating temporal features while ignoring the potential role of visual representations in summarizing videos. In this paper, we present a global difference-aware network(GDANet) that exploits the feature difference across frame and video as guidance to enhance visual features. Initially, a difference optimization module(DOM) is devised to enhance the discrimina...  相似文献   

20.
Dynamic hand gesture recognition is still an interesting topic for the computer vision community. A set of feature vectors can represent any hand gesture. A Recurrent Neural Network (RNN) can recognize these feature vectors as a hand gesture that analyzes the temporal and contextual information of the gesture sequence. Thus, we proposed a hybrid deep learning framework to recognize dynamic hand gestures. In the Hybrid model GoogleNet is pipelined with a Bidirectional GRU unit to recognize the dynamic hand gesture. Dynamic hand gestures consist of many frames, and features of each frame need to be extracted to get the temporal and dynamic information of the performed gesture. As RNN takes input as a sequence of feature vectors, we extract features from videos using pretrained GoogleNet. As Gated Recurrent Unit is one of the variants of RNN to classify the sequential data, we created a feature vector that corresponds to each video and passed it to the bidirectional GRU (BGRU) network to classify the gestures. We evaluate our model on four publicly available hand gesture datasets. The proposed method performs well and is comparable with the existing methods. For instance, we achieved 98.6% accuracy on Northwestern University Hand Gesture(NWUHG), 99.6% on SKIG, 99.4% on Cambridge Hand Gesture (CHG) datasets respectively. We performed our experiments on DHG14/28 dataset and achieved an accuracy of 97.8% with 14-gesture classes and 92.1% on 28-gesture classes. DHG14/28 dataset contains skeleton and depth data, and our proposed model used depth data and achieved comparable accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号