基于局部和全局特征视觉单词的人物行为识别 Human Action Recognition by Visual Word Based on Local and Global Features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于局部和全局特征视觉单词的人物行为识别

引用本文：	谢飞,龚声蓉,刘纯平,季怡.基于局部和全局特征视觉单词的人物行为识别[J].计算机科学,2015,42(11):293-298.

作者姓名：	谢飞龚声蓉刘纯平季怡

作者单位：	苏州大学计算机科学与技术学院苏州215006,苏州大学计算机科学与技术学院苏州215006,苏州大学计算机科学与技术学院苏州215006,苏州大学计算机科学与技术学院苏州215006

基金项目：	本文受国家自然科学基金:基于二型模糊概率图模型的多摄像头目标跟踪研究(61170124),基于显著性和信任传递的动态场景主题发现(61272258),基于深度学习的时序3D深度图动作语义理解(61301299),江苏省产学研联合创新资金(前瞻性联合研究项目):复杂场景下异常行为分析及其应用(BY2014059-14)资助

摘要：	基于视觉单词的人物行为识别由于在特征中加入了中层语义信息,因此提高了识别的准确性。然而,视觉单词提取时由于前景和背景存在相互干扰,使得视觉单词的表达能力受到影响。提出一种结合局部和全局特征的视觉单词生成方法。该方法首先用显著图检测出前景人物区域,采用提出的动态阈值矩阵对人物区域用不同的阈值来分别检测时空兴趣点,并计算周围的3D-SIFT特征来描述局部信息。在此基础上,采用光流直方图特征描述行为的全局运动信息。通过谱聚类将局部和全局特征融合成视觉单词。实验证明,相对于流行的局部特征视觉单词生成方法,所提出的方法在简单背景的KTH数据集上的识别率比平均识别率提高了6.4%,在复杂背景的UCF数据集上的识别率比平均识别率提高了6.5%。
关键词：	视觉单词显著图 3D-SIFT 动态阈值矩阵光流直方图
收稿时间：	2014/7/21 0:00:00
修稿时间：	2014/8/29 0:00:00
Human Action Recognition by Visual Word Based on Local and Global Features

XIE Fei,GONG Sheng-rong,LIU Chun-ping and JI Yi.Human Action Recognition by Visual Word Based on Local and Global Features[J].Computer Science,2015,42(11):293-298.

Authors:	XIE Fei GONG Sheng-rong LIU Chun-ping and JI Yi

Affiliation:	School of Computer Science & Technology,Soochow University,Suzhou 215006,China,School of Computer Science & Technology,Soochow University,Suzhou 215006,China,School of Computer Science & Technology,Soochow University,Suzhou 215006,China and School of Computer Science & Technology,Soochow University,Suzhou 215006,China

Abstract:	Different from the method based on low-level features,the human action recognition based on visual word adds mid-level semantic information to features and then improves the accuracy of recognition.For complex background or dynamic scenes,the efficiency of visual words might deteriorate.We proposed a new method which is a combination of local and global feature to generate visual words.Firstly,our approach uses saliency map to detect the rectangles around human.And then inside these rectangles,3D-SIFT will be calculated around interest points detected from dynamic threshold matrix to describe local features.We also added HOOF to describe the global motion information.These visual words provide the important semantic information in the video such as brightness contrast,motion information,etc.The performance of this method in action recognition can be improved 6.4% on KTH dataset and 6.5% on UCF dataset compared with state-of-the-art methods.The experiment results also indicate that our visual dictionary has more advantages in both simple background and dynamic scene than others.

Keywords:	Visual words Saliency map 3D-SIFT Dynamic threshold matrix HOOF
本文献已被万方数据等数据库收录！
	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏