Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Affiliation:	1. INESC-ID Lisboa, Rua Alves Redol, 9, 1000-029 Lisboa, Portugal;2. Instituto Universitário de Lisboa (ISCTE-IUL), Av. das Forças Armadas, 1649-026 Lisboa, Portugal;3. Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1, 1049-001 Lisboa, Portugal;4. Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213-3891, USA;1. School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran;2. School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran

Abstract:	Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏