首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In spite of significant improvements in video data retrieval, a system has not yet been developed that can adequately respond to a user’s query. Typically, the user has to refine the query many times and view query results until eventually the expected videos are retrieved from the database. The complexity of video data and questionable query structuring by the user aggravates the retrieval process. Most previous research in this area has focused on retrieval based on low-level features. Managing imprecise queries using semantic (high-level) content is no easier than queries based on low-level features due to the absence of a proper continuous distance function. We provide a method to help users search for clips and videos of interest in video databases. The video clips are classified as interesting and uninteresting based on user browsing. The attribute values of clips are classified by commonality, presence, and frequency within each of the two groups to be used in computing the relevance of each clip to the user’s query. In this paper, we provide an intelligent query structuring system, called I-Quest, to rank clips based on user browsing feedback, where a template generation from the set of interesting and uninteresting sets is impossible or yields poor results.
Ramazan Savaş Aygün (Corresponding author)Email:
  相似文献   

2.
This paper targets at the problem of automatic semantic indexing of news videos by presenting a video annotation and retrieval system which is able to perform automatic semantic annotation of news video archives and provide access to the archives via these annotations. The presented system relies on the video texts as the information source and exploits several information extraction techniques on these texts to arrive at representative semantic information regarding the underlying videos. These techniques include named entity recognition, person entity extraction, coreference resolution, and semantic event extraction. Apart from the information extraction components, the proposed system also encompasses modules for news story segmentation, text extraction, and video retrieval along with a news video database to make it a full-fledged system to be employed in practical settings. The proposed system is a generic one employing a wide range of techniques to automate the semantic video indexing process and to bridge the semantic gap between what can be automatically extracted from videos and what people perceive as the video semantics. Based on the proposed system, a novel automatic semantic annotation and retrieval system is built for Turkish and evaluated on a broadcast news video collection, providing evidence for its feasibility and convenience for news videos with a satisfactory overall performance.  相似文献   

3.
Video scene retrieval with interactive genetic algorithm   总被引:2,自引:1,他引:1  
This paper proposes a video scene retrieval algorithm based on emotion. First, abrupt/gradual shot boundaries are detected in the video clip of representing a specific story. Then, five video features such as “average color histogram,” “average brightness,” “average edge histogram,” “average shot duration,” and “gradual change rate” are extracted from each of the videos, and mapping through an interactive genetic algorithm is conducted between these features and the emotional space that a user has in mind. After the proposed algorithm selects the videos that contain the corresponding emotion from the initial population of videos, the feature vectors from them are regarded as chromosomes, and a genetic crossover is applied to those feature vectors. Next, new chromosomes after crossover and feature vectors in the database videos are compared based on a similarity function to obtain the most similar videos as solutions of the next generation. By iterating this process, a new population of videos that a user has in mind are retrieved. In order to show the validity of the proposed method, six example categories of “action,” “excitement,” “suspense,” “quietness,” “relaxation,” and “happiness” are used as emotions for experiments. This method of retrieval shows 70% of effectiveness on the average over 300 commercial videos.
Sung-Bae ChoEmail:
  相似文献   

4.
In video processing, a common first step is to segment the videos into physical units, generally called shots. A shot is a video segment that consists of one continuous action. In general, these physical units need to be clustered to form more semantically significant units, such as scenes, sequences, programs, etc. This is the so-called story-based video structuring. Automatic video structuring is of great importance for video browsing and retrieval. The shots or scenes are usually described by one or several representative frames, called key-frames. Viewed from a higher level, key frames of some shots might be redundant in terms of semantics. In this paper, we propose automatic solutions to the problems of: (i) video partitioning, (ii) key frame computing, (iii) key frame pruning. For the first problem, an algorithm called “net comparison” is devised. It is accurate and fast because it uses both statistical and spatial information in an image and does not have to process the entire image. For the last two problems, we develop an original image similarity criterion, which considers both spatial layout and detail content in an image. For this purpose, coefficients of wavelet decomposition are used to derive parameter vectors accounting for the above two aspects. The parameters exhibit (quasi-) invariant properties, thus making the algorithm robust for many types of object/camera motions and scaling variances. The novel “seek and spread” strategy used in key frame computing allows us to obtain a large representative range for the key frames. Inter-shot redundancy of the key-frames is suppressed using the same image similarity measure. Experimental results demonstrate the effectiveness and efficiency of our techniques.  相似文献   

5.
In medical image registration and content-based image retrieval, the rigid transformation model is not adequate for anatomical structures that are elastic or deformable. For human structures such as abdomen, registration would involve global features such as abdominal wall as well as local target organs such as liver or spleen. A general non-rigid registration may not be sufficient to produce image matching of both global and local structures. In this study, a warping-deformable model is proposed to register images of such structures. This model uses a two-stage strategy for image registration of abdomen. In the first stage, the global-deformable transformation is used to register the global wall. The warping-transformation is used in second stage to register the liver. There is a good match of images using the proposed method (mean similarity index = 0.73545).The image matching correlation coefficients calculated from eight pairs of CT and MR images of abdomen indicates that the warping-deformable transformation gives better matching of images than those without transformation (p < 0.001, paired t-test). This study has established a model for image registration of deformable structures. This is particularly important for data mining of image content retrieval for structures which are non-rigid. The result obtained is very promising but further clinical evaluation is needed  相似文献   

6.
Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly, key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm has been performed on various genres of films and TV program. Promising experimental results show that the proposed method makes sense to efficient retrieval of interesting video content.
Yuncai LiuEmail:
  相似文献   

7.
The dramatic growth of video content over modern media channels (such as the Internet and mobile phone platforms) directs the interest of media broadcasters towards the topics of video retrieval and content browsing. Several video retrieval systems benefit from the use of semantic indexing based on content, since it allows an intuitive categorization of videos. However, indexing is usually performed through manual annotation, thus introducing potential problems such as ambiguity, lack of information, and non-relevance of index terms. In this paper, we present SHIATSU, a complete system for video retrieval which is based on the (semi-)automatic hierarchical semantic annotation of videos exploiting the analysis of visual content; videos can then be searched by means of attached tags and/or visual features. We experimentally evaluate the performance of SHIATSU on two different real video benchmarks, proving its accuracy and efficiency.  相似文献   

8.
Discovery of a perceptual distance function for measuring image similarity   总被引:3,自引:0,他引:3  
For more than a decade, researchers have actively explored the area of image/video analysis and retrieval. Yet one fundamental problem remains largely unsolved: how to measure perceptual similarity between two objects. For this purpose, most researchers employ a Minkowski-type metric. Unfortunately, the Minkowski metric does not reliably find similarities in objects that are obviously alike. Through mining a large set of visual data, our team has discovered a perceptual distance function. We call the discovered function the dynamic partial function (DPF). When we empirically compare DPF to Minkowski-type distance functions in image retrieval and in video shot-transition detection using our image features, DPF performs significantly better. The effectiveness of DPF can be explained by similarity theories in cognitive psychology.  相似文献   

9.
针对虚拟新闻系统中视频使用时出现的接近于复杂网络理论中的无尺度现象,从而导致整个虚拟新闻效果下降的问题,设计了一种全新的视频语义相似度网络。详细给出了视频语义的描述模型、网络构建的规则、相似度计算的方法以及建立在相似度网络基础上的视频检索算法。对视频语义相似度网络进行了实验,结果表明,视频语义相似度网络能够非常有效地解决视频使用时出现的问题。  相似文献   

10.
11.
 In this work we perform a proof-theoretical investigation of some logical systems in the neighborhood of substructural, intermediate and many-valued logics. The common feature of the logics we consider is that they satisfy some weak forms of the excluded-middle principle. We first propose a cut-free hypersequent calculus for the intermediate logic LQ, obtained by adding the axiom *A∨**A to intuitionistic logic. We then propose cut-free calculi for systems W n , obtained by adding the axioms *A∨(A ⊕ ⋯ ⊕ A) (n−1 times) to affine linear logic (without exponential connectives). For n=3, the system W n coincides with 3-valued Łukasiewicz logic. For n>3, W n is a proper subsystem of n-valued Łukasiewicz logic. Our calculi can be seen as a first step towards the development of uniform cut-free Gentzen calculi for finite-valued Łukasiewicz logics.  相似文献   

12.
A fast simulation method is proposed for estimation of the number of k-dimensional subspaces of weight w in an n-dimensional vector space over the Galois field containing q components. Unbiased estimates are constructed for the cases when w = 1 and w = 2, and lower and upper estimates are proposed for the case when w = 3. It is proved that the relative error remains bounded as q → ∞. A high accuracy of the method proposed is illustrated by numerical examples.  相似文献   

13.
Extraction and recognition of artificial text in multimedia documents   总被引:1,自引:0,他引:1  
Abstract The systems currently available for contentbased image and video retrieval work without semantic knowledge, i. e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e. g. by key word based queries. In this paper we present an algorithm to localise artificial text in images and videos using a measure of accumulated gradients and morphological processing. The quality of the localised text is improved by robust multiple frame integration. A new technique for the binarisation of the text boxes based on a criterion maximizing local contrast is proposed. Finally, detection and OCR results for a commercial OCR are presented, justifying the choice of the binarisation technique.An erratum to this article can be found at  相似文献   

14.
Dai  Peng  Wang  Xue  Zhang  Weihang  Zhang  Pengbo  You  Wei 《Multimedia Tools and Applications》2018,77(18):23547-23577

Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.

  相似文献   

15.
近期,跨模态视频语料库时刻检索(VCMR)这一新任务被提出,它的目标是从未分段的视频语料库中检索出与查询语句相对应的一小段视频片段.现有的跨模态视频文本检索工作的关键点在于不同模态特征的对齐和融合,然而,简单地执行跨模态对齐和融合不能确保来自相同模态且语义相似的数据在联合特征空间下保持接近,也未考虑查询语句的语义.为了解决上述问题,本文提出了一种面向多模态视频片段检索的查询感知跨模态双重对比学习网络(QACLN),该网络通过结合模态间和模态内的双重对比学习来获取不同模态数据的统一语义表示.具体地,本文提出了一种查询感知的跨模态语义融合策略,根据感知到的查询语义自适应地融合视频的视觉模态特征和字幕模态特征等多模态特征,获得视频的查询感知多模态联合表示.此外,提出了一种面向视频和查询语句的模态间及模态内双重对比学习机制,以增强不同模态的语义对齐和融合,从而提高不同模态数据表示的可分辨性和语义一致性.最后,采用一维卷积边界回归和跨模态语义相似度计算来完成时刻定位和视频检索.大量实验验证表明,所提出的QACLN优于基准方法.  相似文献   

16.
We present an unsupervised approach for learning a layered representation of a scene from a video for motion segmentation. Our method is applicable to any video containing piecewise parametric motion. The learnt model is a composition of layers, which consist of one or more segments. The shape of each segment is represented using a binary matte and its appearance is given by the rgb value for each point belonging to the matte. Included in the model are the effects of image projection, lighting, and motion blur. Furthermore, spatial continuity is explicitly modeled resulting in contiguous segments. Unlike previous approaches, our method does not use reference frame(s) for initialization. The two main contributions of our method are: (i) A novel algorithm for obtaining the initial estimate of the model by dividing the scene into rigidly moving components using efficient loopy belief propagation; and (ii) Refining the initial estimate using α β-swap and α-expansion algorithms, which guarantee a strong local minima. Results are presented on several classes of objects with different types of camera motion, e.g. videos of a human walking shot with static or translating cameras. We compare our method with the state of the art and demonstrate significant improvements.  相似文献   

17.
Given m facilities each with an opening cost, n demands, and distance between every demand and facility, the Facility Location problem finds a solution which opens some facilities to connect every demand to an opened facility such that the total cost of the solution is minimized. The k-Facility Location problem further requires that the number of opened facilities is at most k, where k is a parameter given in the instance of the problem. We consider the Facility Location problems satisfying that for every demand the ratio of the longest distance to facilities and the shortest distance to facilities is at most ω, where ω is a predefined constant. Using the local search approach with scaling technique and error control technique, for any arbitrarily small constant > 0, we give a polynomial-time approximation algorithm for the ω-constrained Facility Location problem with approximation ratio 1 + ω + 1 + ε, which significantly improves the previous best known ratio (ω + 1)/α for some 1≤α≤2, and a polynomial-time approximation algorithm for the ω-constrained k- Facility Location problem with approximation ratio ω+1+ε. On the aspect of approximation hardness, we prove that unless NP■DTIME(nO(loglogn)), the ω-constrained Facility Location problem cannot be approximated within 1 + lnω - 1, which slightly improves the previous best known hardness result 1.243 + 0.316ln(ω - 1). The experimental results on the standard test instances of Facility Location problem show that our algorithm also has good performance in practice.  相似文献   

18.
The volume of surveillance videos is increasing rapidly, where humans are the major objects of interest. Rapid human retrieval in surveillance videos is therefore desirable and applicable to a broad spectrum of applications. Existing big data processing tools that mainly target textual data cannot be applied directly for timely processing of large video data due to three main challenges: videos are more data-intensive than textual data; visual operations have higher computational complexity than textual operations; and traditional segmentation may damage video data’s continuous semantics. In this paper, we design SurvSurf, a human retrieval system on large surveillance video data that exploits characteristics of these data and big data processing tools. We propose using motion information contained in videos for video data segmentation. The basic data unit after segmentation is called M-clip. M-clips help remove redundant video contents and reduce data volumes. We use the MapReduce framework to process M-clips in parallel for human detection and appearance/motion feature extraction. We further accelerate vision algorithms by processing only sub-areas with significant motion vectors rather than entire frames. In addition, we design a distributed data store called V-BigTable to structuralize M-clips’ semantic information. V-BigTable enables efficient retrieval on a huge amount of M-clips. We implement the system on Hadoop and HBase. Experimental results show that our system outperforms basic solutions by one order of magnitude in computational time with satisfactory human retrieval accuracy.  相似文献   

19.

As one of key technologies in content-based near-duplicate detection and video retrieval, video sequence matching can be used to judge whether two videos exist duplicate or near-duplicate segments or not. Despite a lot of research efforts devoted in recent years, how to precisely and efficiently perform sequence matching among videos (which may be subject to complex audio-visual transformations) from a large-scale database still remains a pretty challenging task. To address this problem, this paper proposes a multiscale video sequence matching (MS-VSM) method, which can gradually detect and locate the similar segments between videos from coarse to fine scales. At the coarse scale, it makes use of the Maximum Weight Matching (MWM) algorithm to rapidly select several candidate reference videos from the database for a given query. Then for each candidate video, its most similar segment with respect to the given query is obtained at the middle scale by the Constrained Longest Ascending Matching Subsequence (CLAMS) algorithm, and then can be used to judge whether that candidate exists near-duplicate or not. If so, the precise locations of the near-duplicate segments in both query and reference videos are determined at the fine scale by using bi-directional scanning to check the matching similarity at the segments’ boundaries. As such, the MS-VSM method can achieve excellent near-duplicate detection accuracy and localization precision with a very high processing efficiency. Extensive experiments show that it outperforms several state-of-the-art methods remarkably on several benchmarks.

  相似文献   

20.
Automatic video annotation is to bridge the semantic gap and facilitate concept based video retrieval by detecting high level concepts from video data. Recently, utilizing context information has emerged as an important direction in such domain. In this paper, we present a novel video annotation refinement approach by utilizing extrinsic semantic context extracted from video subtitles and intrinsic context among candidate annotation concepts. The extrinsic semantic context is formed by identifying a set of key terms from video subtitles. The semantic similarity between those key terms and the candidate annotation concepts is then exploited to refine initial annotation results, while most existing approaches utilize textual information heuristically. Similarity measurements including Google distance and WordNet distance have been investigated for such a refinement purpose, which is different with approaches deriving semantic relationship among concepts from given training datasets. Visualness is also utilized to discriminate individual terms for further refinement. In addition, Random Walk with Restarts (RWR) technique is employed to perform final refinement of the annotation results by exploring the inter-relationship among annotation concepts. Comprehensive experiments on TRECVID 2005 dataset have been conducted to demonstrate the effectiveness of the proposed annotation approach and to investigate the impact of various factors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号