首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Nowadays, numerous social videos have pervaded on the web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally, those contextual data such as tags are provided at the whole video level, without temporal indication of when they actually appear in the video, let alone the spatial annotation of object related tags in the video frames. However, many tags only describe parts of the video content. Therefore, tag localization, the process of assigning tags to the underlying relevant video segments or frames even regions in frames is gaining increasing research interests and a benchmark dataset for the fair evaluation of tag localization algorithms is highly desirable. In this paper, we describe and release a dataset called DUT-WEBV, which contains about 4,000 videos collected from YouTube portal by issuing 50 concepts as queries. These concepts cover a wide range of semantic aspects including scenes like “mountain”, events like “flood”, objects like “cows”, sites like “gas station”, and activities like “handshaking”, offering great challenges to the tag (i.e., concept) localization task. For each video of a tag, we carefully annotate the time durations when the tag appears in the video and also label the spatial location of object with mask in frames for object related tag. Besides the video itself, the contextual information, such as thumbnail images, titles, and YouTube categories, is also provided. Together with this benchmark dataset, we present a baseline for tag localization using multiple instance learning approach. Finally, we discuss some open research issues for tag localization in web videos.  相似文献   

2.
The massive web videos prompt an imperative demand on efficiently grasping the major events.However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and the unavoidable error in near-duplicate keyframes (NDKs) detection, make web video event mining a challenging task.In this paper, we propose a novel four-stage framework to improve the performance of web video event mining.Data preprocessing is the first stage.Multiple Correspondence Analysis (MCA) is then applied to explore the correlation between terms and classes, targeting for bridging the gap between NDKs and high-level semantic concepts.Next, co-occurrence information is used to detect the similarity between NDKs and classes using the NDK-within-video information.Finally, both of them are integrated for web video event mining through negative NDK pruning and positive NDK enhancement.Moreover, both NDKs and terms with relatively low frequencies are treated as useful information in our experiments.Experimental results on large-scale web videos from YouTube demonstrate that the proposed framework outperforms several existing mining methods and obtains good results for web video event mining.  相似文献   

3.
With the exponential growth of social media, there exist huge numbers of near-duplicate web videos, ranging from simple formatting to complex mixture of different editing effects. In addition to the abundant video content, the social Web provides rich sets of context information associated with web videos, such as thumbnail image, time duration and so on. At the same time, the popularity of Web 2.0 demands for timely response to user queries. To balance the speed and accuracy aspects, in this paper, we combine the contextual information from time duration, number of views, and thumbnail images with the content analysis derived from color and local points to achieve real-time near-duplicate elimination. The results of 24 popular queries retrieved from YouTube show that the proposed approach integrating content and context can reach real-time novelty re-ranking of web videos with extremely high efficiency, where the majority of duplicates can be rapidly detected and removed from the top rankings. The speedup of the proposed approach can reach 164 times faster than the effective hierarchical method proposed in , with just a slight loss of performance.  相似文献   

4.
5.
Together with the explosive growth of web video in sharing sites like YouTube, automatic topic discovery and visualization have become increasingly important in helping to organize and navigate such large-scale videos. Previous work dealt with the topic discovery and visualization problem separately, and did not take fully into account of the distinctive characteristics of multi-modality and sparsity in web video features. This paper tries to solve web video topic discovery problem with visualization under a single framework, and proposes a Star-structured K-partite Graph based co-clustering and ranking framework, which consists of three stages: (1) firstly, represent the web videos and their multi-model features (e.g., keyword, near-duplicate keyframe, near-duplicate aural frame, etc.) as a Star-structured K-partite Graph; (2) secondly, group videos and their features simultaneously into clusters (topics) and organize the generated clusters as a linked cluster network; (3) finally, rank each type of nodes in the linked cluster network by “popularity” and visualize them as a novel interface to let user interactively browse topics in multi-level scales. Experiments on a YouTube benchmark dataset demonstrate the flexibility and effectiveness of our proposed framework.  相似文献   

6.
Web video categorization is a fundamental task for web video search. In this paper, we explore web video categorization from a new perspective, by integrating the model-based and data-driven approaches to boost the performance. The boosting comes from two aspects: one is the performance improvement for text classifiers through query expansion from related videos and user videos. The model-based classifiers are built based on the text features extracted from title and tags. Related videos and user videos act as external resources for compensating the shortcoming of the limited and noisy text features. Query expansion is adopted to reinforce the classification performance of text features through related videos and user videos. The other improvement is derived from the integration of model-based classification and data-driven majority voting from related videos and user videos. From the data-driven viewpoint, related videos and user videos are treated as sources for majority voting from the perspective of video relevance and user interest, respectively. Semantic meaning from text, video relevance from related videos, and user interest induced from user videos, are combined to robustly determine the video category. Their combination from semantics, relevance and interest further improves the performance of web video categorization. Experiments on YouTube videos demonstrate the significant improvement of the proposed approach compared to the traditional text based classifiers.  相似文献   

7.
Web 2.0 tools in general, and Web video in particular, provide new ways for activists to express their viewpoints to a broad audience. In this paper we deployed tools that have been used to find subgroups automatically in social networks and applied them to the problem of distinguishing between two sides of a controversial issue based on patterns of online interaction. We explored the problem of distinguishing between anti‐ and pro‐vaccination activists based on a social network of videos and associated comments posted on YouTube. Videos for the analysis were selected by submitting the term “vaccination” to a search on YouTube. A content analysis of the selected videos was then performed ( Keelan et al, 2007 ) to classify videos as pro‐ or anti‐vaccination. Then, a modified version of the SCAN method ( Chin and Chignell, 2008 ) for identifying cohesive subgroups in social networks was applied to the social network inferred from the discussions about the videos. Results showed that a cohesive subgroup of anti‐vaccination people existed in discussions around anti‐vaccination videos, whereas discussions around pro‐vaccination videos included both anti‐vaccination and pro‐vaccination people. Implications of the method and results for more general delineation of types of medical activism and the opposing camps within those camps are discussed.  相似文献   

8.
We present a simple yet effective approach for human action recognition. Most of the existing solutions based on multi-class action classification aim to assign a class label for the input video. However, the variety and complexity of real-life videos make it very challenging to achieve high classification accuracy. To address this problem, we propose to partition the input video into small clips and formulate action recognition as a joint decision-making task. First, we partition all videos into two equal segments that are processed in the same manner. We repeat this procedure to obtain three layers of video subsegments, which are then organized in a binary tree structure. We train separate classifiers for each layer. By applying the corresponding classifiers to video subsegments, we obtain a decision value matrix (DVM). Then, we construct an aggregated representation for the original full-length video by integrating the elements of the DVM. Finally, we train a new action recognition classifier based on the DVM representation. Our extensive experimental evaluations demonstrate that the proposed method achieves significant performance improvement against several compared methods on two benchmark datasets.  相似文献   

9.
Video streaming over wireless networks is becoming increasingly important for a variety of applications. To accommodate the dynamic change of wireless network bandwidths, Quality of Service (QoS) scalable video streams need to be provided. This paper presents a system of content-adaptive streaming of instructional (lecture) videos over wireless networks for E-learning applications. We first provide a real-time content analysis method to detect and extract content regions from instructional videos, then apply a “leaking-video-buffer” model to adjust QoS of video streams dynamically based on video content. In content-adaptive video streaming, an adaptive feedback control scheme is also developed to transmit properly compressed video streams to video clients not only based on network bandwidth, but also based on video content and the preferences of users. Finally, we demonstrate the scalability and content adaptiveness of the proposed video streaming system with experimental results on several instructional videos.  相似文献   

10.
Video streaming over wireless networks is becoming increasingly important for a variety of applications. To accommodate the dynamic change of wireless network bandwidths, Quality of Service (QoS) scalable video streams need to be provided. This paper presents a system of content-adaptive streaming of instructional (lecture) videos over wireless networks for E-learning applications. We first provide a real-time content analysis method to detect and extract content regions from instructional videos, then apply a “leaking-video-buffer” model to adjust QoS of video streams dynamically based on video content. In content-adaptive video streaming, an adaptive feedback control scheme is also developed to transmit properly compressed video streams to video clients not only based on network bandwidth, but also based on video content and the preferences of users. Finally, we demonstrate the scalability and content adaptiveness of the proposed video streaming system with experimental results on several instructional videos.  相似文献   

11.
Video in digital format is now commonplace and widespread in both professional use, and in domestic consumer products from camcorders to mobile phones. Video content is growing in volume and while we can capture, compress, store, transmit and display video with great facility, editing videos and manipulating them based on their content is still a non-trivial activity. In this paper, we give a brief review of the state of the art of video analysis, indexing and retrieval and we point to research directions which we think are promising and could make searching and browsing of video archives based on video content, as easy as searching and browsing (text) web pages. We conclude the paper with a list of grand challenges for researchers working in the area.  相似文献   

12.
In this work we are concerned with detecting non-collaborative videos in video sharing social networks. Specifically, we investigate how much visual content-based analysis can aid in detecting ballot stuffing and spam videos in threads of video responses. That is a very challenging task, because of the high-level semantic concepts involved; of the assorted nature of social networks, preventing the use of constrained a priori information; and, which is paramount, of the context-dependent nature of non-collaborative videos. Content filtering for social networks is an increasingly demanded task: due to their popularity, the number of abuses also tends to increase, annoying the user and disrupting their services. We propose two approaches, each one better adapted to a specific non-collaborative action: ballot stuffing, which tries to inflate the popularity of a given video by giving “fake” responses to it, and spamming, which tries to insert a non-related video as a response in popular videos. We endorse the use of low-level features combined into higher-level features representation, like bag-of-visual-features and latent semantic analysis. Our experiments show the feasibility of the proposed approaches.  相似文献   

13.
A vast amount of social feedback expressed via ratings (i.e., likes and dislikes) and comments is available for the multimedia content shared through Web 2.0 platforms. However, the potential of such social features associated with shared content still remains unexplored in the context of information retrieval. In this paper, we first study the social features that are associated with the top-ranked videos retrieved from the YouTube video sharing site for the real user queries. Our analysis considers both raw and derived social features. Next, we investigate the effectiveness of each such feature for video retrieval and the correlation between the features. Finally, we investigate the impact of the social features on the video retrieval effectiveness using state-of-the-art learning to rank approaches. In order to identify the most effective features, we adopt a new feature selection strategy based on the Maximal Marginal Relevance (MMR) method, as well as utilizing an existing strategy. In our experiments, we treat popular and rare queries separately and annotate 4,969 and 4,949 query-video pairs from each query type, respectively. Our findings reveal that incorporating social features is a promising approach for improving the retrieval performance for both types of queries.  相似文献   

14.
Using string matching to detect video transitions   总被引:2,自引:0,他引:2  
The detection of shot boundaries in videos captures the structure of the image sequences by the identification of transitional effects. This task is important in the video indexing and retrieval domain. The video slice or visual rhythm is a single two-dimensional image sampling that has been used to detect several types of video events, including transitions. We use the longest common subsequence (LCS) between two strings to transform the video slice into one-dimensional signals obtaining a highly simplified representation of the video content. We also developed a chain of mathematical morphology operations over these signals leading to the detection of the most frequent video transitions, namely, cut, fade, and wipe. The algorithms are tested with success with various genres of videos.  相似文献   

15.
Liu  Zihe  Hou  Weiying  Zhang  Jiayi  Cao  Chenyu  Wu  Bin 《Multimedia Tools and Applications》2022,81(4):4909-4934

Automatically interpreting social relations, e.g., friendship, kinship, etc., from visual scenes has huge potential application value in areas such as knowledge graphs construction, person behavior and emotion analysis, entertainment ecology, etc. Great progress has been made in social analysis based on structured data. However, existing video-based methods consider social relationship extraction as a general classification task and categorize videos into only predefined types. Such methods are unable to recognize multiple relations in multi-person videos, which is obviously not consistent with the actual application scenarios. At the same time, videos are inherently multimodal. Subtitles in the video also provide abundant cues for relationship recognition that is often ignored by researchers. In this paper, we introduce and define a new task named “Multiple-Relation Extraction in Videos (MREV)”. To solve the MREV task, we propose the Visual-Textual Fusion (VTF) framework for jointly modeling visual and textual information. For the spatial representation, we not only adopt a SlowFast network to learn global action and scene information, but also exploit the unique cues of face, body and dialogue between characters. For the temporal domain, we propose a Temporal Feature Aggregation module to perform temporal reasoning, which assesses the quality of different frames adaptively. After that, we use a Multi-Conv Attention module to capture the inter-modal correlation and map the features of different modes to a coordinated feature space. By this means, our VTF framework comprehensively exploits abundant multimodal cues for the MREV task and achieves 49.2% and 50.4% average accuracy on a self-constructed Video Multiple-Relation(VMR) dataset and ViSR dataset, respectively. Extensive experiments on VMR dataset and ViSR dataset demonstrate the effectiveness of the proposed framework.

  相似文献   

16.
SweetWiki: A semantic wiki   总被引:1,自引:0,他引:1  
Everyone agrees that user interactions and social networks are among the cornerstones of “Web 2.0”. Web 2.0 applications generally run in a web browser, propose dynamic content with rich user interfaces, offer means to easily add or edit content of the web site they belong to and present social network aspects. Well-known applications that have helped spread Web 2.0 are blogs, wikis, and image/video sharing sites; they have dramatically increased sharing and participation among web users. It is possible to build knowledge using tools that can help analyze users’ behavior behind the scenes: what they do, what they know, what they want. Tools that help share this knowledge across a network, and that can reason on that knowledge, will lead to users who can better use the knowledge available, i.e., to smarter users. Wikipedia, a wildly successful example of web technology, has helped knowledge-sharing between people by letting individuals freely create and modify its content. But Wikipedia is designed for people—today's software cannot understand and reason on Wikipedia's content. In parallel, the “semantic web”, a set of technologies that help knowledge-sharing across the web between different applications, is starting to gain attraction. Researchers have only recently started working on the concept of a “semantic wiki”, mixing the advantages of the wiki and the technologies of the semantic web. In this paper we will present a state-of-the-art of semantic wikis, and we will introduce SweetWiki, an example of an application reconciling two trends of the future web: a semantically augmented web and a web of social applications where every user is an active provider as well as a consumer of information. SweetWiki makes heavy use of semantic web concepts and languages, and demonstrates how the use of such paradigms can improve navigation, search, and usability.  相似文献   

17.
Video recommendation is an important tool to help people access interesting videos. In this paper, we propose a universal scheme to integrate rich information for personalized video recommendation. Our approach regards video recommendation as a ranking task. First, it generates multiple ranking lists by exploring different information sources. In particular, one novel source user’s relationship strength is inferred through the online social network and applied to recommend videos. Second, based on multiple ranking lists, a multi-task rank aggregation approach is proposed to integrate these ranking lists to generate a final result for video recommendation. It is shown that our scheme is flexible that can easily incorporate other methods by adding their generated ranking lists into our multi-task rank aggregation approach. We conduct experiments on a large dataset with 76 users and more than 11,000 videos. The experimental results demonstrate the feasibility and effectiveness of our approach.  相似文献   

18.
This paper introduces a workload characterization study of the most popular short video sharing service of Web 2.0, YouTube. Based on a vast amount of data gathered in a five-month period, we analyzed characteristics of around 250,000 YouTube popular and regular videos. In particular, we collected lists of related videos for each video clip recursively and analyzed their statistical behavior. Understanding YouTube traffic and similar Web 2.0 video sharing sites is crucial to develop synthetic workload generators. Workload simulators are required for evaluating the methods addressing the problems of high bandwidth usage and scalability of Web 2.0 sites such as YouTube. The distribution models, in particular Zipf-like behavior of YouTube popular video files suggests proxy caching of YouTube popular videos can reduce network traffic and increase scalability of YouTube Web site. YouTube workload characteristics provided in this work enabled us to develop a workload generator to evaluate the effectiveness of this approach.  相似文献   

19.
With the rapid development of WiFi and 3G/4G, people tend to view videos on mobile devices. These devices are ubiquitous but have small memory to cache videos. As a result, in contrast to traditional computers, these devices aggravate the network pressure of content providers. Previous studies use CDN to solve this problem. But its static leasing mechanism in which the rental space cannot be dynamically adjusted makes the operational cost soar and incompatible with the dynamically video delivery. In our study, based on a thorough analysis of user behavior from Tencent Video, a popular Chinese on-line video share platform, we identify two key user behaviors. Firstly, lots of users in the same region tend to watch the same video. Secondly, the popularity distribution of videos conforms with the Pareto principle, i.e., the top 20% popular videos own 80% of all video traffic. To turn these observations into silver bullet, we propose and implement a novel cloud- and peer-assisted video on demand system (CPA-VoD). In the system, we group users in the same region as a peer swarm, and in the same peer swarm, users can provide videos to other users by sharing their cached videos. Besides, we cache the 10% most popular videos in cloud servers to further alleviate the network pressure. We choose cloud servers to cache videos because the rental space can be dynamically adjusted. According to the evaluation on a real dataset from Tencent Video, CPA-VoD alleviates the network pressure and the operation cost excellently, while only 20.9% traffic is serviced by the content provider.  相似文献   

20.
现有的视频去重技术多样,但字幕这一与视频内容能高度匹配的重要信息并未被考虑到其中。提出一种针对含内嵌字幕视频进行去重的方法,并在三大视频网页中得到了该方法的再去重效果。首先将相应网页视频中的字幕经过OTC处理将其文档化,再规范文档,最后设定一个界值,对网页进行去重筛选。类比于网页文本的去重方法,基于文本内容的去重工作可以大大改善去重的效果,考虑到视频中人物对话内容的唯一性,我们可以根据视频字幕内容来进行去重,从而得到更为精准的视频去重结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号