期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Haojie Li Bin Liu Lei Yi Yue Guan Zhong-Xuan Luo 《Multimedia Systems》2016,22(4):405-412

Nowadays, numerous social videos have pervaded on the web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally, those contextual data such as tags are provided at the whole video level, without temporal indication of when they actually appear in the video, let alone the spatial annotation of object related tags in the video frames. However, many tags only describe parts of the video content. Therefore, tag localization, the process of assigning tags to the underlying relevant video segments or frames even regions in frames is gaining increasing research interests and a benchmark dataset for the fair evaluation of tag localization algorithms is highly desirable. In this paper, we describe and release a dataset called DUT-WEBV, which contains about 4,000 videos collected from YouTube portal by issuing 50 concepts as queries. These concepts cover a wide range of semantic aspects including scenes like “mountain”, events like “flood”, objects like “cows”, sites like “gas station”, and activities like “handshaking”, offering great challenges to the tag (i.e., concept) localization task. For each video of a tag, we carefully annotate the time durations when the tag appears in the video and also label the spatial location of object with mask in frames for object related tag. Besides the video itself, the contextual information, such as thumbnail images, titles, and YouTube categories, is also provided. Together with this benchmark dataset, we present a baseline for tag localization using multiple instance learning approach. Finally, we discuss some open research issues for tag localization in web videos. 相似文献

2.

Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis

Feng Wang Chong-Wah Ngo Ting-Chuen Pong 《Pattern recognition》2008,41(10):3257-3269

This paper presents an unified approach in analyzing and structuring the content of videotaped lectures for distance learning applications. By structuring lecture videos, we can support topic indexing and semantic querying of multimedia documents captured in the traditional classrooms. Our goal in this paper is to automatically construct the cross references of lecture videos and textual documents so as to facilitate the synchronized browsing and presentation of multimedia information. The major issues involved in our approach are topical event detection, video text analysis and the matching of slide shots and external documents. In topical event detection, a novel transition detector is proposed to rapidly locate the slide shot boundaries by computing the changes of text and background regions in videos. For each detected topical event, multiple keyframes are extracted for video text detection, super-resolution reconstruction, binarization and recognition. A new approach for the reconstruction of high-resolution textboxes based on linear interpolation and multi-frame integration is also proposed for the effective binarization and recognition. The recognized characters are utilized to match the video slide shots and external documents based on our proposed title and content similarity measures. 相似文献

3.

Recognizing human–human interaction activities using visual and textual information

Sunyoung Cho Sooyeong Kwak Hyeran Byun 《Pattern recognition letters》2013

相似文献

4.

Effect of metacognitive strategies and verbal-imagery cognitive style on biology-based video search and learning performance

《Computers & Education》2015

Videos have diverse content that can assist students in learning. However, because videos are linear media, video users may take a longer time than readers of text to evaluate the context. Therefore, the process of video search may vary from one user to another depending on the users' individual characteristics, and the effectiveness of video learning may also vary across individuals. This study evaluated 100 Taiwanese fifth graders searching for videos related to “understanding animals” on YouTube and examined the effects of the students' metacognitive strategies (planning, monitoring, and evaluating) and verbal-imagery cognitive style on their video searches. The observable indicators were quantitative (search behaviors, search performance, and learning performance) and qualitative (search process observations and interviews). The study concludes that metacognitive strategy is the primary influencer of video search. Students with better metacognitive skills used fewer keywords, browsed fewer videos, and spent less time evaluating videos, but they achieved higher learning performance. They reviewed the video metadata information on the user interface and did not attempt to watch videos on the video recommendation lists, particularly videos that were irrelevant to the task requirements. During the course of the searches, keyword usage had a significant influence on the students' search performance and learning performance. The fewer keywords the students used, the better search and learning performance they were able to achieve. Our results are different from those of previous studies on text, image, and map searches. Accordingly, users must adopt different search strategies when using various types of search engines. 相似文献

5.

A Web video retrieval method using hierarchical structure of Web video groups

Ryosuke Harakawa Takahiro Ogawa Miki Haseyama 《Multimedia Tools and Applications》2016,75(24):17059-17079

In this paper, we propose a Web video retrieval method that uses hierarchical structure of Web video groups. Existing retrieval systems require users to input suitable queries that identify the desired contents in order to accurately retrieve Web videos; however, the proposed method enables retrieval of the desired Web videos even if users cannot input the suitable queries. Specifically, we first select representative Web videos from a target video dataset by using link relationships between Web videos obtained via metadata “related videos” and heterogeneous video features. Furthermore, by using the representative Web videos, we construct a network whose nodes and edges respectively correspond to Web videos and links between these Web videos. Then Web video groups, i.e., Web video sets with similar topics are hierarchically extracted based on strongly connected components, edge betweenness and modularity. By exhibiting the obtained hierarchical structure of Web video groups, users can easily grasp the overview of many Web videos. Consequently, even if users cannot write suitable queries that identify the desired contents, it becomes feasible to accurately retrieve the desired Web videos by selecting Web video groups according to the hierarchical structure. Experimental results on actual Web videos verify the effectiveness of our method. 相似文献

6.

A framework for dynamic restructuring of semantic video analysis systems based on learning attention control

《Image and vision computing》2016

相似文献

7.

1D representation of Isomap for united video coding

Honggui?Li Email author 《Multimedia Systems》2018,24(3):297-312

This paper proposes a 1D representation of isometric feature mapping (Isomap) based united video coding algorithms. First, 1D Isomap representations that maintain distances are generated which can achieve a very high compression ratio. Next, embedding and reconstruction algorithms for the 1D Isomap representation are presented that can transform samples from a high-dimensional space to a low-dimensional space and vice versa. Then, dictionary learning algorithms for training samples are proposed to compress the input samples. Finally, a unified coding framework for diverse videos based on a 1D Isomap representation is built. The proposed methods make full use of correlations between internal and external videos, which are not considered by classical methods. Simulation experiments have shown that the proposed methods can obtain higher peak signal-to-noise ratios than standard highly efficient video coding for similar bit per pixel levels in the low bit rate situation. 相似文献

8.

A rank aggregation framework for video multimodal geocoding

Lin Tzy Li Daniel Carlos Guimarães Pedronette Jurandy Almeida Otávio A. B. Penatti Rodrigo Tripodi Calumby Ricardo da Silva Torres 《Multimedia Tools and Applications》2014,73(3):1323-1359

相似文献

9.

Energy optimization for mobile video streaming via an aggregate model

Yantao Li Du Shen Gang Zhou 《Multimedia Tools and Applications》2017,76(20):20781-20797

Wireless video streaming on smartphones drains a significantly large fraction of battery energy, which is primarily consumed by wireless network interfaces for downloading unused data and repeatedly switching radio interface. In this paper, we propose an energy-efficient download scheduling algorithm for video streaming based on an aggregate model that utilizes user’s video viewing history to predict user behavior when watching a new video, thereby minimizing wasted energy when streaming over wireless network interfaces. The aggregate model is constructed by a personal retention model with users’ personal viewing history and the audience retention on crowd-sourced viewing history, which can accurately predict the user behavior of watching videos by balancing “user interest” and “video attractiveness”. We evaluate different users streaming multiple videos in various wireless environments and the results illustrate that the aggregate model can help reduce energy waste by 20 % on average. In addition, we also discuss implementation details and extensions, such as dynamically updating personal retention, balancing audience and personal retention, categorizing videos for accurate model. 相似文献

10.

Catching a viral video

Tom Broxton Yannet Interian Jon Vaver Mirjam Wattenhofer 《Journal of Intelligent Information Systems》2013,40(2):241-259

The sharing and re-sharing of videos on social sites, blogs e-mail, and other means has given rise to the phenomenon of viral videos—videos that become popular through internet sharing. In this paper we seek to better understand viral videos on YouTube by analyzing sharing and its relationship to video popularity using millions of YouTube videos. The socialness of a video is quantified by classifying the referrer sources for video views as social (e.g. an emailed link, Facebook referral) or non-social (e.g. a link from related videos). We find that viewership patterns of highly social videos are very different from less social videos. For example, the highly social videos rise to, and fall from, their peak popularity more quickly than less social videos. We also find that not all highly social videos become popular, and not all popular videos are highly social. By using our insights on viral videos we are able develop a method for ranking blogs and websites on their ability to spread viral videos. 相似文献

11.

SurvSurf: human retrieval on large surveillance video data

Sihao Ding Gang Li Ying Li Xinfeng Li Qiang Zhai Adam C. Champion Junda Zhu Dong Xuan Yuan F. Zheng 《Multimedia Tools and Applications》2017,76(5):6521-6549

The volume of surveillance videos is increasing rapidly, where humans are the major objects of interest. Rapid human retrieval in surveillance videos is therefore desirable and applicable to a broad spectrum of applications. Existing big data processing tools that mainly target textual data cannot be applied directly for timely processing of large video data due to three main challenges: videos are more data-intensive than textual data; visual operations have higher computational complexity than textual operations; and traditional segmentation may damage video data’s continuous semantics. In this paper, we design SurvSurf, a human retrieval system on large surveillance video data that exploits characteristics of these data and big data processing tools. We propose using motion information contained in videos for video data segmentation. The basic data unit after segmentation is called M-clip. M-clips help remove redundant video contents and reduce data volumes. We use the MapReduce framework to process M-clips in parallel for human detection and appearance/motion feature extraction. We further accelerate vision algorithms by processing only sub-areas with significant motion vectors rather than entire frames. In addition, we design a distributed data store called V-BigTable to structuralize M-clips’ semantic information. V-BigTable enables efficient retrieval on a huge amount of M-clips. We implement the system on Hadoop and HBase. Experimental results show that our system outperforms basic solutions by one order of magnitude in computational time with satisfactory human retrieval accuracy. 相似文献

12.

Generating web-based corpora for video transcripts categorization

José M. Perea-Ortega Arturo Montejo-Ráez M. Teresa Martín-Valdivia L. Alfonso Ureña-López 《Expert systems with applications》2013,40(1):337-344

相似文献

13.

Self-labeling video prediction

《Displays》2023

Learning to predict future visual dynamics given input video sequences is a challenging but essential task. Although many stochastic video prediction models are proposed, they still suffer from “multi-modal entanglement”, which refers to the ambiguity of learned representations for multi-modal dynamics modeling. While most existing video prediction models are label-free, we propose a self-supervised labeling strategy to improve spatiotemporal prediction networks without extra supervision. Starting from a set of clustered pseudo-labels, our framework alternates between model optimization and label updating. The key insight of our method lies in that we exploit the reconstruction error from the optimized model itself as an indicator to progressively refine the label assignment on the training set. The two steps are interdependent, with the predictive model guiding the direction of label updates, and in turn, effective pseudo-labels further help the model learn better disentangled multi-modal representation. Experiments on two different video prediction datasets demonstrate the effectiveness of the proposed method. 相似文献

14.

Features extraction for soccer video semantic analysis: current achievements and remaining issues

Amjad Rehman Tanzila Saba 《Artificial Intelligence Review》2014,41(3):451-461

This paper presents a state of the art review of features extraction for soccer video summarization research. The all existing approaches with regard to event detection, video summarization based on video stream and application of text sources in event detection have been surveyed. As regard the current challenges for automatic and real time provision of summary videos, different computer vision approaches are discussed and compared. Audio, video feature extraction methods and their combination with textual methods have been investigated. Available commercial products are presented to better clarify the boundaries in this domain and future directions for improvement of existing systems have been suggested. 相似文献

15.

Hierarchical video content description and summarization using unified semantic and visual similarity

Xingquan?Zhu Email author Jianping?Fan Ahmed?K.?Elmagarmid Xindong?Wu 《Multimedia Systems》2003,9(1):31-53

相似文献

16.

Recognizing key segments of videos for video annotation by learning from web image sets

Hao Song Xinxiao Wu Wei Liang Yunde Jia 《Multimedia Tools and Applications》2017,76(5):6111-6126

In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods. 相似文献

17.

Minimum-risk temporal alignment of videos

Zhen Wang Massimo Piccardi 《Multimedia Tools and Applications》2018,77(12):14891-14906

Temporal alignment of videos is an important requirement of tasks such as video comparison, analysis and classification. Most of the approaches proposed to date for video alignment leverage dynamic programming algorithms whose parameters are manually tuned. Conversely, this paper proposes a model that can learn its parameters automatically by minimizing a meaningful loss function over a given training set of videos and alignments. For learning, we exploit the effective framework of structural SVM and we extend it with an original scoring function that suitably scores the alignment of two given videos, and a loss function that quantifies the accuracy of a predicted alignment. The experimental results from four video action datasets show that the proposed model has been able to outperform a baseline and a state-of-the-art algorithm by a large margin in terms of alignment accuracy. 相似文献

18.

Multimedia event-based video indexing using time intervals 总被引：6，自引：0，他引：6

Snoek C.G.M. Worring M. 《Multimedia, IEEE Transactions on》2005,7(4):638-647

We propose the time interval multimedia event (TIME) framework as a robust approach for classification of semantic events in multimodal video documents. The representation used in TIME extends the Allen temporal interval relations and allows for proper inclusion of context and synchronization of the heterogeneous information sources involved in multimodal video analysis. To demonstrate the viability of our approach, it was evaluated on the domains of soccer and news broadcasts. For automatic classification of semantic events, we compare three different machine learning techniques, i.c. C4.5 decision tree, maximum entropy, and support vector machine. The results show that semantic video indexing results significantly benefit from using the TIME framework. 相似文献

19.

Context-based person identification framework for smart video surveillance

Liyan Zhang Dmitri V. Kalashnikov Sharad Mehrotra Ronen Vaisenberg 《Machine Vision and Applications》2014,25(7):1711-1725

Smart video surveillance (SVS) applications enhance situational awareness by allowing domain analysts to focus on the events of higher priority. SVS approaches operate by trying to extract and interpret higher “semantic” level events that occur in video. One of the key challenges of SVS is that of person identification where the task is for each subject that occurs in a video shot to identify the person it corresponds to. The problem of person identification is especially challenging in resource-constrained environments where transmission delay, bandwidth restriction, and packet loss may prevent the capture of high-quality data. Conventional person identification approaches which primarily are based on analyzing facial features are often not sufficient to deal with poor-quality data. To address this challenge, we propose a framework that leverages heterogeneous contextual information together with facial features to handle the problem of person identification for low-quality data. We first investigate the appropriate methods to utilize heterogeneous context features including clothing, activity, human attributes, gait, people co-occurrence, and so on. We then propose a unified approach for person identification that builds on top of our generic entity resolution framework called RelDC, which can integrate all these context features to improve the quality of person identification. This work thus links one well-known problem of person identification from the computer vision research area (that deals with video/images) with another well-recognized challenge known as entity resolution from the database and AI/ML areas (that deals with textual data). We apply the proposed solution to a real-world dataset consisting of several weeks of surveillance videos. The results demonstrate the effectiveness and efficiency of our approach even on low-quality video data. 相似文献

20.

Endoscopy video summarisation using novel relational motion histogram descriptor and semi-supervised clustering

Mohamed Maher Ben Ismail Ouiem Bchir 《人工智能实验与理论杂志》2016,28(4):629-653

相似文献