期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Creating video art with evolutionary algorithms

Teresa Chambel Luís Correia Jnatas Manzolli Gonalo Dias Miguel Nuno A.C. Henriques Nuno Correia 《Computers & Graphics》2007,31(6):837-847

The boundaries of art are subjective, but the impetus for art is often associated with creativity, regarded with wonder and admiration along human history. Most interesting activities and their products are a result of creativity. The main goal of our approach is to explore new creative ways of editing and producing videos, using evolutionary algorithms. A creative evolutionary system makes use of evolutionary computation operators and properties and is designed to aid our own creative processes, and to generate results to problems that traditionally required creative people to solve. Our system is able to generate new videos or to help a user in doing so. New video sequences are combined and selected, based on their characteristics represented as video annotations, either by defining criteria or by interactively performing selections in the evolving population of video clips, in forms that can reflect editing styles. With evolving video, the clips can be explored through emergent narratives and aesthetics in ways that may reveal or inspire creativity in digital art. 相似文献

2.

Weakly supervised temporal action localization with proxy metric modeling

Hongsheng XU Zihan CHEN Yu ZHANG Xin GENG Siya MI Zhihong YANG 《Frontiers of Computer Science》2023,17(2):172309

Temporal localization is crucial for action video recognition. Since the manual annotations are expensive and time-consuming in videos, temporal localization with weak video-level labels is challenging but indispensable. In this paper, we propose a weakly-supervised temporal action localization approach in untrimmed videos. To settle this issue, we train the model based on the proxies of each action class. The proxies are used to measure the distances between action segments and different original action features. We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds. Compared with state-of-the-art methods, our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets. 相似文献

3.

Integrated Video and Text for Content-based Access to Video Databases

Jiang Haitao Montesi Danilo Elmagarmid Ahmed K. 《Multimedia Tools and Applications》1999,9(3):227-249

This paper introduces a new approach to realize video databases. The approach consists of a VideoText data model based on free text annotations associated with logical video segments and a corresponding query language. Traditional database techniques are inadequate for exploiting queries on unstructured data such as video, supporting temporal queries, and ranking query results according to their relevance to the query. In this paper, we propose to use information retrieval techniques to provide such features and to extend the query language to accommodate interval queries that are particularly suited to video data. Algorithms are provided to show how user queries are evaluated. Finally, a generic and modular video database architecture which is based on VideoText data model is described. 相似文献

4.

Learning automatic concept detectors from online video

《Computer Vision and Image Understanding》2010,114(4):429-438

Concept detection is targeted at automatically labeling video content with semantic concepts appearing in it, like objects, locations, or activities. While concept detectors have become key components in many research prototypes for content-based video retrieval, their practical use is limited by the need for large-scale annotated training sets. To overcome this problem, we propose to train concept detectors on material downloaded from web-based video sharing portals like YouTube, such that training is based on tags given by users during upload, no manual annotation is required, and concept detection can scale up to thousands of concepts. On the downside, web video as training material is a complex domain, and the tags associated with it are weak and unreliable. Consequently, performance loss is to be expected when replacing high-quality state-of-the-art training sets with web video content.This paper presents a concept detection prototype named TubeTagger that utilizes YouTube content for an autonomous training. In quantitative experiments, we compare the performance when training on web video and on standard datasets from the literature. It is demonstrated that concept detection in web video is feasible, and that – when testing on YouTube videos – the YouTube-based detector outperforms the ones trained on standard training sets. By applying the YouTube-based prototype to datasets from the literature, we further demonstrate that: (1) If training annotations on the target domain are available, the resulting detectors significantly outperform the YouTube-based tagger. (2) If no annotations are available, the YouTube-based detector achieves comparable performance to the ones trained on standard datasets (moderate relative performance losses of 11.4% is measured) while offering the advantage of a fully automatic, scalable learning. (3) By enriching conventional training sets with online video material, performance improvements of 11.7% can be achieved when generalizing to domains unseen in training. 相似文献

5.

Smooth Loops from Unconstrained Video

下载免费PDF全文

L. Sevilla‐Lara J. Wulff K. Sunkavalli E. Shechtman 《Computer Graphics Forum》2015,34(4):99-107

Converting unconstrained video sequences into videos that loop seamlessly is an extremely challenging problem. In this work, we take the first steps towards automating this process by focusing on an important subclass of videos containing a single dominant foreground object. Our technique makes two novel contributions over previous work: first, we propose a correspondence‐based similarity metric to automatically identify a good transition point in the video where the appearance and dynamics of the foreground are most consistent. Second, we develop a technique that aligns both the foreground and background about this transition point using a combination of global camera path planning and patch‐based video morphing. We demonstrate that this allows us to create natural, compelling, loopy videos from a wide range of videos collected from the internet. 相似文献

6.

Behavior Discovery and Alignment of Articulated Object Classes from Unstructured Video

Luca Del Pero Susanna Ricco Rahul Sukthankar Vittorio Ferrari 《International Journal of Computer Vision》2017,121(2):303-325

相似文献

7.

Modeling and Management of Fuzzy Information in Multimedia Database Applications 总被引：1，自引：0，他引：1

Ramazan Savaş Aygün Adnan Yazici 《Multimedia Tools and Applications》2004,24(1):29-56

In this paper, we firstly present a conceptual data model for multimedia database applications based on ExIFO₂ model. The ExIFO₂ data model is chosen as the conceptual model since it handles complex objects along with their uncertain and imprecise properties. We enhanced this conceptual model in order to meet the multimedia data requirements. In addition to uncertain and imprecise information, we present a way of handling relationships among objects of multimedia database applications. Events that might be extracted from video or audio are also considered in this study. Secondly, the conceptual model is mapped to a logical model, which the fuzzy object-oriented data (FOOD) model is chosen, for storing and manipulating the multimedia objects. This mapping is done in a way that it preserves most of the information represented at the conceptual level. Finally, in this study videos of football (soccer) games is selected as the multimedia database application to show how we handle crisp and fuzzy querying and retrieval of fuzzy and crisp data from the database. A program has been developed to draw ExIFO₂ schemas and to map the schema to FOOD code automatically. 相似文献

8.

Using objective ground-truth labels created by multiple annotators for improved video classification: A comparative study

Gaurav Srivastava Josiah A. Yoder Johnny Park Avinash C. Kak 《Computer Vision and Image Understanding》2013,117(10):1384-1399

We address the problem of predicting category labels for unlabeled videos in a large video dataset by using a ground-truth set of objectively labeled videos that we have created. Large video databases like YouTube require that a user uploading a new video assign to it a category label from a prescribed set of labels. Such category labeling is likely to be corrupted by the subjective biases of the uploader. Despite their noisy nature, these subjective labels are frequently used as gold standard in algorithms for multimedia classification and retrieval. Our goal in this paper is NOT to propose yet another algorithm that predicts labels for unseen videos based on the subjective ground-truth. On the other hand, our goal is to demonstrate that the video classification performance can be improved if instead of using subjective labels, we first create an objectively labeled ground-truth set of videos and then train a classifier based on such a ground-truth so as to predict objective labels for the set of unlabeled videos. 相似文献

9.

Online annotation of faces in personal videos by sequential learning 总被引：1，自引：1，他引：0

M. C. Yilmazturk I. Ulusoy N. K. Cicekli 《Multimedia Tools and Applications》2013,63(3):591-613

This paper addresses semi-automatic annotation of faces in personal videos. Different from previous offline annotation systems, this paper studies online annotation of faces. During an annotation session, few annotations are requested from the user only for some part of the video online. These annotations are used to train a system that will perform annotation automatically for the rest of the video. The automatic annotation results are presented to the user during the same session and the user is allowed to correct any automatic annotation mistakes. Thus, only fast and accurate face recognition methods are considered. Instead of batch learning, which has been used in the existing annotation systems, this paper proposes sequential learning methods to be used as face classifiers. Different classification methods are tested with various feature extraction methods using the same database so that a fair comparison is made among them. The results are evaluated in terms of recognition accuracies and execution time requirements. 相似文献

10.

A Web video retrieval method using hierarchical structure of Web video groups

Ryosuke Harakawa Takahiro Ogawa Miki Haseyama 《Multimedia Tools and Applications》2016,75(24):17059-17079

In this paper, we propose a Web video retrieval method that uses hierarchical structure of Web video groups. Existing retrieval systems require users to input suitable queries that identify the desired contents in order to accurately retrieve Web videos; however, the proposed method enables retrieval of the desired Web videos even if users cannot input the suitable queries. Specifically, we first select representative Web videos from a target video dataset by using link relationships between Web videos obtained via metadata “related videos” and heterogeneous video features. Furthermore, by using the representative Web videos, we construct a network whose nodes and edges respectively correspond to Web videos and links between these Web videos. Then Web video groups, i.e., Web video sets with similar topics are hierarchically extracted based on strongly connected components, edge betweenness and modularity. By exhibiting the obtained hierarchical structure of Web video groups, users can easily grasp the overview of many Web videos. Consequently, even if users cannot write suitable queries that identify the desired contents, it becomes feasible to accurately retrieve the desired Web videos by selecting Web video groups according to the hierarchical structure. Experimental results on actual Web videos verify the effectiveness of our method. 相似文献

11.

Video archaeology: understanding video manipulation history

Junge Shen Tao Mei Xinbo Gao 《Multimedia Tools and Applications》2013,63(2):461-483

Facing the explosive growth of near-duplicate videos, video archaeology is quite desired to investigate the history of the manipulations on these videos. With the determination of derived videos according to the manipulations, a video migration map can be constructed with the pair-wise relationships in a set of near-duplicate videos. In this paper, we propose an improved video archaeology (I-VA) system by extending our previous work (Shen et al. 2010). The extensions include more comprehensive video manipulation detectors and improved techniques for these detectors. Specially, the detectors are used for two categories of manipulations, i.e., semantic-based manipulations and non-semantic-based manipulations. Moreover, the improved detecting algorithms are more stable. The key of I-VA is the construction of a video migration map, which represents the history of how near-duplicate videos have been manipulated. There are various applications based on the proposed I-VA system, such as better understanding of the meaning and context conveyed by the manipulated videos, improving current video search engines by better presentation based on the migration map, and better indexing scheme based on the annotation propagation. The system is tested on a collection of 12,790 videos and 3,481 duplicates. The experimental results show that I-VA can discover the manipulation relation among the near-duplicate videos effectively. 相似文献

12.

A Multicast-Enabled Delivery Framework for QoE Assurance of Over-The-Top Services in Multimedia Access Networks

Niels Bouten Steven Latré Wim Van de Meerssche Bart De Vleeschauwer Koen De Schepper Werner Van Leekwijck Filip De Turck 《Journal of Network and Systems Management》2013,21(4):677-706

Over-The-Top (OTT) video services are becoming more and more important in today’s broadband access networks. While original OTT services only offered short duration medium quality videos, more recently, premium content such as high definition full feature movies and live video are offered as well. For operators, who see the potential in providing Quality of Experience (QoE) assurance for an increased revenue, this introduces important new network management challenges. Traditional network management paradigms are often not suited for ensuring QoE guarantees as the provider does not have any control on the content’s origin. In this article, we focus on the management of an OTT-based video service. We present a loosely coupled architecture that can be seamlessly integrated into an existing OTT-based video delivery architecture. The framework has the goal of resolving the network bottleneck that might occur from high peaks in the requests for OTT video services. The proposed approach groups the existing Hypertext Transfer Protocol (HTTP) based video connections to be multicasted over an access network’s bottleneck and then splits them again to reconstruct the original HTTP connections. A prototype of this architecture is presented, which includes the caching of videos and incorporates retransmission schemes to ensure robust transmission. Furthermore, an autonomic algorithm is presented that allows to intelligently select which OTT videos need to be multicasted by making a remote assessment of the cache state to predict the future availability of content. The approach was evaluated through both simulation and large scale emulation and shows a significant gain in scalability of the prototype compared to a traditional video delivery architecture. 相似文献

13.

Multimodal detection of highlights for multimedia content

Serhan Dagtas Mohamed Abdel-Mottaleb 《Multimedia Systems》2004,9(6):586-593

相似文献

14.

Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos

Dilek Küçük Adnan Yazıcı 《Knowledge》2011,24(6):844-857

This paper targets at the problem of automatic semantic indexing of news videos by presenting a video annotation and retrieval system which is able to perform automatic semantic annotation of news video archives and provide access to the archives via these annotations. The presented system relies on the video texts as the information source and exploits several information extraction techniques on these texts to arrive at representative semantic information regarding the underlying videos. These techniques include named entity recognition, person entity extraction, coreference resolution, and semantic event extraction. Apart from the information extraction components, the proposed system also encompasses modules for news story segmentation, text extraction, and video retrieval along with a news video database to make it a full-fledged system to be employed in practical settings. The proposed system is a generic one employing a wide range of techniques to automate the semantic video indexing process and to bridge the semantic gap between what can be automatically extracted from videos and what people perceive as the video semantics. Based on the proposed system, a novel automatic semantic annotation and retrieval system is built for Turkish and evaluated on a broadcast news video collection, providing evidence for its feasibility and convenience for news videos with a satisfactory overall performance. 相似文献

15.

Search-based composition, streaming and playback of video archive content

Dag Johansen P?l Halvorsen H?vard Johansen H?kon Riiser Cathal Gurrin Bj?rn Olstad Carsten Griwodz ?ge Kvalnes Joseph Hurley Tomas Kupka 《Multimedia Tools and Applications》2012,61(2):419-445

Locating content in existing video archives is both a time and bandwidth consuming process since users might have to download and manually watch large portions of superfluous videos. In this paper, we present two novel prototypes using an Internet based video composition and streaming system with a keyword-based search interface that collects, converts, analyses, indexes, and ranks video content. At user requests, the system can automatically sequence out portions of single videos or aggregate content from multiple videos to produce a single, personalized video stream on-the-fly. 相似文献

16.

Extracting viewer interests for automated bookmarking in video-on-demand services

Yang ZHAO Ye TIAN Yong LIU 《Frontiers of Computer Science》2015,9(3):415

Video-on-demand (VoD) services have become popular on the Internet in recent years. In VoD, it is challenging to support the VCR functionality, especially the jumps, while maintaining a smooth streaming quality. Previous studies propose to solve this problem by predicting the jump target locations and prefetching the contents. However, through our analysis on traces from a real-world VoD service, we find that it would be fundamentally difficult to improve a viewer’s VCR experience by simply predicting his future jumps, while ignoring the intentions behind these jumps. Instead of the prediction-based approach, in this paper, we seek to support the VCR functionality by bookmarking the videos. There are two key techniques in our proposed methodology. First, we infer and differentiate viewers’ intentions in VCR jumps by decomposing the interseek times, using an expectation-maximization (EM) algorithm, and combine the decomposed inter-seek times with the VCR jumps to compute a numerical interest score for each video segment. Second, based on the interest scores, we propose an automated video bookmarking algorithm. The algorithm employs the time-series change detection techniques of CUSUMandMB-GT, and bookmarks videos by detecting the abrupt changes on their interest score sequences. We evaluate our proposed techniques using real-world VoD traces from dozens of videos. Experimental results suggest that with our methods, viewers’ interests within a video can be precisely extracted, and we can position bookmarks on the video’s highlight events accurately. Our proposed video bookmarking methodology does not require any knowledge on video type, contents, and semantics, and can be applied on various types of videos. 相似文献

17.

A multi-level abstraction and modeling in video databases

Young Francis Day Ashfaq Khokhar Serhan Dagtas Arif Ghafoor 《Multimedia Systems》1999,7(5):409-423

In this paper, we propose a multi-level abstraction mechanism for capturing the spatial and temporal semantics associated with various objects in an input image or in a sequence of video frames. This abstraction can manifest itself effectively in conceptualizing events and views in multimedia data as perceived by individual users. The objective is to provide an efficient mechanism for handling content-based queries, with the minimum amount of processing performed on raw data during query evaluation. We introduce a multi-level architecture for video data management at different levels of abstraction. The architecture facilitates a multi-level indexing/searching mechanism. At the finest level of granularity, video data can be indexed based on mere appearance of objects and faces. For management of information at higher levels of abstractions, an object-oriented paradigm is proposed which is capable of supporting domain specific views. 相似文献

18.

Smart VideoText: a video data model based on conceptual graphs 总被引：2，自引：0，他引：2

F. Kokkoras H. Jiang I. Vlahavas A.K. Elmagarmid E.N. Houstis W.G. Aref 《Multimedia Systems》2002,8(4):328-338

An intelligent annotation-based video data model called Smart VideoText is introduced. It utilizes the conceptual graph knowledge representation formalism to capture the semantic associations among the concepts described in text annotations of video data. The aim is to achieve more effective query, retrieval, and browsing capabilities based on the semantic content of video data. Finally, a generic and modular video database architecture based on the Smart VideoText data model is described. 相似文献

19.

An original-stream based solution for smoothly replaying high-definition videos in desktop virtualization systems

《Journal of Visual Languages and Computing》2014,25(6):676-683

Existing desktop virtualization systems suffer from a very limited performance in replaying high-definition videos remotely: intolerable CPU and bandwidth consumption, high response delay and poor video quality. In this paper, we propose an original-stream based solution to provide good user experience for replaying high-definition videos in desktop virtualization systems without any modification on applications and support most of prevalent high-definition video formats. In our solution, server׳s video content is not decoded on server but intercepted and delivered to client in its originally encoded state, so that the video content can be easily stored and transported in computer systems with high quality and low bandwidth. The encoded video content is intercepted in server׳s display driver, which enables HDR to work seamlessly with existing applications. The extremely CPU-intensive video decoding tasks are executed on client by using GPU-accelerated video decoding technology so that CPU can concentrate on other tasks. The experimental results validate our method and show that this proposed approach measurably outperforms state-of-the-art solutions. 相似文献

20.

The priority curve algorithm for video summarization

M. Albanese M. Fayzullin A. Picariello V.S. Subrahmanian 《Information Systems》2006

In this paper, we introduce the concept of a priority curve associated with a video. We then provide an algorithm that can use the priority curve to create a summary (of a desired length) of any video. The summary thus created exhibits nice continuity properties and also avoids repetition. We have implemented the priority curve algorithm (PriCA) and compared it with other summarization algorithms in the literature with respect to both performance and the output quality. The quality of summaries was evaluated by a group of 200 students in Naples, Italy, who watched soccer videos. We show that PriCA is faster than existing algorithms and also produces better quality summaries. We also briefly describe a soccer video summarization system we have built on using the PriCA architecture and various (classical) image processing algorithms. 相似文献