共查询到20条相似文献,搜索用时 31 毫秒
1.
Video is an information-intensive media with much redundancy. Therefore, it is desirable to be able to mine structure or semantics
of video data for efficient browsing, summarization and highlight extraction. In this paper, we propose a mosaic based approach
to key-event as well as structure mining, which is regarded as a complementary view for sports video analysis. Mosaic is generated
for each shot by a novel efficient mosaicing scheme, which constructs a global motion path and selects a best subset of frames
for mosaicing. These improved mosaics are then used as the representative image of shot content. Based on mosaic, the structure
and event in sports video are mined by the methods with prior knowledge and without prior knowledge. Without prior knowledge,
our system is able to locate global view shots taken by dominant camera. If prior knowledge is available, the events in these
global view shots are detected using robust features extracted from mosaics. For global view mining, the experiments compared
with key-frame-based scheme have demonstrated that this mosaic-based scheme presents better results in several kinds of sports
videos; for events mining, the detection of key-plays and key-events in the specific-domain of soccer videos have proved its
effectiveness.
相似文献
2.
In this paper, we propose an innovative architecture to segment a news video into the so-called “stories” by both using the
included video and audio information. Segmentation of news into stories is one of the key issues for achieving efficient treatment
of news-based digital libraries. While the relevance of this research problem is widely recognized in the scientific community,
we are in presence of a few established solutions in the field. In our approach, the segmentation is performed in two steps:
first, shots are classified by combining three different anchor shot detection algorithms using video information only. Then,
the shot classification is improved by using a novel anchor shot detection method based on features extracted from the audio
track. Tests on a large database confirm that the proposed system outperforms each single video-based method as well as their
combination.
相似文献
3.
Detecting and tracking human faces in video sequences is useful in a number of applications such as gesture recognition and
human-machine interaction. In this paper, we show that online appearance models (holistic approaches) can be used for simultaneously
tracking the head, the lips, the eyebrows, and the eyelids in monocular video sequences. Unlike previous approaches to eyelid
tracking, we show that the online appearance models can be used for this purpose. Neither color information nor intensity
edges are used by our proposed approach. More precisely, we show how the classical appearance-based trackers can be upgraded
in order to deal with fast eyelid movements. The proposed eyelid tracking is made robust by avoiding eye feature extraction.
Experiments on real videos show the usefulness of the proposed tracking schemes as well as their enhancement to our previous
approach.
相似文献
4.
Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes
to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes
and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly,
key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template
matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video
content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously
recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm
has been performed on various genres of films and TV program. Promising experimental results show that the proposed method
makes sense to efficient retrieval of interesting video content.
相似文献
5.
This paper presents a framework that explicitly detects events in broadcasting baseball videos and facilitates the development
of many practical applications. Three phases of contributions are included in this work: reliable shot classification, explicit
event detection, and elaborate applications. At the shot classification stage, color and geometric information are utilized
to classify video shots into several canonical views. To explicitly detect semantic events, rule-based decision and model-based
decision methods are developed. We emphasize that this system efficiently and exactly identifies what happened in baseball
games rather than roughly finding some interesting parts. On the basis of explicit event detection, many accurate and practical
applications such as automatic box score generation and game summarization could be built. The reported results show the effectiveness
of the proposed framework and demonstrate some research opportunities about bridging the semantic gap for sports videos.
相似文献
6.
This paper proposes a framework to aid video analysts in detecting suspicious activity within the tremendous amounts of video
data that exists in today’s world of omnipresent surveillance video. Ideas and techniques for closing the semantic gap between
low-level machine readable features of video data and high-level events seen by a human observer are discussed. An evaluation
of the event classification and detection technique is presented and a future experiment to refine this technique is proposed.
These experiments are used as a lead to a discussion on the most optimal machine learning algorithm to learn the event representation
scheme proposed in this paper.
相似文献
8.
This paper focuses on the integration of multimodal features for sport video structure analysis. The method relies on a statistical model which takes into account both the shot content and the interleaving of shots. This stochastic modelling is performed in the global framework of Hidden Markov Models (HMMs) that can be efficiently applied to merge audio and visual cues. Our approach is validated in the particular domain of tennis videos. The model integrates prior information about tennis content and editing rules. The basic temporal unit is the video shot. Visual features are used to characterize the type of shot view. Audio features describe the audio events within a video shot. Two sets of audio features are used in this study: the first one is extracted from a manual segmentation of the soundtrack and is more reliable. The second one is provided by an automatic segmentation and classification process. As a result of the overall HMM process, typical tennis scenes are simultaneously segmented and identified. The experiments illustrate the improvement of HMM-based fusion over indexing using only the best single media, when both media are of similar quality. 相似文献
9.
In conventional motion compensated temporal filtering based wavelet coding scheme, where the group of picture structure and
low-pass frame position are fixed, variations in motion activities of video sequences are not considered. In this paper, we
propose an adaptive group of picture structure selection scheme, which the group of picture size and low-pass frame position
are selected based on mutual information. Furthermore, the temporal decomposition process is determined adaptively according
to the selected group of picture structure. A large amount of experimental work is carried out to compare the compression
performance of proposed method with the conventional motion compensated temporal filtering encoding scheme and adaptive group
of picture structure in standard scalable video coding model. The proposed low-pass frame selection can improve the compression
quality by about 0.3–0.5 dB comparing to the conventional scheme in video sequences with high motion activities. In the scenes
with un-even variation of motion activities, e.g. frequent shot cuts, the proposed adaptive group of picture size can achieve
a better compression capability than conventional scheme. When comparing to adaptive group of picture in standard scalable
video coding model, the proposed group of picture structure scheme can lead to about 0.2~0.8 dB improvements in sequences
with high motion activities or shot cut.
相似文献
10.
This paper addresses the problem of ensuring the integrity of a digital video and presents a scalable signature scheme for
video authentication based on cryptographic secret sharing. The proposed method detects spatial cropping and temporal jittering
in a video, yet is robust against frame dropping in the streaming video scenario. In our scheme, the authentication signature
is compact and independent of the size of the video. Given a video, we identify the key frames based on differential energy
between the frames. Considering video frames as shares, we compute the corresponding secret at three hierarchical levels.
The master secret is used as digital signature to authenticate the video. The proposed signature scheme is scalable to three
hierarchical levels of signature computation based on the needs of different scenarios. We provide extensive experimental
results to show the utility of our technique in three different scenarios—streaming video, video identification and face tampering.
相似文献
11.
We present a real-time implementation of 2D to 3D video conversion using compressed video. In our method, compressed 2D video
is analyzed by extracting motion vectors. Using the motion vector maps, depth maps are built for each frame and the frames
are segmented to provide object-wise depth ordering. These data are then used to synthesize stereo pairs. 3D video synthesized
in this fashion can be viewed using any stereoscopic display. In our implementation, anaglyph projection was selected as the
3D visualization method, because it is mostly suited to standard displays.
相似文献
12.
We present a study of using camera-phones and visual-tags to access mobile services. Firstly, a user-experience study is described in which participants were both observed learning to interact with a prototype mobile service and interviewed
about their experiences. Secondly, a pointing-device task is presented in which quantitative data was gathered regarding the speed and accuracy with which participants aimed and clicked
on visual-tags using camera-phones. We found that participants’ attitudes to visual-tag-based applications were broadly positive,
although they had several important reservations about camera-phone technology more generally. Data from our pointing-device
task demonstrated that novice users were able to aim and click on visual-tags quickly (well under 3 s per pointing-device
trial on average) and accurately (almost all meeting our defined speed/accuracy tradeoff of 6% error-rate). Based on our findings,
design lessons for camera-phone and visual-tag applications are presented.
相似文献
13.
Many environmental, scientific, technical or medical database applications require effective and efficient mining of time
series, sequences or trajectories of measurements taken at different time points and positions forming large temporal or spatial
databases. Particularly the analysis of concurrent and multidimensional sequences poses new challenges in finding clusters
of arbitrary length and varying number of attributes. We present a novel algorithm capable of finding parallel clusters in
different subspaces and demonstrate our results for temporal and spatial applications. Our analysis of structural quality
parameters in rivers is successfully used by hydrologists to develop measures for river quality improvements.
相似文献
14.
There are only a few ethical regulations that deal explicitly with robots, in contrast to a vast number of regulations, which
may be applied. We will focus on ethical issues with regard to “responsibility and autonomous robots”, “machines as a replacement
for humans”, and “tele-presence”. Furthermore we will examine examples from special fields of application (medicine and healthcare,
armed forces, and entertainment). We do not claim to present a complete list of ethical issue nor of regulations in the field
of robotics, but we will demonstrate that there are legal challenges with regard to these issues.
相似文献
15.
In this paper, we aim to provide adaptive multimedia services especially video ones to end-users in an efficient and secure
manner. Users moving outside the office should be able to maintain an office-like environment at their current locations.
First, the agents within our proposed architecture negotiate the different communication and interaction factors autonomously
and dynamically. Moreover, we needed to develop a user agent in addition to service and system agents that could negotiate
the requirements and capabilities at run time to furnish best possible service results. Thus we designed and integrated a
video indexing and key framing service within our overall agent-based architecture. We integrated this video indexing and
content-based analysis service to adapt the video content according to run time conditions. We designed a video XML schema
to validate the media content out of this multimedia service according to specific requirements and features, as we will describe
later.
相似文献
16.
The complexity of group dynamics occurring in small group interactions often hinders the performance of teams. The availability
of rich multimodal information about what is going on during the meeting makes it possible to explore the possibility of providing
support to dysfunctional teams from facilitation to training sessions addressing both the individuals and the group as a whole.
A necessary step in this direction is that of capturing and understanding group dynamics. In this paper, we discuss a particular
scenario, in which meeting participants receive multimedia feedback on their relational behaviour, as a first step towards
increasing self-awareness. We describe the background and the motivation for a coding scheme for annotating meeting recordings
partially inspired by the Bales’ Interaction Process Analysis. This coding scheme was aimed at identifying suitable observable
behavioural sequences. The study is complemented with an experimental investigation on the acceptability of such a service.
相似文献
17.
The paper presents a real-time algorithm that compensates image distortions due to atmospheric turbulence in video sequences,
while keeping the real moving objects in the video unharmed. The algorithm involves (1) generation of a “reference” frame,
(2) estimation, for each incoming video frame, of a local image displacement map with respect to the reference frame, (3)
segmentation of the displacement map into two classes: stationary and moving objects; (4) turbulence compensation of stationary
objects. Experiments with both simulated and real-life sequences have shown that the restored videos, generated in real-time
using standard computer hardware, exhibit excellent stability for stationary objects while retaining real motion.
相似文献
18.
This paper describes the simulated car racing competition that was arranged as part of the 2007 IEEE Congress on Evolutionary
Computation. Both the game that was used as the domain for the competition, the controllers submitted as entries to the competition
and its results are presented. With this paper, we hope to provide some insight into the efficacy of various computational
intelligence methods on a well-defined game task, as well as an example of one way of running a competition. In the process,
we provide a set of reference results for those who wish to use the simplerace game to benchmark their own algorithms. The paper is co-authored by the organizers and participants of the competition.
相似文献
19.
Awareness systems have attracted significant research interest for their potential to support interpersonal relationships.
Investigations of awareness systems for the domestic environment have suggested that such systems can help individuals stay
in touch with dear friends or family and provide affective benefits to their users. Our research provides empirical evidence
to refine and substantiate such suggestions. We report our experience with designing and evaluating the ASTRA awareness system,
for connecting households and mobile family members. We introduce the concept of connectedness and its measurement through
the Affective Benefits and Costs of communication questionnaire (ABC-Q). We inform results that testify the benefits of sharing
experiences at the moment they happen without interrupting potential receivers. Finally, we document the role that lightweight,
picture-based communication can play in the range of communication media available.
相似文献
20.
In the age of speech and voice recognition technologies, sign language recognition is an essential part of ensuring equal
access for deaf people. To date, sign language recognition research has mostly ignored facial expressions that arise as part
of a natural sign language discourse, even though they carry important grammatical and prosodic information. One reason is
that tracking the motion and dynamics of expressions in human faces from video is a hard task, especially with the high number
of occlusions from the signers’ hands. This paper presents a 3D deformable model tracking system to address this problem,
and applies it to sequences of native signers, taken from the National Center of Sign Language and Gesture Resources (NCSLGR),
with a special emphasis on outlier rejection methods to handle occlusions. The experiments conducted in this paper validate
the output of the face tracker against expert human annotations of the NCSLGR corpus, demonstrate the promise of the proposed
face tracking framework for sign language data, and reveal that the tracking framework picks up properties that ideally complement
human annotations for linguistic research.
相似文献
|