首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We present a two-dimensional (2-D) mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh for each frame and a final mosaic image, for video objects with mildly deformable motion in the presence of self and/or object-to-object (external) occlusion. Unlike classical mosaic representations where successive frames are registered using global motion models, we map the uncovered regions in the successive frames onto the mosaic reference frame using local affine models, i.e., those of the neighboring mesh patches. The proposed method to compute this mosaic representation is tightly coupled with an occlusion adaptive 2-D mesh tracking procedure, which consist of propagating the object mesh frame to frame, and updating of both object and mosaic meshes to optimize texture mapping from the mosaic to each instance of the object. The proposed representation has been applied to video object rendering and editing, including self transfiguration, synthetic transfiguration, and 2-D augmented reality in the presence of self and/or external occlusion. We also provide an algorithm to determine the minimum number of still views needed to reconstruct a replacement mosaic which is needed for synthetic transfiguration. Experimental results are provided to demonstrate both the 2-D mesh-based mosaic synthesis and two different video object editing applications on real video sequences.  相似文献   

2.
In this paper we address the problem of fusing images from many video cameras or a moving video camera. The captured images have obvious motion parallax, but they will be aligned and integrated into a few mosaics with a large field-of-view (FOV) that preserve 3D information. We have developed a compact geometric representation that can re-organize the original perspective images into a set of parallel projections with different oblique viewing angles. In addition to providing a wide field of view, mosaics with various oblique views well represent occlusion regions that cannot be seen in a usual nadir view. Stereo mosaic pairs can be formed from mosaics with different oblique viewing angles. This representation can be used as both an advanced interface for interactive 3D video and a pre-processing step for 3D reconstruction. A ray interpolation approach for generating the parallel-projection mosaics is presented, and efficient 3D scene/object rendering based on multiple parallel-projection mosaics is discussed. Several real-world examples are provided, with applications ranging from aerial video surveillance/environmental monitoring, ground mobile robot navigation, to under-vehicle inspection.  相似文献   

3.
4.

Video compression is one among the pre-processes in video streaming. While capturing moving objects with moving cameras, more amount of redundant data is recorded along with dynamic change. In this paper, this change is identified using various geometric transformations. To register all these dynamic relations with minimal storage, tensor representation is used. The amount of similarity between the frames is measured using canonical correlation analysis (CCA). The key frames are identified by comparing the canonical auto-correlation analysis score of the candidate key frame with CCA score of other frames. In this method, coded video is represented using tensor which consists of intra-coded key frame, a vector of P frame identifiers, transformation of each variable sized block and information fusion that has three levels of abstractions: measurements, characteristics and decisions that combine all these factors into a single entity. Each dimension can have variable sizes which facilitates storing all characteristics without missing any information. In this paper, the proposed video compression method is applied to under-water videos that have more redundancy as both the camera and the underwater species are in motion. This method is compared with H.264, H.265 and some recent compression methods. Metrics like Peak Signal to Noise Ratio and compression ratio for various bit rates are used to evaluate the performance. From the results obtained, it is obvious that the proposed method performs compression with a high compression ratio, and the loss is comparatively less.

  相似文献   

5.
Color demosaicking is critical to the image quality of digital still and video cameras that use a single-sensor array. Limited by the mosaic sampling pattern of the color filter array (CFA), color artifacts may occur in a demosaicked image in areas of high-frequency and/or sharp color transition structures. However, a color digital video camera captures a sequence of mosaic images and the temporal dimension of the color signals provides a rich source of information about the scene via camera and object motions. This paper proposes an inter-frame demosaicking approach to take advantage of all three forms of pixel correlations: spatial, spectral, and temporal. By motion estimation and statistical data fusion between adjacent mosaic frames, the new approach can remove much of the color artifacts that survive intra-frame demosaicking and also improve tone reproduction accuracy. Empirical results show that the proposed inter-frame demosaicking approach consistently outperforms its intra-frame counterparts both in peak signal-to-noise measure and subjective visual quality.  相似文献   

6.
The exploitation of video data requires methods able to extract high-level information from the images. Video summarization, video retrieval, or video surveillance are examples of applications. In this paper, we tackle the challenging problem of recognizing dynamic video contents from low-level motion features. We adopt a statistical approach involving modeling, (supervised) learning, and classification issues. Because of the diversity of video content (even for a given class of events), we have to design appropriate models of visual motion and learn them from videos. We have defined original parsimonious global probabilistic motion models, both for the dominant image motion (assumed to be due to the camera motion) and the residual image motion (related to scene motion). Motion measurements include affine motion models to capture the camera motion and low-level local motion features to account for scene motion. Motion learning and recognition are solved using maximum likelihood criteria. To validate the interest of the proposed motion modeling and recognition framework, we report dynamic content recognition results on sports videos.  相似文献   

7.
Very low bit-rate coding requires new paradigms that go well beyond pixel- and frame-based video representations. We introduce a novel content-based video representation using tridimensional entities: textured object models and pose estimates. The multiproperty object models carry stochastic information about the shape and texture of each object present in the scene. The pose estimates define the position and orientation of the objects for each frame. This representation is compact. It provides alternative means for handling video by manipulating and compositing three-dimensional (3-D) entities. We call this representation tridimensional video compositing, or 3DVC for short. We present the 3DVC framework and describe the methods used to construct incrementally the object models and the pose estimates from unregistered noisy depth and texture measurements. We also describe a method for video frame reconstruction based on 3-D scene assembly, and discuss potential applications of 3DVC to video coding and content-based handling. 3DVC assumes that the objects in the scene are rigid and segmented. By assuming segmentation, we do not address the difficult questions of nonrigid segmentation and multiple object segmentation. In our experiments, segmentation is obtained via depth thresholding. It is important to notice that 3DVC is independent of the segmentation technique adopted. Experimental results with synthetic and real video sequences where compression ratios in the range of 1:150-1:2700 are achieved demonstrate the applicability of the proposed representation to very low bit-rate coding  相似文献   

8.
视频序列的时域亚采样对于低比特率信道上的视频传输和存储空间受限条件下的视频存储具有十分重要的意义。目前普遍采用的时域等间隔亚采样方法有时会造成视频序列中重要的运动变化信息的丢失。针对这一情况,本文提出了基于运动变化信息熵的视频序列时域亚采样技术,通过帧间运动场分析,利用运动场总体能量和运动补偿后的残差能量综合描述视频序列的运动变化信息,并根据信息熵最大化准则确定各个时域亚采样时刻点。大量的对比实验证明该方法明显优于时域等间隔亚采样方法,它较完整地反映了序列图像的运动变化过程,更有利于对视频内容的理解。  相似文献   

9.
Information about camera operations such as zoom, focus, pan, tilt and dollying is significant not only for efficient video coding, but also for content-based video representation. In this paper we describe a high-precision camera operation parameter measurement system and apply it to image motion inferring. First, we outline the implemented system which is designed to provide camera operation parameters with a high precision required for image coding applications. Second, we calibrate the camera lens to determine its exact optical properties, A pin-hole camera model with the 2nd order radial lens distortion and a two-image calibration technique are employed. Finally, we use the pan, tilt and zoom parameters measured by the system to infer image motion. The experimental results show that the inferred motion coincides with the actual motion very closely. Compared to the motion analysis techniques that estimate camera motion from video sequences, our approach does not suffer from ambiguity, thus can provide reliable and accurate image global motion. The obtained motion can be applied to image mosaicing, moving object segmentation, object-based image coding, etc  相似文献   

10.
Aerial video surveillance and exploitation   总被引:8,自引:0,他引:8  
There is growing interest in performing aerial surveillance using video cameras. Compared to traditional framing cameras, video cameras provide the capability to observe ongoing activity within a scene and to automatically control the camera to track the activity. However, the high data rates and relatively small field of view of video cameras present new technical challenges that must be overcome before such cameras can be widely used. In this paper, we present a framework and details of the key components for real-time, automatic exploitation of aerial video for surveillance applications. The framework involves separating an aerial video into the natural components corresponding to the scene. Three major components of the scene are the static background geometry, moving objects, and appearance of the static and dynamic components of the scene. In order to delineate videos into these scene components, we have developed real time, image-processing techniques for 2-D/3-D frame-to-frame alignment, change detection, camera control, and tracking of independently moving objects in cluttered scenes. The geo-location of video and tracked objects is estimated by registration of the video to controlled reference imagery, elevation maps, and site models. Finally static, dynamic and reprojected mosaics may be constructed for compression, enhanced visualization, and mapping applications  相似文献   

11.
Network camera, made possible by recent advances in the integration of sensing, compression, and communication hardware, is a new video source that can be easily deployed and remotely managed. Unobtrusively located along highways, at airports, or in office buildings, such cameras can form a visual sensor network, or camera web, an extremely rich source of visual information. In its infancy today, camera web deployment will likely accelerate in the future and one can expect visual sensing devices to eventually become as ubiquitous as electric bulbs. While the capturing hardware has evolved tremendously, hardware and algorithms necessary for effective analysis and efficient communication of multi-camera data clearly lag. In this article, I overview one particular aspect of visual data analysis, namely, space-time video segmentation that is often a pre-requisite for motion estimation, video compression, event detection, scene understanding, etc. I introduce the concept of object tunnel, a 3-D surface in space-time through which a video object travels, and the associated concept of occlusion volume. I present examples of object tunnels and occlusion volumes on surveillance data that, upon further processing, may lead to automatic event detection or scene understanding. Finally, I describe challenges in extending video analysis algorithms to visual sensor networks, and I outline some possible approaches  相似文献   

12.
In this paper we address the problem of mosaic construction from MPEG 1/2 compressed video for the purpose of video browsing. State-of-the-art mosaicing methods work on raw video, but most video content is available in compressed form such as MPEG 1/2. Applying these methods to compressed video requires full decoding which is very costly. The resulting mosaic is in general too large to display on the screen and is thus inappropriate for the purpose of video browsing. Therefore, we directly extract very low-resolution frames from MPEG 1/2 compressed video for the mosaic construction and then apply a super-resolution (SR) method based on iterative backprojections in order to increase the mosaic resolution and its visual quality. Global motion to be used in the SR method for aligning and warping the frames is estimated from motion information contained in the compressed stream. We also use the estimated global motion in the blur estimation and in the choice of the degradation model used for the restoration in the SR algorithm. The method for the SR mosaic construction from MPEG 1/2 compressed video that we present in this paper is less costly than mosaic construction from full decoded video. Furthermore, the resulting mosaic size is more appropriate for the purpose of video browsing.  相似文献   

13.
The performance of Motion Compensated Discrete Cosine Transform (MC‐DCT) video coding is improved by using the region adaptive subband image coding [18]. On the assumption that the video is acquired from the camera on a moving platform and the distance between the camera and the scene is large enough, both the motion of camera and the motion of moving objects in a frame are compensated. For the compensation of camera motion, a feature matching algorithm is employed. Several feature points extracted using a Sobel operator are used to compensate the camera motion of translation, rotation, and zoom. The illumination change between frames is also compensated. Motion compensated frame differences are divided into three regions called stationary background, moving objects, and newly emerging areas each of which is arbitrarily shaped. Different quantizers are used for different regions. Compared to the conventional MC‐DCT video coding using block matching algorithm, our video coding scheme shows about 1.0‐dB improvements on average for the experimental video samples.  相似文献   

14.
This paper addresses the problem of side information extraction for distributed coding of videos captured by a camera moving in a 3-D static environment. Examples of targeted applications are augmented reality, remote-controlled robots operating in hazardous environments, or remote exploration by drones. It explores the benefits of the structure-from-motion paradigm for distributed coding of this type of video content. Two interpolation methods constrained by the scene geometry, based either on block matching along epipolar lines or on 3-D mesh fitting, are first developed. These techniques are based on a robust algorithm for sub-pel matching of feature points, which leads to semi-dense correspondences between key frames. However, their rate-distortion (RD) performances are limited by misalignments between the side information and the actual Wyner-Ziv (WZ) frames due to the assumption of linear motion between key frames. To cope with this problem, two feature point tracking techniques are introduced, which recover the camera parameters of the WZ frames. A first technique, in which the frames remain encoded separately, performs tracking at the decoder and leads to significant RD performance gains. A second technique further improves the RD performances by allowing a limited tracking at the encoder. As an additional benefit, statistics on tracks allow the encoder to adapt the key frame frequency to the video motion content.  相似文献   

15.
New methods for dynamic mosaicking   总被引:3,自引:0,他引:3  
This paper presents a new technique for the creation of a sequence of mosaic images from an original video shot. A mosaic image represents, on a single image, the scene background seen all over the sequence and its creation requires the estimation of the warping parameters and the use of a blending technique. The warping parameters permit one to represent each original image in the mosaic reference. An estimation method, based on a direct comparison between the current original image and the previously calculated mosaic is proposed. A new analytic minimization criterion is also designed to optimize the determination of the blending coefficient used for the update of the mosaic image with a new original image. This criterion is based on constraints related to the temporal variations of the background, the temporal delay and the resolution of the created mosaic images, while its minimization can be analytically performed. Finally, the proposed method is applied to the creation of new video sequences in which the camera point of view, the camera focal, or the image size are modified. This approach has been tested and validated on real video sequences with large camera motion.  相似文献   

16.
Common image compression techniques suitable for general purpose may be less effective for such specific applications as video surveillance. Since a stationed surveillance camera always targets at a fixed scene, its captured images exhibit high consistency in content or structure. In this paper, we propose a surveillance image compression technique via dictionary learning to fully exploit the constant characteristics of a target scene. This method transforms images over sparsely tailored over-complete dictionaries learned directly from image samples rather than a fixed one, and thus can approximate an image with fewer coefficients. A set of dictionaries trained off-line is applied for sparse representation. An adaptive image blocking method is developed so that the encoder can represent an image in a texture-aware way. Experimental results show that the proposed algorithm significantly outperforms JPEG and JPEG 2000 in terms of both quality of reconstructed images and compression ratio as well.  相似文献   

17.
为了满足航空大面阵CCD相机视频数据高速、实时传输和存储的要求,本文设计了一种基于H.264视频编码算法的压缩系统。整个压缩系统分为CCD前端、视频压缩、视频显示、视频压缩码流存储以及压缩分析单元,视频压缩单元采用高性能视频专用DSP处理器TMS320DM642,软件平台采用在CCS3.1上使用C语言实现H.264压缩算法。为了使压缩算法高效快速的运行,本文使用了DSP/BIOS资源来管理软硬件工作。 为了高速交互数据,采用了EDMA高速搬运数据策略,进而保证了数据实时传输的需要。实验结果表明,本文提出的压缩系统可以稳定正常的工作,具有良好压缩性能,在压缩比40:1~10:1范围内,平均信噪比高于35dB,满足了航空CCD相机应用的需求。  相似文献   

18.
As human vision system is highly sensitive to motion present in a scene, motion saliency forms an important feature in a video sequence. Motion information is used for video compression, object segmentation, object tracking and in many other applications. Though its applications are extensive, accurate detection of motion in a given video is complex and computationally expensive for the solutions reported in the literature. Decomposing a video into visually similar and residual videos is a robust way to detect motion salient regions. The existing decomposition techniques require large execution time as the standard form of the problem is NP-hard. We propose a novel algorithm which detects the motion salient regions by decomposing the input video into background and residual videos in much lesser time without sacrificing the accuracy of the decomposition. In addition, the proposed algorithm is completely parallelizable that ensures further reduction in computational time with the use of advanced multicore processors.  相似文献   

19.
针对具有场景变换、景深变换较小的抖动视频,本文提出了一种利用全景图的视频稳像方法。不同于传统稳像方法,本文所提算法利用原始视频序列生成广角视图,通过后期抽取视频帧,以合成方式输出稳定视频。通过区域块的运动估计,获取相机的运动轨迹;在此基础上,为消除视频抖动导致的原始运动轨迹的非平滑问题,提出了一种基于运动矢量统计的相机运动模式判决方法,并以此选择相应的平滑方式。最后,通过平滑后的运动轨迹对全景图进行抽帧,合成平滑、稳定视频。实验结果表明,本文提出的稳像方法能有效去除较大抖动,同时避免了对视频宽度的裁剪等后处理,具有较好的稳像能力。   相似文献   

20.
Interactive 3-D Video Representation and Coding Technologies   总被引:5,自引:0,他引:5  
Interactivity in the sense of being able to explore and navigate audio-visual scenes by freely choosing viewpoint and viewing direction, is an important key feature of new and emerging audio-visual media. This paper gives an overview of suitable technology for such applications, with a focus on international standards, which are beneficial for consumers, service providers, and manufacturers. We first give a general classification and overview of interactive scene representation formats as commonly used in computer graphics literature. Then, we describe popular standard formats for interactive three-dimensional (3-D) scene representation and creation of virtual environments, the virtual reality modeling language (VRML), and the MPEG-4 BInary Format for Scenes (BIFS) with some examples. Recent extensions to MPEG-4 BIFS, the Animation Framework eXtension (AFX), providing advanced computer graphics tools, are explained and illustrated. New technologies mainly targeted at reconstruction, modeling, and representation of dynamic real world scenes are further studied. The user shall be able to navigate photorealistic scenes within certain restrictions, which can be roughly defined as 3-D video. Omnidirectional video is an extension of the planar two-dimensional (2-D) image plane to a spherical or cylindrical image plane. Any 2-D view in any direction can be rendered from this overall recording to give the user the impression of looking around. In interactive stereo two views, one for each eye, are synthesized to provide the user with an adequate depth cue of the observed scene. Head motion parallax viewing can be supported in a certain operating range if sufficient depth or disparity data are delivered with the video data. In free viewpoint video, a dynamic scene is captured by a number of cameras. The input data are transformed into a special data representation that enables interactive navigation through the dynamic scene environment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号