首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Interactive 3-D Video Representation and Coding Technologies   总被引:5,自引:0,他引:5  
Interactivity in the sense of being able to explore and navigate audio-visual scenes by freely choosing viewpoint and viewing direction, is an important key feature of new and emerging audio-visual media. This paper gives an overview of suitable technology for such applications, with a focus on international standards, which are beneficial for consumers, service providers, and manufacturers. We first give a general classification and overview of interactive scene representation formats as commonly used in computer graphics literature. Then, we describe popular standard formats for interactive three-dimensional (3-D) scene representation and creation of virtual environments, the virtual reality modeling language (VRML), and the MPEG-4 BInary Format for Scenes (BIFS) with some examples. Recent extensions to MPEG-4 BIFS, the Animation Framework eXtension (AFX), providing advanced computer graphics tools, are explained and illustrated. New technologies mainly targeted at reconstruction, modeling, and representation of dynamic real world scenes are further studied. The user shall be able to navigate photorealistic scenes within certain restrictions, which can be roughly defined as 3-D video. Omnidirectional video is an extension of the planar two-dimensional (2-D) image plane to a spherical or cylindrical image plane. Any 2-D view in any direction can be rendered from this overall recording to give the user the impression of looking around. In interactive stereo two views, one for each eye, are synthesized to provide the user with an adequate depth cue of the observed scene. Head motion parallax viewing can be supported in a certain operating range if sufficient depth or disparity data are delivered with the video data. In free viewpoint video, a dynamic scene is captured by a number of cameras. The input data are transformed into a special data representation that enables interactive navigation through the dynamic scene environment.  相似文献   

2.
Video object extraction is a key technology in content-based video coding.A novel video object extracting algorithm by two Dimensional (2-D) mesh-based motion analysis is proposed in this paper.Firstly,a 2-D mesh fitting the original frame image is obtained via feature detection algorithm. Then,higher order statistics motion analysis is applied on the 2-D mesh representation to get an initial motion detection mask.After post-processing,the final segmenting mask is quickly obtained.And hence the video object is effectively extracted.Experimental results show that the proposed algorithm combines the merits of mesh-based segmenting algorithms and pixel-based segmenting algorithms,and hereby achieves satisfactory subjective and objective performance while dramatically increasing the segmenting speed.  相似文献   

3.
Three-dimensional (3-D) scene reconstruction from broadcast video is a challenging problem with many potential applications, such as 3-D TV, free-view TV, augmented reality or three-dimensionalization of two-dimensional (2-D) media archives. In this paper, a flexible and effective system capable of efficiently reconstructing 3-D scenes from broadcast video is proposed, with the assumption that there is relative motion between camera and scene/objects. The system requires no a priori information and input, other than the video sequence itself, and capable of estimating the internal and external camera parameters and performing a 3-D motion-based segmentation, as well as computing a dense depth field. The system also serves as a showcase to present some novel approaches for moving object segmentation, sparse and dense reconstruction problems. According to the simulations for both synthetic and real data, the system achieves a promising performance for typical TV content, indicating that it is a significant step towards the 3-D reconstruction of scenes from broadcast video.  相似文献   

4.
This paper integrates fully automatic video object segmentation and tracking including detection and assignment of uncovered regions in a 2-D mesh-based framework. Particular contributions of this work are (i) a novel video object segmentation method that is posed as a constrained maximum contrast path search problem along the edges of a 2-D triangular mesh, and (ii) a 2-D mesh-based uncovered region detection method along the object boundary as well as within the object. At the first frame, an optimal number of feature points are selected as nodes of a 2-D content-based mesh. These points are classified as moving (foreground) and stationary nodes based on multi-frame node motion analysis, yielding a coarse estimate of the foreground object boundary. Color differences across triangles near the coarse boundary are employed for a maximum contrast path search along the edges of the 2-D mesh to refine the boundary of the video object. Next, we propagate the refined boundary to the subsequent frame by using motion vectors of the node points to form the coarse boundary at the next frame. We detect occluded regions by using motion-compensated frame differences and range filtered edge maps. The boundaries of detected uncovered regions are then refined by using the search procedure. These regions are either appended to the foreground object or tracked as new objects. The segmentation procedure is re-initialized when unreliable motion vectors exceed a certain number. The proposed scheme is demonstrated on several video sequences.  相似文献   

5.
Accurate and fast localization of a predefined target region inside the patient is an important component of many image-guided therapy procedures. This problem is commonly solved by registration of intraoperative 2-D projection images to 3-D preoperative images. If the patient is not fixed during the intervention, the 2-D image acquisition is repeated several times during the procedure, and the registration problem can be cast instead as a 3-D tracking problem. To solve the 3-D problem, we propose in this paper to apply 2-D region tracking to first recover the components of the transformation that are in-plane to the projections. The 2-D motion estimates of all projections are backprojected into 3-D space, where they are then combined into a consistent estimate of the 3-D motion. We compare this method to intensity-based 2-D to 3-D registration and a combination of 2-D motion backprojection followed by a 2-D to 3-D registration stage. Using clinical data with a fiducial marker-based gold-standard transformation, we show that our method is capable of accurately tracking vertebral targets in 3-D from 2-D motion measured in X-ray projection images. Using a standard tracking algorithm (hyperplane tracking), tracking is achieved at video frame rates but fails relatively often (32% of all frames tracked with target registration error (TRE) better than 1.2 mm, 82% of all frames tracked with TRE better than 2.4 mm). With intensity-based 2-D to 2-D image registration using normalized mutual information (NMI) and pattern intensity (PI), accuracy and robustness are substantially improved. NMI tracked 82% of all frames in our data with TRE better than 1.2 mm and 96% of all frames with TRE better than 2.4 mm. This comes at the cost of a reduced frame rate, 1.7 s average processing time per frame and projection device. Results using PI were slightly more accurate, but required on average 5.4 s time per frame. These results are still substantially faster than 2-D to 3-D registration. We conclude that motion backprojection from 2-D motion tracking is an accurate and efficient method for tracking 3-D target motion, but tracking 2-D motion accurately and robustly remains a challenge.  相似文献   

6.
Intensity prediction along motion trajectories removes temporal redundancy considerably in video compression algorithms. In three-dimensional (3-D) object-based video coding, both 3-D motion and depth values are required for temporal prediction. The required 3-D motion parameters for each object are found by the correspondence-based E-matrix method. The estimation of the correspondences-two-dimensional (2-D) motion field-between the frames and segmentation of the scene into objects are achieved simultaneously by minimizing a Gibbs energy. The depth field is estimated by jointly minimizing a defined distortion and bit-rate criterion using the 3-D motion parameters. The resulting depth field is efficient in the rate-distortion sense. Bit-rate values corresponding to the lossless encoding of the resultant depth fields are obtained using predictive coding; prediction errors are encoded by a Lempel-Ziv algorithm. The results are satisfactory for real-life video scenes.  相似文献   

7.
We present a two-dimensional (2-D) mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh for each frame and a final mosaic image, for video objects with mildly deformable motion in the presence of self and/or object-to-object (external) occlusion. Unlike classical mosaic representations where successive frames are registered using global motion models, we map the uncovered regions in the successive frames onto the mosaic reference frame using local affine models, i.e., those of the neighboring mesh patches. The proposed method to compute this mosaic representation is tightly coupled with an occlusion adaptive 2-D mesh tracking procedure, which consist of propagating the object mesh frame to frame, and updating of both object and mosaic meshes to optimize texture mapping from the mosaic to each instance of the object. The proposed representation has been applied to video object rendering and editing, including self transfiguration, synthetic transfiguration, and 2-D augmented reality in the presence of self and/or external occlusion. We also provide an algorithm to determine the minimum number of still views needed to reconstruct a replacement mosaic which is needed for synthetic transfiguration. Experimental results are provided to demonstrate both the 2-D mesh-based mosaic synthesis and two different video object editing applications on real video sequences.  相似文献   

8.
提出了一种基于二维网格运动分析与改进形态学滤波空域自动分割策略相结合的视频对象时空分割算法。该算法首先利用高阶统计方法对视频图像的二维网格表示进行运动分析,快速得到前景对象区域,通过后处理有效获得前景对象运动检测掩膜。然后,用一种结合交变序列重建滤波算法和自适应阈值判别算法的改进分水岭分割策略有效获得前景对象的精确边缘。最后,用区域基时空融合算法将时域分割结果和空域分割结果结合起来提取出边缘精细的视频对象。实验结果表明,本算法综合了多种算法的优点,主客观分割效果理想。  相似文献   

9.
运动补偿插帧是目前主要的帧率上转换方法。为减小内插帧中的块效应,并降低运算量以满足实时高清视频应用,该文提出了一种基于3维递归搜索(3-D Recursive Search, 3-D RS)的多级块匹配运动估计视频帧率上转换算法。该算法将3-D RS与双向运动估计相结合,首先对序列中相邻帧进行由粗到精的三级运动估计,再利用简化的中值滤波器平滑运动矢量场,最后通过线性插值补偿得到内插帧。实验结果表明,与现有的运动补偿插帧算法相比,该算法内插帧的主、客观质量都有所提高,且算法复杂度低,有很强的实用性。  相似文献   

10.
In this paper, we present a complete system for the recognition and localization of a three-dimensional (3-D) model from a sequence of monocular images with known motion. The originality of this system is twofold. First, it uses a purely 3-D approach, starting from the 3-D reconstruction of the scene and ending by the 3-D matching of the model. Second, unlike most monocular systems, we do not use token tracking to match successive images. Rather, subpixel contour matching is used to recover more precisely complete 3-D contours. In contrast with the token tracking approaches, which yield a representation of the 3-D scene based on disconnected segments or points, this approach provides us with a denser and higher level representation of the scene. The reconstructed contours are fused along successive images using a simple result derived from the Kalman filter theory. The fusion process increases the localization precision and the robustness of the 3-D reconstruction. Finally, corners are extracted from the 3-D contours. They are used to generate hypotheses of the model position, using a hypothesize-and-verify algorithm that is described in detail. This algorithm yields a robust recognition and precise localization of the model in the scene. Results are presented on infrared image sequences with different resolutions, demonstrating the precision of the localization as well as the robustness and the low computational complexity of the algorithms.  相似文献   

11.
The purpose of this study is to investigate a variational method for joint multiregion three-dimensional (3-D) motion segmentation and 3-D interpretation of temporal sequences of monocular images. Interpretation consists of dense recovery of 3-D structure and motion from the image sequence spatiotemporal variations due to short-range image motion. The method is direct insomuch as it does not require prior computation of image motion. It allows movement of both viewing system and multiple independently moving objects. The problem is formulated following a variational statement with a functional containing three terms. One term measures the conformity of the interpretation within each region of 3-D motion segmentation to the image sequence spatiotemporal variations. The second term is of regularization of depth. The assumption that environmental objects are rigid accounts automatically for the regularity of 3-D motion within each region of segmentation. The third and last term is for the regularity of segmentation boundaries. Minimization of the functional follows the corresponding Euler-Lagrange equations. This results in iterated concurrent computation of 3-D motion segmentation by curve evolution, depth by gradient descent, and 3-D motion by least squares within each region of segmentation. Curve evolution is implemented via level sets for topology independence and numerical stability. This algorithm and its implementation are verified on synthetic and real image sequences. Viewers presented with anaglyphs of stereoscopic images constructed from the algorithm's output reported a strong perception of depth.  相似文献   

12.
This paper presents a model-based vision system for dentistry that will assist in diagnosis, treatment planning, and surgical simulation. Dentistry requires an accurate three-dimensional (3-D) representation of the teeth and jaws for diagnostic and treatment purposes. The proposed integrated computer vision system constructs a 3-D model of the patient's dental occlusion using an intraoral video camera. A modified shape from shading (SFS) technique, using perspective projection and camera calibration, extracts the 3-D information from a sequence of two-dimensional (2-D) images of the jaw. Data fusion of range data and 3-D registration techniques develop the complete jaw model. Triangulation is then performed, and a solid 3-D model is reconstructed. The system performance is investigated using ground truth data, and the results show acceptable reconstruction accuracy.  相似文献   

13.
A method has been developed to reconstruct three-dimensional (3-D) surfaces from two-dimensional (2-D) projection data. It is used to produce individualized boundary element models, consisting of thorax and lung surfaces, for electro- and magnetocardiographic inverse problems. Two orthogonal projections are utilized. A geometrical prior model, built using segmented magnetic resonance images, is deformed according to profiles segmented from projection images. In the authors' method, virtual X-ray images of the prior model are first constructed by simulating real X-ray imaging. The 2-D profiles of the model are segmented from the projections and elastically matched with the profiles segmented from patient data. The displacement vectors produced by the elastic 2-D matching are back projected onto the 3-D surface of the prior model. Finally, the model is deformed, using the back-projected vectors. Two different deformation methods are proposed. The accuracy of the method is validated by a simulation. The average reconstruction error of a thorax and lungs was 1.22 voxels, corresponding to about 5 mm  相似文献   

14.
陈坦  赖建军赵悦 《红外》2006,27(9):24-28
光栅投影成像法经常用于物体的非接触形状测量和形变测量。通过莫尔相移法,可以实时获得物体表面的等高轮廓线。但是在测量高速运动物体三维轮廓图像时误差较大,因为相移法需要拍摄几张经过相移后的变形光栅。在加入了DMD芯片后,可以在CCD的一帧图像时间内完成所有的相移后变形光栅的图像拍摄,有效地降低了高速运动物体三维轮廓成像的误差。  相似文献   

15.
Magnetic resonance (MR) tagging has shown great potential for noninvasive measurement of the motion of a beating heart. In MR tagged images, the heart appears with a spatially encoded pattern that moves with the tissue. The position of the tag pattern in each frame of the image sequence can be used to obtain a measurement of the 3-D displacement field of the myocardium. The measurements are sparse, however, and interpolation is required to reconstruct a dense displacement field from which measures of local contractile performance such as strain can be computed. Here, the authors propose a method for estimating a dense displacement field from sparse displacement measurements. Their approach is based on a multidimensional stochastic model for the smoothness and divergence of the displacement field and the Fisher estimation framework. The main feature of this method is that both the displacement field model and the resulting estimate equation are defined only on the irregular domain of the myocardium. The authors' methods are validated on both simulated and in vivo heart data.  相似文献   

16.
Very low bit-rate coding requires new paradigms that go well beyond pixel- and frame-based video representations. We introduce a novel content-based video representation using tridimensional entities: textured object models and pose estimates. The multiproperty object models carry stochastic information about the shape and texture of each object present in the scene. The pose estimates define the position and orientation of the objects for each frame. This representation is compact. It provides alternative means for handling video by manipulating and compositing three-dimensional (3-D) entities. We call this representation tridimensional video compositing, or 3DVC for short. We present the 3DVC framework and describe the methods used to construct incrementally the object models and the pose estimates from unregistered noisy depth and texture measurements. We also describe a method for video frame reconstruction based on 3-D scene assembly, and discuss potential applications of 3DVC to video coding and content-based handling. 3DVC assumes that the objects in the scene are rigid and segmented. By assuming segmentation, we do not address the difficult questions of nonrigid segmentation and multiple object segmentation. In our experiments, segmentation is obtained via depth thresholding. It is important to notice that 3DVC is independent of the segmentation technique adopted. Experimental results with synthetic and real video sequences where compression ratios in the range of 1:150-1:2700 are achieved demonstrate the applicability of the proposed representation to very low bit-rate coding  相似文献   

17.
3-D object recognition using 2-D views   总被引:1,自引:0,他引:1  
We consider the problem of recognizing 3-D objects from 2-D images using geometric models and assuming different viewing angles and positions. Our goal is to recognize and localize instances of specific objects (i.e., model-based) in a scene. This is in contrast to category-based object recognition methods where the goal is to search for instances of objects that belong to a certain visual category (e.g., faces or cars). The key contribution of our work is improving 3-D object recognition by integrating Algebraic Functions of Views (AFoVs), a powerful framework for predicting the geometric appearance of an object due to viewpoint changes, with indexing and learning. During training, we compute the space of views that groups of object features can produce under the assumption of 3-D linear transformations, by combining a small number of reference views that contain the object features using AFoVs. Unrealistic views (e.g., due to the assumption of 3-D linear transformations) are eliminated by imposing a pair of rigidity constraints based on knowledge of the transformation between the reference views of the object. To represent the space of views that an object can produce compactly while allowing efficient hypothesis generation during recognition, we propose combining indexing with learning in two stages. In the first stage, we sample the space of views of an object sparsely and represent information about the samples using indexing. In the second stage, we build probabilistic models of shape appearance by sampling the space of views of the object densely and learning the manifold formed by the samples. Learning employs the Expectation-Maximization (EM) algorithm and takes place in a "universal," lower-dimensional, space computed through Random Projection (RP). During recognition, we extract groups of point features from the scene and we use indexing to retrieve the most feasible model groups that might have produced them (i.e., hypothesis generation). The likelihood of each hypothesis is then computed using the probabilistic models of shape appearance. Only hypotheses ranked high enough are considered for further verification with the most likely hypotheses verified first. The proposed approach has been evaluated using both artificial and real data, illustrating promising performance. We also present preliminary results illustrating extensions of the AFoVs framework to predict the intensity appearance of an object. In this context, we have built a hybrid recognition framework that exploits geometric knowledge to hypothesize the location of an object in the scene and both geometrical and intesnity information to verify the hypotheses.  相似文献   

18.
This paper describes augmented reality visualization for the guidance of breast-conservative cancer surgery using ultrasonic images acquired in the operating room just before surgical resection. By combining an optical three-dimensional (3-D) position sensor, the position and orientation of each ultrasonic cross section are precisely measured to reconstruct geometrically accurate 3-D tumor models from the acquired ultrasonic images. Similarly, the 3-D position and orientation of a video camera are obtained to integrate video and ultrasonic images in a geometrically accurate manner. Superimposing the 3-D tumor models onto live video images of the patient's breast enables the surgeon to perceive the exact 3-D position of the tumor, including irregular cancer invasions which cannot be perceived by touch, as if it were visible through the breast skin. Using the resultant visualization, the surgeon can determine the region for surgical resection in a more objective and accurate manner, thereby minimizing the risk of a relapse and maximizing breast conservation. The system was shown to be effective in experiments using phantom and clinical data  相似文献   

19.
是快速获取视频关键信息的有效途径,现有的视频摘要方法通常计算复杂度高,在计算资源受限的场景下难以实际应用。为此,提出了一种高效的顾及方向信息的时空联合监控视频摘要方法。该方法首先采用水平切片获得目标时空运动轨迹;其次去除时空轨迹背景并计算直线轨迹斜率,依据目标时空轨迹斜率完成目标运动方向判定;然后检测采样域运动片段以确定目标在视频中的时序位置;最后依据目标时序位置及其运动方向自适应构建视频摘要。实验结果表明,所提方法的帧平均处理时间(average frame processing time,AFPT)达到了0.374 s,明显优于对比方法,且所生成的视频摘要简洁高效、用户体验好。  相似文献   

20.
In this paper, we derive a spatiotemporal extrapolation method for 3-D discrete signals. Extending a discrete signal beyond a limited number of known samples is commonly referred to as discrete signal extrapolation. Extrapolation problems arise in many applications in video communications. Transmission errors in video communications may cause data losses which are concealed by extrapolating the surrounding video signal into the missing area. The same principle is applied for TV logo removal. Prediction in hybrid video coding is also interpreted as an extrapolation problem. Conventionally, the unknown areas in the video sequence are estimated from either the spatial or temporal surrounding. Our approach considers the spatiotemporal signal including the missing area in a volume and replaces the unknown samples by extrapolating the surrounding signal from spatial, as well as temporal direction. By exploiting spatial and temporal correlations at the same time, it is possible to inherently compensate motion. Deviations in luminance occurring from frame to frame can be compensated, too.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号