首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents an efficient image-based approach to navigate a scene based on only three wide-baseline uncalibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images, an accurate trifocal plane is extracted from the trifocal tensor of these three images. Next, based on a small number of feature marks using a friendly GUI, the correct dense disparity maps are obtained by using our trinocular-stereo algorithm. Employing the barycentric warping scheme with the computed disparity, we can generate an arbitrary novel view within a triangle spanned by three camera centers. Furthermore, after self-calibration of the cameras, 3D objects can be correctly augmented into the virtual environment synthesized by the tri-view morphing algorithm. Three applications of the tri-view morphing algorithm are demonstrated. The first one is 4D video synthesis, which can be used to fill in the gap between a few sparsely located video cameras to synthetically generate a video from a virtual moving camera. This synthetic camera can be used to view the dynamic scene from a novel view instead of the original static camera views. The second application is multiple view morphing, where we can seamlessly fly through the scene over a 2D space constructed by more than three cameras. The last one is dynamic scene synthesis using three still images, where several rigid objects may move in any orientation or direction. After segmenting three reference frames into several layers, the novel views in the dynamic scene can be generated by applying our algorithm. Finally, the experiments are presented to illustrate that a series of photo-realistic virtual views can be generated to fly through a virtual environment covered by several static cameras.  相似文献   

2.
Stereovision is an effective technique to use a CCD video camera to determine the 3D position of a target object from two or more simultaneous views of the scene. Camera calibration is a central issue in finding the position of objects in a stereovision system. This is usually carried out by calibrating each camera independently, and then applying a geometric transformation of the external parameters to find the geometry of the stereo setting. After calibration, the distance of various target objects in the scene can be calculated with CCD video cameras, and recovering the 3D structure from 2D images becomes simpler. However, the process of camera calibration is complicated. Based on the ideal pinhole model of a camera, we describe formulas to calculate intrinsic parameters that specify the correct camera characteristics, and extrinsic parameters that describe the spatial relationship between the camera and the world coordinate system. A simple camera calibration method for our CCD video cameras and corresponding experiment results are also given. This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

3.
Towards video-based immersive environments   总被引:2,自引:0,他引:2  
Video provides a comprehensive visual record of environment activity over time. Thus, video data is an attractive source of information for the creation of virtual worlds which require some real-world fidelity. This paper describes the use of multiple streams of video data for the creation of immersive virtual environments. We outline our multiple perspective interactive video (MPI-Video) architecture which provides the infrastructure for the processing and analysis of multiple streams of video data. Our MPI-Video system performs automated analysis of the raw video and constructs a model of the environment and object activity within this environment. This model provides a comprehensive representation of the world monitored by the cameras which, in turn, can be used in the construction of a virtual world. In addition, using the information produced and maintained by the MPI-Video system, our immersive video system generates virtual video sequences. These are sequences of the dynamic environment from an arbitrary view point generated using the real camera data. Such sequences allow a user to navigate through the environment and provide a sense of immersion in the scene. We discuss results from our MPI-Video prototype, outline algorithms for the construction of virtual views and provide examples of a variety of such immersive video sequences.  相似文献   

4.
We report an autonomous surveillance system with multiple pan-tilt-zoom (PTZ) cameras assisted by a fixed wide-angle camera. The wide-angle camera provides large but low resolution coverage and detects and tracks all moving objects in the scene. Based on the output of the wide-angle camera, the system generates spatiotemporal observation requests for each moving object, which are candidates for close-up views using PTZ cameras. Due to the fact that there are usually much more objects than the number of PTZ cameras, the system first assigns a subset of the requests/objects to each PTZ camera. The PTZ cameras then select the parameter settings that best satisfy the assigned competing requests to provide high resolution views of the moving objects. We propose an approximation algorithm to solve the request assignment and the camera parameter selection problems in real time. The effectiveness of the proposed system is validated in both simulation and physical experiment. In comparison with an existing work using simulation, it shows that in heavy traffic scenarios, our algorithm increases the number of observed objects by over 210%.  相似文献   

5.
Images/videos captured by portable devices (e.g., cellphones, DV cameras) often have limited fields of view. Image stitching, also referred to as mosaics or panorama, can produce a wide angle image by compositing several photographs together. Although various methods have been developed for image stitching in recent years, few works address the video stitching problem. In this paper, we present the first system to stitch videos captured by hand‐held cameras. We first recover the 3D camera paths and a sparse set of 3D scene points using CoSLAM system, and densely reconstruct the 3D scene in the overlapping regions. Then, we generate a smooth virtual camera path, which stays in the middle of the original paths. Finally, the stitched video is synthesized along the virtual path as if it was taken from this new trajectory. The warping required for the stitching is obtained by optimizing over both temporal stability and alignment quality, while leveraging on 3D information at our disposal. The experiments show that our method can produce high quality stitching results for various challenging scenarios.  相似文献   

6.
Automatic 3D animation generation techniques are becoming increasingly popular in different areas related to computer graphics such as video games and animated movies. They help automate the filmmaking process even by non professionals without or with minimal intervention of animators and computer graphics programmers. Based on specified cinematographic principles and filming rules, they plan the sequence of virtual cameras that the best render a 3D scene. In this paper, we present an approach for automatic movie generation using linear temporal logic to express these filming and cinematography rules. We consider the filming of a 3D scene as a sequence of shots satisfying given filming rules, conveying constraints on the desirable configuration (position, orientation, and zoom) of virtual cameras. The selection of camera configurations at different points in time is understood as a camera plan, which is computed using a temporal-logic based planning system (TLPlan) to obtain a 3D movie. The camera planner is used within an automated planning application for generating 3D tasks demonstrations involving a teleoperated robot arm on the the International Space Station (ISS). A typical task demonstration involves moving the robot arm from one configuration to another. The main challenge is to automatically plan the configurations of virtual cameras to film the arm in a manner that conveys the best awareness of the robot trajectory to the user. The robot trajectory is generated using a path-planner. The camera planner is then invoked to find a sequence of configurations of virtual cameras to film the trajectory.  相似文献   

7.
Monitoring of large sites requires coordination between multiple cameras, which in turn requires methods for relating events between distributed cameras. This paper tackles the problem of automatic external calibration of multiple cameras in an extended scene, that is, full recovery of their 3D relative positions and orientations. Because the cameras are placed far apart, brightness or proximity constraints cannot be used to match static features, so we instead apply planar geometric constraints to moving objects tracked throughout the scene. By robustly matching and fitting tracked objects to a planar model, we align the scene's ground plane across multiple views and decompose the planar alignment matrix to recover the 3D relative camera and ground plane positions. We demonstrate this technique in both a controlled lab setting where we test the effects of errors in the intrinsic camera parameters, and in an uncontrolled, outdoor setting. In the latter, we do not assume synchronized cameras and we show that enforcing geometric constraints enables us to align the tracking data in time. In spite of noise in the intrinsic camera parameters and in the image data, the system successfully transforms multiple views of the scene's ground plane to an overhead view and recovers the relative 3D camera and ground plane positions  相似文献   

8.
This paper proposes a method to realize a 3D video system that can capture video data from multiple cameras, reconstruct 3D models, transmit 3D video streams via the network, and display them on remote PCs. All processes are done in real time. We represent a player with a simplified 3D model consisting of a single plane and a live video texture extracted from multiple cameras. This 3D model is simple enough to be transmitted via a network. A prototype system has been developed and tested at actual soccer stadiums. A 3D video of a typical soccer scene, which includes more than a dozen players, was processed at video rate and transmitted to remote PCs through the internet at 15–24 frames per second.  相似文献   

9.
A Video-Based 3D-Reconstruction of Soccer Games   总被引:1,自引:0,他引:1  
In this paper we present SoccerMan, a reconstruction system designed to generate animated, virtual 3D views from two synchronous video sequences of a short part of a given soccer game. After the reconstruction process, which needs also some manual interaction, the virtual 3D scene can be examined and 'replayed' from any viewpoint. Players are modeled as so-called animated texture objects, i.e. 2D player shapes are extracted from video and texture-mapped onto rectangles in 3D space. Animated texture objects have shown very appropriate as a 3D representation of soccer players in motion, as the visual nature of the original human motion is preserved. The trajectories of the players and the ball in 3D space are reconstructed accurately. In order to create a 3D reconstruction of a given soccer scene, the following steps have to be executed: 1) Camera parameters of all frames of both sequences are computed (camera calibration). 2) The playground texture is extracted from the video sequences. 3) Trajectories of the ball and the players' heads are computed after manually specifying their image positions in a few key frames. 4) Player textures are extracted automatically from video. 5) The shapes of colliding or occluding players are separated automatically. 6) For visualization, player shapes are texture-mapped onto appropriately placed rectangles in virtual space. SoccerMan is a novel experimental sports analysis system with fairly ambitious objectives. Its design decisions, in particular to start from two synchronous video sequences and to model players by texture objects, have already proven promising.  相似文献   

10.
This paper addresses the synthesis of novel views of people from multiple view video. We consider the target area of the multiple camera 3D Virtual Studio for broadcast production with the requirement for free-viewpoint video synthesis for a virtual camera with the same quality as captured video. A framework is introduced for view-dependent optimisation of reconstructed surface shape to align multiple captured images with sub-pixel accuracy for rendering novel views. View-dependent shape optimisation combines multiple view stereo and silhouette constraints to robustly estimate correspondence between images in the presence of visual ambiguities such as uniform surface regions, self-occlusion, and camera calibration error. Free-viewpoint rendering of video sequences of people achieves a visual quality comparable to the captured video images. Experimental evaluation demonstrates that this approach overcomes limitations of previous stereo- and silhouette-based approaches to rendering novel views of moving people.  相似文献   

11.
We present a system for automatically extracting the region of interest (ROI) and controlling virtual cameras' control based on panoramic video. It targets applications such as classroom lectures and video conferencing. For capturing panoramic video, we use the FlyCam system that produces high resolution, wide-angle video by stitching video images from multiple stationary cameras. To generate conventional video, a region of interest can be cropped from the panoramic video. We propose methods for ROI detection, tracking, and virtual camera control that work in both the uncompressed and compressed domains. The ROI is located from motion and color information in the uncompressed domain and macroblock information in the compressed domain, and tracked using a Kalman filter. This results in virtual camera control that simulates human controlled video recording. The system has no physical camera motion and the virtual camera parameters are readily available for video indexing.  相似文献   

12.
Cognitive visual tracking is the process of observing and understanding the behavior of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision.  相似文献   

13.
郭洋  马翠霞  滕东兴  杨祎  王宏安 《软件学报》2016,27(5):1151-1162
随着治安监控系统的普及,越来越多的监控摄像头被安装在各个交通道路和公共场所中,每天都产生大量的监控视频.如今,监控视频分析工作主要是采用人工观看的方式来排查异常,以这种方式来分析视频内容耗费大量的人力和时间.目前,关于视频分析方面的研究大多是针对目标个体的异常行为检测和追踪,缺乏针对对象之间的关联关系的分析,对视频中的一些对象和场景之间的关联关系等还没有较为有效的表示和分析方法.针对这一现状,提出一种基于运动目标三维轨迹的关联视频可视分析方法来辅助人工分析视频,首先对视频资料进行预处理,获取各个目标对象的运动轨迹信息,由于二维轨迹难以处理轨迹的自相交、循环运动和停留等现象,并且没有时间信息就难以对同一空间内多个对象轨迹进行的关联性分析,于是结合时间维度对轨迹进行三维化扩展.该方法支持草图交互方式来操作,在分析过程中进行添加草图注释来辅助分析.可结合场景和对象的时空关系对轨迹进行关联性计算,得出对象及场景之间的关联模型,通过对对象在各个场景出现状况的统计,结合人工预先设定的规则,可实现对异常行为报警,辅助用户决策.  相似文献   

14.
In this paper, we describe a reconstruction method for multiple motion scenes, which are scenes containing multiple moving objects, from uncalibrated views. Assuming that the objects are moving with constant velocities, the method recovers the scene structure, the trajectories of the moving objects, the camera motion, and the camera intrinsic parameters (except skews) simultaneously. We focus on the case where the cameras have unknown and varying focal lengths while the other intrinsic parameters are known. The number of the moving objects is automatically detected without prior motion segmentation. The method is based on a unified geometrical representation of the static scene and the moving objects. It first performs a projective reconstruction using a bilinear factorization algorithm and, then, converts the projective solution to a Euclidean one by enforcing metric constraints. Experimental results on synthetic and real images are presented.  相似文献   

15.
3D reconstruction of a dynamic scene from features in two cameras usually requires synchronization and correspondences between the cameras. These may be hard to achieve due to occlusions, different orientation, different scales, etc. In this work we present an algorithm for reconstructing a dynamic scene from sequences acquired by two uncalibrated non-synchronized fixed affine cameras. It is assumed that (possibly) different points are tracked in the two sequences. The only constraint relating the two cameras is that every 3D point tracked in one sequence can be described as a linear combination of some of the 3D points tracked in the other sequence. Such constraint is useful, for example, for articulated objects. We may track some points on an arm in the first sequence, and some other points on the same arm in the second sequence. On the other extreme, this model can be used for generally moving points tracked in both sequences without knowing the correct permutation. In between, this model can cover non-rigid bodies with local rigidity constraints. We present linear algorithms for synchronizing the two sequences and reconstructing the 3D points tracked in both views. Outlier points are automatically detected and discarded. The algorithm can handle both 3D objects and planar objects in a unified framework, therefore avoiding numerical problems existing in other methods. This work was done while the authors were PhD students in the School of Computer Science and Engineering, the Hebrew University of Jerusalem.  相似文献   

16.
This paper presents a symbolic formalism for modeling and retrieving video data via the moving objects contained in the video images. The model integrates the representations of individual moving objects in a scene with the time-varying relationships between them by incorporating both the notions of object tracks and temporal sequences of PIRs (projection interval relationships). The model is supported by a set of operations which form the basis of a moving object algebra. This algebra allows one to retrieve scenes and information from scenes by specifying both spatial and temporal properties of the objects involved. It also provides operations to create new scenes from existing ones. A prototype implementation is described which allows queries to be specified either via an animation sketch or using the moving object algebra.  相似文献   

17.
The view-independent visualization of 3D scenes is most often based on rendering accurate 3D models or utilizes image-based rendering techniques. To compute the 3D structure of a scene from a moving vision sensor or to use image-based rendering approaches, we need to be able to estimate the motion of the sensor from the recorded image information with high accuracy, a problem that has been well-studied. In this work, we investigate the relationship between camera design and our ability to perform accurate 3D photography, by examining the influence of camera design on the estimation of the motion and structure of a scene from video data. By relating the differential structure of the time varying plenoptic function to different known and new camera designs, we can establish a hierarchy of cameras based upon the stability and complexity of the computations necessary to estimate structure and motion. At the low end of this hierarchy is the standard planar pinhole camera for which the structure from motion problem is non-linear and ill-posed. At the high end is a camera, which we call the full field of view polydioptric camera, for which the motion estimation problem can be solved independently of the depth of the scene which leads to fast and robust algorithms for 3D Photography. In between are multiple view cameras with a large field of view which we have built, as well as omni-directional sensors.  相似文献   

18.
视觉监控应用中多传感器协作的人脸检测系统   总被引:2,自引:0,他引:2  
提出了一种新颖的由两个可控摄像机组成的多传感器视觉监控系统,旨在实现户外环境下的实时跟踪与特征化运动目标.特别地,该系统利用一个在多个缩放级别上可操作的移动摄像机在连续视频帧中自动获取与跟踪人脸.配合它的是一架能执行自动目标跟踪与分类的固定广域摄像机.  相似文献   

19.
When occlusion is minimal, a single camera is generally sufficient to detect and track objects. However, when the density of objects is high, the resulting occlusion and lack of visibility suggests the use of multiple cameras and collaboration between them so that an object is detected using information available from all the cameras in the scene.In this paper, we present a system that is capable of segmenting, detecting and tracking multiple people in a cluttered scene using multiple synchronized surveillance cameras located far from each other. The system is fully automatic, and takes decisions about object detection and tracking using evidence collected from many pairs of cameras. Innovations that help us tackle the problem include a region-based stereo algorithm capable of finding 3D points inside an object knowing only the projections of the object (as a whole) in two views, a segmentation algorithm using bayesian classification and the use of occlusion analysis to combine evidence from different camera pairs.The system has been tested using different densities of people in the scene. This helps us determine the number of cameras required for a particular density of people. Experiments have also been conducted to verify and quantify the efficacy of the occlusion analysis scheme.  相似文献   

20.
目的 基于深度图的绘制(DIBR)是一种新型的虚拟视点生成技术,在诸多方面得到了广泛的应用。然而,该技术还不能满足实时性的绘制需求。为了在保证绘制质量不下降的前提下,尽可能地提高绘制速度,提出了一种高效的3D-Warping(3维坐标变换)算法。方法 主要在以下3个方面进行了改进:1)引入了深度—视差映射表技术,避免了重复地进行视差求取操作。2)对深度平坦的像素块进行基于块的3D-Warping,减少了映射的次数。对深度非平坦像素块中的像素点采取传统的基于像素点的3D-Warping,保证了映射的准确性。3)针对两种不同的3D-Warping方式,分别提出了相应的插值算法。在水平方向上,改进的像素插值算法对紧邻插值和Splatting(散射)插值算法进行了折中,只在映射像素点与待插值像素点很近的情况下才进行紧邻插值,否则进行Splatting插值;在深度方向上,它对Z-Buffer(深度缓存)技术进行了改进,舍弃了与前景物体太远的映射像素点,而对其他映射像素点按深度值进行加权操作。结果 实验结果表明,与标准绘制方案的整像素精度相比,绘制时间平均节省了72.05%;与标准绘制方案的半像素精度相比,PSNR平均提高了0.355dB,SSIM平均提高了0.00115。结论 改进算法非常适用于水平设置相机系统的DIBR技术中的整像素精度绘制,对包含大量深度平坦区域的视频序列效果明显,不但能够提高绘制的速度,而且可以有效地改善绘制的客观质量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号