首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
近年来,随着 GPU 技术的深入发展和并行算法的日益成熟,使得实时三维重建成 为可能。文中实现了一种针对小场景的交互式稠密三维重建系统,此系统借助先进的移动跟踪 技术,可以准确地估计相机的即时位置。提出了一种改进的多视深度生成算法,在 GPU 加速下 能够实时计算场景的深度。改进算法中的亚像素级的半全局匹配代价累积提高了多视立体匹配 的精度,并结合全局优化的方法计算出了准确的场景深度信息。深度图被转换为距离场,使用 全局优化的直方图压缩融合算法和并行的原始对偶算法实现了深度的实时融合。实验结果证明 了重建系统的可行性和重建算法的正确性。  相似文献   

2.
在动态场景中提取运动目标是开展视频分析的关键问题,也是当前计算机视觉与图像处理技术领域中的热门课题。本文提出了一种适用于动态场景的运动目标提取新算法,算法先根据摄像机全局运动模型计算全局运动参数,再利用三帧差分法得到分割的前景。将分割为背景的像素点映射到邻近帧,求得各帧的像素点为背景时其高斯模型的均值及方差。最后利用粒子滤波预测出下一帧前景区域,计算各像素点为前景的概率,获得运动目标的视频分割结果。实验表明,本文算法有效地克服了由于全局运动模型参数估算偏差而导致的累积误差,能以更高精度实现跳水运动视频中的目标分割。  相似文献   

3.
We present a hybrid camera system for capturing video at high spatial and spectral resolutions. Composed of an red, green, and blue (RGB) video camera, a grayscale video camera and a few optical elements, the hybrid camera system simultaneously records two video streams: an RGB video with high spatial resolution, and a multispectral (MS) video with low spatial resolution. After registration of the two video streams, our system propagates the MS information into the RGB video to produce a video with both high spectral and spatial resolution. This propagation between videos is guided by color similarity of pixels in the spectral domain, proximity in the spatial domain, and the consistent color of each scene point in the temporal domain. The propagation algorithm, based on trilateral filtering, is designed to rapidly generate output video from the captured data at frame rates fast enough for real-time video analysis tasks such as tracking and surveillance. We evaluate the proposed system using both simulations with ground truth data and on real-world scenes. The accuracy of spectral capture is examined through comparisons with ground truth and with a commercial spectrometer. The utility of this high resolution MS video data is demonstrated on the applications of dynamic white balance adjustment, object tracking, and separating the appearance contributions of different illumination sources. The various high resolution MS video datasets that we captured will be made publicly available to facilitate research on dynamic spectral data analysis.  相似文献   

4.
In this paper we propose a system for the analysis of user generated video (UGV). UGV often has a rich camera motion structure that is generated at the time the video is recorded by the person taking the video, i.e., the ?camera person.? We exploit this structure by defining a new concept known as camera view for temporal segmentation of UGV. The segmentation provides a video summary with unique properties that is useful in applications such as video annotation. Camera motion is also a powerful feature for identification of keyframes and regions of interest (ROIs) since it is an indicator of the camera person's interests in the scene and can also attract the viewers' attention. We propose a new location-based saliency map which is generated based on camera motion parameters. This map is combined with other saliency maps generated using features such as color contrast, object motion and face detection to determine the ROIs. In order to evaluate our methods we conducted several user studies. A subjective evaluation indicated that our system produces results that is consistent with viewers' preferences. We also examined the effect of camera motion on human visual attention through an eye tracking experiment. The results showed a high dependency between the distribution of fixation points of the viewers and the direction of camera movement which is consistent with our location-based saliency map.  相似文献   

5.
We present a technique for coupling simulated fluid phenomena that interact with real dynamic scenes captured as a binocular video sequence. We first process the binocular video sequence to obtain a complete 3D reconstruction of the scene, including velocity information. We use stereo for the visible parts of 3D geometry and surface completion to fill the missing regions. We then perform fluid simulation within a 3D domain that contains the object, enabling one‐way coupling from the video to the fluid. In order to maintain temporal consistency of the reconstructed scene and the animated fluid across frames, we develop a geometry tracking algorithm that combines optic flow and depth information with a novel technique for “velocity completion”. The velocity completion technique uses local rigidity constraints to hypothesize a motion field for the entire 3D shape, which is then used to propagate and filter the reconstructed shape over time. This approach not only generates smoothly varying geometry across time, but also simultaneously provides the necessary boundary conditions for one‐way coupling between the dynamic geometry and the simulated fluid. Finally, we employ a GPU based scheme for rendering the synthetic fluid in the real video, taking refraction and scene texture into account.  相似文献   

6.
We approach mosaicing as a camera tracking problem within a known parameterized surface. From a video of a camera moving within a surface, we compute a mosaic representing the texture of that surface, flattened onto a planar image. Our approach works by defining a warp between images as a function of surface geometry and camera pose. Globally optimizing this warp to maximize alignment across all frames determines the camera trajectory, and the corresponding flattened mosaic image. In contrast to previous mosaicing methods which assume planar or distant scenes, or controlled camera motion, our approach enables mosaicing in cases where the camera moves unpredictably through proximal surfaces, such as in medical endoscopy applications.  相似文献   

7.
This work presents an effective approach to visual tracking using a graphics processing unit (GPU) for computation purposes. In order to get a performance improvement against other platforms it is convenient to select proper algorithms such as population-based ones. They expose a parallel-friendly nature needing from many independent evaluations that map well to the parallel architecture of the GPU. To this end we propose a particle filter (PF) hybridized with a memetic algorithm (MA) to produce a MAPF tracking algorithm for single and multiple object tracking problems. Previous experimental results demonstrated that the MAPF algorithm showed more accurate tracking results than the standard PF, and now we extend those results with the first complete adaptation of the PF and the MAPF for visual tracking to the NVIDIA CUDA architecture. Results show a GPU speedup between 5×–16× for different configurations.  相似文献   

8.
丝路文化是联系一带一路战略的重要纽带,其传承意义重大,但是由于历史地理原因,丝路文化中代表性的历史遗产分散或损坏,难以有效地呈现,因此,本文面向丝路文化的虚拟展示与数字化,提出并实现了基于虚拟现实技术的丝路文化传承平台,通过历史遗迹复原以及基于图像的三维重建,还原了丝路文化中重要节点宁夏固原有关的历史遗迹、文物和事件....  相似文献   

9.
Depth estimation in a scene using image pairs acquired by a stereo camera setup, is one of the important tasks of stereo vision systems. The disparity between the stereo images allows for 3D information acquisition which is indispensable in many machine vision applications. Practical stereo vision systems involve wide ranges of disparity levels. Considering that disparity map extraction of an image is a computationally demanding task, practical real-time FPGA based algorithms require increased device utilization resource usage, depending on the disparity levels operational range, which leads to significant power consumption. In this paper a new hardware-efficient real-time disparity map computation module is developed. The module constantly estimates the precisely required range of disparity levels upon a given stereo image set, maintaining this range as low as possible by verging the stereo setup cameras axes. This enables a parallel-pipelined design, for the overall module, realized on a single FPGA device of the Altera Stratix IV family. Accurate disparity maps are computed at a rate of more than 320 frames per second, for a stereo image pair of 640 × 480 pixels spatial resolution with a disparity range of 80 pixels. The presented technique provides very good processing speed at the expense of accuracy, with very good scalability in terms of disparity levels. The proposed method enables a suitable module delivering high performance in real-time stereo vision applications, where space and power are significant concerns.  相似文献   

10.
目的 越来越多的应用依赖于对场景深度图像准确且快速的观测和分析,如机器人导航以及在电影和游戏中对虚拟场景的设计建模等.飞行时间深度相机等直接的深度测量设备可以实时的获取场景的深度图像,但是由于硬件条件的限制,采集的深度图像分辨率比较低,无法满足实际应用的需要.通过立体匹配算法对左右立体图对之间进行匹配获得视差从而得到深度图像是计算机视觉的一种经典方法,但是由于左右图像之间遮挡以及无纹理区域的影响,立体匹配算法在这些区域无法匹配得到正确的视差,导致立体匹配算法在实际应用中存在一定的局限性.方法 结合飞行时间深度相机等直接的深度测量设备和立体匹配算法的优势,提出一种新的深度图像重建方法.首先结合直接的深度测量设备采集的深度图像来构造自适应局部匹配权值,对左右图像之间的局部窗立体匹配过程进行约束,得到基于立体匹配算法的深度图像;然后基于左右检测原理将采集到的深度图像和匹配得到的深度图像进行有效融合;接着提出一种局部权值滤波算法,来进一步提高深度图像的重建质量.结果 实验结果表明,无论在客观指标还是视觉效果上,本文提出的深度图像重建算法较其他立体匹配算法可以得到更好的结果.其中错误率比较实验表明,本文算法较传统的立体匹配算法在深度重建错误率上可以提升10%左右.峰值信噪比实验结果表明,本文算法在峰值信噪比上可以得到10 dB左右的提升.结论 提出的深度图像重建方法通过结合高分辨率左右立体图对和初始的低分辨率深度图像,可以有效地重建高质量高分辨率的深度图像.  相似文献   

11.
Cinemagraphs are a popular new type of visual media that lie in‐between photos and video; some parts of the frame are animated and loop seamlessly, while other parts of the frame remain completely still. Cinemagraphs are especially effective for portraits because they capture the nuances of our dynamic facial expressions. We present a completely automatic algorithm for generating portrait cinemagraphs from a short video captured with a hand‐held camera. Our algorithm uses a combination of face tracking and point tracking to segment face motions into two classes: gross, large‐scale motions that should be removed from the video, and dynamic facial expressions that should be preserved. This segmentation informs a spatially‐varying warp that removes the large‐scale motion, and a graph‐cut segmentation of the frame into dynamic and still regions that preserves the finer‐scale facial expression motions. We demonstrate the success of our method with a variety of results and a comparison to previous work.  相似文献   

12.
Li  Chao  Chen  Zhihua  Sheng  Bin  Li  Ping  He  Gaoqi 《Multimedia Tools and Applications》2020,79(7-8):4661-4679

In this paper, we introduce an approach to remove the flickers in the videos, and the flickers are caused by applying image-based processing methods to original videos frame by frame. First, we propose a multi-frame based video flicker removal method. We utilize multiple temporally corresponding frames to reconstruct the flickering frame. Compared with traditional methods, which reconstruct the flickering frame just from an adjacent frame, reconstruction with multiple temporally corresponding frames reduces the warp inaccuracy. Then, we optimize our video flickering method from following aspects. On the one hand, we detect the flickering frames in the video sequence with temporal consistency metrics, and just reconstructing the flickering frames can accelerate the algorithm greatly. On the other hand, we just choose the previous temporally corresponding frames to reconstruct the output frames. We also accelerate our video flicker removal with GPU. Qualitative experimental results demonstrate the efficiency of our proposed video flicker method. With algorithmic optimization and GPU acceleration, the time complexity of our method also outperforms traditional video temporal coherence methods.

  相似文献   

13.
To capture the full brightness range of natural scenes, cameras automatically adjust the exposure value which causes the brightness of scene points to change from frame to frame. Given such a video sequence, we introduce a system for tracking features and estimating the radiometric response function of the camera and the exposure difference between frames simultaneously. We model the global and nonlinear process that is responsible for the changes in image brightness rather than adapting to the changes locally and linearly which makes our tracking more robust to the change in brightness. We apply our system to perform structure-from-motion and stereo to reconstruct a texture-mapped 3D surface from a video taken in a high dynamic range environment.  相似文献   

14.
We present an approach that significantly enhances the capabilities of traditional image mosaicking. The key observation is that as a camera moves, it senses each scene point multiple times. We rigidly attach to the camera an optical filter with spatially varying properties, so that multiple measurements are obtained for each scene point under different optical settings. Fusing the data captured in the multiple images yields an image mosaic that includes additional information about the scene. We refer to this approach as generalized mosaicing. In this paper we show that this approach can significantly extend the optical dynamic range of any given imaging system by exploiting vignetting effects. We derive the optimal vignetting configuration and implement it using an external filter with spatially varying transmittance. We also derive efficient scene sampling conditions as well as ways to self calibrate the vignetting effects. Maximum likelihood is used for image registration and fusion. In an experiment we mounted such a filter on a standard 8-bit video camera, to obtain an image panorama with dynamic range comparable to imaging with a 16-bit camera.  相似文献   

15.
《Real》1998,4(1):21-40
This paper describes how image sequences taken by a moving video camera may be processed to detect and track moving objects against a moving background in real-time. The motion segmentation and shape tracking system is known as a scene segmenter establishing tracking, version 2 (ASSET-2). Motion is found by tracking image features, and segmentation is based on first-order (i.e. six-parameter) flow fields. Shape tracking is performed using two-dimensional radial map representations. The system runs in real-time, and is accurate and reliable. It requires no camera calibration and no knowledge of the camera's motion.  相似文献   

16.
In recent years, many image-based rendering techniques have advanced from static to dynamic scenes and thus become video-based rendering (VBR) methods. But actually, only a few of them can render new views on-line. We present a new VBR system that creates new views of a live dynamic scene. This system provides high quality images and does not require any background subtraction. Our method follows a plane-sweep approach and reaches real-time rendering using consumer graphic hardware, graphics processing unit (GPU). Only one computer is used for both acquisition and rendering. The video stream acquisition is performed by at least 3 webcams. We propose an additional video stream management that extends the number of webcams to 10 or more. These considerations make our system low-cost and hence accessible for everyone. We also present an adaptation of our plane-sweep method to create simultaneously multiple views of the scene in real-time. Our system is especially designed for stereovision using autostereoscopic displays. The new views are computed from 4 webcams connected to a computer and are compressed in order to be transfered to a mobile phone. Using GPU programming, our method provides up to 16 images of the scene in real-time. The use of both GPU and CPU makes this method work on only one consumer grade computer.  相似文献   

17.
Segmenting and tracking of people is an important aim in the video analysis with multitudinous applications. The scene calibration enables the system to process the input video in a different way depending on the camera position and the scene characteristics for arising the successful results. In complex situations there exists extended occlusions, shadows and/or rejections, so that an appropriate calibration is required in order to achieve a highly developed people's segmentation as well as a tracking algorithm. In the majority of cases, once the system has been installed in a certain scene, it is difficult to obtain the calibration information of the scene. In this paper, an automatic method to calibrate the scene for detecting and tracking people systems is presented based on measurements of video sequences captured from a stationary camera.  相似文献   

18.
In this paper, we present an algorithm to combine edge information from stereo-derived disparity maps with edges from the original intensity/color image to improve the contour detection in images of natural scenes. After computing the disparity map, we generate a so-called “edge-combination image,” which relies on those edges of the original image that are also present in the stereo map. We describe an algorithm to identify corresponding intensity and disparity edges, which are usually not perfectly aligned due to errors in the stereo reconstruction. Our experiments show that the proposed edge-combination approach can significantly improve the segmentation results of an active contour algorithm. The text was submitted by the authors in English. Danijela Markovic graduated from the Faculty of Electronic Engineering, University of Nis, Serbia in 1997. She is currently a PhD student at the Institute for Software Technology and Interactive Systems, Vienna University of Technology. Her research interests are in computer vision and computer graphics, including stereo vision and curve/surface modeling. Particularly, she is interested in object segmentation, feature extraction, and tracking. Margrit Gelautz received her PhD degree in computer science from Graz University of Technology, Austria. She worked on stereo and interferometric image processing for radar remote sensing applications during a postdoctoral stay at Stanford University. Her current research interests include image and video processing for multimedia applications, with a focus on 3D vision and rendering techniques.  相似文献   

19.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

20.
目的 近年来,采用神经网络完成人像实时抠图已成为计算机视觉领域的研究热点,现有相关网络在处理高分辨率视频时还无法满足实时性要求,为此本文提出一种结合背景图的高分辨率视频人像实时抠图网络。方法 给出一种由基准网络和精细化网络构成的双层网络,在基准网络中,视频帧通过编码器模块提取图像的多尺度特征,采用金字塔池化模块融合这些特征作为循环解码器网络的输入;在循环解码器中,通过残差门控循环单元聚合连续视频帧间的时间信息,以此生成蒙版图、前景残差图和隐藏特征图,采用残差结构降低模型参数量并提高网络的实时性。为提高高分辨率图像实时抠图性能,在精细化网络中,设计高分辨率信息指导模块,通过高分辨率图像信息指导低分辨率图像的方式生成高质量人像抠图结果。结果 与近年来的相关网络模型进行实验对比,实验结果表明,本文方法在高分辨率数据集Human2K上优于现有相关方法,在评价指标(绝对误差、均方误差、梯度、连通性)上分别提升了18.8%、39.2%、40.7%、20.9%。在NVIDIA GTX 1080Ti GPU上处理4 K分辨率影像运行速率可达26帧/s,处理HD(high definition)分辨率影像运行速率可达43帧/s。结论 本文模型能够更好地完成高分辨率人像实时抠图任务,可以为影视、短视频社交以及网络会议等高级应用提供更好的支持。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号