首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
From depth sensors to thermal cameras, the increased availability of camera sensors beyond the visible spectrum has created many exciting applications. Most of these applications require combining information from these hyperspectral cameras with a regular RGB camera. Information fusion from multiple heterogeneous cameras can be a very complex problem. They can be fused at different levels from pixel to voxel or even semantic objects, with large variations in accuracy, communication, and computation costs. In this paper, we propose a system for robust segmentation of human figures in video sequences by fusing visible-light and thermal imageries. Our system focuses on the geometric transformation between visual blobs corresponding to human figures observed at both cameras. This approach provides the most reliable fusion at the expense of high computation and communication costs. To reduce the computational complexity of the geometric fusion, an efficient calibration procedure is first applied to rectify the two camera views without the complex procedure of estimating the intrinsic parameters of the cameras. To geometrically register different blobs at the pixel level, a blob-to-blob homography in the rectified domain is then computed in real-time by estimating the disparity for each blob-pair. Precise segmentation is finally achieved using a two-tier tracking algorithm and a unified background model. Our experimental results show that our proposed system provides significant improvements over existing schemes under various conditions.  相似文献   

2.
In this paper we present an automatic method for calibrating a network of cameras that works by analyzing only the motion of silhouettes in the multiple video streams. This is particularly useful for automatic reconstruction of a dynamic event using a camera network in a situation where pre-calibration of the cameras is impractical or even impossible. The key contribution of this work is a RANSAC-based algorithm that simultaneously computes the epipolar geometry and synchronization of a pair of cameras only from the motion of silhouettes in video. Our approach involves first independently computing the fundamental matrix and synchronization for multiple pairs of cameras in the network. In the next stage the calibration and synchronization for the complete network is recovered from the pairwise information. Finally, a visual-hull algorithm is used to reconstruct the shape of the dynamic object from its silhouettes in video. For unsynchronized video streams with sub-frame temporal offsets, we interpolate silhouettes between successive frames to get more accurate visual hulls. We show the effectiveness of our method by remotely calibrating several different indoor camera networks from archived video streams.  相似文献   

3.
In video post-production applications, camera motion analysis and alignment are important in order to ensure the geometric correctness and temporal consistency. In this paper, we trade some generality in estimating and aligning camera motion for reduced computational complexity and increased image-based nature. The main contribution is to use fundamental ratios to synchronize video sequences of distinct scenes captured by cameras undergoing similar motions. We also present a simple method to align 3D camera trajectories when the fundamental ratios are not able to match the noisy trajectories. Experimental results show that our method can accurately synchronize sequences even when the scenes are totally different and have dense depths. An application on 3D object transfer is also demonstrated.  相似文献   

4.
Global motion generally describes the motion of a camera, although it may comprise motions of large objects. Global motions are often modeled by parametric transformations of two-dimensional images. The process of estimating the motions parameters is called global motion estimation (GME). GME is widely employed in many applications such as video coding, image stabilization and super-resolution. To estimate global motion parameters, the Levenburg–Marquardt algorithm (LMA) is typically used to minimize an objective function iteratively. Since the region of support for the global motion representation consists of the entire image frame, the minimization process tends to be very expensive computationally by involving all the pixels within an image frame. In order to significantly reduce the computational complexity of the LMA, we proposed to select only a small subset of the pixels for estimating the motion parameters, based on several subsampling patterns and their combinations. Simulation results demonstrated that the proposed method could speed up the conventional GME approach by over ten times, with only a very slight loss (less than 0.1 dB) in estimation accuracy. The proposed method was also found to outperform several state-of-the-art fast GME methods in terms of the speed/accuracy tradeoffs.  相似文献   

5.
Image stabilization is a process to smooth the unstable motion vector of video sequences to achieve its stabilization. Even though the classical image stabilization techniques seem already very mature so far, similar advances have not been extended to the quantum computing domain. In this study, we explore a novel quantum video framework and make a modest attempt to perform the image stabilization based on it by utilizing the quantum comparator and quantum image translation operations. The proposed method is capable of estimating the camera motion during exposure and compensating for the video jitter caused by the motion. In addition, the quantum properties, i.e., entanglement and parallelism, ensure that the quantum image stabilization is feasible and effective. Finally, a simple experiment to stabilize a four-frame jittered quantum video is implemented using Matlab based on linear algebra with complex vectors as quantum states and unitary matrices as unitary transforms to show the feasibility and merits of this proposal.  相似文献   

6.
We present a system for automatically extracting the region of interest (ROI) and controlling virtual cameras' control based on panoramic video. It targets applications such as classroom lectures and video conferencing. For capturing panoramic video, we use the FlyCam system that produces high resolution, wide-angle video by stitching video images from multiple stationary cameras. To generate conventional video, a region of interest can be cropped from the panoramic video. We propose methods for ROI detection, tracking, and virtual camera control that work in both the uncompressed and compressed domains. The ROI is located from motion and color information in the uncompressed domain and macroblock information in the compressed domain, and tracked using a Kalman filter. This results in virtual camera control that simulates human controlled video recording. The system has no physical camera motion and the virtual camera parameters are readily available for video indexing.  相似文献   

7.
In this paper, we present a theory for combining the effects of motion, illumination, 3D structure, albedo, and camera parameters in a sequence of images obtained by a perspective camera. We show that the set of all Lambertian reflectance functions of a moving object, at any position, illuminated by arbitrarily distant light sources, lies "close" to a bilinear subspace consisting of nine illumination variables and six motion variables. This result implies that, given an arbitrary video sequence, it is possible to recover the 3D structure, motion, and illumination conditions simultaneously using the bilinear subspace formulation. The derivation builds upon existing work on linear subspace representations of reflectance by generalizing it to moving objects. Lighting can change slowly or suddenly, locally or globally, and can originate from a combination of point and extended sources. We experimentally compare the results of our theory with ground truth data and also provide results on real data by using video sequences of a 3D face and the entire human body with various combinations of motion and illumination directions. We also show results of our theory in estimating 3D motion and illumination model parameters from a video sequence  相似文献   

8.
A calibrated camera is essential for computer vision systems: the prime reason being that such a camera acts as an angle measuring device. Once the camera is calibrated, applications like three-dimensional reconstruction or metrology or other applications requiring real world information from the video sequences can be envisioned. Motivated by this, we address the problem of calibrating multiple cameras, with an overlapping field of view observing pedestrians in a scene walking on an uneven terrain. This problem of calibration on an uneven terrain has so far not been addressed in the vision community. We automatically estimate vertical and horizontal vanishing points by observing pedestrians in each camera and use the corresponding vanishing points to estimate the infinite homography existing between the different cameras. This homography provides constraints on intrinsic (or interior) camera parameters while also enabling us to estimate the extrinsic (or exterior) camera parameters. We test the proposed method on real as well as synthetic data, in addition to motion capture dataset and compare our results with the state of the art.  相似文献   

9.
In this paper, we propose a new method for estimating camera motion parameters based on optical flow models. Camera motion parameters are generated using linear combinations of optical flow models. The proposed method first creates these optical flow models, and then linear decompositions are performed on the input optical flows calculated from adjacent images in the video sequence, which are used to estimate the coefficients of each optical flow model. These coefficients are then applied to the parameters used to create each optical flow model, and the camera motion parameters implied in the adjacent images can be estimated through a linear composition of the weighted parameters.We demonstrated that the proposed method estimates the camera motion parameters accurately and at a low computational cost as well as robust to noise residing in the video sequence being analyzed.  相似文献   

10.
视频图像序列运动参数估计与动态拼接   总被引:2,自引:0,他引:2  
本文采用多重分层叠代算法来估计全局运动参数,并提出应用于动态拼接的运动分割新方法,实现既有摄像机运动又有物体运动的视频图像序列自动拼接。我们的方法基本步骤如下:首先进行全局运动参数的初始估计,并且在分层叠代过程中进行区域分类,得到初始运动模板。接着空间分割原始图像,先根据图像的空间属性由底向上分层合并图像空间区域,再利用视频图像时间属性进一步向上合并,得到图像空间分割结果。然后结合初始运动模板和图像空间分割结果,采用区域分类新方法重新对图像空间分割结果的每个区域进行分类。然后根据分类结果逐步精确求解全局运动参数。最后进行图像合成,得到全景拼接图像。我们的方法利用了多重分层叠代的优点,并且充分考虑到视频图像空间和时间上的属性,实现了运动物体和覆盖背景的精确分割,避免了遮挡问题对全局运动参数估计精度的影响。而且在图像合成时我们解决了拼接图可能产生模糊或某些区域不连续等问题。实验结果表明我们的方法实现了动态视频图像序列高质量的全景拼接。  相似文献   

11.
The segmentation of objects and people in particular is an important problem in computer vision. In this paper, we focus on automatically segmenting a person from challenging video sequences in which we place no constraint on camera viewpoint, camera motion or the movements of a person in the scene. Our approach uses the most confident predictions from a pose detector as a form of anchor or keyframe stick figure prediction which helps guide the segmentation of other more challenging frames in the video. Since even state of the art pose detectors are unreliable on many frames –especially given that we are interested in segmentations with no camera or motion constraints –only the poses or stick figure predictions for frames with the highest confidence in a localized temporal region anchor further processing. The stick figure predictions within confident keyframes are used to extract color, position and optical flow features. Multiple conditional random fields (CRFs) are used to process blocks of video in batches, using a two dimensional CRF for detailed keyframe segmentation as well as 3D CRFs for propagating segmentations to the entire sequence of frames belonging to batches. Location information derived from the pose is also used to refine the results. Importantly, no hand labeled training data is required by our method. We discuss the use of a continuity method that reuses learnt parameters between batches of frames and show how pose predictions can also be improved by our model. We provide an extensive evaluation of our approach, comparing it with a variety of alternative grab cut based methods and a prior state of the art method. We also release our evaluation data to the community to facilitate further experiments. We find that our approach yields state of the art qualitative and quantitative performance compared to prior work and more heuristic alternative approaches.  相似文献   

12.
In this paper, we propose a novel motion-based video retrieval approach to find desired videos from video databases through trajectory matching. The main component of our approach is to extract representative motion features from the video, which could be broken down to the following three steps. First, we extract the motion vectors from each frame of videos and utilize Harris corner points to compensate the effect of the camera motion. Second, we find interesting motion flows from frames using sliding window mechanism and a clustering algorithm. Third, we merge the generated motion flows and select representative ones to capture the motion features of videos. Furthermore, we design a symbolic based trajectory matching method for effective video retrieval. The experimental results show that our algorithm is capable to effectively extract motion flows with high accuracy and outperforms existing approaches for video retrieval.  相似文献   

13.
In this paper, we introduce a method to estimate the object’s pose from multiple cameras. We focus on direct estimation of the 3D object pose from 2D image sequences. Scale-Invariant Feature Transform (SIFT) is used to extract corresponding feature points from adjacent images in the video sequence. We first demonstrate that centralized pose estimation from the collection of corresponding feature points in the 2D images from all cameras can be obtained as a solution to a generalized Sylvester’s equation. We subsequently derive a distributed solution to pose estimation from multiple cameras and show that it is equivalent to the solution of the centralized pose estimation based on Sylvester’s equation. Specifically, we rely on collaboration among the multiple cameras to provide an iterative refinement of the independent solution to pose estimation obtained for each camera based on Sylvester’s equation. The proposed approach to pose estimation from multiple cameras relies on all of the information available from all cameras to obtain an estimate at each camera even when the image features are not visible to some of the cameras. The resulting pose estimation technique is therefore robust to occlusion and sensor errors from specific camera views. Moreover, the proposed approach does not require matching feature points among images from different camera views nor does it demand reconstruction of 3D points. Furthermore, the computational complexity of the proposed solution grows linearly with the number of cameras. Finally, computer simulation experiments demonstrate the accuracy and speed of our approach to pose estimation from multiple cameras.  相似文献   

14.
Capturing exposure sequences to compute high dynamic range (HDR) images causes motion blur in cases of camera movement. This also applies to light‐field cameras: frames rendered from multiple blurred HDR light‐field perspectives are also blurred. While the recording times of exposure sequences cannot be reduced for a single‐sensor camera, we demonstrate how this can be achieved for a camera array. Thus, we decrease capturing time and reduce motion blur for HDR light‐field video recording. Applying a spatio‐temporal exposure pattern while capturing frames with a camera array reduces the overall recording time and enables the estimation of camera movement within one light‐field video frame. By estimating depth maps and local point spread functions (PSFs) from multiple perspectives with the same exposure, regional motion deblurring can be supported. Missing exposures at various perspectives are then interpolated.  相似文献   

15.
目的 传统的视频稳像方法为了获得理想的稳像效果,一般耗费较多的计算时间,且存在较长的延时。针对此问题,提出一种即时全变差优化的低延时视频稳像方法。方法 首先利用特征点检测和匹配计算帧间单应变换,得到抖动视频的运动路径;然后通过即时全变差优化方法对抖动路径进行平滑优化,获得稳定的运动路径;最后通过运动补偿,生成稳定的视频。结果 对公共视频数据集中的抖动视频进行稳像效果测试,并与当前稳像效果较好的几种稳像算法和商业软件进行效果和时间对比。在时间方面,统计了不同方法的每帧平均消耗时间和处理延迟帧数,不同于后期处理方法需要得到大部分视频帧才能够进行计算,本文算法能够在只有一帧延时的情况下获得最终的稳像结果,相比于MeshFlow方法有15%左右的速度提升;在稳像效果方面,计算了不同方法稳像后的视频扭曲率和裁剪率,并邀请非专业用户进行了稳定程度的主观判断,本文算法的实验结果并不输于目前被公认较好的3种后期稳像方法,优于Kalman滤波方法。结论 本文所提稳像方法能够兼顾速度和有效性,相对于传统方法,更适合低延时要求的应用场景。  相似文献   

16.
This paper presents a novel approach for image-based visual servoing of a robot manipulator with an eye-in-hand camera when the camera parameters are not calibrated and the 3-D coordinates of the features are not known. Both point and line features are considered. This paper extends the concept of depth-independent interaction (or image Jacobian) matrix, developed in earlier work for visual servoing using point features and fixed cameras, to the problem using eye-in-hand cameras and point and line features. By using the depth-independent interaction matrix, it is possible to linearly parameterize, by the unknown camera parameters and the unknown coordinates of the features, the closed-loop dynamics of the system. A new algorithm is developed to estimate unknown parameters online by combining the Slotine–Li method with the idea of structure from motion in computer vision. By minimizing the errors between the real and estimated projections of the feature on multiple images captured during motion of the robot, this new adaptive algorithm can guarantee the convergence of the estimated parameters to the real values up to a scale. On the basis of the nonlinear robot dynamics, we proved asymptotic convergence of the image errors by the Lyapunov theory. Experiments have been conducted to demonstrate the performance of the proposed controller.   相似文献   

17.
An approach based on fuzzy logic for matching both articulated and non-articulated objects across multiple non-overlapping field of views (FoVs) from multiple cameras is proposed. We call it fuzzy logic matching algorithm (FLMA). The approach uses the information of object motion, shape and camera topology for matching objects across camera views. The motion and shape information of targets are obtained by tracking them using a combination of ConDensation and CAMShift tracking algorithms. The information of camera topology is obtained and used by calculating the projective transformation of each view with the common ground plane. The algorithm is suitable for tracking non-rigid objects with both linear and non-linear motion. We show videos of tracking objects across multiple cameras based on FLMA. From our experiments, the system is able to correctly match the targets across views with a high accuracy.  相似文献   

18.
In this work we propose methods that exploit context sensor data modalities for the task of detecting interesting events and extracting high-level contextual information about the recording activity in user generated videos. Indeed, most camera-enabled electronic devices contain various auxiliary sensors such as accelerometers, compasses, GPS receivers, etc. Data captured by these sensors during the media acquisition have already been used to limit camera degradations such as shake and also to provide some basic tagging information such as the location. However, exploiting the sensor-recordings modality for subsequent higher-level information extraction such as interesting events has been a subject of rather limited research, further constrained to specialized acquisition setups. In this work, we show how these sensor modalities allow inferring information (camera movements, content degradations) about each individual video recording. In addition, we consider a multi-camera scenario, where multiple user generated recordings of a common scene (e.g., music concerts) are available. For this kind of scenarios we jointly analyze these multiple video recordings and their associated sensor modalities in order to extract higher-level semantics of the recorded media: based on the orientation of cameras we identify the region of interest of the recorded scene, by exploiting correlation in the motion of different cameras we detect generic interesting events and estimate their relative position. Furthermore, by analyzing also the audio content captured by multiple users we detect more specific interesting events. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real live music performances.  相似文献   

19.
With the recent popularization of mobile video cameras including camera phones, a new technology, mobile video surveillance, which uses mobile video cameras for video surveillance has been emerging. Such videos, however, may infringe upon the privacy of others by disclosing privacy sensitive information (PSI), i.e., their appearances. To prevent videos from infringing on the right to privacy, new techniques are required that automatically obscure PSI regions. The problem is how to determine the PSI regions to be obscured while maintaining enough video content to present the camera persons’ capture-intentions, i.e., what they want to record in their videos to achieve their surveillance tasks. To this end, we introduce a new concept called intended human objects that are defined as human objects essential for capture-intentions, and develop a new method called intended human object detection that automatically detects the intended human objects in videos taken by different camera persons. Through the process of intended human object detection, we develop a system for automatically obscuring PSI regions. We experimentally show the performance of intended human object detection and the contributions of the features used. Our user study shows the potential applicability of our proposed system.  相似文献   

20.
We present a distributed system for wide-area multi-object tracking across disjoint camera views. Every camera in the system performs multi-object tracking, and keeps its own trackers and trajectories. The data from multiple features are exchanged between adjacent cameras for object matching. We employ a probabilistic Petri Net-based approach to account for the uncertainties of the vision algorithms (such as unreliable background subtraction, and tracking failure) and to incorporate the available domain knowledge. We combine appearance features of objects as well as the travel-time evidence for target matching and consistent labeling across disjoint camera views. 3D color histogram, histogram of oriented gradients, local binary patterns, object size and aspect ratio are used as the appearance features. The distribution of the travel time is modeled by a Gaussian mixture model. Multiple features are combined by the weights, which are assigned based on the reliability of the features. By incorporating the domain knowledge about the camera configurations and the information about the received packets from other cameras, certain transitions are fired in the probabilistic Petri net. The system is trained to learn different parameters of the matching process, and updated online. We first present wide-area tracking of vehicles, where we used three non-overlapping cameras. The first and the third cameras are approximately 150 m apart from each other with two intersections in the blind region. We also present an example of applying our method to a people-tracking scenario. The results show the success of the proposed method. A comparison between our work and related work is also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号