首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 755 毫秒
1.
Vision-based road-traffic monitoring sensor   总被引:9,自引:0,他引:9  
Current techniques for road-traffic monitoring rely on sensors that have limited capabilities and are often both costly and disruptive to install. The use of video cameras (many of which are already installed to survey road networks), coupled with computer vision techniques, offers an attractive alternative to current sensors. Vision-based sensors have the potential to measure a greater variety of traffic parameters (e.g. entry/exit statistic, journey times and incident detection) while installation and maintenance may be performed without disruption to traffic flow. Work on a model based approach for locating vehicles in images of complex road scenes is presented. The location of the vehicle in the image is transformed to the vehicle's position and orientation in the real world while the deformable vehicle model allows the vehicle's principal dimensions to be measured. This data may be passed to a high level tracking algorithm to extract traffic parameters such as vehicle speed, vehicle count, and junction entry/exit statistics. The principal dimensions may be used to classify the vehicle within categories such as car, van or bus. The system could also be used as a boot-strap process for faster, but perhaps less robust, tracking algorithms. The key features of the system are described and results from testing it on images from real traffic scenes are presented  相似文献   

2.
Stereoscopic and high dynamic range (HDR) imaging are two methods that enhance video content by respectively improving depth perception and light representation. A large body of research has looked into each of these technologies independently, but very little work has attempted to combine them due to limitations in capture and display; HDR video capture (for a wide range of exposure values over 20 f-stops) is not yet commercially available and few prototype HDR video cameras exist. In this work we propose techniques which facilitate stereoscopic high dynamic range (SHDR) video capture by using an HDR and LDR camera pair. Three methods are proposed: one based on generating the missing HDR frame by warping the existing one using a disparity map; increasing the range of LDR video using a novel expansion operator; and a hybrid of the two where expansion is used for pixels within the LDR range and warping for the rest. Generated videos were compared to the ground truth SHDR video captured using two HDR video cameras. Results show little overall error and demonstrate that the hybrid method produces the least error of the presented methods.  相似文献   

3.
The need for a better integration of the new generation of computer-assisted-surgical systems has been recently emphasized. One necessity to achieve this objective is to retrieve data from the operating room (OR) with different sensors, then to derive models from these data. Recently, the use of videos from cameras in the OR has demonstrated its efficiency. In this paper, we propose a framework to assist in the development of systems for the automatic recognition of high-level surgical tasks using microscope videos analysis. We validated its use on cataract procedures. The idea is to combine state-of-the-art computer vision techniques with time series analysis. The first step of the framework consisted in the definition of several visual cues for extracting semantic information, therefore, characterizing each frame of the video. Five different pieces of image-based classifiers were, therefore, implemented. A step of pupil segmentation was also applied for dedicated visual cue detection. Time series classification algorithms were then applied to model time-varying data. Dynamic time warping and hidden Markov models were tested. This association combined the advantages of all methods for better understanding of the problem. The framework was finally validated through various studies. Six binary visual cues were chosen along with 12 phases to detect, obtaining accuracies of 94%.  相似文献   

4.
Aerial video surveillance and exploitation   总被引:8,自引:0,他引:8  
There is growing interest in performing aerial surveillance using video cameras. Compared to traditional framing cameras, video cameras provide the capability to observe ongoing activity within a scene and to automatically control the camera to track the activity. However, the high data rates and relatively small field of view of video cameras present new technical challenges that must be overcome before such cameras can be widely used. In this paper, we present a framework and details of the key components for real-time, automatic exploitation of aerial video for surveillance applications. The framework involves separating an aerial video into the natural components corresponding to the scene. Three major components of the scene are the static background geometry, moving objects, and appearance of the static and dynamic components of the scene. In order to delineate videos into these scene components, we have developed real time, image-processing techniques for 2-D/3-D frame-to-frame alignment, change detection, camera control, and tracking of independently moving objects in cluttered scenes. The geo-location of video and tracked objects is estimated by registration of the video to controlled reference imagery, elevation maps, and site models. Finally static, dynamic and reprojected mosaics may be constructed for compression, enhanced visualization, and mapping applications  相似文献   

5.
An Introduction to Distributed Smart Cameras   总被引:2,自引:0,他引:2  
Distributed smart cameras (DSCs) are real-time distributed embedded systems that perform computer vision using multiple cameras. This new approach has emerged thanks to a confluence of simultaneous advances in four key disciplines: computer vision, image sensors, embedded computing, and sensor networks. Processing images in a network of distributed smart cameras introduces several complications. However, we believe that the problems DSCs solve are much more important than the challenges of designing and building a distributed video system. We argue that distributed smart cameras represent key components for future embedded computer vision systems and that smart cameras will become an enabling technology for many new applications. We summarize smart camera technology and applications, discuss current trends, and identify important research challenges.   相似文献   

6.
To overcome the dynamic range limitations in images taken with regular consumer cameras, several methods exist for creating high dynamic range (HDR) content. Current low-budget solutions apply a temporal exposure bracketing which is not applicable for dynamic scenes or HDR video. In this article, a framework is presented that utilizes two cameras to realize a spatial exposure bracketing, for which the different exposures are distributed among the cameras. Such a setup allows for HDR images of dynamic scenes and HDR video due to its frame by frame operating principle, but faces challenges in the stereo matching and HDR generation steps. Therefore, the modules in this framework are selected to alleviate these challenges and to properly handle under- and oversaturated regions. In comparison to existing work, the camera response calculation is shifted to an offline process and a masking with a saturation map before the actual HDR generation is proposed. The first aspect enables the use of more complex camera setups with different sensors and provides robust camera responses. The second one makes sure that only necessary pixel values are used from the additional camera view, and thus, reduces errors in the final HDR image. The resulting HDR images are compared with the quality metric HDR-VDP-2 and numerical results are given for the first time. For the Middlebury test images, an average gain of 52 points on a 0-100 mean opinion score is achieved in comparison to temporal exposure bracketing with camera motion. Finally, HDR video results are provided.  相似文献   

7.
Pressure sensors play an integral role in a wide range of applications, such as soft robotics and health monitoring. In order to meet this demand, many groups microengineer the active layer—the layer that deforms under pressure and dictates changes in the output signal—of capacitive, resistive/piezoresistive, piezoelectric, and triboelectric pressure sensors in order to improve sensor performance. Geometric microengineering of the active layer has been shown to improve performance parameters such as sensitivity, dynamic range, limit of detection, and response and relaxation times. There are a wide range of implemented designs, including microdomes, micropyramids, lines or microridges, papillae, microspheres, micropores, and microcylinders, each offering different advantages for a particular application. It is important to compare the techniques by which the microengineered active layers are designed and fabricated as they may provide additional insights on compatibility and sensing range limits. To evaluate each fabrication method, it is critical to take into account the active layer uniformity, ease of fabrication, shape and size versatility and tunability, and scalability of both the device and the fabrication process. By better understanding how microengineering techniques and design compares, pressure sensors can be targetedly designed and implemented.  相似文献   

8.
9.
根据通用主动视觉系统的设计要求,提出了基于DSP的嵌入式主动视觉系统的设计方案,并完成系统的软硬件设计。最终实现的系统具有俯仰和摇摆2个自由度、支持双CCD视觉传感器和双倾角传感器。该系统的硬件分别采用TMS320C6711和TMS320F2812实现了视频处理、运动控制、传感器信息采集及系统与PC机之间的通讯。实验结果表明该系统具有计算和接口资源丰富、操作灵活等特点,达到设计要求。  相似文献   

10.
A network of co-operative cameras for the visual surveillance of parking lots is presented. Such a network employs multiple subnets able to manage static and active cameras in a hierarchical framework. The system is able to track multiple targets simultaneously and in real-time throughout the controlled areas. The positions of detected objects, computed from different sensors, are fused considering a dynamic reliability factor for each sensor reading. Close-up recordings of suspicious events are obtained by tasking the active camera systems (ACSs). The co-operation is performed through a multicast communication system studied to transmit useful data both intra and inter networks. In particular, information about the position of the object to track, sent by a static camera system (SCS), is used by an ACS to operate an initial repositioning. The ACS compensates background changes owing to the camera motion, detects mobile objects in the scene and autonomously tracks the object of interest. Tracking results are presented in the context of a video surveillance application for a parking lot.  相似文献   

11.
Image-based rendering and 3D modeling: A complete framework   总被引:1,自引:0,他引:1  
Multi-viewpoint synthesis of video data is a key technology for the integration of video and 3D graphics, as necessary for telepresence and augmented-reality applications. This paper describes a number of important techniques which can be employed to accomplish that goal. The techniques presented are based on the analysis of 2D images acquired by two or more cameras. To determine depth information of single objects present in the scene, it is necessary to perform segmentation and disparity estimation. It is shown, how these analysis tools can benefit from each other. For viewpoint synthesis, techniques with different levels of tradeoff between complexity and degrees of freedom are presented. The first approach is disparity-controlled view interpolation, which is capable of generating intermediate views along the interocular axis between two adjacent cameras. The second is the recently introduced incomplete 3D technique, which in a first step extracts the texture of the visible surface of a video object acquired with multiple cameras, and then performs disparity-compensated projection from the surface onto a view plane. In the third and most complex approach, a 3D model of the object is generated, which can be represented by a 3D wire grid. For synthesis, this model can be rotated to arbitrary orientations, and original texture is mapped onto the surface to obtain an arbitrary view of the processed object. The result of this rendering procedure is a virtual image with very natural appearance.  相似文献   

12.
We present a two-dimensional (2-D) mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh for each frame and a final mosaic image, for video objects with mildly deformable motion in the presence of self and/or object-to-object (external) occlusion. Unlike classical mosaic representations where successive frames are registered using global motion models, we map the uncovered regions in the successive frames onto the mosaic reference frame using local affine models, i.e., those of the neighboring mesh patches. The proposed method to compute this mosaic representation is tightly coupled with an occlusion adaptive 2-D mesh tracking procedure, which consist of propagating the object mesh frame to frame, and updating of both object and mosaic meshes to optimize texture mapping from the mosaic to each instance of the object. The proposed representation has been applied to video object rendering and editing, including self transfiguration, synthetic transfiguration, and 2-D augmented reality in the presence of self and/or external occlusion. We also provide an algorithm to determine the minimum number of still views needed to reconstruct a replacement mosaic which is needed for synthetic transfiguration. Experimental results are provided to demonstrate both the 2-D mesh-based mosaic synthesis and two different video object editing applications on real video sequences.  相似文献   

13.
Digital Video Transcoding   总被引:14,自引:0,他引:14  
Video transcoding, due to its high practical values for a wide range of networked video applications, has become an active research topic. We outline the technical issues and research results related to video transcoding. We also discuss techniques for reducing the complexity, and techniques for improving the video quality, by exploiting the information extracted from the input video bit stream.  相似文献   

14.
The multi-modal multi-sensor PROMETHEUS database was created in support of research and development activities [PROMETHEUS (FP7-ICT-214901): http://www.prometheus-FP7.eu] aiming at the creation of a framework for monitoring and interpretation of human behaviors in unrestricted indoor and outdoor environments. The distinctiveness of the PROMETHEUS database comes from the unique sensor sets, used in the various recording scenarios, but also from the database design, which covers a range of real-world applications, correlated to smart-home automation and indoors/outdoors surveillance of public areas. Numerous single-person and multi-person scenarios, but also scenarios with interactions between groups of people, motivated by these applications were implemented with the help of skilled actors and supernumerary personnel. In these scenarios, the actors and personnel were instructed to implement a range of typical and atypical behaviors, and simulations of emergency and crisis situations. In summary, the database contains more than 4 h of synchronized recordings from heterogeneous sensors (an infrared motion detection sensor, thermal imaging cameras, overview/surveillance video cameras, close-view video cameras, a 3D camera, a stereoscopic camera, a general-purpose camcoder, microphone arrays, and motion capture equipment) collected in common setups, simulating smart-home environment, airport, and ATM security environment. Selected scenes of the database were annotated for the needs of human detection and tracking. The entire audio part of the database was annotated for the needs of sound event detection, sound source enumeration, emotion recognition, etc.  相似文献   

15.
In recent years, the interest in multiview video systems has increased. In these systems, a typical predictive coding approach exploits the inter-view correlation at a joint encoder, requiring the various cameras to communicate among them. However, many applications ask for simple sensing systems preventing the various cameras to communicate among them, and thus the adoption of a predictive coding approach. Wyner–Ziv (WZ) video coding is a promising solution for those applications since it is the WZ decoder task to (fully or partly) exploit the video redundancy. The rate-distortion (RD) performance of WZ video coding strongly depends on the quality of the so-called side information (SI), which is a decoder estimate of the original frame to code. In multiview WZ (MV-WZ) video coding, the target is to exploit in the best way the available correlation not only in time, as for the monoview case, but also between views. Thus, the multiview SI results from the fusion of a temporally created SI and an inter-view created SI. In this context, the main objective of this paper is to propose a classification taxonomy to organize the many inter-view SI creation and SI fusion techniques available in the literature and to review the most relevant techniques in each class. The inter-view SI creation techniques are classified into two classes, notably matching and scene geometry based, while the SI fusion techniques are classified into three classes, notably time, view and time-view driven. After reviewing the most relevant inter-view SI creation and SI fusion techniques guided by the proposed classification taxonomy, conclusions are drawn about the current status quo, thus allowing to better identify the next research challenges in the multiview WZ video coding paradigm.  相似文献   

16.
The importance of video surveillance techniques has considerably increased since the latest terrorist incidents. Safety and security have become critical in many public areas, and there is a specific need to enable human operators to remotely monitor the activity across large environments. For these reasons, multicamera systems are needed to provide surveillance coverage across a wide area, ensuring object visibility over a large range of depths. In the development of advanced visual-based surveillance systems, a number of key issues critical to its successful operation must be addressed. This article describes the low-level image and video processing techniques needed to implement a modern surveillance system. In particular, the change detection methods for both fixed and mobile cameras (pan and tilt) are introduced and the registration methods for multicamera systems with overlapping and nonoverlapping views are discussed.  相似文献   

17.
Generating face models of humans from video sequences is an important problem in many multimedia applications ranging from teleconferencing to virtual reality. Most practical approaches try to fit a generic face model in the two-dimensional image, and adjust the model parameters to arrive at the final answer. These approaches require the identification of specific landmarks on the face, and this identification routine may or may not be an automated process. In this paper, we present a method for deriving the three-dimensional (3-D) face model from a monocular image sequence, using a few standard results from the affine camera geometry literature in computer vision, and spline-fitting techniques adopted from the nonparametric regression literature in statistics. No prior knowledge of the camera calibration parameters and the shape of the face is required by the system, and the entire process requires no user intervention. The system has been successfully demonstrated to extract the 3-D face structure of humans in several image sequences  相似文献   

18.
Interactive 3-D Video Representation and Coding Technologies   总被引:5,自引:0,他引:5  
Interactivity in the sense of being able to explore and navigate audio-visual scenes by freely choosing viewpoint and viewing direction, is an important key feature of new and emerging audio-visual media. This paper gives an overview of suitable technology for such applications, with a focus on international standards, which are beneficial for consumers, service providers, and manufacturers. We first give a general classification and overview of interactive scene representation formats as commonly used in computer graphics literature. Then, we describe popular standard formats for interactive three-dimensional (3-D) scene representation and creation of virtual environments, the virtual reality modeling language (VRML), and the MPEG-4 BInary Format for Scenes (BIFS) with some examples. Recent extensions to MPEG-4 BIFS, the Animation Framework eXtension (AFX), providing advanced computer graphics tools, are explained and illustrated. New technologies mainly targeted at reconstruction, modeling, and representation of dynamic real world scenes are further studied. The user shall be able to navigate photorealistic scenes within certain restrictions, which can be roughly defined as 3-D video. Omnidirectional video is an extension of the planar two-dimensional (2-D) image plane to a spherical or cylindrical image plane. Any 2-D view in any direction can be rendered from this overall recording to give the user the impression of looking around. In interactive stereo two views, one for each eye, are synthesized to provide the user with an adequate depth cue of the observed scene. Head motion parallax viewing can be supported in a certain operating range if sufficient depth or disparity data are delivered with the video data. In free viewpoint video, a dynamic scene is captured by a number of cameras. The input data are transformed into a special data representation that enables interactive navigation through the dynamic scene environment.  相似文献   

19.
The Michelangelo Project at the University of Glasgow has developed an experimental three-dimensional television studio. This uses 24 video cameras and parallel computers to capture moving three-dimensional models of human actors. This allows the capture in real time of the appearance and three-dimensional positions of a human actor. It does this using stereo imaging techniques that have been under development at the University of Glasgow for several years. The development of the studio has thrown up many technical problems which are still to be fully resolved, nonetheless it is already producing convincing animated sequences.  相似文献   

20.
Traditionally, video has been either part of the environment, such as video surveillance cameras mounted on or inside a building or video conferencing systems based on fixed cameras within a special room, or the domain of large organizations such as broadcast television stations. However, a new field of research called “personal imaging” has emerged. Personal imaging systems are based on wireless video technology, and are typically characterized by video from a first-person perspective by way of a head-mounted camera and display together with an image processing computer worn on the body of the user. The possibilities afforded by personal imaging include a personal safely device for crime reduction, a new kind of video conferencing system for computer-supported collaboration, as well as a new tool for photojournalism. This article describes work in personal imaging as it has evolved over the past 20 years, and then sets forth a future vision for wireless video in a head-mounted context. Most notably, the notion of computer-supported collaborative wireless video is presented  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号