首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
Text recognition captured in multiple frames by a hand-held video camera is a challenging task because it is possible to capture and recognize a longer line of text while improving the quality of the text image by utilizing the redundancy of the overlapping areas between the frames. For this task, the video frames should be registered, i.e., mosaiced, after compensating for their distortions due to camera shakes. In this paper, a mosaicing-by-recognition technique is proposed where the problems of video mosaicing and text recognition are formulated as a unified optimization problem and solved by a dynamic programming-based optimization algorithm simultaneously and collaboratively. Experimental results indicate that, even if the frames undergo various distortions such as rotation, scaling, translation, and nonlinear speed fluctuation of camera movement, the proposed technique provides fine mosaic image by accurate distortion estimation (around 90% of perfect estimation) and character recognition accuracy (over 95%).  相似文献   

2.
基于OpenCV图像处理库函数,在VS2013平台下开发了一种红外与可见光图像融合系统,该方法克服了红外图像特征点不明显的缺点,通过特殊的摄像机标定技术,完成了红外与可见光摄像机的标定,进而实现了红外与可见光图像的匹配融合.实验证明,该系统能达到较好的融合效果,并能保证融合的实时性.  相似文献   

3.
基于OpenCV 的人脸识别应用   总被引:1,自引:1,他引:0  
在Linux平台下开发一人脸识别系统,通过QT来开发用户界面,调用OpenCV图像处理库对相机进行采集和处理采集图像,从而实现了人脸检测、身份识别、简单表情识别的功能。  相似文献   

4.
Multi-view image sensing is currently gaining momentum, fostered by new applications such as autonomous vehicles and self-propelled robots. In this paper, we prototype and evaluate a multi-view smart vision system for object recognition. The system exploits an optimized Multi-View Convolutional Neural Network (MVCNN) in which the processing is distributed among several sensors (heads) and a camera body. The main challenge for designing such a system comes from the computationally expensive workload of real-time MVCNNs which is difficult to support with embedded processing and high frame rates. This paper focuses on the decisions to be taken for distributing an MVCNN on the camera heads, each camera head embedding a Field-Programmable Gate Array (FPGA) for processing images on the stream. In particular, we show that the first layer of the AlexNet network can be processed at the nearest of the sensors, by performing a Direct Hardware Mapping (DHM) using a dataflow model of computation. The feature maps produced by the first layers are merged and processed by a camera central processing node that executes the remaining layers. The proposed system exploits state-of-the-art deep learning optimization methods, such as parameter removing and data quantization. We demonstrate that accuracy drops caused by these optimizations can be compensated by the multi-view nature of the captured information. Experimental results conducted with the AlexNet CNN show that the proposed partitioning and resulting optimizations can fit the first layer of the multi-view network in low-end FPGAs. Among all the tested configurations, we propose 2 setups with an equivalent accuracy compared to the original network on the ModelNet40 dataset. The first one is composed of 4 cameras based on a Cyclone III E120 FPGA to embed the least expensive version in terms of logic resources while the second version requires 2 cameras based on a Cyclone 10 GX220 FPGA. This distributed computing with workload reduction is demonstrated to be a practical solution when building a real-time multi-view smart camera processing several frames per second.  相似文献   

5.
针对目前嵌入式开发平台上视频显示不流畅的现象,文中提出通过USB 摄像头采集人脸图像的方案。该方 案是基于三星S5PV210 处理器和嵌入式Linux 操作系统,并在其移植了Qt 和OpenCv。文章介绍了以USB 摄像头为视频图像 采集设备,通过OpenCv 进行视频图像采集的过程,以及通过Qt 实现人脸图像实时显示在LCD 屏上的过程,经测试图像能够 在LCD 屏上流畅显示,为实时在嵌入式平台上流畅显示视频图像提供了方案。  相似文献   

6.
设计一个用于机器人的人脸识别系统。该系统先通过摄像头捕捉人脸图像,在计算机上完成人脸识别,最后对机器人发送指令。识别过程是提取摄像头捕捉的人脸特征和数据库中人脸特征的128维向量进行欧氏距离计算,通过欧氏距离的计算结果进行判定。实现该系统所用的平台和硬件标准不高,且识别效果良好、构造简单方便,因此可应用范围广。  相似文献   

7.
由于情感感知移动应用的智能性和用户易接受性,使情感感知移动应用不断增加。由于移动设备的处理能力有限,因此移动设备上的情感识别方法的算法实现应该实时和高效。提出了一个移动应用上的高精度和低计算复杂度的情感识别方法。在该方法中,人脸视频由智能手机的摄像头捕获,从视频中提取一些有代表性的帧,并且用一个人脸检测模块从这些帧中提取人脸区域。脸部区域被Bandlet变换处理,结果子波被划分为互不重叠的子块。计算每个块的局部二进制值模式的直方图,将所有块的直方图关联起来作为描述面部图像的特征集。用Kruskal-Wallis检验从面部图像特征集中选择最具优势的特征,将这些特征送入高斯混合模型分类器中进行情感识别。实验结果表明,该方法在一个合理的时间内实现了高识别精度。  相似文献   

8.
Image geo-tagging has drawn a great deal of attention in recent years. The geographic information associated with images can be used to promote potential applications such as location recognition or virtual navigation. In this paper, we propose a novel approach for accurate mobile image geo-tagging in urban areas. The approach is able to provide a comprehensive set of geo-context information based on the current image, including the real location of the camera and the viewing angle, as well as the location of the captured scene. Moreover, the parsed building facades and their geometric structures can also be estimated. First, for the image to be geo-tagged, we perform partial duplicate image retrieval to filter crowd-sourced images capturing the same scene. We then employ the structure-from-motion technique to reconstruct a sparse 3D point cloud of the scene. Meanwhile, the geometric structure of the query image is analyzed to extract building facades. Finally, by combining the reconstructed 3D scene model and the extracted structure information, we can register the camera location and viewing direction to a real-world map. The captured building location and facade orientation are also aligned. The effectiveness of the proposed system is demonstrated by experiment results.  相似文献   

9.
李知菲  陈源 《计算机应用》2014,34(8):2231-2234
针对Kinect镜头采集的深度图像一般有噪声和黑洞现象,直接应用于人体动作跟踪和识别等系统中效果差的问题,提出一种基于联合双边滤波器的深度图像滤波算法。算法利用联合双边滤波原理,将Kinect镜头同一时刻采集的深度图像和彩色图像作为输入,首先,用高斯核函数计算出深度图像的空间距离权值和RGB彩色图像的灰度权值;然后,将这两个权值相乘得到联合滤波权值,并利用快速高斯变换替换高斯核函数,设计出联合双边滤波器;最后,用此滤波器的滤波结果与噪声图像进行卷积运算实现Kinect深度图像滤波。实验结果表明,所提算法应用在人体动作识别和跟踪系统后,可显著提高在背景复杂场景中的抗噪能力,识别正确率提高17.3%,同时所提算法的平均耗时为371ms,远低于同类算法。所提算法保持了联合双边滤波平滑保边的优点,由于引入彩色图像作为引导图像,去噪的同时也能对黑洞进行修补,因此该算法在Kinect深度图像上的去噪和修复效果优于经典的双边滤波算法和联合双边滤波算法,且实时性强。  相似文献   

10.
针对手持移动设备拍摄的抖动视频问题,提出了一种基于特征跟踪和网格路径运动的视频稳像算法。通过SIFT算法提取视频帧的特征点,采用KLT算法追踪特征点,利用RANSAC算法估计相邻帧间的仿射变换矩阵,将视频帧划分为均匀的网格,计算视频的运动轨迹,再通过极小化能量函数优化平滑多条网格路径。最后由原相机路径与平滑相机路径的关系,计算相邻帧间的补偿矩阵,利用补偿矩阵对每一帧进行几何变换,从而得到稳定的视频。实验表明,该算法在手持移动设备拍摄的抖动视频中有较好的结果,其中稳像后视频的PSNR平均值相比原抖动视频PSNR值大约提升了11.2 dB。与捆绑相机路径方法相比约提升了2.3 dB。图像间的结构相似性SSIM平均值大约提升了59%,与捆绑相机路径方法相比约提升了3.3%。  相似文献   

11.
This paper deals with the problem of locating a rigid object and estimating its motion in three dimensions. This involves determining the position and orientation of the object at each instant when an image is captured by a camera, and recovering the motion of the object between consecutive frames.In the implementation scheme used here, a sequence of camera images, digitized at the sample instants, is used as the initial input data. Measurements are made of the locations of certain features (e.g., maximum curvature points of an image contour, corners, edges, etc.) on the 2-D images. To measure the feature locations a matching algorithm is used, which produces correspondences between the features in the image and the object.Using the measured feature locations on the image, an algorithm is developed to solve the location and motion problem. The algorithm is an extended Kalman filter modeled for this application.Department of Electrical Engineering and Alberta Center for Machine Intelligence and Robotics, University of Alberta  相似文献   

12.
This paper describes a novel real‐time multi‐spectral imaging capability for surveillance applications. The capability combines a new high‐performance multi‐spectral camera system with a distributed algorithm that computes a spectral‐screening principal component transform (PCT). The camera system uses a novel filter wheel design together with a high‐bandwidth CCD camera to allow image cubes to be delivered at 110 frames s with a spectral coverage between 400 and 1000 nm. The filters used in a particular application are selected to highlight a particular object based on its spectral signature. The distributed algorithm allows image streams from a dispersed collection of cameras to be disseminated, viewed, and interpreted by a distributed group of analysts in real‐time. It operates on networks of commercial‐off‐the‐shelf multiprocessors connected with high‐performance (e.g. gigabit) networking, taking advantage of multi‐threading where appropriate. The algorithm uses a concurrent formulation of the PCT to de‐correlate and compress a multi‐spectral image cube. Spectral screening is used to give features that occur infrequently (e.g. mechanized vehicles in a forest) equal importance to those that occur frequently (e.g. trees in the forest). A human‐centered color‐mapping scheme is used to maximize the impact of spectral contrast on the human visual system. To demonstrate the efficacy of the multi‐spectral system, plant‐life scenes with both real and artificial foliage are used. These scenes demonstrate the systems ability to distinguish elements of a scene that cannot be distinguished with the naked eye. The capability is evaluated in terms of visual performance, scalability, and real‐time throughput. Our previous work on predictive analytical modeling is extended to answer practical design questions such as ‘For a specified cost, what system can be constructed and what performance will it attain?’ Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

13.
宋雅丽  唐晓晟 《计算机应用》2007,27(6):1542-1544
基于开放服务网关规范(OSGi)的家庭网关技术可以向用户提供各种具体的服务,如智能家居管理、远程监控等,通过Web Service技术则可以实现让任何应用系统在任何地方动态访问各种应用服务的功能。目前的智能家庭解决方案大都基于家庭网关技术,没有考虑到将Web Service技术引入其中。提出了一种将OSGi家庭网关技术和Web Service技术相结合的智能家庭系统的设计和实现方案,充分利用两种技术的优势。在对系统各功能模块进行详细设计的基础上,给出了在PDA上调用基于OSGi家庭网关的Web服务来监控家中各电器状态的实现结果。  相似文献   

14.
3D surface reconstruction and motion modeling has been integrated in several industrial applications. Using a pan–tilt–zoom (PTZ) camera, we present an efficient method called dynamic 3D reconstruction (D3DR) for recovering the 3D motion and structure of a freely moving target. The proposed method estimates the PTZ measurements to keep the target in the center of the field of view (FoV) of the camera with the same size. Feature extraction and tracking approach are used in the imaging framework to estimate the target's translation, position, and distance. A selection strategy is used to select keyframes that show significant changes in target movement and directly update the recovered 3D information. The proposed D3DR method is designed to work in a real-time environment, not requiring all frames captured to be used to update the recovered 3D motion and structure of the target. Using fewer frames minimizes the time and space complexity required. Experimental results conducted on real-time video streams using different targets to prove the efficiency of the proposed method. The proposed D3DR has been compared to existing offline and online 3D reconstruction methods, showing that it uses less execution time than the offline method and uses an average of 49.6% of the total number of frames captured.  相似文献   

15.
介绍了利用微软图形设备接口技术的升级版本GDI+(图形设备接口扩展)技术,在VC++6.0编程环境下实现用USB摄像头拍摄到的仪器仪表图像文件,向BMP格式文件转换的方法。实验表明,该方法编程效率高,转换方便,为仪表图像自动识别系统后继的处理奠定了基础。  相似文献   

16.
目前,如何开展面向自闭症儿童的情绪智能感知已成为一个重要方向。文章对一套情绪识别数据分析系统(该系统基于OpenCV和Face十+技术而开发)进行研究,实现了按指定帧数间隔截取视频帧,通过机器学习算法进行图片分类,对图像进行情绪识别分类统计等功能,最后以Excel表格形式输出分析数据。  相似文献   

17.
In this paper, we present a novel approach for constructing a large-scale range panoramic background model that provides fast registration of the observed frame and localizes the foreground targets with arbitrary camera direction and scale in a Pan–tilt–zoom (PTZ) camera-based surveillance system. Our method consists of three stages. (1) In the first stage, a panoramic Gaussian mixture model (PGMM) of the PTZ camera’s field of view is generated off-line for later use in on-line foreground detection. (2) In the second stage, a multi-layered correspondence ensemble is generated off-line from frames captured at different scales which is used by the correspondence propagation method to register observed frames online to the PGMM. (3) In the third stage, foreground is detected and the PGMM is updated. The proposed method has the capacity to deal with the PTZ camera’s ability to cover a wide field of view (FOV) and large-scale range. We demonstrate the advantages of the proposed PGMM background subtraction method by incorporating it with a tracking system for surveillance applications.  相似文献   

18.
In this study, design and implementation of a multi sensor based brain computer interface for disabled and/or elderly people is proposed. Developed system consists of a wheelchair, a high-power motor controller card, a Kinect camera, electromyogram (EMG) and electroencephalogram (EEG) sensors and a computer. The Kinect sensor is installed on the system to provide safe navigation for the system. Depth frames, captured by the Kinect’s infra-red (IR) camera, are processed with a custom image processing algorithm in order to detect obstacles around the wheelchair. A Consumer grade EMG device (Thalmic Labs) was used to obtain eight channels of EMG data. Four different hand movements: Fist, release, waving hand left and right are used for EMG based control of the robotic wheelchair. EMG data is first classified using artificial neural network (ANN), support vector machines and random forest schemes. The class is then decided by a rule-based scheme constructed on the individual outputs of the three classifiers. EEG based control is adopted as an alternative controller for the developed robotic wheelchair. A wireless 14-channels EEG sensor (Emotiv Epoch) is used to acquire real time EEG data. Three different cognitive tasks: Relaxing, math problem solving, text reading are defined for the EEG based control of the system. Subjects were asked to accomplish the relative cognitive task in order to control the wheelchair. During experiments, all subjects were able to control the robotic wheelchair by hand movements and track a pre-determined route with a reasonable accuracy. The results for the EEG based control of the robotic wheelchair are promising though vary depending on user experience.  相似文献   

19.
针对应用场景中存在的运动物体会降低视觉同步定位与地图构建(SLAM)系统的定位精度和鲁棒性的问题,提出一种基于语义信息的动态环境下的视觉SLAM算法。首先,将传统视觉SLAM前端与YOLOv4目标检测算法相结合,在对输入图像进行ORB特征提取的同时,对该图像进行语义分割;然后,判断目标类型以获得动态目标在图像中的区域,剔除分布在动态物体上的特征点;最后,使用处理后的特征点与相邻帧进行帧间匹配来求解相机位姿。实验采用TUM数据集进行测试,测试结果表明,所提算法相较于ORB-SLAM2在高动态环境下在位姿估计精度上提升了96.78%,同时该算法的跟踪线程处理一帧的平均耗时为0.065 5 s,相较于其他应用在动态环境下的SLAM算法耗时最短。实验结果表明,所提算法能够实现在动态环境中的实时精确定位与建图。  相似文献   

20.
Conventional iris recognition requires a high-resolution camera equipped with a zoom lens and a near-infrared illuminator to observe iris patterns. Moreover, with a zoom lens, the viewing angle is small, restricting the user’s head movement. To address these limitations, periocular recognition has recently been studied as biometrics. Because the larger surrounding area of the eye is used instead of iris region, the camera having the high-resolution sensor and zoom lens is not necessary for the periocular recognition. In addition, the image of user’s eye can be captured by using the camera having wide viewing angle, which reduces the constraints to the head movement of user’s head during the image acquisition. Previous periocular recognition methods extract features in Cartesian coordinates sensitive to the rotation (roll) of the eye region caused by in-plane rotation of the head, degrading the matching accuracy. Thus, we propose a novel periocular recognition method that is robust to eye rotation (roll) based on polar coordinates. Experimental results with open database of CASIA-Iris-Distance database (CASIA-IrisV4) show that the proposed method outperformed the others.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号