首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.

Underwater object detection is an essential step in image processing and it plays a vital role in several applications such as the repair and maintenance of sub-aquatic structures and marine sciences. Many computer vision-based solutions have been proposed but an optimal solution for underwater object detection and species classification does not exist. This is mainly because of the challenges presented by the underwater environment which mainly include light scattering and light absorption. The advent of deep learning has enabled researchers to solve various problems like protection of the subaquatic ecological environment, emergency rescue, reducing chances of underwater disaster and its prevention, underwater target detection, spooring, and recognition. However, the advantages and shortcomings of these deep learning algorithms are still unclear. Thus, to give a clearer view of the underwater object detection algorithms and their pros and cons, we proffer a state-of-the-art review of different computer vision-based approaches that have been developed as yet. Besides, a comparison of various state-of-the-art schemes is made based on various objective indices and future research directions in the field of underwater object detection have also been proffered.

  相似文献   

2.
目的 针对水下人造目标的位姿参数估计问题,提出一种基于图像线特征与点云面特征的目标定位算法。方法 基于对人造物体成像后的边缘特征及其本身曲面特征的认知,将目标描述成为一种线特征与面特征的组合。首先依据指定线型对目标图像边缘进行线特征检测,初步定位目标在图像中的位置;然后采用RANSAC(random sample consensus)算法对投影到目标区域内的点云进行曲面特征检测,得到目标参数的近似值并从视场点云中提取目标点云;最后以超二次曲面作为目标的部件化模型,以检测到的目标参数为初值,建立3维目标尺寸和位姿估计的非线性目标函数,将该目标函数的优化结果作为3维目标的定位结果。结果 通过水下实验对算法的有效性进行验证,定位后的目标旋转轴角度偏差不超过2°,相对位置偏差不超过1%,单目标定位耗时不超过5 s。结论 实验结果表明,该算法的定位精度和耗时均能满足应用需要,可有效定位未知尺寸的人造目标,且对水下复杂环境有较强的适应性。  相似文献   

3.
Many generic position-estimation algorithms are vulnerable to ambiguity introduced by nonunique landmarks. Also, the available high-dimensional image data is not fully used when these techniques are extended to vision-based localization. This paper presents the landmark matching, triangulation, reconstruction, and comparison (LTRQ global localization algorithm, which is reasonably immune to ambiguous landmark matches. It extracts natural landmarks for the (rough) matching stage before generating the list of possible position estimates through triangulation. Reconstruction and comparison then rank the possible estimates. The LTRC algorithm has been implemented using an interpreted language, onto a robot equipped with a panoramic vision system. Empirical data shows remarkable improvement in accuracy when compared with the established random sample consensus method. LTRC is also robust against inaccurate map data.  相似文献   

4.
In the field of augmented reality (AR), many kinds of vision-based extrinsic camera parameter estimation methods have been proposed to achieve geometric registration between real and virtual worlds. Previously, a feature landmark-based camera parameter estimation method was proposed. This is an effective method for implementing outdoor AR applications because a feature landmark database can be automatically constructed using the structure-from-motion (SfM) technique. However, the previous method cannot work in real time because it entails a high computational cost or matching landmarks in a database with image features in an input image. In addition, the accuracy of estimated camera parameters is insufficient for applications that need to overlay CG objects at a position close to the user's viewpoint. This is because it is difficult to compensate for visual pattern change of close landmarks when only the sparse depth information obtained by the SfM is available. In this paper, we achieve fast and accurate feature landmark-based camera parameter estimation by adopting the following approaches. First, the number of matching candidates is reduced to achieve fast camera parameter estimation by tentative camera parameter estimation and by assigning priorities to landmarks. Second, image templates of landmarks are adequately compensated for by considering the local 3-D structure of a landmark using the dense depth information obtained by a laser range sensor. To demonstrate the effectiveness of the proposed method, we developed some AR applications using the proposed method.  相似文献   

5.
Currently, machine vision-based fault detection and diagnosis technology for robotic manipulators is widely used. However, traditional machine vision has difficulty identifying manipulator failure in complex environments with dim lighting and large texture differences and the influence of such factors as image motion blur caused by robotic manipulator movement. This article discusses the failure factors of mechanical manipulators and systematically analyzes various links leading to failure and the current technology limitations. First, a gradient-based semantic segmentation method is proposed to extract targets quickly and accurately for the grasped object and complex surrounding environment. Second, when the vision and grasped object have relative movement in dim environment, a multiframe image registration and fusion method are proposed to obtain high-quality, clear image data. Then, a machine-based method is adopted to learn the fault detection and diagnosis methods that fuse internal and external sensors. Finally, a physical system is built to verify the three aspects of the target extraction effect: image clarity, fault detection speed, and diagnosis accuracy, reflecting the superiority of this algorithm.  相似文献   

6.
One of the basic processes of a vision-based target tracking system is the detection process that separates an object from the background in a given image. A novel target detection technique for suppression of the background clutter is presented that uses a predicted point that is estimated from a tracking filter. For every pixel, the three-dimensional feature that is composed of the x-position, the y-position and the gray level of its position is used for evaluating the membership value that describes the probability of whether the pixel belongs to the target or to the background. These membership values are transformed into the membership level histogram. We suggest an asymmetric Laplacian model for the membership distribution of the background pixel and determine the optimal membership value for detecting the target region using the likelihood criterion. The proposed technique is applied to several infra-red image sequences and CCD image sequences to test segmentation and tracking. The feasibility of the proposed method is verified through comparison of the experimental results with the other techniques.  相似文献   

7.
Distributed Cooperative Outdoor Multirobot Localization and Mapping   总被引:1,自引:0,他引:1  
The subject of this article is a scheme for distributed outdoor localization of a team of robots and the use of the robot team for outdoor terrain mapping. Localization is accomplished via Extended Kalman Filtering (EKF). In the distributed EKF-based scheme for localization, heterogeneity of the available sensors is exploited in the absence or degradation of absolute sensors aboard the team members. The terrain mapping technique then utilizes localization information to facilitate the fusion of vision-based range information of environmental features with changes in elevation profile across the terrain. The result is a terrain matrix from which a metric map is then generated. The proposed algorithms are implemented using field data obtained from a team of robots traversing an uneven outdoor terrain.  相似文献   

8.
目的 水平集模型是图像分割中的一种先进方法,在陆地环境图像分割中展现出较好效果。特征融合策略被广泛引入到该模型框架,以拉伸目标-背景对比度,进而提高对高噪声、杂乱纹理等多类复杂图像的处理性能。然而,在水下环境中,由于水体高散射、强衰减等多因素的共同作用,使得现有图像特征及水平集模型难以适用于对水下图像的分割任务,分割结果与目标形态间存在较大差异。鉴于此,提出一种适用于水下图像分割的区域-边缘水平集模型,以提高水下图像目标分割的准确性。方法 综合应用图像的区域特征及边缘特征对水下目标进行辨识。对于区域特征,引入水下图像显著性特征;对于边缘特征,创新性地提出了一种基于深度信息的边缘特征提取方法。所提方法在融合区域级和边缘级特征的基础上,引入距离正则项对水平集函数进行规范,以增强水平集函数演化的稳定性。结果 基于YouTube和Bubblevision的水下数据集的实验结果表明,所提方法不仅对高散射强衰减的低对比度水下图像实现较好的分割效果,同时对处理强背景噪声图像也有较好的鲁棒性,与水平集分割方法(local pre-fitting,LPF)相比,分割精确度至少提高11.5%,与显著性检测方法(hierarchical co-salient detection via color names,HCN)相比,精确度提高6.7%左右。结论 实验表明区域-边缘特征融合以及其基础上的水平集模型能够较好地克服水下图像分割中的部分难点,所提方法能够较好分割水下目标区域并拟合目标轮廓,与现有方法对比获得了较好的分割结果。  相似文献   

9.
We propose a new vision-based method for global robot localization using an omnidirectional camera. Topological and metric localization information are combined in an efficient, hierarchical process, with each step being more complex and accurate than the previous one but evaluating fewer images. This allows us to work with large reference image sets in a reasonable amount of time. Simultaneously, thanks to the use of 1D three-view geometry, accurate metric localization can be achieved based on just a small number of nearby reference images. Owing to the wide baseline features used, the method deals well with illumination changes and occlusions, while keeping the computational load small. The simplicity of the radial line features used speeds up the process without affecting the accuracy too much. We show experiments with two omnidirectional image data sets to evaluate the performance of the method and compare the results using the proposed radial lines with results from state-of-the-art wide-baseline matching techniques.  相似文献   

10.
In this paper a novel framework for the development of computer vision applications that exploit sensors available in mobile devices is presented. The framework is organized as a client–server application that combines mobile devices, network technologies and computer vision algorithms with the aim of performing object recognition starting from photos captured by a phone camera. The client module on the mobile device manages the image acquisition and the query formulation tasks, while the recognition module on the server executes the search on an existing database and sends back relevant information to the client. To show the effectiveness of the proposed solution, the implementation of two possible plug-ins for specific problems is described: landmark recognition and fashion shopping. Experiments on four different landmark datasets and one self-collected dataset of fashion accessories show that the system is efficient and robust in the presence of objects with different characteristics.  相似文献   

11.
In this paper we present a novel vision-based markerless hand pose estimation scheme with the input of depth image sequences. The proposed scheme exploits both temporal constraints and spatial features of the input sequence, and focuses on hand parsing and 3D fingertip localization for hand pose estimation. The hand parsing algorithm incorporates a novel spatial-temporal feature into a Bayesian inference framework to assign the correct label to each image pixel. The 3D fingertip localization algorithm adapts a recently developed geodesic extrema extraction method to fingertip detection with the hand parsing algorithm, a novel path-reweighting method and K-means clustering in metric space. The detected 3D fingertip locations are finally used for hand pose estimation with an inverse kinematics solver. Quantitative experiments on synthetic data show the proposed hand pose estimation scheme can accurately capture the natural hand motion. A simulated water-oscillator application is also built to demonstrate the effectiveness of the proposed method in human-computer interaction scenarios.  相似文献   

12.
In this paper, we address the problem of Multiple Transmitter Localization (MTL). MTL is to determine the locations of potential multiple transmitters in a field, based on readings from a distributed set of sensors. In contrast to the widely studied single transmitter localization problem, the MTL problem has only been studied recently in a few works. MTL is of great significance in many applications wherein intruders may be present. E.g., in shared spectrum systems, detection of unauthorized transmitters and estimating their power are imperative to efficient utilization of the shared spectrum.In this paper, we present DeepMTL, a novel deep learning approach to address the MTL problem. In particular, we frame MTL as a sequence of two steps, each of which is a computer vision problem: image-to-image translation and object detection. The first step of image-to-image translation essentially maps an input image representing sensor readings to an image representing the distribution of transmitter locations, and the second object detection step derives precise locations of transmitters from the image of transmitter distributions. For the first step, we design our learning model sen2peak, while for the second step, we customize a state-of-the-art object detection model YOLOv3-cust. Using DeepMTL as a building block, we also develop techniques to estimate transmit power of the localized transmitters. We demonstrate the effectiveness of our approach via extensive large-scale simulations and show that our approach outperforms the previous approaches significantly (by 50% or more) in performance metrics including localization error, miss rate, and false alarm rate. Our method also incurs a very small latency. We evaluate our techniques over a small-scale area with real testbed data and the testbed results align with the simulation results.  相似文献   

13.
On-road vehicle detection: a review   总被引:13,自引:0,他引:13  
Developing on-board automotive driver assistance systems aiming to alert drivers about driving environments, and possible collision with other vehicles has attracted a lot of attention lately. In these systems, robust and reliable vehicle detection is a critical step. This paper presents a review of recent vision-based on-road vehicle detection systems. Our focus is on systems where the camera is mounted on the vehicle rather than being fixed such as in traffic/driveway monitoring systems. First, we discuss the problem of on-road vehicle detection using optical sensors followed by a brief review of intelligent vehicle research worldwide. Then, we discuss active and passive sensors to set the stage for vision-based vehicle detection. Methods aiming to quickly hypothesize the location of vehicles in an image as well as to verify the hypothesized locations are reviewed next. Integrating detection with tracking is also reviewed to illustrate the benefits of exploiting temporal continuity for vehicle detection. Finally, we present a critical overview of the methods discussed, we assess their potential for future deployment, and we present directions for future research.  相似文献   

14.
A novel simultaneous localization and mapping (SLAM) technique based on independent particle filters for landmark mapping and localization for a mobile robot based on a high-frequency (HF)-band radio-frequency identification (RFID) system is proposed in this paper. SLAM is a technique for performing self-localization and map building simultaneously. FastSLAM is a standard landmark-based SLAM method. RFID is a robust identification system with ID tags and readers over wireless communication; further, it is rarely affected by obstacles in the robot area or by lighting conditions. Therefore, RFID is useful for self-localization and mapping for a mobile robot with a reasonable accuracy and sufficient robustness. In this study, multiple HF-band RFID readers are embedded in the bottom of an omnidirectional vehicle, and a large number of tags are installed on the floor. The HF-band RFID tags are used as the landmarks of the environment. We found that FastSLAM is not appropriate for this condition for two reasons. First, the tag detection of the HF-band RFID system does not follow the standard Gaussian distribution, which FastSLAM is supposed to have. Second, FastSLAM does not have a sufficient scalability, which causes its failure to handle a large number of landmarks. Therefore, we propose a novel SLAM method with two independent particle filters to solve these problems. The first particle filter is for self-localization based on Monte Carlo localization. The second particle filter is for landmark mapping. The particle filters are nonparametric so that it can handle the non-Gaussian distribution of the landmark detection. The separation of localization and landmark mapping reduces the computational cost significantly. The proposed method is evaluated in simulated and real environments. The experimental results show that the proposed method has more precise localization and mapping and a lower computational cost than FastSLAM.  相似文献   

15.
This paper proposes a vision-based indoor localization service system that adopts affine scale invariant features (ASIFT) in MapReduce framework. Compared to prior vision-based localization methods that use scale invariant features or bag-of-words to match database images, the proposed system with ASIFT achieves better localization hit rate, especially when the query image has a large viewing angle difference to the most similar database image. The heavy computation imposed by ASIFT feature detection and image registration is handled by processes designed in MapReduce framework to speed up the localization service. Experiments using a Hadoop computation cluster provide results that show the performance of the localization system. The better localization hit rate is demonstrated by comparing the proposed approach to previous work based on scale invariant feature matching and visual vocabulary.  相似文献   

16.
谢金衡  张炎生 《计算机应用》2019,39(12):3659-3664
针对人脸关键点定位算法需要分为人脸区域检测与单人脸关键点定位两个步骤,导致处理时间成倍增加的情况,提出一步到位的实时且准确的多人脸关键点定位算法。该算法将人脸关键点坐标生成对应的热度图作为数据标签,利用深度残差网络完成前期的图像特征提取,使用特征金字塔网络融合在不同网络深度中表征不同尺度感受野的信息特征,应用中间监督思想,级联多个预测网络由粗到精地一次性回归图中所有人脸的关键点,而无需人脸检测步骤。在保持高定位精度的同时,该算法完成一次前向传播只需要约0.0075 s (约每秒133帧),满足了实时人脸关键点定位的要求,且在WFLW测试集中取得了6.06%的平均误差与11.70%的错误率。  相似文献   

17.
Object detection and classification are the trending research topics in the field of computer vision because of their applications like visual surveillance. However, the vision-based objects detection and classification methods still suffer from detecting smaller objects and dense objects in the complex dynamic environment with high accuracy and precision. The present paper proposes a novel enhanced method to detect and classify objects using Hyperbolic Tangent based You Only Look Once V4 with a Modified Manta-Ray Foraging Optimization-based Convolution Neural Network. Initially, in the pre-processing, the video data was converted into image sequences and Polynomial Adaptive Edge was applied to preserve the Algorithm method for image resizing and noise removal. The noiseless resized image sequences contrast was enhanced using Contrast Limited Adaptive Edge Preserving Algorithm. And, with the contrast-enhanced image sequences, the Hyperbolic Tangent based You Only Look Once V4 was trained for object detection. Additionally, to detect smaller objects with high accuracy, Grasp configuration was observed for every detected object. Finally, the Modified Manta-Ray Foraging Optimization-based Convolution Neural Network method was carried out for the detection and the classification of objects. Comparative experiments were conducted on various benchmark datasets and methods that showed improved accurate detection and classification results.  相似文献   

18.
In this paper a vision-based approach for guidance and safe landing of an Unmanned Aerial Vehicle (UAV) is proposed. The UAV is required to navigate from an initial to a final position in a partially known environment. The guidance system allows a remote user to define target areas from a high resolution aerial or satellite image to determine either the waypoints of the navigation trajectory or the landing area. A feature-based image-matching algorithm finds the natural landmarks and gives feedbacks to an onboard, hierarchical, behaviour-based control system for autonomous navigation and landing. Two algorithms for safe landing area detection are also proposed, based on a feature optical flow analysis. The main novelty is in the vision-based architecture, extensively tested on a helicopter, which, in particular, does not require any artificial landmark (e.g., helipad). Results show the appropriateness of the vision-based approach, which is robust to occlusions and light variations.  相似文献   

19.
多信息融合的多姿态三维人脸面部五官标志点定位方法   总被引:1,自引:0,他引:1  
针对三维人脸模型面部五官标志点定位对姿态变化非常敏感的问题,提出了一种基于多信息融合的多姿态三维人脸五官标志点定位方法.首先对二维人脸纹理图像采用仿射不变的Affine- SIFT方法进行特征点检测,再利用映射关系将其投影到三维空间,并采用局部邻域曲率变化最大规则和迭代约束优化相结合的方法对面部五官标志点进行精确定位.在FRGC2.0和自建NPU3D数据库的实验结果表明,文中方法无需对姿态和三维数据的格式进行预先估计和定义,算法复杂度低,同时对人脸模型的姿态有着较强的鲁棒性,与现有五官标志点定位方法相比,有着更高的定位精度.  相似文献   

20.
This paper presents a novel vision-based global localization that uses hybrid maps of objects and spatial layouts. We model indoor environments with a stereo camera using the following visual cues: local invariant features for object recognition and their 3D positions for object pose estimation. We also use the depth information at the horizontal centerline of image where the optical axis passes through, which is similar to the data from a 2D laser range finder. This allows us to build our topological node that is composed of a horizontal depth map and an object location map. The horizontal depth map describes the explicit spatial layout of each local space and provides metric information to compute the spatial relationships between adjacent spaces, while the object location map contains the pose information of objects found in each local space and the visual features for object recognition. Based on this map representation, we suggest a coarse-to-fine strategy for global localization. The coarse pose is estimated by means of object recognition and SVD-based point cloud fitting, and then is refined by stochastic scan matching. Experimental results show that our approaches can be used for an effective vision-based map representation as well as for global localization methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号