首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the area of computer vision, deep learning has produced a variety of state-of-the-art models that rely on massive labeled data. However, collecting and annotating images from the real world is too demanding in terms of labor and money investments, and is usually inflexible to build datasets with specific characteristics, such as small area of objects and high occlusion level. Under the framework of Parallel Vision, this paper presents a purposeful way to design artificial scenes and automatically generate virtual images with precise annotations. A virtual dataset named ParallelEye is built, which can be used for several computer vision tasks. Then, by training the DPM (Deformable parts model) and Faster R-CNN detectors, we prove that the performance of models can be significantly improved by combining ParallelEye with publicly available real-world datasets during the training phase. In addition, we investigate the potential of testing the trained models from a specific aspect using intentionally designed virtual datasets, in order to discover the flaws of trained models. From the experimental results, we conclude that our virtual dataset is viable to train and test the object detectors.   相似文献   

2.
在动态背景下的运动目标检测中,由于目标和背景两者都是各自独立运动的,在提取前景运动目标时需要考虑由移动机器人自身运动引起的背景变化。仿射变换是一种广泛用于估计图像间背景变换的方法。然而,在移动机器人上使用全方位视觉传感器(ODVS)时,由于全方位图像的扭曲变形会 造成图像中背景运动不一致,无法通过单一的仿射变换描述全方位图像上的背景运动。将图像划分为网格窗口,然后对每个窗口分别进行仿射变换,从背景变换补偿帧差中得到运动目标的区域。最后,根据ODVS的成像特性,通过视觉方法解析出运动障碍物的距离和方位信息。实验结果表明,提出的方法能准确检测出移动机器人360°范围内的运动障碍物,并实现运动障碍物的精确定位,有效地提高了移动机器人的实时避障能力。  相似文献   

3.
LabelMe: A Database and Web-Based Tool for Image Annotation   总被引:15,自引:0,他引:15  
We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web. The first two authors (B.C. Russell and A. Torralba) contributed equally to this work.  相似文献   

4.
The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.  相似文献   

5.
Executing complex robotic tasks including dexterous grasping and manipulation requires a combination of dexterous robots, intelligent sensors and adequate object information processing. In this paper, vision has been integrated into a highly redundant robotic system consisting of a tiltable camera and a three-fingered dexterous gripper both mounted on a puma-type robot arm. In order to condense the image data of the robot working space acquired from the mobile camera, contour image processing is used for offline grasp and motion planning as well as for online supervision of manipulation tasks. The performance of the desired robot and object motions is controlled by a visual feedback system coordinating motions of hand, arm and eye according to the specific requirements of the respective situation. Experiences and results based on several experiments in the field of service robotics show the possibilities and limits of integrating vision and tactile sensors into a dexterous hand-arm-eye system being able to assist humans in industrial or servicing environments.  相似文献   

6.
Robotic advances and developments in sensors and acquisition systems facilitate the collection of survey data in remote and challenging scenarios. Semantic segmentation, which attempts to provide per‐pixel semantic labels, is an essential task when processing such data. Recent advances in deep learning approaches have boosted this task's performance. Unfortunately, these methods need large amounts of labeled data, which is usually a challenge in many domains. In many environmental monitoring instances, such as the coral reef example studied here, data labeling demands expert knowledge and is costly. Therefore, many data sets often present scarce and sparse image annotations or remain untouched in image libraries. This study proposes and validates an effective approach for learning semantic segmentation models from sparsely labeled data. Based on augmenting sparse annotations with the proposed adaptive superpixel segmentation propagation, we obtain similar results as if training with dense annotations, significantly reducing the labeling effort. We perform an in‐depth analysis of our labeling augmentation method as well as of different neural network architectures and loss functions for semantic segmentation. We demonstrate the effectiveness of our approach on publicly available data sets of different real domains, with the emphasis on underwater scenarios—specifically, coral reef semantic segmentation. We release new labeled data as well as an encoder trained on half a million coral reef images, which is shown to facilitate the generalization to new coral scenarios.  相似文献   

7.
8.
《Advanced Robotics》2013,27(8-9):947-967
Abstract

A wide field of view is required for many robotic vision tasks. Such an aperture may be acquired by a fisheye camera, which provides a full image compared to catadioptric visual sensors, and does not increase the size and the weakness of the imaging system with respect to perspective cameras. While a unified model exists for all central catadioptric systems, many different models, approximating the radial distortions, exist for fisheye cameras. It is shown in this paper that the unified projection model proposed for central catadioptric cameras is also valid for fisheye cameras in the context of robotic applications. This model consists of a projection onto a virtual unitary sphere followed by a perspective projection onto an image plane. This model is shown equivalent to almost all the fisheye models. Calibration with four cameras and partial Euclidean reconstruction are done using this model, and lead to persuasive results. Finally, an application to a mobile robot navigation task is proposed and correctly executed along a 200-m trajectory.  相似文献   

9.
Object detection and location from remote sensing (RS) images is challenging, computationally expensive, and labor intense. Benefiting from research on convolutional neural networks (CNNs), the performance in this field has improved in the recent years. However, object detection methods based on CNNs require a large number of images with annotation information for training. For object location, these annotations must contain bounding boxes. Furthermore, objects in RS images are usually small and densely co-located, leading to a high cost of manual annotation. We tackle the problem of weakly supervised object detection under such conditions, aiming to learn detectors with only image-level annotations, i.e., without bounding box annotations. Based on the fact that the feature maps of a CNN are localizable, we hierarchically fuse the location information from the shallow feature map with the class activation map to obtain accurate object locations. In order to mitigate the loss of small or densely distributed objects, we introduce a divergent activation module and a similarity module into the network. The divergent activation module is used to improve the response strength of the low-response areas in the shallow feature map. Densely distributed objects in RS images, such as aircraft in an airport, often exhibit a certain similarity. The similarity module is used to improve the feature distribution of the shallow feature map and to suppress background noise. Comprehensive experiments on a public dataset and a self-assembled dataset (which we made publicly available) show the superior performance of our method compared to state-of-the-art object detectors.  相似文献   

10.
Design of an autonomous agricultural robot   总被引:5,自引:0,他引:5  
This paper presents a state-of-the-art review in the development of autonomous agricultural robots including guidance systems, greenhouse autonomous systems and fruit-harvesting robots. A general concept for a field crops robotic machine to selectively harvest easily bruised fruit and vegetables is designed. Future trends that must be pursued in order to make robots a viable option for agricultural operations are focused upon.A prototype machine which includes part of this design has been implemented for melon harvesting. The machine consists of a Cartesian manipulator mounted on a mobile chassis pulled by a tractor. Two vision sensors are used to locate the fruit and guide the robotic arm toward it. A gripper grasps the melon and detaches it from the vine. The real-time control hardware architecture consists of a blackboard system, with autonomous modules for sensing, planning and control connected through a PC bus. Approximately 85% of the fruit are successfully located and harvested.  相似文献   

11.

Occlusion-aware instance-sensitive segmentation is a complex task generally split into region-based segmentations, by approximating instances as their bounding box. We address the showcase scenario of dense homogeneous layouts in which this approximation does not hold. In this scenario, outlining unoccluded instances by decoding a deep encoder becomes difficult, due to the translation invariance of convolutional layers and the lack of complexity in the decoder. We therefore propose a multicameral design composed of subtask-specific lightweight decoder and encoder–decoder units, coupled in cascade to encourage subtask-specific feature reuse and enforce a learning path within the decoding process. Furthermore, the state-of-the-art datasets for occlusion-aware instance segmentation contain real images with few instances and occlusions mostly due to objects occluding the background, unlike dense object layouts. We thus also introduce a synthetic dataset of dense homogeneous object layouts, namely Mikado, which extensibly contains more instances and inter-instance occlusions per image than these public datasets. Our extensive experiments on Mikado and public datasets show that ordinal multiscale units within the decoding process prove more effective than state-of-the-art design patterns for capturing position-sensitive representations. We also show that Mikado is plausible with respect to real-world problems, in the sense that it enables the learning of performance-enhancing representations transferable to real images, while drastically reducing the need of hand-made annotations for finetuning. The proposed dataset will be made publicly available.

  相似文献   

12.
13.
This paper presents an efficient metric for the computation of the similarity among omnidirectional images (image matching). The representation of image appearance is based on feature vectors that include both the chromatic attributes of color sets and their mutual spatial relationships. The proposed metric fits well to robotic navigation using omnidirectional vision sensors, because it has very important properties: it is reflexive, compositional and invariant with respect to image scaling and rotation. The robustness of the metric was repeatedly tested using omnidirectional images for a robot localization task in a real indoor environment.  相似文献   

14.
Finding semantically similar images is a problem that relies on image annotations manually assigned by amateurs or professionals, or automatically computed by some algorithm using low-level image features. These image annotations create a keyword space where a dissimilarity function quantifies the semantic relationship among images. In this setting, the objective of this paper is two-fold. First, we compare amateur to professional user annotations and propose a model of manual annotation errors, more specifically, an asymmetric binary model. Second, we examine different aspects of search by semantic similarity. More specifically, we study the accuracy of manual annotations versus automatic annotations, the influence of manual annotations with different accuracies as a result of incorrect annotations, and revisit the influence of the keyword space dimensionality. To assess these aspects we conducted experiments on a professional image dataset (Corel) and two amateur image datasets (one with 25,000 Flickr images and a second with 269,648 Flickr images) with a large number of keywords, with different similarity functions and with both manual and automatic annotation methods. We find that Amateur-level manual annotations offers better performance for top ranked results in all datasets (MP@20). However, for full rank measures (MAP) in the real datasets (Flickr) retrieval by semantic similarity with automatic annotations is similar or better than amateur-level manual annotations.  相似文献   

15.
This article describes a three-dimensional artificial vision system for robotic applications using an ultrasonic sensor array. The array is placed on the robot grip so that it is possible to detect the presence of an object, to direct the robot tool towards it, and to locate the object position. It will provide visual information about the object's surface by means of superficial scanning and it permits the object shape reconstruction. The developed system uses an approximation of the ultrasonic radiation and reception beam shape for calculating the first contact points with the object's surface. On the other hand, the position of the array's sensors has been selected in order to provide the sensorial head with other useful capabilities, such as edge detection and edge tracking. Furthermore, the article shows the structure of the sensorial head for avoiding successive rebounds between the head and the object surface, and for eliminating the mechanical vibrations among sensors.  相似文献   

16.
Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research – detection, estimation, tracking, etc. – in the past two decades. The interest in this topic has increased even more with the demand for advanced driver assistance systems (ADAS) and self-driving cars. Although extensively studied independently, there is still need for studies that propose a combined solution for the multiple problems related to the ego-lane, such as lane departure warning (LDW), lane change detection, lane marking type (LMT) classification, road markings detection and classification, and detection of adjacent lanes (i.e., immediate left and right lanes) presence. In this paper, we propose a real-time Ego-Lane Analysis System (ELAS) capable of estimating ego-lane position, classifying LMTs and road markings, performing LDW and detecting lane change events. The proposed vision-based system works on a temporal sequence of images. Lane marking features are extracted in perspective and Inverse Perspective Mapping (IPM) images that are combined to increase robustness. The final estimated lane is modeled as a spline using a combination of methods (Hough lines with Kalman filter and spline with particle filter). Based on the estimated lane, all other events are detected. To validate ELAS and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created. The dataset was manually annotated and made publicly available to enable evaluation of several events that are of interest for the research community (i.e., lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes). Moreover, the system was also validated quantitatively and qualitatively on other public datasets. ELAS achieved high detection rates in all real-world events and proved to be ready for real-time applications.  相似文献   

17.
In recent years, due to the emergence of ubiquitous computing technology, a new class of networked robots called ubiquitous robots has been introduced. The Ubiquitous Robotic Companion (URC) is our conceptual vision of ubiquitous service robots that provides its user with the services the user needs, anytime and anywhere, in the ubiquitous computing environments. There are requirements to be met for the vision of URC. One of the essential requirements is that the robotic systems must support ubiquity of services. This means that a robot service must always be available even though there are changes in the service environment. More specifically, a robotic system needs to be interoperable with sensors and devices in its current service environments automatically, rather than statically pre-programmed for its environment. In this paper, the design and implementation of an infrastructure for URC called Ubiquitous Robotic Service Framework (URSF) is presented. URSF enables automated integration of networked robots in a ubiquitous computing environment by the use of Semantic Web Services Technologies.  相似文献   

18.
Multispectral pedestrian detection is an important functionality in various computer vision applications such as robot sensing, security surveillance, and autonomous driving. In this paper, our motivation is to automatically adapt a generic pedestrian detector trained in a visible source domain to a new multispectral target domain without any manual annotation efforts. For this purpose, we present an auto-annotation framework to iteratively label pedestrian instances in visible and thermal channels by leveraging the complementary information of multispectral data. A distinct target is temporally tracked through image sequences to generate more confident labels. The predicted pedestrians in two individual channels are merged through a label fusion scheme to generate multispectral pedestrian annotations. The obtained annotations are then fed to a two-stream region proposal network (TS-RPN) to learn the multispectral features on both visible and thermal images for robust pedestrian detection. Experimental results on KAIST multispectral dataset show that our proposed unsupervised approach using auto-annotated training data can achieve performance comparable to state-of-the-art deep neural networks (DNNs) based pedestrian detectors trained using manual labels.  相似文献   

19.
目的 目标检测是遥感智能解译中重要的研究方向之一,大多数目标检测算法难以实现密集排列的旋转目标的高精度检测。提出了一种基于关键点与引导向量预测的目标检测算法,实现高精度旋转目标检测的同时,还可对目标的朝向进行表征。方法 首先提出了一种新的旋转目标建模方式,将目标检测分解成中心点、头部顶点、引导向量以及目标宽度的参数回归以更贴合检测目标;其次设计旋转椭圆高斯核,能够更好地拟合遥感目标的形状,从而提升关键点的预测精度;最后通过预测中心点指向头部顶点的引导向量,完成同一个目标内中心点与头部顶点的匹配,从而生成一个精准的带方向的旋转矩形检测框。结果 在大长宽比舰船目标的HRSC(high-resolution ship collections)数据集上的实验结果表明,相比于其他主流的目标检测算法,本文算法获得了更好的检测结果,在VOC 2007(visual object classes)和VOC 2012的平均精度分别达到了90.78%和97.85%。在小长宽比飞机目标UCAS-AOD(UCAS-high resolution aerial object detection dataset)数据集上达到了98.81%的平均精度。实验结果表明了本文算法的可行性与有效性。结论 本文算法利用椭圆高斯核计算中心点与头部顶点,并设计引导向量对点匹配关系进行约束,实现了旋转目标的方向检测。  相似文献   

20.
Visual navigation is a challenging issue in automated robot control. In many robot applications, like object manipulation in hazardous environments or autonomous locomotion, it is necessary to automatically detect and avoid obstacles while planning a safe trajectory. In this context the detection of corridors of free space along the robot trajectory is a very important capability which requires nontrivial visual processing. In most cases it is possible to take advantage of the active control of the cameras. In this paper we propose a cooperative schema in which motion and stereo vision are used to infer scene structure and determine free space areas. Binocular disparity, computed on several stereo images over time, is combined with optical flow from the same sequence to obtain a relative-depth map of the scene. Both the time to impact and depth scaled by the distance of the camera from the fixation point in space are considered as good, relative measurements which are based on the viewer, but centered on the environment. The need for calibrated parameters is considerably reduced by using an active control strategy. The cameras track a point in space independently of the robot motion and the full rotation of the head, which includes the unknown robot motion, is derived from binocular image data. The feasibility of the approach in real robotic applications is demonstrated by several experiments performed on real image data acquired from an autonomous vehicle and a prototype camera head  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号