首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
2.
Statistics of natural image categories   总被引:9,自引:0,他引:9  
In this paper we study the statistical properties of natural images belonging to different categories and their relevance for scene and object categorization tasks. We discuss how second-order statistics are correlated with image categories, scene scale and objects. We propose how scene categorization could be computed in a feedforward manner in order to provide top-down and contextual information very early in the visual processing chain. Results show how visual categorization based directly on low-level features, without grouping or segmentation stages, can benefit object localization and identification. We show how simple image statistics can be used to predict the presence and absence of objects in the scene before exploring the image.  相似文献   

3.
目的 针对细粒度图像分类中的背景干扰问题,提出一种利用自上而下注意图分割的分类模型。方法 首先,利用卷积神经网络对细粒度图像库进行初分类,得到基本网络模型。再对网络模型进行可视化分析,发现仅有部分图像区域对目标类别有贡献,利用学习好的基本网络计算图像像素对相关类别的空间支持度,生成自上而下注意图,检测图像中的关键区域。再用注意图初始化GraphCut算法,分割出关键的目标区域,从而提高图像的判别性。最后,对分割图像提取CNN特征实现细粒度分类。结果 该模型仅使用图像的类别标注信息,在公开的细粒度图像库Cars196和Aircrafts100上进行实验验证,最后得到的平均分类正确率分别为86.74%和84.70%。这一结果表明,在GoogLeNet模型基础上引入注意信息能够进一步提高细粒度图像分类的正确率。结论 基于自上而下注意图的语义分割策略,提高了细粒度图像的分类性能。由于不需要目标窗口和部位的标注信息,所以该模型具有通用性和鲁棒性,适用于显著性目标检测、前景分割和细粒度图像分类应用。  相似文献   

4.
5.
Scale-Invariant Visual Language Modeling for Object Categorization   总被引:2,自引:0,他引:2  
In recent years, ldquobag-of-wordsrdquo models, which treat an image as a collection of unordered visual words, have been widely applied in the multimedia and computer vision fields. However, their ignorance of the spatial structure among visual words makes them indiscriminative for objects with similar word frequencies but different word spatial distributions. In this paper, we propose a visual language modeling method (VLM), which incorporates the spatial context of the local appearance features into the statistical language model. To represent the object categories, models with different orders of statistical dependencies have been exploited. In addition, the multilayer extension to the VLM makes it more resistant to scale variations of objects. The model is effective and applicable to large scale image categorization. We train scale invariant visual language models based on the images which are grouped by Flickr tags, and use these models for object categorization. Experimental results show they achieve better performance than single layer visual language models and ldquobag-of-wordsrdquo models. They also achieve comparable performance with 2-D MHMM and SVM-based methods, while costing much less computational time.  相似文献   

6.
7.
The goal of object categorization is to locate and identify instances of an object category within an image. Recognizing an object in an image is difficult when images include occlusion, poor quality, noise or background clutter, and this task becomes even more challenging when many objects are present in the same scene. Several models for object categorization use appearance and context information from objects to improve recognition accuracy. Appearance information, based on visual cues, can successfully identify object classes up to a certain extent. Context information, based on the interaction among objects in the scene or global scene statistics, can help successfully disambiguate appearance inputs in recognition tasks. In this work we address the problem of incorporating different types of contextual information for robust object categorization in computer vision. We review different ways of using contextual information in the field of object categorization, considering the most common levels of extraction of context and the different levels of contextual interactions. We also examine common machine learning models that integrate context information into object recognition frameworks and discuss scalability, optimizations and possible future approaches.  相似文献   

8.
9.
In recent years we have seen lots of strong work in visual recognition, dialogue interpretation and multi-modal learning that is targeted at provide the building blocks to enable intelligent robots to interact with humans in a meaningful way and even continuously evolve during this process. Building systems that unify those components under a common architecture has turned out to be challenging, as each approach comes with it’s own set of assumptions, restrictions, and implications.For example, the impact of recent progress on visual category recognition has been limited from a perspective of interactive systems. Reasons for this are diverse. We identify and address two major challenges in order to integrate modern techniques for visual categorization in an interactive learning system: reducing the number of required labelled training examples and dealing with potentially erroneous input.Today’s object categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attractive due to their potential to use far more and unlabeled training data. We proposes a novel method that uses unsupervised training to obtain visual groupings of objects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a unified and scale-invariant object representation that allows to handle labeled as well as unlabeled information in a coherent way. First experiments demonstrate the ability of the system to learn object category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous.  相似文献   

10.
In this paper, a novel framework is developed for leveraging large-scale loosely tagged images for object classifier training by addressing three key issues jointly: (a) spam tags e.g., some tags are more related to popular query terms rather than the image semantics; (b) loose object tags, e.g., multiple object tags are loosely given at the image level without identifying the object locations in the images; (c) missing object tags, e.g., some object tags are missed and thus negative bags may contain positive instances. To address these three issues jointly, our framework consists of the following key components for leveraging large-scale loosely tagged images for object classifier training: (1) distributed image clustering and inter-cluster visual correlation analysis for handling the issue of spam tags by filtering out large amounts of junk images automatically, (2) multiple instance learning with missing tag prediction for dealing with the issues of loose object tags and missing object tags jointly; (3) structural learning for leveraging the inter-object visual correlations to train large numbers of inter-related object classifiers jointly. Our experiments on large-scale loosely tagged images have provided very positive results.  相似文献   

11.
A new software technology for on‐line performance analysis and the visualization of complex parallel and distributed systems is presented. Often heterogeneous, these systems need capabilities for the flexible integration and configuration of performance analysis and visualization. Our technology is based on an object‐oriented framework for the rapid prototyping and development of distributable visual objects. The visual objects consist of two levels, a platform/device‐specific low level and an analysis‐ and visualization‐specific high level. We have developed a very high‐level markup language called VOML and a compiler for the component‐based development of high‐level visual objects. The VOML is based on a software architecture for on‐line event processing and performance visualization called EPIRA. The technology lends itself to constructing high‐level visual objects from globally distributed component definitions. Details of the technology and tools used, as well as how an example visual object can be rapidly prototyped from several reusable components, are presented. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

12.
This paper introduces an adaptive visual tracking method that combines the adaptive appearance model and the optimization capability of the Markov decision process. Most tracking algorithms are limited due to variations in object appearance from changes in illumination, viewing angle, object scale, and object shape. This paper is motivated by the fact that tracking performance degradation is caused not only by changes in object appearance but also by the inflexible controls of tracker parameters. To the best of our knowledge, optimization of tracker parameters has not been thoroughly investigated, even though it critically influences tracking performance. The challenge is to equip an adaptive tracking algorithm with an optimization capability for a more flexible and robust appearance model. In this paper, the Markov decision process, which has been applied successfully in many dynamic systems, is employed to optimize an adaptive appearance model-based tracking algorithm. The adaptive visual tracking is formulated as a Markov decision process based dynamic parameter optimization problem with uncertain and incomplete information. The high computation requirements of the Markov decision process formulation are solved by the proposed prioritized Q-learning approach. We carried out extensive experiments using realistic video sets, and achieved very encouraging and competitive results.  相似文献   

13.
In this paper we investigate the fine-grained object categorization problem of determining fish species in low-quality visual data (images and videos) recorded in real-life settings. We first describe a new annotated dataset of about 35,000 fish images (MA-35K dataset), derived from the Fish4Knowledge project, covering 10 fish species from the Eastern Indo-Pacific bio-geographic zone. We then resort to a label propagation method able to transfer the labels from the MA-35K to a set of 20 million fish images in order to achieve variability in fish appearance. The resulting annotated dataset, containing over one million annotations (AA-1M), was then manually checked by removing false positives as well as images with occlusions between fish or showing partially fish. Finally, we randomly picked more than 30,000 fish images distributed among ten fish species and extracted from about 400 10-minute videos, and used this data (both images and videos) for the fish task of the LifeCLEF 2014 contest. Together with the fine-grained visual dataset release, we also present two approaches for fish species classification in, respectively, still images and videos. Both approaches showed high performance (for some fish species the precision and recall were close to one) in object classification and outperformed state-of-the-art methods. In addition, despite the fact that dataset is unbalanced in the number of images per species, both methods (especially the one operating on still images) appear to be rather robust against the long-tail curse of data, showing the best performance on the less populated object classes.  相似文献   

14.
In the face of current large-scale video libraries, the practical applicability of content-based indexing algorithms is constrained by their efficiency. This paper strives for efficient large-scale video indexing by comparing various visual-based concept categorization techniques. In visual categorization, the popular codebook model has shown excellent categorization performance. The codebook model represents continuous visual features by discrete prototypes predefined in a vocabulary. The vocabulary size has a major impact on categorization efficiency, where a more compact vocabulary is more efficient. However, smaller vocabularies typically score lower on classification performance than larger vocabularies. This paper compares four approaches to achieve a compact codebook vocabulary while retaining categorization performance. For these four methods, we investigate the trade-off between codebook compactness and categorization performance. We evaluate the methods on more than 200 h of challenging video data with as many as 101 semantic concepts. The results allow us to create a taxonomy of the four methods based on their efficiency and categorization performance.  相似文献   

15.
基于深度网络的可学习感受野算法在图像分类中的应用   总被引:1,自引:0,他引:1  
作为图像检索,图像组织和机器人视觉的基本任务,图像分类在计算机视觉和机器学习中受到了广泛的关注.用于目标识别及图像分类的多种基于深度学习的模型同样引发了该领域内的极大兴趣.本文提出了一种取代尺度不变特征变换(SIFT)和方向梯度直方图(HOG)描述子的算法,即利用深度分层结构,按层级学习有效的图像表示,直接从原始像素点学习特征.该方法分别利用K--奇异值分解(K--SVD)和正交匹配追踪(OMP)进行字典训练和编码.此外,本文采用了同时学习分类器和用于池化的感受野方案.实验结果证明,上述算法在目标(Oxford flowers)和事件(UIUC--sports)图像分类测试集中取得了更好的分类性能.  相似文献   

16.
Visual categorization problems, such as object classification or action recognition, are increasingly often approached using a detection strategy: a classifier function is first applied to candidate subwindows of the image or the video, and then the maximum classifier score is used for class decision. Traditionally, the subwindow classifiers are trained on a large collection of examples manually annotated with masks or bounding boxes. The reliance on time-consuming human labeling effectively limits the application of these methods to problems involving very few categories. Furthermore, the human selection of the masks introduces arbitrary biases (e.g., in terms of window size and location) which may be suboptimal for classification. We propose a novel method for learning a discriminative subwindow classifier from examples annotated with binary labels indicating the presence of an object or action of interest, but not its location. During training, our approach simultaneously localizes the instances of the positive class and learns a subwindow SVM to recognize them. We extend our method to classification of time series by presenting an algorithm that localizes the most discriminative set of temporal segments in the signal. We evaluate our approach on several datasets for object and action recognition and show that it achieves results similar and in many cases superior to those obtained with full supervision.  相似文献   

17.
The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.  相似文献   

18.
19.
Text categorization assigns predefined categories to either electronically available texts or those resulting from document image analysis. A generic system for text categorization is presented which is based on statistical analysis of representative text corpora. Significant features are automatically derived from training texts by selecting substrings from actual word forms and applying statistical information and general linguistic knowledge. The dimension of the feature vectors is then reduced by linear transformation, keeping the essential information. The classification is a minimum least-squares approach based on polynomials. The described system can be efficiently adapted to new domains or different languages. In application, the adapted text categorizers are reliable, fast, and completely automatic. Two example categorization tasks achieve recognition scores of approximately 80% and are very robust against recognition or typing errors.  相似文献   

20.
We present an efficient method for estimating the pose of a three-dimensional object. Its implementation is embedded in a computer vision system which is motivated by and based on cognitive principles concerning the visual perception of three-dimensional objects. Viewpoint-invariant object recognition has been subject to controversial discussions for a long time. An important point of discussion is the nature of internal object representations. Behavioral studies with primates, which are summarized in this article, support the model of view-based object representations. We designed our computer vision system according to these findings and demonstrate that very precise estimations of the poses of real-world objects are possible even if only a small number of sample views of an object is available. The system can be used for a variety of applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号