首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present an efficient method for learning part-based object class models from unsegmented images represented as sets of salient features. A model includes parts’ appearance, as well as location and scale relations between parts. The object class is generatively modeled using a simple Bayesian network with a central hidden node containing location and scale information, and nodes describing object parts. The model’s parameters, however, are optimized to reduce a loss function of the training error, as in discriminative methods. We show how boosting techniques can be extended to optimize the relational model proposed, with complexity linear in the number of parts and the number of features per image. This efficiency allows our method to learn relational models with many parts and features. The method has an advantage over purely generative and purely discriminative approaches for learning from sets of salient features, since generative method often use a small number of parts and features, while discriminative methods tend to ignore geometrical relations between parts. Experimental results are described, using some bench-mark data sets and three sets of newly collected data, showing the relative merits of our method in recognition and localization tasks.  相似文献   

2.
The discriminative power of a feature has an impact on the convergence rate in training and running speed in evaluating an object detector. In this paper, a novel distribution-based discriminative feature is proposed to distinguish objects of rigid object categories from background. It fully makes use of the advantage of local binary pattern (LBP) that specializes in encoding local structures and statistic information of distribution from training data, which is utilized in getting optimal separating hyperplane. The proposed feature maintains the merit of simplicity in calculation and powerful discriminative ability to distinguish objects from background patches. Three LBP-based features are derived to adaptive projection ones, which are more discriminative than original versions. The asymmetric Gentle Adaboost organized in nested cascade structure constructs the final detector. The proposed features are evaluated on two different object categories: frontal human faces and side-view cars. Experimental results demonstrate that the proposed features are more discriminative than traditional Haarlike features and multi-block LBP (MBLBP) features. Furthermore they are also robust in monotonous variations of illumination.  相似文献   

3.
This paper details a new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently. The learned model is used for automatic visual understanding and semantic segmentation of photographs. Our discriminative model exploits texture-layout filters, novel features based on textons, which jointly model patterns of texture and their spatial layout. Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes. Accurate image segmentation is achieved by incorporating the unary classifier in a conditional random field, which (i) captures the spatial interactions between class labels of neighboring pixels, and (ii) improves the segmentation of specific object instances. Efficient training of the model on large datasets is achieved by exploiting both random feature selection and piecewise training methods. High classification and segmentation accuracy is demonstrated on four varied databases: (i) the MSRC 21-class database containing photographs of real objects viewed under general lighting conditions, poses and viewpoints, (ii) the 7-class Corel subset and (iii) the 7-class Sowerby database used in He et al. (Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 695–702, June 2004), and (iv) a set of video sequences of television shows. The proposed algorithm gives competitive and visually pleasing results for objects that are highly textured (grass, trees, etc.), highly structured (cars, faces, bicycles, airplanes, etc.), and even articulated (body, cow, etc.). J. Shotton is now working at Toshiba Corporate Research & Development Center, Kawasaki, Japan.  相似文献   

4.
For facial expression recognition, we selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last vowel in an utterance. In this study, as a pre-processing module, we added a judgment function to distinguish a front-view face for facial expression recognition. A frame of the front-view face in a dynamic image is selected by estimating the face direction. The judgment function measures four feature parameters using thermal image processing, and selects the thermal images that have all the values of the feature parameters within limited ranges which were decided on the basis of training thermal images of front-view faces. As an initial investigation, we adopted the utterance of the Japanese name “Taro,” which is semantically neutral. The mean judgment accuracy of the front-view face was 99.5% for six subjects who changed their face direction freely. Using the proposed method, the facial expressions of six subjects were distinguishable with 84.0% accuracy when they exhibited one of the intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” We expect the proposed method to be applicable for recognizing facial expressions in daily conversation.  相似文献   

5.
In this article, we propose a new video object retrieval system. Our approach is based on a Spatio-Temporal data representation, a dedicated kernel design and a statistical learning toolbox for video object recognition and retrieval. Using state-of-the-art video object detection algorithms (for faces or cars, for example) we segment video object tracks from real movies video shots. We then extract, from these tracks, sets of spatio-temporally coherent features that we call Spatio-Temporal Tubes. To compare these complex tube objects, we design a Spatio-Temporal Tube Kernel (STTK) function. Based on this kernel similarity we present both supervised and active learning strategies embedded in Support Vector Machine framework. Additionally, we propose a multi-class classification framework dealing with unbalanced data. Our approach is successfully evaluated on two real movies databases, the french movie “L’esquive” and episodes from “Buffy, the Vampire Slayer” TV series. Our method is also tested on a car database (from real movies) and shows promising results for car identification task.  相似文献   

6.
In the process of extending the UML metamodel for a specific domain, the metamodel specifier introduces frequently some metaassociations at MOF level M2 with the aim that they induce some specific associations at MOF level M1. For instance, if a metamodel for software process modelling states that a “Role” is responsible for an “Artifact”, we can interpret that its specifier intended to model two aspects: (1) the implications of this metaassociation at level M1 (e.g., the specific instance of Role “TestEngineer” is responsible for the specific instance of Artifact “TestPlans”); and (2) the implications of this metaassociation at level M0 (e.g., “John Doe” is the responsible test engineer for elaborating the test plans for the package “Foo”). Unfortunately, the second aspect is often not enforced by the metamodel and, as a result, the models which are defined as its instances may not incorporate it. This problem, consequence of the so-called “shallow instantiation” in Atkinson and Kühne (Procs. UML’01, LNCS 2185, Springer, 2001), prevents these models from being accurate enough in the sense that they do not express all the information intended by the metamodel specifier and consequently do not distinguish metaassociations that induce associations at M1 from those that do not. In this article we introduce the concept of induced association that may come up when an extension of the UML metamodel is developed. The implications that this concept has both in the extended metamodel and in its instances are discussed. We also present a methodology to enforce that M1 models incorporate the associations induced by the metamodel which they are instances from. Next, as an example of application we present a quality metamodel for software artifacts which makes intensive use of induced associations. Finally, we introduce a software tool to assist the development of quality models as correct instantiations of the metamodel, assuring the proper application of the induced associations as required by the metamodel.  相似文献   

7.
Classifier-based acronym extraction for business documents   总被引:1,自引:1,他引:0  
Acronym extraction for business documents has been neglected in favor of acronym extraction for biomedical documents. Although there are overlapping challenges, the semi-structured and non-predictive nature of business documents hinder the effectiveness of the extraction methods used on biomedical documents and fail to deliver the expected performance. A classifier-based extraction subsystem is presented as part of the wider project, Binocle, for the analysis of French business corpora. Explicit and implicit acronym presentation cases are identified using textual and syntactical hints. Among the 7 features extracted from each candidate instance, we introduce “similarity” features, which compare a candidate’s characteristics with average length-related values calculated from a generic acronym repository. Commonly used rules for evaluating the candidate (matching first letters, ordered instances, etc.) are scored and aggregated in a single composite feature that permits a supple classification. One hundred and thirty-eight French business documents from 14 public organizations were used for the training and evaluation corpora, yielding a recall of 90.9% at a precision level of 89.1% for a search space size of 3 sentences.  相似文献   

8.
This paper investigates the prospects of Rodney Brooks’ proposal for AI without representation. It turns out that the supposedly characteristic features of “new AI” (embodiment, situatedness, absence of reasoning, and absence of representation) are all present in conventional systems: “New AI” is just like old AI. Brooks proposal boils down to the architectural rejection of central control in intelligent agents—Which, however, turns out to be crucial. Some of more recent cognitive science suggests that we might do well to dispose of the image of intelligent agents as central representation processors. If this paradigm shift is achieved, Brooks’ proposal for cognition without representation appears promising for full-blown intelligent agents—Though not for conscious agents.  相似文献   

9.
Erik Hollnagel’s body of work in the past three decades has molded much of the current research approach to system safety, particularly notions of “error”. Hollnagel regards “error” as a dead-end and avoids using the term. This position is consistent with Rasmussen’s claim that there is no scientifically stable category of human performance that can be described as “error”. While this systems view is undoubtedly correct, “error” persists. Organizations, especially formal business, political, and regulatory structures, use “error” as if it were a stable category of human performance. They apply the term to performances associated with undesired outcomes, tabulate occurrences of “error”, and justify control and sanctions through “error”. Although a compelling argument can be made for Hollnagel’s view, it is clear that notions of “error” are socially and organizationally productive. The persistence of “error” in management and regulatory circles reflects its value as a means for social control.  相似文献   

10.
Learning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful features, one of which is the ability to naturally establish explicit correspondence between model components and scene features—this, in turn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative approach, using ‘Fisher Kernels’ (Jaakola, T., et al. in Advances in neural information processing systems, Vol. 11, pp. 487–493, 1999), which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements over the corresponding generative approach. In addition, we demonstrate how this hybrid learning paradigm can be extended to address several outstanding challenges within computer vision including how to combine multiple object models and learning with unlabeled data.  相似文献   

11.
Low-level cues in an image not only allow to infer higher-level information like the presence of an object, but the inverse is also true. Category-level object recognition has now reached a level of maturity and accuracy that allows to successfully feed back its output to other processes. This is what we refer to as cognitive feedback. In this paper, we study one particular form of cognitive feedback, where the ability to recognize objects of a given category is exploited to infer different kinds of meta-data annotations for images of previously unseen object instances, in particular information on 3D shape. Meta-data can be discrete, real- or vector-valued. Our approach builds on the Implicit Shape Model of Leibe and Schiele [B. Leibe, A. Leonardis, B. Schiele, Robust object detection with interleaved categorization and segmentation, International Journal of Computer Vision 77 (1–3) (2008) 259–289], and extends it to transfer annotations from training images to test images. We focus on the inference of approximative 3D shape information about objects in a single 2D image. In experiments, we illustrate how our method can infer depth maps, surface normals and part labels for previously unseen object instances.  相似文献   

12.
In this paper, a visual object tracking method is proposed based on sparse 2-dimensional discrete cosine transform (2D DCT) coefficients as discriminative features. To select the discriminative DCT coefficients, we give two propositions. The propositions select the features based on estimated mean of feature distributions in each frame. Some intermediate tracking instances are obtained by (a) computing feature similarity using kernel, (b) finding the maximum classifier score computed using ratio classifier, and (c) combinations of both. Another intermediate tracking instance is obtained using incremental subspace learning method. The final tracked instance amongst the intermediate instances are selected by using a discriminative linear classifier learned in each frame. The linear classifier is updated in each frame using some of the intermediate tracked instances. The proposed method has a better tracking performance as compared to state-of-the-art video trackers in a dataset of 50 challenging video sequences.  相似文献   

13.
Multi-Class Segmentation with Relative Location Prior   总被引:2,自引:0,他引:2  
Multi-class image segmentation has made significant advances in recent years through the combination of local and global features. One important type of global feature is that of inter-class spatial relationships. For example, identifying “tree” pixels indicates that pixels above and to the sides are more likely to be “sky” whereas pixels below are more likely to be “grass.” Incorporating such global information across the entire image and between all classes is a computational challenge as it is image-dependent, and hence, cannot be precomputed. In this work we propose a method for capturing global information from inter-class spatial relationships and encoding it as a local feature. We employ a two-stage classification process to label all image pixels. First, we generate predictions which are used to compute a local relative location feature from learned relative location maps. In the second stage, we combine this with appearance-based features to provide a final segmentation. We compare our results to recent published results on several multi-class image segmentation databases and show that the incorporation of relative location information allows us to significantly outperform the current state-of-the-art.  相似文献   

14.
This article presents an object-oriented mechanism to achieve group communication in large scale grids. Group communication is a crucial feature for high-performance and grid computing. While previous work on collective communications imposed the use of dedicated interfaces, we propose a scheme where one can initiate group communications using the standard public methods of the class by instantiating objects through a special object factory. The object factory utilizes casting and introspection to construct a “parallel processing enhanced” implementation of the object which matches the original class’ interface. This mechanism is then extended in an evolution of the classical SPMD programming paradigm into the domain of clusters and grids named “Object-Oriented SPMD”. OOSPMD provides interprocess (inter-object) communications via transparent remote method invocations rather than custom interfaces. Such typed group communication constitutes a basis for improvement of component models allowing advanced composition of parallel building blocks. The typed group pattern leads to an interesting, uniform, and complete model for programming applications intended to be run on clusters and grids.  相似文献   

15.
We constructed a universal system for object recognition, which uses preliminary training based on sample images of “objects” and “non-objects.” The images are represented by separate points in multi-dimensional space of features. A recognition system which uses the features space obtained on the basis of lateral-inhibition type functions is described. The article is published in the original.  相似文献   

16.
We introduce a segmentation-based detection and top-down figure-ground delineation algorithm. Unlike common methods which use appearance for detection, our method relies primarily on the shape of objects as is reflected by their bottom-up segmentation. Our algorithm receives as input an image, along with its bottom-up hierarchical segmentation. The shape of each segment is then described both by its significant boundary sections and by regional, dense orientation information derived from the segment’s shape using the Poisson equation. Our method then examines multiple, overlapping segmentation hypotheses, using their shape and color, in an attempt to find a “coherent whole,” i.e., a collection of segments that consistently vote for an object at a single location in the image. Once an object is detected, we propose a novel pixel-level top-down figure-ground segmentation by “competitive coverage” process to accurately delineate the boundaries of the object. In this process, given a particular detection hypothesis, we let the voting segments compete for interpreting (covering) each of the semantic parts of an object. Incorporating competition in the process allows us to resolve ambiguities that arise when two different regions are matched to the same object part and to discard nearby false regions that participated in the voting process. We provide quantitative and qualitative experimental results on challenging datasets. These experiments demonstrate that our method can accurately detect and segment objects with complex shapes, obtaining results comparable to those of existing state of the art methods. Moreover, our method allows us to simultaneously detect multiple instances of class objects in images and to cope with challenging types of occlusions such as occlusions by a bar of varying size or by another object of the same class, that are difficult to handle with other existing class-specific top-down segmentation methods.  相似文献   

17.
We have previously developed a method for the recognition of the facial expression of a speaker. For facial expression recognition, we previously selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last vowel in an utterance. By using the speech recognition system named Julius, thermal static images are saved at the timed positions of just before speaking, and when just speaking the phonemes of the first and last vowels. To implement our method, we recorded three subjects who spoke 25 Japanese first names which provided all combinations of the first and last vowels. These recordings were used to prepare first the training data and then the test data. Julius sometimes makes a mistake in recognizing the first and/or last vowel (s). For example, /a/ for the first vowel is sometimes misrecognized as /i/. In the training data, we corrected this misrecognition. However, the correction cannot be carried out in the test data. In the implementation of our method, the facial expressions of the three subjects were distinguished with a mean accuracy of 79.8% when they exhibited one of the intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” The mean accuracy of the speech recognition of vowels by Julius was 84.1%.  相似文献   

18.
In this paper we propose an object-triggered human memory augmentation system named “Ubiquitous Memories” that enables a user to directly associate his/her experience data with physical objects by using a “touching” operation. A user conceptually encloses his/her experiences gathered through sense organs into physical objects by simply touching an object. The user can also disclose and re-experience for himself/herself the experiences accumulated in an object by the same operation. We implemented a prototype system composed basically of a radio frequency identification (RFID) device. Physical objects are also attached to RFID tags. We conducted two experiments. The first experiment confirms a succession of the “encoding specificity principle,” which is well known in the research field of psychology, to the Ubiquitous Memories system. The second experiment aims at a clarification of the system’s characteristics by comparing the system with other memory externalization strategies. The results show the Ubiquitous Memories system is effective for supporting memorization and recollection of contextual events.  相似文献   

19.
20.
Edges are useful features for structural image analysis, but the output of standard edge detectors must be thresholded to remove the many spurious edges. This paper describes experiments with both new and old techniques for: 1. determining edge saliency (as alternatives to gradient magnitude) and 2. automatically determining appropriate edge threshold values. Some examples of edge saliency measures are lifetime, wiggliness, spatial width, and phase congruency. Examples of thresholding techniques use: the Rayleigh distribution to model the edge gradient magnitude histogram, relaxation labelling, and an edge curve “length”–“average gradient magnitude” feature space.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号