首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mobile robotics has achieved notable progress, however, to increase the complexity of the tasks that mobile robots can perform in natural environments, we need to provide them with a greater semantic understanding of their surrounding. In particular, identifying indoor scenes, such as an Office or a Kitchen, is a highly valuable perceptual ability for an indoor mobile robot, and in this paper we propose a new technique to achieve this goal. As a distinguishing feature, we use common objects, such as Doors or furniture, as a key intermediate representation to recognize indoor scenes. We frame our method as a generative probabilistic hierarchical model, where we use object category classifiers to associate low-level visual features to objects, and contextual relations to associate objects to scenes. The inherent semantic interpretation of common objects allows us to use rich sources of online data to populate the probabilistic terms of our model. In contrast to alternative computer vision based methods, we boost performance by exploiting the embedded and dynamic nature of a mobile robot. In particular, we increase detection accuracy and efficiency by using a 3D range sensor that allows us to implement a focus of attention mechanism based on geometric and structural information. Furthermore, we use concepts from information theory to propose an adaptive scheme that limits computational load by selectively guiding the search for informative objects. The operation of this scheme is facilitated by the dynamic nature of a mobile robot that is constantly changing its field of view. We test our approach using real data captured by a mobile robot navigating in Office and home environments. Our results indicate that the proposed approach outperforms several state-of-the-art techniques for scene recognition.  相似文献   

2.
3.
为了满足在复杂环境下对人体动作识别的需求,提出了一种基于场景理解的双流网络识别结构。将场景信息作为辅助信息加入了人体动作识别网络结构中,改善识别网络的识别准确率。对场景识别网络与人体动作识别网络不同的融合方式进行研究,确定了网络最佳识别结构。通过分析不同参数对识别准确率的影响,最终确定了双流网络的所有结构参数,设计并训练完成了双流网络结构。通过在UCF50,UCF101等公开数据集上实验,分别取得了95%,93%的准确率,高于典型的识别网络结果。对其他一些典型识别网络加入同样场景信息进行了研究,其实验结果证明了此方法可以有效改善识别准确率。  相似文献   

4.
5.
Spatial pyramids have been successfully applied to incorporating spatial information into bag-of-words based image representation. However, a major drawback is that it leads to high dimensional image representations. In this paper, we present a novel framework for obtaining compact pyramid representation. First, we investigate the usage of the divisive information theoretic feature clustering (DITC) algorithm in creating a compact pyramid representation. In many cases this method allows us to reduce the size of a high dimensional pyramid representation up to an order of magnitude with little or no loss in accuracy. Furthermore, comparison to clustering based on agglomerative information bottleneck (AIB) shows that our method obtains superior results at significantly lower computational costs. Moreover, we investigate the optimal combination of multiple features in the context of our compact pyramid representation. Finally, experiments show that the method can obtain state-of-the-art results on several challenging data sets.  相似文献   

6.
7.
A fast technique for recursive scene matching using pyramids   总被引:3,自引:0,他引:3  
An algorithm of a fast correlation technique for scene matching using pyramidal image representation is introduced. A mathematical model of the image registration process based on the pyramidal representation of a separable Markov random field is considered in order to evaluate threshold sequence for the algorithm. Experimental results are presented for matching images, both free of noise and corrupted by noise. Theoretical and experimental results given in the paper show that computational efficiency in scene matching could be improved in three orders of magnitude comparatively to the traditional correlation technique.  相似文献   

8.
This paper proposes a novel method based on Spectral Regression (SR) for efficient scene recognition. First, a new SR approach, called Extended Spectral Regression (ESR), is proposed to perform manifold learning on a huge number of data samples. Then, an efficient Bag-of-Words (BOW) based method is developed which employs ESR to encapsulate local visual features with their semantic, spatial, scale, and orientation information for scene recognition. In many applications, such as image classification and multimedia analysis, there are a huge number of low-level feature samples in a training set. It prohibits direct application of SR to perform manifold learning on such dataset. In ESR, we first group the samples into tiny clusters, and then devise an approach to reduce the size of the similarity matrix for graph learning. In this way, the subspace learning on graph Laplacian for a vast dataset is computationally feasible on a personal computer. In the ESR-based scene recognition, we first propose an enhanced low-level feature representation which combines the scale, orientation, spatial position, and local appearance of a local feature. Then, ESR is applied to embed enhanced low-level image features. The ESR-based feature embedding not only generates a low dimension feature representation but also integrates various aspects of low-level features into the compact representation. The bag-of-words is then generated from the embedded features for image classification. The comparative experiments on open benchmark datasets for scene recognition demonstrate that the proposed method outperforms baseline approaches. It is suitable for real-time applications on mobile platforms, e.g. tablets and smart phones.  相似文献   

9.
10.
Dimensionality reduction with adaptive graph   总被引:1,自引:1,他引:0  
Graph-based dimensionality reduction (DR) methods have been applied successfully in many practical problems, such as face recognition, where graphs play a crucial role in modeling the data distribution or structure. However, the ideal graph is, in practice, difficult to discover. Usually, one needs to construct graph empirically according to various motivations, priors, or assumptions; this is independent of the subsequent DR mapping calculation. Different from the previous works, in this paper, we attempt to learn a graph closely linked with the DR process, and propose an algorithm called dimensionality reduction with adaptive graph (DRAG), whose idea is to, during seeking projection matrix, simultaneously learn a graph in the neighborhood of a prespecified one. Moreover, the pre-specified graph is treated as a noisy observation of the ideal one, and the square Frobenius divergence is used to measure their difference in the objective function. As a result, we achieve an elegant graph update formula which naturally fuses the original and transformed data information. In particular, the optimal graph is shown to be a weighted sum of the pre-defined graph in the original space and a new graph depending on transformed space. Empirical results on several face datasets demonstrate the effectiveness of the proposed algorithm.  相似文献   

11.
Detecting and recognizing text in natural images are quite challenging and have received much attention from the computer vision community in recent years. In this paper, we propose a robust end-to-end scene text recognition method, which utilizes tree-structured character models and normalized pictorial structured word models. For each category of characters, we build a part-based tree-structured model (TSM) so as to make use of the character-specific structure information as well as the local appearance information. The TSM could detect each part of the character and recognize the unique structure as well, seamlessly combining character detection and recognition together. As the TSMs could accurately detect characters from complex background, for text localization, we apply TSMs for all the characters on the coarse text detection regions to eliminate the false positives and search the possible missing characters as well. While for word recognition, we propose a normalized pictorial structure (PS) framework to deal with the bias caused by words of different lengths. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods both for text localization and word recognition.  相似文献   

12.
A scene analysis system for the recognition and inspection of overlapping workpieces in visually noisy scenes is described. It consists of a preprocessing algorithm based on an edge-following operator and a model-based analysis algorithm. In the preprocessing stage, which is not described in detail, features such as corners, straight lines, circles and circular arcs are extracted and described by a few parameters. In the analysis stage, the pattern features extracted by the preprocessing algorithm are used to synthesize, in model-guided fashion, a prototype of the workpiece, which is continuously checked against the model. A similarity measure indicates the match between model and scene. Besides topographical features, the analysis makes use of grey levels, textural measures and values representing colors.Results obtained with different, partly occluded workpieces are given.  相似文献   

13.
Dimensionality reduction methods (DRs) have commonly been used as a principled way to understand the high-dimensional data such as face images. In this paper, we propose a new unsupervised DR method called sparsity preserving projections (SPP). Unlike many existing techniques such as local preserving projection (LPP) and neighborhood preserving embedding (NPE), where local neighborhood information is preserved during the DR procedure, SPP aims to preserve the sparse reconstructive relationship of the data, which is achieved by minimizing a L1 regularization-related objective function. The obtained projections are invariant to rotations, rescalings and translations of the data, and more importantly, they contain natural discriminating information even if no class labels are provided. Moreover, SPP chooses its neighborhood automatically and hence can be more conveniently used in practice compared to LPP and NPE. The feasibility and effectiveness of the proposed method is verified on three popular face databases (Yale, AR and Extended Yale B) with promising results.  相似文献   

14.
15.
A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time–frequency (T–F) mask which retains the mixture in a local T–F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T–F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance.  相似文献   

16.
局部保持投影算法(locality preserving projections,LPP)作为降维算法,在机器学习和模式识别中有着广泛应用。在识别分类中,为了更好的利用类别信息,在保持样本点的局部特征外,有效地从高维数据中提取出低维的人脸图像信息并提高人脸图像的识别率和识别速度,使分类达到一定优化,基于LPP算法结合流形学习思想,通过构造一种吸引向量的方法提出一种改进的局部保持投影算法(reformation locality preserve projections ,RLPP)。将数据集利用极端学习机分类器进行分类后,在标准人脸数据库上的进行试验,实验结果证明,改进后算法的识别率优于LPP算法、局部保持平均邻域边际最大化算法和鲁棒线性降维算法,具有较强的泛化能力和较高的识别率。  相似文献   

17.
In this paper an efficient feature extraction method named as locally linear discriminant embedding (LLDE) is proposed for face recognition. It is well known that a point can be linearly reconstructed by its neighbors and the reconstruction weights are under the sum-to-one constraint in the classical locally linear embedding (LLE). So the constrained weights obey an important symmetry: for any particular data point, they are invariant to rotations, rescalings and translations. The latter two are introduced to the proposed method to strengthen the classification ability of the original LLE. The data with different class labels are translated by the corresponding vectors and those belonging to the same class are translated by the same vector. In order to cluster the data with the same label closer, they are also rescaled to some extent. So after translation and rescaling, the discriminability of the data will be improved significantly. The proposed method is compared with some related feature extraction methods such as maximum margin criterion (MMC), as well as other supervised manifold learning-based approaches, for example ensemble unified LLE and linear discriminant analysis (En-ULLELDA), locally linear discriminant analysis (LLDA). Experimental results on Yale and CMU PIE face databases convince us that the proposed method provides a better representation of the class information and obtains much higher recognition accuracies.  相似文献   

18.
Face recognition technology is of great significance for applications involving national security and crime prevention. Despite enormous progress in this field, machine-based system is still far from the goal of matching the versatility and reliability of human face recognition. In this paper, we show that a simple system designed by emulating biological strategies of human visual system can largely surpass the state-of-the-art performance on uncontrolled face recognition. In particular, the proposed system integrates dual retinal texture and color features for face representation, an incremental robust discriminant model for high level face coding, and a hierarchical cue-fusion method for similarity qualification. We demonstrate the strength of the system on the large-scale face verification task following the evaluation protocol of the Face Recognition Grand Challenge (FRGC) version 2 Experiment 4. The results are surprisingly well: Its modules significantly outperform their state-of-the-art counterparts, such as Gabor image representation, local binary patterns, and enhanced Fisher linear discriminant model. Furthermore, applying the integrated system to the FRGC version 2 Experiment 4, the verification rate at the false acceptance rate of 0.1 percent reaches to 93.12 percent.  相似文献   

19.
20.
We present a modular linear discriminant analysis (LDA) approach for face recognition. A set of observers is trained independently on different regions of frontal faces and each observer projects face images to a lower-dimensional subspace. These lower-dimensional subspaces are computed using LDA methods, including a new algorithm that we refer to as direct, weighted LDA or DW-LDA. DW-LDA combines the advantages of two recent LDA enhancements, namely direct LDA (D-LDA) and weighted pairwise Fisher criteria. Each observer performs recognition independently and the results are combined using a simple sum-rule. Experiments compare the proposed approach to other face recognition methods that employ linear dimensionality reduction. These experiments demonstrate that the modular LDA method performs significantly better than other linear subspace methods. The results also show that D-LDA does not necessarily perform better than the well-known principal component analysis followed by LDA approach. This is an important and significant counterpoint to previously published experiments that used smaller databases. Our experiments also indicate that the new DW-LDA algorithm is an improvement over D-LDA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号