首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recognizing scene information in images or has attracted much attention in computer vision or videos, such as locating the objects and answering "Where am research field. Many existing scene recognition methods focus on static images, and cannot achieve satisfactory results on videos which contain more complex scenes features than images. In this paper, we propose a robust movie scene recognition approach based on panoramic frame and representative feature patch. More specifically, the movie is first efficiently segmented into video shots and scenes. Secondly, we introduce a novel key-frame extraction method using panoramic frame and also a local feature extraction process is applied to get the representative feature patches (RFPs) in each video shot. Thirdly, a Latent Dirichlet Allocation (LDA) based recognition model is trained to recognize the scene within each individual video scene clip. The correlations between video clips are considered to enhance the recognition performance. When our proposed approach is implemented to recognize the scene in realistic movies, the experimental results shows that it can achieve satisfactory performance.  相似文献   

2.
Progress in scene understanding requires reasoning about the rich and diverse visual environments that make up our daily experience. To this end, we propose the Scene Understanding database, a nearly exhaustive collection of scenes categorized at the same level of specificity as human discourse. The database contains 908 distinct scene categories and 131,072 images. Given this data with both scene and object labels available, we perform in-depth analysis of co-occurrence statistics and the contextual relationship. To better understand this large scale taxonomy of scene categories, we perform two human experiments: we quantify human scene recognition accuracy, and we measure how typical each image is of its assigned scene category. Next, we perform computational experiments: scene recognition with global image features, indoor versus outdoor classification, and “scene detection,” in which we relax the assumption that one image depicts only one scene category. Finally, we relate human experiments to machine performance and explore the relationship between human and machine recognition errors and the relationship between image “typicality” and machine recognition accuracy.  相似文献   

3.
There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on data sets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new data set with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typical scenes in a data set.  相似文献   

4.
In the field of visual recognition such as scene categorization, representing an image based on the local feature (e.g., the bag-of-visual-word (BOVW) model and the bag-of-contextual-visual-word (BOCVW) model) has become popular and one of the most successful methods. In this paper, we propose a method that uses localized maximum-margin learning to fuse different types of features during the BOCVW modeling for eventual scene classification. The proposed method fuses multiple features at the stage when the best contextual visual word is selected to represent a local region (hard assignment) or the probabilities of the candidate contextual visual words used to represent the unknown region are estimated (soft assignment). The merits of the proposed method are that (1) errors caused by the ambiguity of single feature when assigning local regions to the contextual visual words can be corrected or the probabilities of the candidate contextual visual words used to represent the region can be estimated more accurately; and that (2) it offers a more flexible way in fusing these features through determining the similarity-metric locally by localized maximum-margin learning. The proposed method has been evaluated experimentally and the results indicate its effectiveness.  相似文献   

5.
谢飞  龚声蓉  刘纯平  季怡 《计算机科学》2015,42(11):293-298
基于视觉单词的人物行为识别由于在特征中加入了中层语义信息,因此提高了识别的准确性。然而,视觉单词提取时由于前景和背景存在相互干扰,使得视觉单词的表达能力受到影响。提出一种结合局部和全局特征的视觉单词生成方法。该方法首先用显著图检测出前景人物区域,采用提出的动态阈值矩阵对人物区域用不同的阈值来分别检测时空兴趣点,并计算周围的3D-SIFT特征来描述局部信息。在此基础上,采用光流直方图特征描述行为的全局运动信息。通过谱聚类将局部和全局特征融合成视觉单词。实验证明,相对于流行的局部特征视觉单词生成方法,所提出的方法在简单背景的KTH数据集上的识别率比平均识别率提高了6.4%,在复杂背景的UCF数据集上的识别率比平均识别率提高了6.5%。  相似文献   

6.
综合结构和纹理特征的场景识别   总被引:1,自引:0,他引:1  
当前在计算机视觉领域,场景识别尽管取得了较大进展,但其对于计算机视觉而言,仍然是一个极具挑战的问题.此前的场景识别方法,有些需要预先手动地对训练图像进行语义标注,并且大部分场景识别方法均基于"特征袋"模型,需要对提取的大量特征进行聚类,计算量和内存消耗均很大,且初始聚类中心及聚类数目的选择对识别效果有较大影响.为此本文提出一种不基于"特征袋"模型的无监督场景识别方法.先通过亚采样构建多幅不同分辨率的图像,在多级分辨率图像上,分别提取结构和纹理特征,用本文提出的梯度方向直方图描述方法表示图像的结构特征,用Gabor滤波器组和Schmid滤波集对图像的滤波响应表示图像的纹理特征,并将结构和纹理特征作为相互独立的两个特征通道,最后综合这两个特征通道,通过SVM分类,实现对场景的自动识别.分别在Oliva,Li Fei-Fei和Lazebnik等的8类、13类和15类场景图像库上进行测试实验,实验结果表明,梯度方向直方图描述方法比经典的SIFT描述方法,有着更好的场景识别性能;综合结构和纹理特征的场景识别方法,在通用的三个场景图像库上取得了很好的识别效果.  相似文献   

7.
大场景下的激光(Lidar)点云数据分类是一个复杂的问题任务,有时需要多种技术的结合,以获得所需的结果。我们提出了一种基于多维特征矩阵和PointNet的深度神经网络模型。实现了大场景点云下的激光Lidar点云分类工作。文章先将提取点云的三维和二维邻域特征,再将特征进行融合转换为特征矩阵,将局部特征矩阵输入到PointNet框架中提取的全局特征。最后返回每个类别的分数并输出点云分类结果。我们使用公开的Oakland 3D数据集,测试了我们的大场景点云分类框架。实验结果表明,我们的总体分类准确率为98.0%,与其他的点云分类框架相比达到了一个更好的分类效果。  相似文献   

8.
The point cloud is a common 3D representation widely applied in CAX engineering due to its simple data representation and rich semantic information. However, discrete and unordered 3D data structures make it difficult for point clouds to understand semantic information and make them unsuitable for applying standard operators. In this paper, to enhance machine perception of 3D semantic information, we propose a novel approach that can not only directly process point cloud data by a novel convolution-like operator but also dynamically pay attention to local semantic information. First, we design a novel dynamic local self-attention mechanism that can dynamically and flexibly focus on top-level information of the receptive field to learn and understand subtle features. Second, we propose a dynamic self-attention learning block, which adopts the proposed dynamic local self-attention learning convolution operation to directly deal with disordered and irregular point clouds to learn global and local point features while dynamically learning the important local semantic information. Third, the proposed operation can be compatibly applied as an independent component in popular architectures to improve the perception of local semantic information. Numerous experiments demonstrate the advantage of our method for point cloud tasks on datasets from both CAD data and scan data of complex real-world scenes.  相似文献   

9.
Mobile robotics has achieved notable progress, however, to increase the complexity of the tasks that mobile robots can perform in natural environments, we need to provide them with a greater semantic understanding of their surrounding. In particular, identifying indoor scenes, such as an Office or a Kitchen, is a highly valuable perceptual ability for an indoor mobile robot, and in this paper we propose a new technique to achieve this goal. As a distinguishing feature, we use common objects, such as Doors or furniture, as a key intermediate representation to recognize indoor scenes. We frame our method as a generative probabilistic hierarchical model, where we use object category classifiers to associate low-level visual features to objects, and contextual relations to associate objects to scenes. The inherent semantic interpretation of common objects allows us to use rich sources of online data to populate the probabilistic terms of our model. In contrast to alternative computer vision based methods, we boost performance by exploiting the embedded and dynamic nature of a mobile robot. In particular, we increase detection accuracy and efficiency by using a 3D range sensor that allows us to implement a focus of attention mechanism based on geometric and structural information. Furthermore, we use concepts from information theory to propose an adaptive scheme that limits computational load by selectively guiding the search for informative objects. The operation of this scheme is facilitated by the dynamic nature of a mobile robot that is constantly changing its field of view. We test our approach using real data captured by a mobile robot navigating in Office and home environments. Our results indicate that the proposed approach outperforms several state-of-the-art techniques for scene recognition.  相似文献   

10.
目的 由于室内点云场景中物体的密集性、复杂性以及多遮挡等带来的数据不完整和多噪声问题,极大地限制了室内点云场景的重建工作,无法保证场景重建的准确度。为了更好地从无序点云中恢复出完整的场景,提出了一种基于语义分割的室内场景重建方法。方法 通过体素滤波对原始数据进行下采样,计算场景三维尺度不变特征变换(3D scale-invariant feature transform,3D SIFT)特征点,融合下采样结果与场景特征点从而获得优化的场景下采样结果;利用随机抽样一致算法(random sample consensus,RANSAC)对融合采样后的场景提取平面特征,将该特征输入PointNet网络中进行训练,确保共面的点具有相同的局部特征,从而得到每个点在数据集中各个类别的置信度,在此基础上,提出了一种基于投影的区域生长优化方法,聚合语义分割结果中同一物体的点,获得更精细的分割结果;将场景物体的分割结果划分为内环境元素或外环境元素,分别采用模型匹配的方法、平面拟合的方法从而实现场景的重建。结果 在S3DIS (Stanford large-scale 3D indoor space dataset)数据集上进行实验,本文融合采样算法对后续方法的效率和效果有着不同程度的提高,采样后平面提取算法的运行时间仅为采样前的15%;而语义分割方法在全局准确率(overall accuracy,OA)和平均交并比(mean intersection over union,mIoU)两个方面比PointNet网络分别提高了2.3%和4.2%。结论 本文方法能够在保留关键点的同时提高计算效率,在分割准确率方面也有着明显提升,同时可以得到高质量的重建结果。  相似文献   

11.
This paper proposes a novel method based on Spectral Regression (SR) for efficient scene recognition. First, a new SR approach, called Extended Spectral Regression (ESR), is proposed to perform manifold learning on a huge number of data samples. Then, an efficient Bag-of-Words (BOW) based method is developed which employs ESR to encapsulate local visual features with their semantic, spatial, scale, and orientation information for scene recognition. In many applications, such as image classification and multimedia analysis, there are a huge number of low-level feature samples in a training set. It prohibits direct application of SR to perform manifold learning on such dataset. In ESR, we first group the samples into tiny clusters, and then devise an approach to reduce the size of the similarity matrix for graph learning. In this way, the subspace learning on graph Laplacian for a vast dataset is computationally feasible on a personal computer. In the ESR-based scene recognition, we first propose an enhanced low-level feature representation which combines the scale, orientation, spatial position, and local appearance of a local feature. Then, ESR is applied to embed enhanced low-level image features. The ESR-based feature embedding not only generates a low dimension feature representation but also integrates various aspects of low-level features into the compact representation. The bag-of-words is then generated from the embedded features for image classification. The comparative experiments on open benchmark datasets for scene recognition demonstrate that the proposed method outperforms baseline approaches. It is suitable for real-time applications on mobile platforms, e.g. tablets and smart phones.  相似文献   

12.
Most successful approaches on scene recognition tend to efficiently combine global image features with spatial local appearance and shape cues. On the other hand, less attention has been devoted for studying spatial texture features within scenes. Our method is based on the insight that scenes can be seen as a composition of micro-texture patterns. This paper analyzes the role of texture along with its spatial layout for scene recognition. However, one main drawback of the resulting spatial representation is its huge dimensionality. Hence, we propose a technique that addresses this problem by presenting a compact Spatial Pyramid (SP) representation. The basis of our compact representation, namely, Compact Adaptive Spatial Pyramid (CASP) consists of a two-stages compression strategy. This strategy is based on the Agglomerative Information Bottleneck (AIB) theory for (i) compressing the least informative SP features, and, (ii) automatically learning the most appropriate shape for each category. Our method exceeds the state-of-the-art results on several challenging scene recognition data sets.  相似文献   

13.
Acquired 3D point clouds make possible quick modeling of virtual scenes from the real world. With modern 3D capture pipelines, each point sample often comes with additional attributes such as normal vector and color response. Although rendering and processing such data has been extensively studied, little attention has been devoted using the light transport hidden in the recorded per‐sample color response to relight virtual objects in visual effects (VFX) look‐dev or augmented reality (AR) scenarios. Typically, standard relighting environment exploits global environment maps together with a collection of local light probes to reflect the light mood of the real scene on the virtual object. We propose instead a unified spatial approximation of the radiance and visibility relationships present in the scene, in the form of a colored point cloud. To do so, our method relies on two core components: High Dynamic Range (HDR) expansion and real‐time Point‐Based Global Illumination (PBGI). First, since an acquired color point cloud typically comes in Low Dynamic Range (LDR) format, we boost it using a single HDR photo exemplar of the captured scene that can cover part of it. We perform this expansion efficiently by first expanding the dynamic range of a set of renderings of the point cloud and then projecting these renderings on the original cloud. At this stage, we propagate the expansion to the regions not covered by the renderings or with low‐quality dynamic range by solving a Poisson system. Then, at rendering time, we use the resulting HDR point cloud to relight virtual objects, providing a diffuse model of the indirect illumination propagated by the environment. To do so, we design a PBGI algorithm that exploits the GPU's geometry shader stage as well as a new mipmapping operator, tailored for G‐buffers, to achieve real‐time performances. As a result, our method can effectively relight virtual objects exhibiting diffuse and glossy physically‐based materials in real time. Furthermore, it accounts for the spatial embedding of the object within the 3D environment. We evaluate our approach on manufactured scenes to assess the error introduced at every step from the perfect ground truth. We also report experiments with real captured data, covering a range of capture technologies, from active scanning to multiview stereo reconstruction.  相似文献   

14.
叶利华  王磊  赵利平 《计算机应用》2017,37(7):2008-2013
针对低小慢无人机野外飞行场景复杂自主降落场景识别问题,提出了一种融合局部金字塔特征和卷积神经网络学习特征的野外场景识别算法。首先,将场景分为4×4和8×8块的小场景,使用方向梯度直方图(HOG)算法提取所有块的场景特征,所有特征首尾连接得到具有空间金字塔特性的特征向量。其次,设计一个针对场景分类的深度卷积神经网络,采用调优训练方法得到卷积神经网络模型,并提取深度网络学习特征。最后,连接两个特征得到最终场景特征,并使用支持向量机(SVM)分类器进行分类。所提算法在Sports-8、Scene-15、Indoor-67以及自建数据集上较传统手工特征方法的识别准确率提高了4个百分点以上。实验结果表明,所提算法能有效提升降落场景识别准确率。  相似文献   

15.
赵新灿  展鹏磊  吴飞  张大伟 《机器人》2018,40(4):534-539
为了增强遥操作系统的临场感,使操作者更好地融入远程工作环境,提出了基于领域知识的3维动态场景目标识别和配准算法.首先,通过离线解析和切分虚拟样机CAD(计算机辅助设计)模型构建包含多视角点云特征和装配约束的领域知识库.其次,通过动态采集场景点云并计算CVFH(clustered viewpoint feature histogram)和FPFH(fast point feature histogram)特征,利用领域知识库中多视角点云特征和点云CVFH特征进行比对实现目标识别,并由FPFH特征经过两步配准实现目标姿态确定.最后,利用装配约束知识库实现以遥操作机器人工作状态变化为驱动的指导信息的精确配准和实时推送.实验结果表明,该系统不但可以有效地指导远程机器人完成维修操作,还可以提高遥操作的精度和效率.  相似文献   

16.
目的 局部特征描述子在3维目标识别等任务中能够有效地克服噪声、不同点云分辨率、局部遮挡、点云散乱分布等因素的干扰,但是已有3维描述子难以在性能和效率之间取得平衡,为此提出LoVPE(局部多视点投影视图相关编码)特征描述子用于复杂场景中的3维目标识别。方法 首先构建局部参考坐标系,将世界坐标系下的局部表面变换至关键点局部参考坐标系下的局部表面;然后绕局部参考坐标系各坐标轴旋转K个角度获得多视点局部表面,将局部表面内的点投影至局部参考系各坐标平面内,投影平面分成N×N块,统计每块内投影点的散布信息生成特征描述向量;最后将各视点特征描述向量进行两两视图对相关编码得到低维度特征描述向量,采用ZCA(零项分量分析)白化降低特征描述向量各维间相关性得到LoVPE描述子。结果 在公用数据集上进行不同描述子对噪声、不同分辨率、遮挡及杂波等干扰鲁棒性的特征匹配实验,实验结果表明,提出的描述子特征匹配率与现有最佳描述子处于同等水平,但保持了较低的特征维度和较高的计算效率,维度降低约1半、特征构建及特征匹配时间缩短为现有最佳描述子的1/4。结论 提出一种新的3维局部特征描述子,具有强描述能力,对噪声、不同网格分辨率、遮挡及杂波等具有强鲁棒性,存储消耗较少且计算效率较高,该方法适用于模型点云及真实点云数据,可用于复杂场景中的3维目标识别。  相似文献   

17.
18.
19.
Although modern path tracers are successfully being applied to many rendering applications, there is considerable interest to push them towards ever‐decreasing sampling rates. As the sampling rate is substantially reduced, however, even Monte Carlo (MC) denoisers–which have been very successful at removing large amounts of noise–typically do not produce acceptable final results. As an orthogonal approach to this, we believe that good importance sampling of paths is critical for producing better‐converged, path‐traced images at low sample counts that can then, for example, be more effectively denoised. However, most recent importance‐sampling techniques for guiding path tracing (an area known as “path guiding”) involve expensive online (per‐scene) training and offer benefits only at high sample counts. In this paper, we propose an offline, scene‐independent deep‐learning approach that can importance sample first‐bounce light paths for general scenes without the need of the costly online training, and can start guiding path sampling with as little as 1 sample per pixel. Instead of learning to “overfit” to the sampling distribution of a specific scene like most previous work, our data‐driven approach is trained a priori on a set of training scenes on how to use a local neighborhood of samples with additional feature information to reconstruct the full incident radiance at a point in the scene, which enables first‐bounce importance sampling for new test scenes. Our solution is easy to integrate into existing rendering pipelines without the need for retraining, as we demonstrate by incorporating it into both the Blender/Cycles and Mitsuba path tracers. Finally, we show how our offline, deep importance sampler (ODIS) increases convergence at low sample counts and improves the results of an off‐the‐shelf denoiser relative to other state‐of‐the‐art sampling techniques.  相似文献   

20.
Understanding how images of objects and scenes behave in response to specific egomotions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images. We propose a new “embodied” visual learning paradigm, exploiting proprioceptive motor signals to train visual representations from egocentric video with no manual supervision. Specifically, we enforce that our learned features exhibit equivariance i.e., they respond predictably to transformations associated with distinct egomotions. With three datasets, we show that our unsupervised feature learning approach significantly outperforms previous approaches on visual recognition and next-best-view prediction tasks. In the most challenging test, we show that features learned from video captured on an autonomous driving platform improve large-scale scene recognition in static images from a disjoint domain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号