首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Scenes are closely related to the kinds of objects that may appear in them. Objects are widely used as features for scene categorization. On the other hand, landscapes with more spatial structures of scenes are representative of scene categories. In this paper, we propose a deep learning based algorithm for scene categorization. Specifically, we design two-pathway convolutional neural networks for exploiting both object attributes and spatial structures of scene images. Different from conventional deep learning methods, which usually focus on only one aspect of images, each pathway of the proposed architecture is tuned to capture a different aspect of images. As a result, complementary information of image contents can be utilized effectively. In addition, to deal with the feature redundancy problem caused by combining features from different sources, we adopt the ? 2,1 norm during classifier training to control selectivity of each type of features. Extensive experiments are conducted to evaluate the proposed method. Obtained results demonstrate that the proposed approach achieves superior performances over conventional methods. Moreover, the proposed method is a general framework, which can be easily extended to more pathways and applied to solve other problems.  相似文献   

2.
In this paper, we propose a novel scene categorization method based on contextual visual words. In the proposed method, we extend the traditional ‘bags of visual words’ model by introducing contextual information from the coarser scale and neighborhood regions to the local region of interest based on unsupervised learning. The introduced contextual information provides useful information or cue about the region of interest, which can reduce the ambiguity when employing visual words to represent the local regions. The improved visual words representation of the scene image is capable of enhancing the categorization performance. The proposed method is evaluated over three scene classification datasets, with 8, 13 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results show that the proposed method achieves 90.30%, 87.63% and 85.16% recognition success for Dataset 1, 2 and 3, respectively, which significantly outperforms the methods based on the visual words that only represent the local information in the statistical manner. We also compared the proposed method with three representative scene categorization methods. The result confirms the superiority of the proposed method.  相似文献   

3.
4.
A host of remote-sensing and mapping applications require both high spatial and high spectral resolutions. Availability of high spatial and spectral details at different resolutions from a suite of satellite sensors has necessitated the development of effective image fusion techniques that can effectively combine the information available from different sensors and take advantage of their varied capabilities. A common problem observed in the case of multi-sensor multi-temporal data fusion is spectral distortion of the fused images. Performance of a technique also varies with variation in scene characteristics. In this article, two sets of multi-temporal CARTOSAT-1 and Indian Remote Sensing satellite (IRS-P6) Linear Imaging and Self Scanning sensor (LISS-IV) image sub-scenes, with different urban landscape characteristics, are fused with an aim to evaluate the performance of five image fusion algorithms – high-pass filtering (HPF), Gram–Schmidt (GS), Ehlers, PANSHARP and colour-normalized Brovey (CN-Brovey). The resultant fused data sets are compared qualitatively and quantitatively with respect to spectral fidelity. Spatial enhancement is assessed visually. The difference in the performance of techniques with variation in scene characteristics is also examined. For both scenes, GS, HPF and PANSHARP fusion techniques produced comparable results with high spectral quality and spatial enhancement. For these three methods, the variation in performance over different scenes was not very significant. The Ehlers method resulted in spatially degraded images with a more or less constant negative offset in data values in all bands of one scene and in the first two bands in the other. The CN-Brovey method produced excellent spatial enhancement but highly distorted radiometry for both sub-scenes.  相似文献   

5.
This paper proposes an efficient framework for scene categorization by combining generative model and discriminative model. A state-of-the-art approach for scene categorization is the Bag-of-Words (BoW) framework. However, there exist many categories in scenes. Generally when a new category is considered, the codebook in BoW framework needs to be re-generated, which will involve exhaustive computation. In view of this, this paper tries to address the issue by designing a new framework with good scalability. When an additional category is considered, much lower computational cost is needed while the resulting image signatures are still discriminative. The image signatures for training discriminative model are carefully designed based on the generative model. The soft relevance value of the extracted image signatures are estimated by image signature space modeling and are incorporated in Fuzzy Support Vector Machine (FSVM). The effectiveness of the proposed method is validated on UIUC Scene-15 dataset and NTU-25 dataset, and it is shown to outperform other state-of-the-art approaches for scene categorization.  相似文献   

6.
Recently, various bag-of-features (BoF) methods show their good resistance to within-class variations and occlusions in object categorization. In this paper, we present a novel approach for multi-object categorization within the BoF framework. The approach addresses two issues in BoF related methods simultaneously: how to avoid scene modeling and how to predict labels of an image when multiple categories of objects are co-existing. We employ a biased sampling strategy which combines the bottom-up, biologically inspired saliency information and loose, top-down class prior information for object class modeling. Then this biased sampling component is further integrated with a multi-instance multi-label leaning and classification algorithm. With the proposed biased sampling strategy, we can perform multi-object categorization within an image without semantic segmentation. The experimental results on PASCAL VOC2007 and SUN09 show that the proposed method significantly improves the discriminative ability of BoF methods and achieves good performance in multi-object categorization tasks.  相似文献   

7.
新视角图像生成任务指通过多幅参考图像,生成场景新视角图像。然而多物体场景存在物体间遮挡,物体信息获取不全,导致生成的新视角场景图像存在伪影、错位问题。为解决该问题,提出一种借助场景布局图指导的新视角图像生成网络,并标注了全新的多物体场景数据集(multi-objects novel view Synthesis,MONVS)。首先,将场景的多个布局图信息和对应的相机位姿信息输入到布局图预测模块,计算出新视角下的场景布局图信息;然后,利用场景中标注的物体边界框信息构建不同物体的对象集合,借助像素预测模块生成新视角场景下的各个物体信息;最后,将得到的新视角布局图和各个物体信息输入到场景生成器中构建新视角下的场景图像。在MONVS和ShapeNet cars数据集上与最新的几种方法进行了比较,实验数据和可视化结果表明,在多物体场景的新视角图像生成中,所提方法在两个数据集上都有较好的效果表现,有效地解决了生成图像中存在伪影和多物体在场景中位置信息不准确的问题。  相似文献   

8.
近年来,三维虚拟场景的规模和复杂程度不断提高,受到硬件的限制,一些应用 中的超大规模场景(如建筑群,城市等)很难在单机上进行渲染或满足可交互的需求。针对该问 题,提出了一种分布式渲染框架,将大规模场景在内容上进行划分,得到单一节点可渲染的子 场景。这些子场景被分布到集群中不同的渲染节点进行处理,其渲染结果根据深度信息进行合 成得到整个场景的最终渲染结果。为了降低交互响应时间,需对子场景的渲染结果进行压缩传 输。实验充分验证了提出的分布式渲染系统能够高效处理超大规模场景的渲染和交互,并且具 有良好的可扩展性,能够满足很多领域中对大规模场景交互式渲染的需求。  相似文献   

9.
Auditory scenes are temporal audio segments with coherent semantic content. Automatically classifying and grouping auditory scenes with similar semantics into categories is beneficial for many multimedia applications, such as semantic event detection and indexing. For such semantic categorization, auditory scenes are first characterized with either low-level acoustic features or some mid-level representations like audio effects, and then supervised classifiers or unsupervised clustering algorithms are employed to group scene segments into various semantic categories. In this paper, we focus on the problem of automatically categorizing audio scenes in unsupervised manner. To achieve more reasonable clustering results, we introduce the co-clustering scheme to exploit potential grouping trends among different dimensions of feature spaces (either low-level or mid-level feature spaces), and provide more accurate similarity measure for comparing auditory scenes. Moreover, we also extend the co-clustering scheme with a strategy based on the Bayesian information criterion (BIC) to automatically estimate the numbers of clusters. Evaluation performed on 272 auditory scenes extracted from 12-h audio data shows very encouraging categorization results. Co-clustering achieved a better performance compared to some traditional one-way clustering algorithms, both based on the low-level acoustic features and on the mid-level audio effect representations. Finally, we present our vision regarding the applicability of this approach on general multimedia data, and also show some preliminary results on content-based image clustering.  相似文献   

10.
In the field of visual recognition such as scene categorization, representing an image based on the local feature (e.g., the bag-of-visual-word (BOVW) model and the bag-of-contextual-visual-word (BOCVW) model) has become popular and one of the most successful methods. In this paper, we propose a method that uses localized maximum-margin learning to fuse different types of features during the BOCVW modeling for eventual scene classification. The proposed method fuses multiple features at the stage when the best contextual visual word is selected to represent a local region (hard assignment) or the probabilities of the candidate contextual visual words used to represent the unknown region are estimated (soft assignment). The merits of the proposed method are that (1) errors caused by the ambiguity of single feature when assigning local regions to the contextual visual words can be corrected or the probabilities of the candidate contextual visual words used to represent the region can be estimated more accurately; and that (2) it offers a more flexible way in fusing these features through determining the similarity-metric locally by localized maximum-margin learning. The proposed method has been evaluated experimentally and the results indicate its effectiveness.  相似文献   

11.
Given a surveillance video of a moving person, we present a novel method of estimating layout of a cluttered indoor scene. We propose an idea that trajectories of a moving person can be used to generate features to segment an indoor scene into different areas of interest. We assume a static uncalibrated camera. Using pixel-level color and perspective cues of the scene, each pixel is assigned to a particular class either a sitting place, the ground floor, or the static background areas like walls and ceiling. The pixel-level cues are locally integrated along global topological order of classes, such as sitting objects and background areas are above ground floor into a conditional random field by an ordering constraint. The proposed method yields very accurate segmentation results on challenging real world scenes. We focus on videos with people walking in the scene and show the effectiveness of our approach through quantitative and qualitative results. The proposed estimation method shows better estimation results as compared to the state of the art scene layout estimation methods. We are able to correctly segment 90.3% of background, 89.4% of sitting areas and 74.7% of the ground floor.  相似文献   

12.
In this article, a novel Scan mode synthetic aperture radar (SAR) imaging method for maritime surveillance is presented. Conventional Scan SAR is generally operated with severe azimuth resolution loss in order to cover a large area. The proposed imaging method changes the way Scan SAR illuminates sub-scenes and presents a new radar illuminating strategy based on ships’ spatial distribution in each sub-scene. To gain ships’ spatial distribution, a scene sensing algorithm based on radar range profiles together with a peak-seeking and clustering algorithm is introduced. After that, a Markov transfer-probability matrix is generated to make sure that radar illuminates each sub-scene randomly under the probability we calculated before. Finally, an imaging algorithm within the Lp (0 < p ≤ 1) regularization framework is utilized to reconstruct each sub-scene; the regularization problem is solved by an improved iterative thresholding algorithm. The whole wide swath image is joined by putting all the sub-scenes together. Experimental results support that the proposed imaging method can perform high-resolution wide swath SAR imaging effectively and efficiently without reducing the image resolution.  相似文献   

13.
In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.  相似文献   

14.
15.
16.
目的 实现良好的用户体验是3维游戏场景设计的重要目的之一。目前3维场景设计通常多由美术设计师进行创作而非建筑设计及景观规划领域人员,场景空间组织方式没有充分考虑到用户体验,同时由于大型3维场景的制作周期过长,设计效率普遍较低。上述现象直接导致游戏用户在3维游戏场景中交互的体验感较差,但是该问题一直以来没有较好的方法予以解决,也没有引起相关领域研究者的重视。本文提出一种基于交互式遗传的多手段协同操作方法,其目的为实现更加高效、合理的批量生成大型场景单元,并改善空间组织方式,以获得良好的游戏用户体验感。方法 本文方法主要通过特征聚类、蚁群算法空间布局优化及交互式遗传算法评价的方式来解决交互性差的问题。通过自学习方式进行场景建筑布局及立面层次进行特征聚类,并通过基于包围盒的蚁群优化算法进行场景组织的布局优化,最后结合交互式遗传算法引入用户评价来获得特征适应值评估从而得到新扩展的场景,该方法实现了重构场景的良好用户体验性及空间组织方式的合理性。结果 对小型场景进行扩展和对单体建筑的布局进行重构,该方法所得到的新的场景具有良好的空间组织结构,基于用户评价通过交互式遗传算法以用户喜好的评价驱动进化,扩展后的场景反映了真实用户的主观感受并取得较为令人满意的效果,提高了用户体验的友好性。结论 提出一种基于交互式遗传算法的场景重构方法,通过选择特定场景样本进行算法的实现,结果表明该方法具有可行性,并实现了较好的效果。本文方法对于游戏场景设计、文物古迹复原及系统仿真领域具有现实意义和研究价值。  相似文献   

17.
空间PACT是一种用来进行场景实例和类别识别的新型特征表示,它在PACT(Census变换直方图的主成分分析)的基础上结合最新的场景语义识别框架:空间金字塔,使之相比现存算法具有更高的识别率。针对场景语义识别的强度和效率,提出一种新型的识别方法,在空间PACT中引入潜在阶梯边缘模板,在几乎不影响识别率的基础上改进算法效率。同时通过引入颜色特征信息,获得具有更强语义识别能力的特征表示。实验结果表明,该算法具有计算效率高,识别率高,强语义识别的特点。  相似文献   

18.
张康  安泊舟  李捷  袁夏  赵春霞 《软件学报》2023,34(1):444-462
近年来随着计算机视觉领域的不断发展,三维场景的语义分割和形状补全受到学术界和工业界的广泛关注.其中,语义场景补全是这一领域的新兴研究,该研究以同时预测三维场景的空间布局和语义标签为目标,在近几年得到快速发展.对近些年该领域提出的基于RGB-D图像的方法进行了分类和总结.根据有无使用深度学习将语义场景补全方法划分为传统方法和基于深度学习的方法两大类.其中,对于基于深度学习的方法,根据输入数据类型将其划分为基于单一深度图像的方法和基于彩色图像联合深度图像的方法.在对已有方法分类和概述的基础上,对语义场景补全任务所使用的相关数据集进行了整理,并分析了现有方法的实验结果.最后,总结了该领域面临的挑战和发展前景.  相似文献   

19.
Recognizing scene information in images or has attracted much attention in computer vision or videos, such as locating the objects and answering "Where am research field. Many existing scene recognition methods focus on static images, and cannot achieve satisfactory results on videos which contain more complex scenes features than images. In this paper, we propose a robust movie scene recognition approach based on panoramic frame and representative feature patch. More specifically, the movie is first efficiently segmented into video shots and scenes. Secondly, we introduce a novel key-frame extraction method using panoramic frame and also a local feature extraction process is applied to get the representative feature patches (RFPs) in each video shot. Thirdly, a Latent Dirichlet Allocation (LDA) based recognition model is trained to recognize the scene within each individual video scene clip. The correlations between video clips are considered to enhance the recognition performance. When our proposed approach is implemented to recognize the scene in realistic movies, the experimental results shows that it can achieve satisfactory performance.  相似文献   

20.
综合结构和纹理特征的场景识别   总被引:1,自引:0,他引:1  
当前在计算机视觉领域,场景识别尽管取得了较大进展,但其对于计算机视觉而言,仍然是一个极具挑战的问题.此前的场景识别方法,有些需要预先手动地对训练图像进行语义标注,并且大部分场景识别方法均基于"特征袋"模型,需要对提取的大量特征进行聚类,计算量和内存消耗均很大,且初始聚类中心及聚类数目的选择对识别效果有较大影响.为此本文提出一种不基于"特征袋"模型的无监督场景识别方法.先通过亚采样构建多幅不同分辨率的图像,在多级分辨率图像上,分别提取结构和纹理特征,用本文提出的梯度方向直方图描述方法表示图像的结构特征,用Gabor滤波器组和Schmid滤波集对图像的滤波响应表示图像的纹理特征,并将结构和纹理特征作为相互独立的两个特征通道,最后综合这两个特征通道,通过SVM分类,实现对场景的自动识别.分别在Oliva,Li Fei-Fei和Lazebnik等的8类、13类和15类场景图像库上进行测试实验,实验结果表明,梯度方向直方图描述方法比经典的SIFT描述方法,有着更好的场景识别性能;综合结构和纹理特征的场景识别方法,在通用的三个场景图像库上取得了很好的识别效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号