首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
铁路检测、监测领域产生海量的图像数据,基于图像场景进行分类对图像后续分析、管理具有重要价值.本文提出一种结合深度卷积神经神经网络DCNN (Deep Convolutional Neural Networks)与梯度类激活映射Grad-CAM (Grad Class Activation Mapping)的可视化场景分类模型,DCNN在铁路场景分类图像数据集进行迁移学习,实现特征提取,Grad-CAM根据梯度全局平均计算权重实现对类别的加权热力图及激活分数计算,提升分类模型可解释性.实验中对比了不同的DCNN网络结构对铁路图像场景分类任务性能影响,对场景分类模型实现可视化解释,基于可视化模型提出了通过降低数据集内部偏差提升模型分类能力的优化流程,验证了深度学习技术对于图像场景分类任务的有效性.  相似文献   

2.
在室内场景分类问题中,由于场景本身结构的复杂性和多样性,存在各种干扰因素,影响分类的准确性.针对上述问题,文中提出基于视觉敏感区域信息增强的室内场景分类算法,通过融合基于视觉敏感区域信息增强的局部特征与全局特征,构成多尺度空间-频率融合特征,实现对室内场景的正确分类.在3个标准测试集上的实验表明,文中算法对多个不同场景分类数据集均有较好的分类结果,适用性较强.  相似文献   

3.
4.
声学场景分类技术可以通过在公共区域中录制的音频分析出它的录制环境, 在日常生活中发挥着重要的作用. 与传统分类问题类与类之间没有关系不同, 声学场景分类的类别间存在着层次结构关系(父类与子类), 如机场和购物中心的父类为室内. 而现有的方法在设计时并未考虑声学场景分类任务的这一特性, 忽略了父类和子类间的依赖关系. 因此, 本文利用声学场景类别间的层次结构关系, 提出了一种基于层次信息融合的声学场景分类方法. 该方法为父类和子类分别设计了单独的分类器, 在子类分类的过程中融合了父类的信息, 并设计了层次依赖损失来对预测的父类和子类不匹配的情况进行惩罚. 在TAU城市声学场景2020移动开发数据集上的实验结果表明, 基于层次信息融合的方法有效地提升了声学场景分类模型的性能, 分类准确率提升了1.1%.  相似文献   

5.
场景分类的目标是为各种视觉处理任务建立语义上下文,尤其是为目标识别。双目视觉系统现已广泛配备在智能机器人上,然而场景分类的任务大多只是使用单目图像。由于室内场景的复杂性,使用单目图像进行场景分类的性能很低。提出了一种基于双目视觉的室内场景分类方法,使用在一些特定区域里拟合出的若干平面的参数作为场景的特征。采用层级的分类方法,依据视差图,场景被分为开放场所类和封闭场所类,利用提出的场景特征和Gist特征对上述两类进行细分。为了验证提出的方法,建立了一个包含四种场景类别的图像数据集。实验结果表明提出的方法取得了较好的分类性能。  相似文献   

6.
遥感图像场景分类对土地资源管理具有重要意义,然而高分辨率遥感图像中地物分布复杂,图像中存在着与当前场景无关的冗余信息,会对场景的精确分类造成影响.对此,提出一种基于脉冲卷积神经网络(SCNN)稀疏表征的场景分类方法.从稀疏表征出发,利用脉冲神经元的稀疏脉冲输出特性,设计脉冲卷积神经网络,去除遥感图像中与场景无关的冗余信息,实现对图像的稀疏表征;提出基于脉冲输出交叉熵损失函数的反向传播算法,在该算法的基础上利用梯度下降训练脉冲卷积神经网络,优化网络参数,实现遥感图像场景分类;通过实验验证方法的有效性,将所提出方法应用于Google和UCM两个遥感图像数据集,并与传统的卷积神经网络(CNN)进行对比.实验结果表明,所提出方法可以对遥感图像进行稀疏表征,实现场景分类;相对于卷积神经网络,所提出方法在遥感图像场景分类任务上更具有优势.  相似文献   

7.
最近几十年来,航拍图片和视频在城市规划、沿海地区监视、军事任务等方面都得到了广泛的运用。因而了解航拍图片中所包含的内容,研究航拍视频所拍摄的场景类型就显得异常重要。目前流行的场景分类算法大多是针对自然场景的,很少有针对高分辨率航拍场景分类的算法。针对高分辨率航拍图片的场景分类给出了一种分层式算法。该算法首先用尺度不变特征转换(scale-invariant feature transform,SIFT)算法提取鲁棒的块局部特征,然后在视觉词袋的基础上,用经局限型波兹曼模型(restricted Boltzmarm machine,RBM)初始化的深层信念网络(deep belief network,DBN)来表示低层特征与高层视频特征之间的关系;同时深层信念网络也起到了分类器的作用。实验结果表明,该算法在处理高分辨率航拍图片场景分类问题时都要略好于目前主流算法。  相似文献   

8.
Prior research in scene classification has focused on mapping a set of classic low-level vision features to semantically meaningful categories using a classifier engine. In this paper, we propose improving the established paradigm by using a simplified low-level feature set to predict multiple semantic scene attributes that are integrated probabilistically to obtain a final indoor/outdoor scene classification. An initial indoor/outdoor prediction is obtained by classifying computationally efficient, low-dimensional color and wavelet texture features using support vector machines. Similar low-level features can also be used to explicitly predict the presence of semantic features including grass and sky. The semantic scene attributes are then integrated using a Bayesian network designed for improved indoor/outdoor scene classification.  相似文献   

9.
This paper presents a shift invariant scene classification method based on local autocorrelation of similarities with subspaces. Although conventional scene classification methods used bag-of-visual words for scene classification, superior accuracy of kernel principal component analysis (KPCA) of visual words to bag-of-visual words was reported. Here we also use KPCA of visual words to extract rich information for classification. In the original KPCA of visual words, all local parts mapped into subspace were integrated by summation to be robust to the order, the number, and the shift of local parts. This approach discarded the effective properties for scene classification such as the relation with neighboring regions. To use them, we use (normalized) local autocorrelation (LAC) feature of the similarities with subspaces (outputs of KPCA of visual words). The feature has both the relation with neighboring regions and the robustness to shift of objects in scenes. The proposed method is compared with conventional scene classification methods using the same database and protocol, and we demonstrate the effectiveness of the proposed method.  相似文献   

10.
A thousand words in a scene   总被引:2,自引:0,他引:2  
This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a textlike bag-of-visterms (BOV) representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and 3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multiclass scene classification tasks using a 9,500-image data set, that the BOV representation consistently outperforms classical scene classification approaches. In other data sets, we show that our approach competes with or outperforms other recent more complex methods. We also show that probabilistic latent semantic analysis (PLSA) generates a compact scene representation, is discriminative for accurate classification, and is more robust than the BOV representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections.  相似文献   

11.
In this paper, we present an effective and efficient framework for baseball video scene classification. The results of scene classification can be able to provide the ground for baseball video abstraction and high-level event extraction. In general, most conventional approaches are shot-based, which shot change detection and key-frame extraction are necessary prerequisite procedures. On the contrary, we propose a frame-based approach. In our scene classification framework, an efficient playfield segmentation technique is proposed, and then the reduced field maps are utilized as scene templates. Because the shot change detection and the key-frame extraction are not required in proposed method, the new framework is very simple and efficient. The experimental results have demonstrated that the effectiveness of our proposed framework for baseball videos scene classification, and it can be easily extended the template-based approach to other kinds of sports videos.  相似文献   

12.
在拥有海量数据和强大计算能力的人工智能时代,音频场景分类成为了场景理解的重要研究内容之一.针对音频场景分类建模困难和精确率不高的问题,本文提出一种基于卷积神经网络和极端梯度提升算法相结合的系统模型.首先,将预处理后的音频信号转换成梅尔声谱图,然后输入到卷积神经网络中完成抽象特征提取,最后利用极端梯度提升算法进行分类.为了评估模型的有效性,在城市音频场景UrbanSound8K数据集上进行分类性能测试,结果表明,该混合算法模型对音频场景的分类精确率可以达到89%,优于传统的神经网络算法模型,说明该混合模型对音频场景分类问题的有效性.  相似文献   

13.
In classic pattern recognition problems, classes are mutually exclusive by definition. Classification errors occur when the classes overlap in the feature space. We examine a different situation, occurring when the classes are, by definition, not mutually exclusive. Such problems arise in semantic scene and document classification and in medical diagnosis. We present a framework to handle such problems and apply it to the problem of semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels (e.g., a field scene with a mountain in the background). Such a problem poses challenges to the classic pattern recognition paradigm and demands a different treatment. We discuss approaches for training and testing in this scenario and introduce new metrics for evaluating individual examples, class recall and precision, and overall accuracy. Experiments show that our methods are suitable for scene classification; furthermore, our work appears to generalize to other classification problems of the same nature.  相似文献   

14.
王艳玲  张玘  罗诗途 《微计算机信息》2007,23(34):220-221,259
为解决装甲车车载图像跟踪系统中对场景进行分类的问题.提出了一种根据模糊模式识别原理对场景图像进行分类的方法。首先通过建立场景图像的特征向量来对场景进行描述,并根据所选特征参数设计了矢量隶属函数,然后通过计算隶属函数的值来对场景图像进行分类。实验证明,场景分类的结果对跟踪系统中目标提取方法的选取,有较好的指导作用。  相似文献   

15.
针对单模态特征条件下监控视频的场景识别精度与鲁棒性不高的问题,提出一种基于特征融合的半监督学习场景识别系统。系统模型首先通过卷积神经网络预训练模型分别提取视频帧与音频的场景描述特征;然后针对场景识别的特点进行视频级特征融合;接着通过深度信念网络进行无监督训练,并通过加入相对熵正则化项代价函数进行有监督调优;最后对模型分类效果进行了仿真分析。仿真结果表明,上述模型可有效提升监控场景分类精度,满足针对海量监控视频进行自动化结构化分析等公安业务需求。  相似文献   

16.
视频监控系统中小运动目标分类算法   总被引:1,自引:0,他引:1  
给出了视频监控中的一个小目标分类算法.首先,利用最大互信息获得一组可靠、独立且具辨认力的目标特征集.然后,用有向无环图的多类支持向量机进行分类.分类器的训练分为两步,首先使用场景无关的特征量训练得到基准分类器;然后再利用与场景相关和无关的特征量,进一步训练分类器,以便提高分类器的精度.实验结果证明该算法不仅能满足一定的分类精度,而且对新场景具有很好的适应能力.  相似文献   

17.
This article presents a deep learning-based Multi-scale Bag-of-Visual Words (MBVW) representation for scene classification of high-resolution aerial imagery. Specifically, the convolutional neural network (CNN) is introduced to learn and characterize the complex local spatial patterns at different scales. Then, the learnt deep features are exploited in a novel way to generate visual words. Moreover, the MBVW representation is constructed using the statistics of the visual word co-occurrences at different scales, which are derived from a training data set. We apply our technique to the challenging aerial scene data set: the University of California (UC) Merced data set consisting of 21 different aerial scene categories with sub-metre resolution. The experimental results show that the statistics of deeply described visual words can characterize the scene well and improve classification accuracy. It demonstrates that the proposed method is highly effective in the scene classification of high-resolution remote-sensing imagery.  相似文献   

18.
Joint scene classification and segmentation based on hidden Markov model   总被引:2,自引:0,他引:2  
Scene classification and segmentation are fundamental steps for efficient accessing, retrieving and browsing large amount of video data. We have developed a scene classification scheme using a Hidden Markov Model (HMM)-based classifier. By utilizing the temporal behaviors of different scene classes, HMM classifier can effectively classify presegmented clips into one of the predefined scene classes. In this paper, we describe three approaches for joint classification and segmentation based on HMM, which search for the most likely class transition path by using the dynamic programming technique. All these approaches utilize audio and visual information simultaneously. The first two approaches search optimal scene class transition based on the likelihood values computed for short video segment belonging to a particular class but with different search constrains. The third approach searches the optimal path in a super HMM by concatenating HMM's for different scene classes.  相似文献   

19.
Scene classification is a complicated task, because it includes much content and it is difficult to capture its distribution.A novel hierarchical serial scene classification framework is presented in this paper. At first, we use hierarchical feature to present both the global scene and local patches containing specific objects. Hierarchy is presented by space pyramid match, and our own codebook is built by two different types of words. Secondly, we train the visual words by generative and discriminative methods respectively based on space pyramid match, which could obtain the local patch labels efficiently. Then, we use a neural network to simulate the human decision process, which leads to the final scene category from local labels. Experiments show that the hierarchical serial scene image representation and classification model obtains superior results with respect to accuracy.  相似文献   

20.
刘宏  普杰信 《计算机工程》2011,37(21):182-184
基于场景全局语义特征描述符gist的自然场景分类方法在特征提取过程中计算量较大、识别精度较低。为此,提出一种改进的特征提取方法,将3尺度的gist特征与梯度方向直方图特征相结合对场景进行描述,并利用支持向量机实现分类。实验结果表明,改进的方法加快了特征提取速度,提高了分类正确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号