针对存在运动目标的动态环境,提出一种基于动态目标滤除思想的显著特征提取方法,据此实现基于局部图像特征的场景识别。首先简要介绍基于局部显著图像特征的场景识别方法,然后提出了带动态目标滤除思想的显著特征提取框架,并详细讨论了运动目标检测及提取的实现。实验结果和分析表明,该方法能够有效地过滤环境中的运动目标,提高场景识别的精度。  相似文献   

在机器人场景识别问题中,将连续场景的相关性通过基于隐马尔可夫模型的上下文模型进行描述.采用不同于传统的使用生成模型方法学习上下文场景识别模型的方式,首先引入稀疏贝叶斯学习机对上下文模型中图像特征的后验概率进行建模,然后通过贝叶斯原理将稀疏贝叶斯模型与隐马尔可夫模型结合,提出一种能够实现上下文场景识别模型的判别学习方法.在真实场景数据库上的实验结果表明,由该方法得到的上下文场景识别系统具有很好的场景识别能力和泛化特性.  相似文献   

为了满足在复杂环境下对人体动作识别的需求,提出了一种基于场景理解的双流网络识别结构。将场景信息作为辅助信息加入了人体动作识别网络结构中,改善识别网络的识别准确率。对场景识别网络与人体动作识别网络不同的融合方式进行研究,确定了网络最佳识别结构。通过分析不同参数对识别准确率的影响,最终确定了双流网络的所有结构参数,设计并训练完成了双流网络结构。通过在UCF50,UCF101等公开数据集上实验,分别取得了95%,93%的准确率,高于典型的识别网络结果。对其他一些典型识别网络加入同样场景信息进行了研究,其实验结果证明了此方法可以有效改善识别准确率。  相似文献   

We describe an automated scene modeling system that consists of two components operating in an interleaved fashion: an incremental modeler that builds solid models from range imagery; and a sensor planner that analyzes the resulting model and computes the next sensor position. This planning component is target-driven and computes sensor positions using model information about the imaged surfaces and the unexplored space in a scene. The method is shape-independent and uses a continuous-space representation that preserves the accuracy of sensed data. It is able to completely acquire a scene by repeatedly planning sensor positions, utilizing a partial model to determine volumes of visibility for contiguous areas of unexplored scene. These visibility volumes are combined with sensor placement constraints to compute sets of occlusion-free sensor positions that are guaranteed to improve the quality of the model. We show results for the acquisition of a scene that includes multiple, distinct objects with high occlusion  相似文献   

Three-dimensional (3D) surface models are vital for sustainable urban management studies, and there is a nearly unlimited range of possible applications. Along- or across-track pairs from the same set of sensor imagery may not always be available or economical for a certain study area. Therefore, a photogrammetric approach is proposed in which a digital surface model (DSM) is extracted from a stereo pair of satellite images, acquired by different sensors. The results demonstrate that a mixed-sensor approach may offer a sound alternative to the more established along-track pairs. However, one should consider several criteria when selecting a suitable stereo pair. Two cloud-free acquisitions are selected from the IKONOS and QuickBird image archives, characterized by sufficient overlap and optimal stereo constellation in terms of complementarity of the azimuth and elevation angles. A densely built-up area in Istanbul, Turkey, covering 151 km2 and with elevations ranging between sea level and approximately 160 m is presented as the test site. In addition to the general complexity of modelling the surface and elevation of an urban environment, multi-sensor image fusion has other particular difficulties. As the images are acquired from a different orbital pass, at a different date or instant and by a different sensor system, radiometric and geometric dissimilarities can occur, which may hamper the image-matching process. Strategies are presented for radiometric and geometric normalization of the multi-temporal and multi-sensor imagery and to deal with the differences in sensor characteristics. The accuracy of the generated surface model is assessed in comparison with 3D reference points, 3D rooftop vector data and surface models extracted from an along-track IKONOS stereo pair and an IKONOS triplet. When compared with a set of 35 reference GPS check points, the produced mixed-sensor model yields accuracies of 1.22, 1.53 and 2.96 m for the X, Y and Z coordinates, respectively, expressed in terms of root mean square errors (RMSEs). The results show that it is feasible to extract the DSM of a highly urbanized area from a mixed-sensor pair, with accuracies comparable with those observed from the DSM extracted from an along-track pair. Hence, the flexibility of reconstructing valuable elevation models is greatly increased by considering the mixed-sensor approach.  相似文献   

This paper concerns the combined modelling approach developed for evaluation of performability of large scale computer systems. It presents how to combine queuing networks and Generalised Stochastic Petri Nets in a single model, taking advantages of common abstract features. The main objective of this paper is to present a solution concept involving simulation with active objects that clearly reflects the logic of the simulated objects. Therefore an initial outline of the active objects and their properties, services and structures, which together serve to provide a combined modelling, is given.  相似文献   

We investigate deadlock detection for a modeling language based on active objects. To detect deadlock in an Actor-like subset of Creol we focus on the communication between the active objects. For the analysis of the model we translate a Creol configuration to a process algebra featuring the Linda coordination primitives. The translation preserves the deadlock behaviour of the model and allows us to apply a formalism introduced by Busi et al. (2000) [3] to detect global deadlocks in the process algebra.  相似文献   

We propose a topology adaptive active membrane that can segment images of multiple objects present in a scene. The parametric active membrane evolves in image space and splits into multiple membranes. The shape of the membrane can be constrained according to the shape of the objects present in a scene. We have shown that this active membrane model is also suitable for segmenting images of touching objects. The proposed segmentation technique unifies the membrane evolution and membrane splitting process. The methodology is tested for a number of real images from biomedical and machine vision domains that demonstrate the efficacy of the proposed scheme.  相似文献   

Wu and coworkers introduced an active basis model (ABM) for object recognition in 2010, in which the learning algorithm tends to sketch edges in textures. A grey-value local power spectrum was used to find a common template and deformable templates from a set of training images and to detect an object in new images by template matching. In this paper, we propose a color-based active basis model (color-based ABM for short), which incorporates color information. We adopt the framework of Wu et al. in the learning, detection, and classification of the color-based ABM. However, in order to improve the performance in object recognition, we modify the framework of Wu et al. by using different color-based features in both the learning and template matching algorithms. In this color-based ABM approach, two types of learning (i.e., supervised learning and unsupervised learning) are also explored. Moreover, the usefulness of the color-based ABM for practical object recognition in computer vision applications is demonstrated and its significant improvement in recognizing objects is reported.  相似文献   

目的 目前基于卷积神经网络(CNN)的文本检测方法对自然场景中小尺度文本的定位非常困难。但自然场景图像中文本目标与其他目标存在很强的关联性,即自然场景中的文本通常伴随特定物体如广告牌、路牌等同时出现,基于此本文提出了一种顾及目标关联的级联CNN自然场景文本检测方法。方法 首先利用CNN检测文本目标及包含文本的关联物体目标,得到文本候选框及包含文本的关联物体候选框;再扩大包含文本的关联物体候选框区域,并从原始图像中裁剪,然后以该裁剪图像作为CNN的输入再精确检测文本候选框;最后采用非极大值抑制方法融合上述两步生成的文本候选框,得到文本检测结果。结果 本文方法能够有效地检测小尺度文本,在ICDAR-2013数据集上召回率、准确率和F值分别为0.817、0.880和0.847。结论 本文方法顾及自然场景中文本目标与包含文本的物体目标的强关联性,提高了自然场景图像中小尺度文本检测的召回率。  相似文献   

In this paper, we present a simple, yet very efficient global image representation for scene recognition. A scene image is represented by a histogram of local transforms, which is an extended version of census transform histogram. The local transforms include local difference sign and magnitude information. Due to strong constraints between neighboring transformed values, global structure information can be captured through the histogram and spatial pyramid representation. Principal component analysis is used to reduce the dimensionality and get a compact feature vector. Experimental results on three widely used datasets demonstrate that the proposed method could achieve competitive performance in terms of speed and accuracy.  相似文献   

This paper proposes a novel method based on Spectral Regression (SR) for efficient scene recognition. First, a new SR approach, called Extended Spectral Regression (ESR), is proposed to perform manifold learning on a huge number of data samples. Then, an efficient Bag-of-Words (BOW) based method is developed which employs ESR to encapsulate local visual features with their semantic, spatial, scale, and orientation information for scene recognition. In many applications, such as image classification and multimedia analysis, there are a huge number of low-level feature samples in a training set. It prohibits direct application of SR to perform manifold learning on such dataset. In ESR, we first group the samples into tiny clusters, and then devise an approach to reduce the size of the similarity matrix for graph learning. In this way, the subspace learning on graph Laplacian for a vast dataset is computationally feasible on a personal computer. In the ESR-based scene recognition, we first propose an enhanced low-level feature representation which combines the scale, orientation, spatial position, and local appearance of a local feature. Then, ESR is applied to embed enhanced low-level image features. The ESR-based feature embedding not only generates a low dimension feature representation but also integrates various aspects of low-level features into the compact representation. The bag-of-words is then generated from the embedded features for image classification. The comparative experiments on open benchmark datasets for scene recognition demonstrate that the proposed method outperforms baseline approaches. It is suitable for real-time applications on mobile platforms, e.g. tablets and smart phones.  相似文献   

目的 场景文本识别(scene text recognition,STR)是计算机视觉中的一个热门研究领域。最近,基于多头自注意力机制的视觉Transformer (vision Transformer,ViT)模型被提出用于STR,以实现精度、速度和计算负载的平衡。然而,没有机制可以保证不同的自注意力头确实捕捉到多样性的特征,这将导致使用多头自注意力机制的ViT模型在多样性极强的场景文本识别任务中表现不佳。针对这个问题,提出了一种新颖的正交约束来显式增强多个自注意力头之间的多样性,提高多头自注意力对不同子空间信息的捕获能力,在保证速度和计算效率的同时进一步提高网络的精度。方法 首先提出了针对不同自注意力头上Q (query)、K (key)和V (value)特征的正交约束,这可以使不同的自注意力头能够关注到不同的查询子空间、键子空间、值子空间的特征,关注不同子空间的特征可以显式地使不同的自注意力头捕捉到更具差异的特征。还提出了针对不同自注意力头上QKV 特征线性变换权重的正交约束,这将为Q、K和V特征的学习提供正交权重空间的解决方案,并在网络训练中带来隐式正则化的效果。结果 实验在7个数据集上与基准方法进行比较,在规则数据集Street View Text (SVT)上精度提高了0.5%;在不规则数据集CUTE80 (CT)上精度提高了1.1%;在7个公共数据集上的整体精度提升了0.5%。结论 提出的即插即用的正交约束能够提高多头自注意力机制在STR任务中的特征捕获能力,使ViT模型在STR任务上的识别精度得到提高。本文代码已公开: https://github.com/lexiaoyuan/XViTSTR。  相似文献   

引入改进的隐马尔可夫模型算法,针对真实场景中运动目标轨迹的复杂程度对各个轨迹模式类建立相应的隐马尔可夫模型,利用训练样本训练模型得到可靠的模型参数;计算测试样本对于各个模型的最大似然概率,选取最大概率值对应的轨迹模式类作为轨迹识别的结果,对两种场景中聚类后的轨迹进行训练与识别。实验结果表明,平均识别率分别达到87.76 %和94. 19%。  相似文献   

Object tracking is an important task in computer vision that is essential for higher level vision applications such as surveillance systems, human-computer interaction, industrial control, smart compression of video, and robotics. Tracking, however, cannot be easily accomplished due to challenges such as real-time processing, occlusions, changes in intensity, abrupt motions, variety of objects, and mobile platforms. In this paper, we propose a new method to estimate and eliminate the camera motion in mobile platforms, and accordingly, we propose a set of optimal feature points for accurate tracking. Experimental results on different videos show that the proposed method estimates camera motion very well and eliminate its effect on tracking moving objects. And the use of optimal feature points results in a promising tracking. The proposed method in terms of accuracy and processing time has desirable results compared to the state-of-the-art methods.  相似文献   

提出一种基于方向可变滤波器的平面物体射影不变性识别方法。该方法首先利用方向可变滤波器检测出平面物体的边缘方向特征,从单幅图像中提取平面物体在射影变化下的不变特征,建立经典框架,然后用填充经典框架图像的矩识别物体。该方法是图像局部识别方法,允许景物中有部分的遮挡物存在。  相似文献   

