首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
现有场景分类方法只能识别原训练学习的图像类,对于新增图像类的识别任务,需要将其与原训练类合并后重新训练模型.在LDA(Latent Dirichlet Allocation)的基础上提出一种改进方法来训练生成模型,用于实现自然图像场景分类.根据狄雷克里参数的伪计数作用,改进了LDA模型学习方法.以训练图像的通用主题先验参数作为各类场景主题分布预设先验参数,推导各类场景的类主题构成变化,同时改善了EM参数推导过程中的慢收敛问题,实现了模型增量学习,有效地提高了模型的泛化能力.通过模型计算复杂度比较和增量学习实验对本文方法进行了验证,实验证明本文方法能以较低的时间复杂度取得较高的分类平均正确率.  相似文献   

2.
基于内容相关性的场景图像分类方法   总被引:4,自引:0,他引:4  
场景图像分类是计算机视觉领域中的一个基本问题.提出一种基于内容相关性的场景图像分类方法.首先从图像上提取视觉单词.并把图像表示成视觉单词的词频矢量;然后利用产生式模型来学习训练集合中包含的主题,和每一幅图像所包含的相关主题;最后用判定式分类器进行多类学习.提出的方法利用logistic正态分布对主题的相关性进行建模.使得学习得到的类别的主题分布更准确.并且在学习过程中不需要对图像内容进行人工标注.还提出了一种新的局部区域描述方法,它结合了局部区域的梯度信息和彩色信息.在自然场景图像集合和人造场景图像集合上实验了提出的方法,它相对于传统方法取得了更好的结果.  相似文献   

3.
目的 现有视觉问答方法通常只关注图像中的视觉物体,忽略了对图像中关键文本内容的理解,从而限制了图像内容理解的深度和精度。鉴于图像中隐含的文本信息对理解图像的重要性,学者提出了针对图像中场景文本理解的“场景文本视觉问答”任务以量化模型对场景文字的理解能力,并构建相应的基准评测数据集TextVQA(text visual question answering)和ST-VQA(scene text visual question answering)。本文聚焦场景文本视觉问答任务,针对现有基于自注意力模型的方法存在过拟合风险导致的性能瓶颈问题,提出一种融合知识表征的多模态Transformer的场景文本视觉问答方法,有效提升了模型的稳健性和准确性。方法 对现有基线模型M4C(multimodal multi-copy mesh)进行改进,针对视觉对象间的“空间关联”和文本单词间的“语义关联”这两种互补的先验知识进行建模,并在此基础上设计了一种通用的知识表征增强注意力模块以实现对两种关系的统一编码表达,得到知识表征增强的KR-M4C(knowledge-representation-enhan...  相似文献   

4.
大多数有关深度学习的研究都基于神经网络,即可通过反向传播训练的多层参数化非线性可微模块.近年来,深度森林作为一种非神经网络深度模型被提出,该模型具有远少于深度神经网络的超参数.在不同的超参数设置下以及在不同的任务下,它都表现出非常鲁棒的性能,并且能够基于数据确定模型的复杂度.以gcForest为代表的深度森林的研究为探索基于不可微模块的深度模型提供了一种可行的方式.然而,深度森林目前是一种批量学习方法,这限制了它在许多实际任务中的应用,如数据流的应用场景.因此探索了在增量场景下搭建深度森林的可能性,并提出了蒙德里安深度森林.它具有级联森林结构,可以进行逐层处理.设计了一种自适应机制,通过调整原始特征和经过前一层变换后的特征的权重,以进一步增强逐层处理能力,更好地克服了蒙德里安森林在处理无关特征方面的不足.实验结果表明:蒙德里安深度森林在继承蒙德里安森林的增量训练能力的同时,显著提升了预测性能,并能够使用相同的超参数设置在多个数据集上取得很好的性能.在增量训练场景下,蒙德里安深度森林取得了与定期重新训练的gcForest接近的预测准确率,且将训练速度提升一个数量级.  相似文献   

5.
阐述了图形和图像混合型构建虚拟场景的基本思想,对虚拟场景中的不同对象采用不同的处理方式,场景总体的全景视觉效果由基于图像的技术生成,主要利用了反映真实场景的高质量的图像,在无须复杂建模的情况下取得最真实的感官效果;对于场景中与用户交互部分则利用几何模型的方法来处理,对虚拟环境中用户要与之交互的对象进行几何模型构建,这样可以提高用户的沉浸感。对图形和图像建模型的两种不同方法进行了比较,研究了既提高场景真实感.又能加快渲染速度的混合建立场景的方法。  相似文献   

6.
基于分块潜在语义的场景分类方法   总被引:4,自引:0,他引:4  
曾璞  吴玲达  文军 《计算机应用》2008,28(6):1537-1539
提出了一种基于分块潜在语义的场景分类方法。该方法首先对图像进行均匀分块并使用分块内视觉词汇的出现频率来描述每一个分块,然后利用概率潜在语义分析(PLSA)方法从图像的分块集合中发现潜在语义模型,最后利用该模型提取出潜在语义在图像分块中的出现情况来进行场景分类。在13类场景图像上的实验表明,与其他方法相比,该方法具有更高的分类准确率。  相似文献   

7.
提出一种基于粗糙集的图像理解方法.将图像视为一个信息系统,每个像素看作系统中的一个实体对象.引入粗糙集理论中上下近似和核属性的相关概念,采用相容扩展模型下的知识约简方法,对图像处理、分析和解释这3个过程进行分析,提出基于粗糙集的分割算法和知识库规则约简推理方法.通过与Ncuts分割方法及统计学习方法进行理解的实验结果对比,表明算法的可行性和理解的准确性.  相似文献   

8.
针对森林这样的大空间、复杂场景下的火灾检测,提出一种在单帧视频序列图像中的烟检测方法,并研究一种新的超像素合并算法,改进现有的天地线检测算法。该方法对图像进行SLIC(Simple Linear Iterative Clustering)超像素分割,并用一种新的超像素合并算法解决过分割问题;通过改进的天地线分割算法,排除天空中云对于烟检测的干扰;根据光谱特征,运用支持向量机(SVM)对超像素块进行分类。实验结果表明,超像素合并算法高效简洁,易于编程实现,基于图像分割的烟检测技术能排除云雾等噪声对烟雾检测的干扰,在森林场景下的烟雾检测正确率为77%,可以作为人工森林火灾监测的辅助手段。  相似文献   

9.
恶劣场景下采集的图像与视频数据存在复杂的视觉降质,一方面降低视觉呈现与感知体验,另一方面也为视觉分析理解带来了很大困难。为此,系统地分析了国际国内近年恶劣场景下视觉感知与理解领域的重要研究进展,包括图像视频与降质建模、恶劣场景视觉增强、恶劣场景下视觉分析理解等技术。其中,视觉数据与降质建模部分探讨了不同降质场景下的图像视频与降质过程建模方法,涵盖噪声建模、降采样建模、光照建模和雨雾建模。传统恶劣场景视觉增强部分探讨了早期非深度学习的视觉增强算法,包括直方图均衡化、视网膜大脑皮层理论和滤波方法等。基于深度学习模型的恶劣场景视觉增强部分则以模型架构创新的角度进行梳理,探讨了卷积神经网络、Transformer 模型和扩散模型等架构。不同于传统视觉增强的目标为全面提升人眼对图像视频的视觉感知效果,新一代视觉增强及分析方法考虑降质场景下机器视觉对图像视频的理解性能。恶劣场景下视觉理解技术部分探讨了恶劣场景下视觉理解数据集和基于深度学习模型的恶劣场景视觉理解,以及恶劣场景下视觉增强与理解协同计算。论文详细综述了上述研究的挑战性,梳理了国内外技术发展脉络和前沿动态。最后,根据上述分析展望了恶劣场景下视觉感知与理解的发展方向。  相似文献   

10.
单幅图像深度估计是三维重建中基于图像获取场景深度的重要技术,也是计算机视觉中的经典问题,近年来,基于监督学习的单幅图像深度估计发展迅速.文中介绍了基于监督学习的单幅图像深度估计及其模型和优化方法;分析了现有的参数学习、非参数学习、深度学习3类方法及每类方法的国内外研究现状及优缺点;最后对基于监督学习的单幅图像深度估计进行总结,得出了深度学习框架下的单幅图像深度估计是未来研究的发展趋势和重点.  相似文献   

11.
We present a method for reshuffle-based 3D interior scene synthesis guided by scene structures. Given several 3D scenes, we form each 3D scene as a structure graph associated with a relationship set. Considering both the object similarity and relation similarity, we then establish a furniture-object-based matching between scene pairs via graph matching. Such a matching allows us to merge the structure graphs into a unified structure, i.e., Augmented Graph (AG). Guided by the AG, we perform scene synthesis by reshuffling objects through three simple operations, i.e., replacing, growing and transfer. A synthesis compatibility measure considering the environment of the furniture objects is also introduced to filter out poor-quality results. We show that our method is able to generate high-quality scene variations and outperforms the state of the art.  相似文献   

12.
In this paper, we present an approach for consistently labeling people and for detecting human–object interactions using mono-camera surveillance video. The approach is based on a robust appearance-based correlogram model combined with histogram information to model color distributions of people and objects in the scene. The models are dynamically built from non-stationary objects, which are the outputs of background subtraction, and are used to identify objects on a frame-by-frame basis. We are able to detect when people merge into groups and to segment them even during partial occlusion. We can also detect when a person deposits or removes an object. The models persist when a person or object leaves the scene and are used to identify them when they reappear. Experiments show that the models are able to accommodate perspective foreshortening that occurs with overhead camera angles, as well as partial occlusion. The results show that this is an effective approach that is able to provide important information to algorithms performing higher-level analysis, such as activity recognition, where human–object interactions play an important role.  相似文献   

13.
Depth estimation from image structure   总被引:4,自引:0,他引:4  
In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges, and junctions may provide a 3D model of the scene but it will not provide information about the actual "scale" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, object recognition, under unconstrained conditions, remains difficult and unreliable for current computational approaches. We propose a source of information for absolute depth estimation based on the whole scene structure that does not rely on specific objects. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene and, therefore, its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection  相似文献   

14.
Executing object-oriented programs have a complex structure consisting of numerous objects connected by interobject references. This structure, called the program's object graph, is hard to understand, and this complicates learning, teaching, debugging and maintaining object-oriented programs. While visualization can be used to display object graphs, the size and complexity of typical object graphs also makes visualization difficult. We have developed ownership trees as a simple yet powerful method of extracting a program's implicit encapsulation structure from its object graph. We have developed a program visualization tool that makes use of ownership trees to display the structure of object-oriented programs. Because ownership trees are independent of scale—the relationship between a whole object-oriented system and its top-level components is the same as the relationship between a low-level data structure and the objects that implement it—our software visualization is applicable at all levels of abstraction within a program's design.  相似文献   

15.
16.
We present a data‐driven method for synthesizing 3D indoor scenes by inserting objects progressively into an initial, possibly, empty scene. Instead of relying on few hundreds of hand‐crafted 3D scenes, we take advantage of existing large‐scale annotated RGB‐D datasets, in particular, the SUN RGB‐D database consisting of 10,000+ depth images of real scenes, to form the prior knowledge for our synthesis task. Our object insertion scheme follows a co‐occurrence model and an arrangement model, both learned from the SUN dataset. The former elects a highly probable combination of object categories along with the number of instances per category while a plausible placement is defined by the latter model. Compared to previous works on probabilistic learning for object placement, we make two contributions. First, we learn various classes of higher‐order object‐object relations including symmetry, distinct orientation, and proximity from the database. These relations effectively enable considering objects in semantically formed groups rather than by individuals. Second, while our algorithm inserts objects one at a time, it attains holistic plausibility of the whole current scene while offering controllability through progressive synthesis. We conducted several user studies to compare our scene synthesis performance to results obtained by manual synthesis, state‐of‐the‐art object placement schemes, and variations of parameter settings for the arrangement model.  相似文献   

17.
线段是一种组成几何体的基本元素,蕴含着非常丰富的几何信息。从图像中提取完整、连续且具有语义信息的线段对恢复场景的几何结构具有重要意义。该文提出了一种多分辨率线段提取方法,并对线段进行语义分析以区分轮廓线段和纹理线段。该方法首先运用多分辨率思想进行线段提取,然后结合深度神经网络技术对线段进行语义分析,最后对线段进行聚类合并得到最终结果。在线段连续性和完整性方面,该文提出的方法与当前常用的线段提取方法相比具有明显优势;在语义分析准确性方面,该文提出的方法在测试集上的像素精度高达 97.82%。  相似文献   

18.
Automatic Creation of Object Hierarchies for Ray Tracing   总被引:7,自引:0,他引:7  
Intersection calculations dominate the run time of canonical ray tracers. A common algorithm to reduce the number of intersection tests required is the intersection of rays with a tree of extents, rather than the whole database of objects. A shortcoming of this method is that these trees are difficult to generate. Additionally, manually generated trees can be poor, greatly reducing the run-time improvement available. We present methods for evaluation of these trees in approximate number of intersection calculations required and for automatic generation of good trees. These methods run in O(nlogn) expected time where n is the number of objects in the scene. We report some examples of speedups.  相似文献   

19.
Segmentation and classification of range images   总被引:2,自引:0,他引:2  
The recognition of objects in three-dimensional space is a desirable capability of a computer vision system. Range images, which directly measure 3-D surface coordinates of a scene, are well suited for this task. In this paper we report a procedure to detect connected planar, convex, and concave surfaces of 3-D objects. This is accomplished in three stages. The first stage segments the range image into ``surface patches' by a square error criterion clustering algorithm using surface points and associated surface normals. The second stage classifies these patches as planar, convex, or concave based on a non-parametric statistical test for trend, curvature values, and eigenvalue analysis. In the final stage, boundaries between adjacent surface patches are classified as crease or noncrease edges, and this information is used to merge compatible patches to produce reasonable faces of the object(s). This procedure has been successfully applied to a large number of real and synthetic images, four of which we present in this paper.  相似文献   

20.
Computing Dynamic Changes to BSP Trees   总被引:3,自引:0,他引:3  
This paper investigates a new method for dynamically changing Binary Space Partition (BSP) trees. A BSP tree representation of a 3D polygonal scene provides an ideal data structure for rapidly performing the hidden surface computations involved in changing the viewpoint. However, BSP trees have generally been thought to be unsuitable for applications where the geometry of objects in the scene changes dynamically. The purpose of this paper is to introduce a dynamic BSP tree algorithm which does allow for such changes, and which maintains the simplicity and integrity of the BSP tree representation. The algorithm is extended to include dynamic changes to shadows. We calibrate the algorithms by transforming a range of objects in a scene, and reporting on the observed timing results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号