共查询到20条相似文献,搜索用时 125 毫秒
1.
深度学习理论在计算机视觉中的应用日趋广泛,在目标分类、检测领域取得了令人瞩目的成果,但是深度学习理论在目标跟踪领域的早期应用中,由于存在跟踪时只有目标为正样本,缺乏数据支持,对位置信息依赖程度高等问题,因而应用效果并不理想,传统方法仍占据主流地位.近年来,随着技术的不断发展,深度学习在目标跟踪方向取得了长足的进步.本文首先介绍了目标跟踪技术的基本概念和主要方法,然后针对深度学习在目标跟踪领域的发展现状,从基于深度特征的目标跟踪和基于深度网络的目标跟踪两方面重点阐述了深度学习在该领域的应用方法,并对近期较为流行的基于孪生网络的目标跟踪进行了详细介绍.最后对近年来深度学习在目标跟踪领域取得的成果,以及未来的发展方向作了总结和展望. 相似文献
2.
基于深度学习的多目标跟踪算法是目前动态视觉领域的热门研究方向之一,对于动态多目标识别等问题的解决表现出极大的优势.单目标跟踪算法就目前而言相对比较成熟,研究热点逐渐向多目标跟踪,尤其是在线多目标跟踪问题转移.对比介绍传统目标跟踪算法与深度学习多目标跟踪算法,对今后多目标跟踪算法发展的趋势进行思考. 相似文献
3.
4.
目标跟踪是利用一个视频或图像序列的上下文信息,对目标的外观和运动信息进行建模,从而对目标运动状态进行预测并标定目标位置的一种技术,是计算机视觉的一个重要基础问题,具有重要的理论研究意义和应用价值,在智能视频监控系统、智能人机交互、智能交通和视觉导航系统等方面具有广泛应用。大数据时代的到来及深度学习方法的出现,为目标跟踪的研究提供了新的契机。本文首先阐述了目标跟踪的基本研究框架,从观测模型的角度对现有目标跟踪的历史进行回顾,指出深度学习为获得更为鲁棒的观测模型提供了可能;进而从深度判别模型、深度生成式模型等方面介绍了适用于目标跟踪的深度学习方法;从网络结构、功能划分和网络训练等几个角度对目前的深度目标跟踪方法进行分类并深入地阐述和分析了当前的深度目标跟踪方法;然后,补充介绍了其他一些深度目标跟踪方法,包括基于分类与回归融合的深度目标跟踪方法、基于强化学习的深度目标跟踪方法、基于集成学习的深度目标跟踪方法和基于元学习的深度目标跟踪方法等;之后,介绍了目前主要的适用于深度目标跟踪的数据库及其评测方法;接下来从移动端跟踪系统,基于检测与跟踪的系统等方面深入分析与总结了目标跟踪中的最新具体应用情况,最后对深度学习方法在目标跟踪中存在的训练数据不足、实时跟踪和长程跟踪等问题进行分析,并对未来的发展方向进行了展望。 相似文献
5.
弱光图像增强旨在使隐藏在黑暗中的信息可见,以提高图像质量,在夜间目标检测和行为识别等计算机视觉任务中广泛应用。首先,从有监督和无监督两个角度出发,梳理了基于深度学习的弱光图像增强代表性算法,结合实现原理分析了其优缺点。其次,总结了常用的训练数据集和测试数据集。最后,讨论了目前已有算法存在的问题和未来可能的发展趋势。 相似文献
6.
7.
8.
多视图立体视觉在自动驾驶、增强现实、遗产保护和生物医学等领域得到广泛应用. 为了弥补传统多视图立体视觉方法对低纹理区域不敏感、重建完整度差等不足, 基于深度学习的多视图立体视觉方法应运而生. 对基于深度学习的多视图立体视觉方法的开创性工作和发展现状进行综述, 重点关注基于深度学习的多视图立体视觉局部功能改进和整体架构改进方法, 深入分析代表性模型. 同时, 阐述目前广泛使用的数据集及评价指标, 并对比现有方法在数据集上的测试性能. 最后对多视图立体视觉未来有前景的研究发展方向进行展望. 相似文献
9.
视觉多目标跟踪是计算机视觉领域的热点问题,然而,场景中目标数量的不确定、目标之间的相互遮挡、目标特征区分度不高等多种难题导致了视觉多目标跟踪现实应用进展缓慢.近年来,随着视觉智能处理研究的不断深入,涌现出多种多样的深度学习类视觉多目标跟踪算法.在分析了视觉多目标跟踪面临的挑战和难点基础上,将算法分为基于检测跟踪(Det... 相似文献
10.
单目标跟踪是一种在视频中利用目标外观和上下文信息对单个目标分析运动状态、提供定位的技术,在智能监控、智能交互、导航制导等方面具有应用前景,但遮挡、背景干扰、目标变化等问题导致实际应用的进展缓慢.随着近年来深度学习的快速发展,研究使用深度学习技术优化单目标跟踪算法已成为计算机视觉领域的热点之一.围绕基于深度学习的单目标跟踪算法,在分析了单目标跟踪的基本原理基础上,从相关滤波、孪生网络、元学习、注意力、循环神经网络和生成对抗网络六个方面,根据核心算法的不同分别进行了概述和分析;此外,对研究现状进行了总结,提出了算法的发展趋势和优化思路. 相似文献
11.
12.
Using computer vision and deep learning (e.g., Convolutional Neural Networks) to automatically recognise unsafe behaviour from digital images can help managers identify and respond quickly to such actions and mitigate an adverse event. However, there has been a tendency for computer vision studies in construction to focus solely on detecting unsafe behaviour (i.e., object detection) or the regions of interest with pre-defined labels. Moreover, such approaches have been unable to consider rich semantic information among multiple unsafe actions in a digital image. The research we present in this paper uses a safety rule query to determine and locate several unsafe behaviours in a digital image by employing a visual grounding approach. Our approach consists of: (1) visual and text feature extraction, (2) recursive sub-query, and (3) generation of the bounding box. We validate our approach by conducting an experiment to demonstrate it is effectiveness. The results from an experimental study demonstrate an average precision, recall, and F1-score were 0.55, 0.85, and 0.65, respectively, suggesting our approach can accurately identify and locate different types of unsafe behaviours from digital images acquired from a construction site. 相似文献
13.
14.
针对场景文字区域尺度变化较大,具有较大的长宽比,且具有任意方向性等问题,提出一种基于神经网络的场景文字检测模型.基于直接回归方法设计,无需预先设置锚框,在多次层次构建特征,且在多个分支之间共享卷积核.实验阶段在多个数据集上验证了模型的有效性,相较于现有方法,该模型计算资源消耗更小,推理速度更快,整体性能更好. 相似文献
15.
Analyzing the walking behavior of the public is vital for revealing the need for infrastructure design in a local neighborhood, supporting human-centric urban area development. Traditional walking behavior analysis practices relying on manual on-street surveys to collect pedestrian flow data are labor-intensive and tedious. On the contrary, automated video analytics using surveillance cameras based on computer vision and deep learning techniques appears more effective in generating pedestrian flow statistics. Nevertheless, most existing methods of pedestrian tracking and attribute recognition suffer from several challenging conditions, such as inter-person occlusion and appearance variations, which leads to ambiguous identities and hence inaccurate pedestrian flow statistics.Therefore, this paper proposes a more robust methodology of pedestrian tracking and attribute recognition, facilitating the analysis of pedestrian walking behavior. Specific limitations of a current state-of-the-art method are inferred, based on which several improvement strategies are proposed: 1) incorporating high-level pedestrian attributes to enhance pedestrian tracking, 2) a similarity measure integrating multiple cues for identity matching, and 3) a probation mechanism for more robust identity matching. From our evaluation using two public benchmark datasets, the developed strategies notably enhance the robustness of pedestrian tracking against the challenging conditions mentioned above. Subsequently, the outputs of trajectories and attributes are aggregated into fine-grained pedestrian flow statistics among different pedestrian groups. Overall, our developed framework can support a more comprehensive and reliable decision-making for human-centric planning and design in different urban areas. The framework is also applicable to exploiting pedestrian movement patterns in different scenes for analyses such as urban walkability evaluation. Moreover, the developed mechanisms are generalizable to future researches as a baseline, which provides generic insights of how to fundamentally enhance pedestrian tracking. 相似文献
16.
Tosiyasu L. Kunii 《The Visual computer》2005,21(12):958-960
A brief history and the prospects of “The Visual Computer”and “The Visual Computer: An International Journal” are presented
solely to foster future research on the visual computer. It is still in its infancy, and the author’s view is based on his
own limited experiences, and hence is prone to
mistakes. 相似文献
17.
Although occupancy information is critical to energy consumption of existing buildings, it still remains to be a major source of uncertainty. For reliable and accurate occupant modeling with minimal uncertainties, capturing precise occupant information on occupants is essential. This paper proposes a computer vision-based approach that utilizes deep learning architectures to estimate of the number of people in large, crowded spaces using multiple cameras. Various vision techniques (head detection, background elimination, head tracking) are implemented in three methods: (i) a method that instantaneously counts people in a scene, (ii) a method that incrementally counts people entering/exiting a room and (iii) a combination of the first two methods. These methods were applied in a classroom with heavy occlusions, and resulted in a high prediction capacity when compared to ground truth measurements. Future work in video-analytical approaches can address problems regarding lowering the computational cost of analysis, capturing occupancy data in complex room geometries and addressing concerns in privacy preservation. 相似文献
18.
19.
《Displays》2021
3D object detection is a critical part of environmental perception systems and one of the most fundamental tasks in understanding the 3D visual world, which benefit a series of downstream real-world applications. RGB-D images include object texture and semantic information, as well as depth information describing spatial geometry. Recently, numerous 3D object detection models for RGB-D images have been proposed with excellent performance, but summaries in this area are still absent. To stimulate future research, this paper provides a detailed analysis of current developments in 3D object detection methods for RGB-D images to motivate future research. It covers three major parts, including background on 3D object detection, RGB-D data details, and comparative results of state-of-the-art methods on several publicly available datasets, with an emphasis on contributions, design ideas, and limitations, as well as insightful observations and inspiring future research directions. 相似文献
20.
Image segmentation is an important issue in many industrial processes, with high potential to enhance the manufacturing process derived from raw material imaging. For example, metal phases contained in microstructures yield information on the physical properties of the steel. Existing prior literature has been devoted to develop specific computer vision techniques able to tackle a single problem involving a particular type of metallographic image. However, the field lacks a comprehensive tutorial on the different types of techniques, methodologies, their generalizations and the algorithms that can be applied in each scenario. This paper aims to fill this gap. First, the typologies of computer vision techniques to perform the segmentation of metallographic images are reviewed and categorized in a taxonomy. Second, the potential utilization of pixel similarity is discussed by introducing novel deep learning-based ensemble techniques that exploit this information. Third, a thorough comparison of the reviewed techniques is carried out in two openly available real-world datasets, one of them being a newly published dataset directly provided by ArcelorMittal, which opens up the discussion on the strengths and weaknesses of each technique and the appropriate application framework for each one. Finally, the open challenges in the topic are discussed, aiming to provide guidance in future research to cover the existing gaps. 相似文献