深度学习在语音识别中的研究进展综述   总被引:1,自引:0,他引:1  
在如今的大数据时代里,对于处理大量未经标注的原始语音数据的传统机器学习算法,很多都已不再适用。与此同时,深度学习模型凭借着其对海量数据的强大建模能力,能够直接对未标注数据进行处理,成为当前语音识别领域的一个研究热点。首先主要分析和总结了当前几种具有代表性的深度学习模型;其次是其在语音识别中对于语音特征提取及声学建模中的应用;最后总结了当前所面临的问题和发展方向。  相似文献   

文章主要论述了深度学习目标识别的算法,对其主要运用进行了研究分析。神经网络的结构是一种基于人脑结构设计的技术手段,卷积神经网络属于一种特殊的深度前馈网络信息,为了避免在不同层级之间信息链接产生参数冗余,通过局部链接以及权值共享的方式进行处理,卷积神经网络具有稀疏特征,可以链接符合生物神经元稀疏的响应特性,可以有效的降低网络的参数规模,使得模型训练更为简单便捷。  相似文献   

面向在无纸化考试中的视频行为识别应用需求,提出了基于深度学习的可疑行为识别方法。首先,分析考试视频,提取事件信息。其次,结合深度学习模型,将事件信息作为输入进行训练,以识别可疑行为。最后,基于考试视频数据集进行对比实验,评估所提出方法的准确率。实验结果表明,文章提出的卷积神经网络-长短期记忆网络(Convolutional Neural Networks-Long Short Term Memory,CNN-LSTM)组合模型适用于视频较长的应用场景,CNN-BiGRU适用于视频较短的场景。  相似文献   

传统系统的计算能力较弱,在车辆驾驶行为识别实际应用中经常出现错误识别,准确率较低,为此提出基于深度学习的车辆驾驶行为识别系统。系统硬件方面设计了主控制器、惯性传感器和报警器3个硬件设备,软件方面设计了数据清洗和基于深度学习识别驾驶行为两个功能模块,利用数据清洗模块对原始数据进行无效值处理、标准化处理,将处理后的数据利用深度学习网络模型进行分析,输出识别结果。实验结果表明,该系统的准确率高于传统系统,能够准确识别出车辆驾驶行为。  相似文献   

火灾事故频发严重威胁着社会公共安全和人们的生命财产安全。火灾发生的不可预见性增加了火灾防控的难度。传统温感、烟感火灾探测设备对室内空间火灾探测效率较高;以人工选择特征为依据的火灾图像识别技术受限于实际火灾场景特征复杂多变,存在误报情况;深度学习技术通过海量火灾场景图片训练和网络参数优化,自动提取火灾图像深度抽象特征,以达到对火灾的精准识别和预警判断。本文就火灾图像识别及深度学习技术在该领域中的应用进行分析,对影响深度学习技术在火灾图像识别应用中的瓶颈问题进行探讨,并展望了该技术的未来发展。  相似文献   

为提升湖羊的福利化养殖水平和推动动物福利事业健康发展,提出了基于深度信念网络(Deep Belief Network, DBN) 的湖羊维持行为识别方法。挑选6只湖羊佩戴装有姿态传感器的颈环,经数据采集和整理,构建了包括58680个样本的湖羊维持行为数据集,记录了湖羊卧息、采食、饮水、反刍4种维持行为,结合错误率和重构误差两项评价指标,构建了逐层贪婪二次划分算法的DBN识别模型,经训练后,在测试集上与传统的BP神经网络(BPNN)、随机森林(RF)、支持向量机(SVM)模型 进行对比分析?同时对湖羊进行分组识别对比分析,结果表明:本文方法明显优于其他三种方法,4种维持行为的平均识别精度和灵敏度分别为0.9916和0.9915,验证了该方法在湖羊维持行为识别上的有效性。本研究结果可为湖羊的福利化养殖、 行为学研究、异常行为识别及疾病预警提供技术支持  相似文献   

SAR图像目标识别主要针对桥梁、机场等战略军事目标以及飞机、坦克、汽车等战术目标,进行精确的识别分类及定位,是SAR图像解译的重要一环。首先,构建C6678的卷积神经网络主要处理层,然后结合C6678的处理及存储特性,对卷积层和网络调度进行优化设计,完成了YOLOv3-TINY目标识别网络在C6678上的设计实现方法。该方法能够对常用卷积神经网络模型进行重构及修改,解决了C6678等多核DSP处理平台运行深度学习网络的难题。实验结果表明,该方法在检测性能上与GPU一致,考虑到机载SAR的实时图像帧率,虽然该方法在C6678的实时性能相对GPU还有较大差距,但其能够满足机载SAR实时处理需求。  相似文献   

人体行为识别旨在对视频监控中的人体行为进行检索并识别,是人工智能领域的研究热点。基于传统方法的人体行为识别算法存在对样本数据依赖大、易受环境噪声影响等不足。为解决此问题,许多适用于不同应用场景的基于深度学习的人体行为识别算法被提出。介绍了人体行为识别任务中传统特征提取方法和基于深度学习的特征提取方法;从性能和应用两方面对基于深度学习的人体行为识别算法进行总结,重点分析了基于3D卷积神经网络、混合网络、双流卷积神经网络和少样本学习(few-shot learning,FSL)的人体行为识别方法及其在UCF101和HMDB51数据集上的表现;在深度学习的基础上,归纳了主流模型迁移方法的优缺点及其有效性;总结了现有基于深度学习的人体行为识别算法存在的不足,并讨论了以元学习(meta-learning)和transformer为代表的FSL算法将成为未来模型主流算法的可能性,同时对未来基于深度学习的人体行为识别算法的发展方向进行展望。  相似文献   

Three-dimensional (3D) object recognition is widely used in automated driving, medical image analysis, virtual/augmented reality, artificial intelligence robots, and other areas. Deep learning is increasingly being used to solve 3D vision problems. Multi-view 3D object recognition based on the deep learning technique has become one of the rigorously researched topics because it can directly use the pretrained and successful advanced classification network as the backbone network, and views from multiple viewpoints can complement each other’s detailed features of the object. However, some challenges still exist in this area. Recently, many methods have been proposed to solve the problems pertaining to this research topic. This paper presents a comprehensive review and classification of the latest developments in the deep learning methods for multi-view 3D object recognition. It also summarizes the results of these methods on a few mainstream datasets, provides an insightful summary, and puts forward enlightening future research directions.  相似文献   

View-based approach for learning and recognition of 3D object and its pose detection was proved to be affective and efficient, except its high learning cost. In this research, we propose a virtual learning approach which generates learning samples of views of an object from its 3D view model obtained by motion-stereo method. From the generated learning sample views, features of high-order autocorrelation are extracted, and discriminant feature spaces for object recognition and pose detection are built. Recognition experiments on real objects are carried out to show the effectiveness of the proposed method. Caihua Wang, Ph.D.: He received his B.S. in mathematics and M.E. in electronic engineering from Renmin University of China, Beijing, China in 1983 and 1986, and his Ph. D. from Shizuoka University, Hamamatsu, Japan in 1996. He is a JST domestic fellow and is doing his post doctoral research at Electrotechnical Laboratory. His research interests are computer vision and image processing. He is a member of IEICE and IPSJ. Katsuhiko Sakaue, Ph.D.: He received the B.E., M.E., and Ph.D. degrees all in electronic engineering from University of Tokyo, in 1976, 1978 and 1981, respectively. In 1981, he joined the Electrotechnical Laboratory, Ministry of International Trade and Industry, and engaged in researches in image processing and computer vision. He received the Encouragement Prize in 1979 from IEICE, and the Paper Award in 1985 from Information.  相似文献   

Monitoring and assessing awkward postures is a proactive approach for Musculoskeletal Disorders (MSDs) prevention in construction. Machine Learning models have shown promising results when used in recognition of workers’ posture from Wearable Sensors. However, there is a need to further investigate: i) how to enable Incremental Learning, where trained recognition models continuously learn new postures from incoming subjects while controlling the forgetting of learned postures; ii) the validity of ergonomics risk assessment with recognized postures. The research discussed in this paper seeks to address this need through an adaptive posture recognition model– the incremental Convolutional Long Short-Term Memory (CLN) model. The paper discusses the methodology used to develop and validate this model’s use as an effective Incremental Learning strategy. The evaluation was based on real construction workers’ natural postures during their daily tasks. The CLN model with “shallow” (up to two) convolutional layers achieved high recognition performance (Macro F1 Score) under personalized (0.87) and generalized (0.84) modeling. Generalized CLN model, with one convolutional layer, using the “Many-to-One” Incremental Learning scheme can potentially balance the performance of adaptation and controlling forgetting. Applying the ergonomics rules on recognized and ground truth postures yielded comparable risk assessment results. These findings support that the proposed incremental Deep Neural Networks model has a high potential for adaptive posture recognition. They can be deployed alongside ergonomics rules for effective MSDs risk assessment.  相似文献   

文字识别是一种通用的图像理解技术,对信息检索、车牌识别和自动驾驶等应用的研究有着重要意义。随着神经网络的伟大复兴,场景文字识别任务得到了很大推动,近年来涌现了许多基于深度学习的文字识别算法。本文提出了一种基于特征融合的CRNN改进算法,使用三个通用的文字识别数据集从识别准确率、运行效率和模型大小三个方面进行分析。实验结果表明该算法在提高准确率的同时,运行效率也有所提高。  相似文献   

Active learning has been demonstrated to be effective in reducing labeling costs by selecting the most valuable data from the unlabeled pool. However, the training data of the first epoch in almost all active learning methods is randomly selected, which will cause an instability learning process. Additionally, current active learning, especially uncertainty-based active learning methods, is prone to the problem of data bias because model learning inevitably prefers partial data. For the above issues, we propose Weighting filter (W-filter) tailored for object detection in this paper, which is an image filtering algorithm that can calculate the contribution of a single image to the neural network training as well as remove similar ones in the entire selected data to optimize the sampling results. We first use W-filter to select the training data of the first epoch, which can guarantee better performance and a more stable learning process. Then, we propose to resample the uncertain data from the perspective of the frequency domain to alleviate the problem of data bias. Finally, we redesign several classical uncertainty methods specifically for classification to make them more suitable for the task of object detection. We do rigorous experiments on standard benchmark datasets to validate our work. Several classical detectors such as Faster R-CNN, SSD, R-FCN, CenterNet, EfficientDet, and effective networks including ResNet, DarkNet, MobileNet are used in experiments, which shows our framework is detector-agnostic and network-agnostic and thus can meet any detection scenario.  相似文献   

由于步态容易受到物体遮挡、衣着、视角和携带物等协变量因素的影响,步态识别方法较难获得较优的识别性能.基于端到端和多层特征提取的思想,深度学习近年在步态识别领域取得一系列进展.本文综述深度学习在步态识别中的研究现状、优势和不足,总结其中的关键技术和潜在的研究方向.  相似文献   

现有的草图识别框架利用整幅图像作为网络输入,草图识别过程可解释性较差.文中融合深度学习和语义树,提出草图语义网(Sketch-Semantic Net).首先对草图进行部件分割,将单幅完整的草图分割为多个具有语义概念的部件图.然后利用深度迁移学习识别草图部件.最后通过语义树的语义概念关联部件同部件所属草图对象类别,较好地弥补sketch图像从底层语义到高层语义之间的语义鸿沟.在广泛应用的草图分割数据集上的实验验证文中方法的有效性.  相似文献   

深度学习在语音识别、视觉识别以及其他领域都引起了很多研究者越来越多的关注.在图像处理领域,采用深度学习方法可以获得较高的识别率.本文以玻尔兹曼机和卷积神经网络作为深度学习的研究模型应用于农业方面,从病虫破坏农作物图像识别的角度,结合上述研究模型,并分别结合不同应用场景对模型进行改进.针对病虫破坏农作物的图像识别采用玻尔...  相似文献   

