首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
For the aging population, surveillance in household environments has become more and more important. In this paper, we present a household robot that can detect abnormal events by utilizing video and audio information. In our approach, moving targets can be detected by the robot using a passive acoustic location device. The robot then tracks the targets by employing a particle filter algorithm. To adapt to different lighting conditions, the target model is updated regularly based on an update mechanism. To ensure robust tracking, the robot detects abnormal human behavior by tracking the upper body of a person. For audio surveillance, Mel frequency cepstral coefficients (MFCC) is used to extract features from audio information. Those features are input to a support vector machine classifier for analysis. Experimental results show that the robot can detect abnormal behavior such as “falling down” and “running”. Also, a 88.17% accuracy rate is achieved in the detection of abnormal audio information like “crying”, “groan”, and “gun shooting”. To lower the false alarms by abnormal sound detection system, the passive acoustic location device directs the robot to the scene where abnormal events occur and the robot can employ its camera to further confirm the occurrence of the events. At last, the robot will send the image captured by the robot to the mobile phone of master.  相似文献   

2.
Audio streams, such as news broadcasting, meeting rooms, and special video comprise sound from an extensive variety of sources. The detection of audio events including speech, coughing, gunshots, etc. leads to intelligent audio event detection (AED). With substantial attention geared to AED for various types of applications, such as security, speech recognition, speaker recognition, home care, and health monitoring, scientists are now more motivated to perform extensive research on AED. The deployment of AED is actually a more complicated task when going beyond exclusively highlighting audio events in terms of feature extraction and classification in order to select the best features with high detection accuracy. To date, a wide range of different detection systems based on intelligent techniques have been utilized to create machine learning-based audio event detection schemes. Nevertheless, the preview study does not encompass any state-of-the-art reviews of the proficiency and significances of such methods for resolving audio event detection matters. The major contribution of this work entails reviewing and categorizing existing AED schemes into preprocessing, feature extraction, and classification methods. The importance of the algorithms and methodologies and their proficiency and restriction are additionally analyzed in this study. This research is expanded by critically comparing audio detection methods and algorithms according to accuracy and false alarms using different types of datasets.  相似文献   

3.
The content‐based classification and retrieval of real‐world audio clips is one of the challenging tasks in multimedia information retrieval. Although the problem has been well studied in the last two decades, most of the current retrieval systems cannot provide flexible querying of audio clips due to the mixed‐type form (e.g., speech over music and speech over environmental sound) of audio information in real world. We present here a complete, scalable, and extensible content‐based classification and retrieval system for mixed‐type audio clips. The system gives users an opportunity for flexible querying of audio data semantically by providing four alternative ways, namely, querying by mixed‐type audio classes, querying by domain‐based fuzzy classes, querying by temporal information and temporal relationships, and querying by example (QBE). In order to reduce the retrieval time, a hash‐based indexing technique is introduced. Two kinds of experiments were conducted on the audio tracks of the TRECVID news broadcasts to evaluate the performance of the proposed system. The results obtained from our experiments demonstrate that the Audio Spectrum Flatness feature in MPEG‐7 standard performs better in music audio samples compared to other kinds of audio samples and the system is robust under different conditions. © 2011 Wiley Periodicals, Inc.  相似文献   

4.
移动机器人定位已成为机器人研究的重要任务。提出基于递归卷积神经网络的移动机器人定位(Recurrent Convolutional Neural Networks-Based Mobile Robot Localization,RCNN-MRL)算法。递归卷积神经网络(Recurrent Convolutional Neural Networks,RCNN)结合卷积神经网络(Convolutional Neural Networks,CNN)和递归神经网络(Recurrent Neural Networks,RNN)的特性,并依据机器人上嵌入的照相机拍摄的第一人称视角图像,RCNN-MRL算法利用RCNN实现自主定位。具体而言,先通过RCNN有效地处理多个连续图像,再利用RCNN作为回归模型,进而估计机器人位置。同时,设计双轮机器人移动,获取多个时间序列图像信息。最后,依据双轮机器人随机移动建立仿真环境,分析机器人定位性能。实验数据表明,提出的RCNN模型能够实现自主定位。  相似文献   

5.
Traffic congestion in modern cities is an increasing problem having significant consequences in our daily lives. This work proposes a non-intrusive, passive monitoring framework based on the acoustic modality which can be used either autonomously or as a part of a multimodal system and provide valuable information to an intelligent transportation system. We consider a large number of audio classes which are typically encountered in urban areas. We introduce a combination of a powerful audio representation mechanism based on time, frequency and wavelet domain features with universal background modeling which leads to higher recognition accuracies and detection rates (in terms of false alarm and miss probability rates) with respect to commonly employed methodologies. The basic advantage of a class-specific model derived using the universal background modeling logic is its tolerance to data which belong to other sound classes. Another important feature of the proposed system is its ability to detect crash incidents, which apart from their catastrophic impact on human life and property, have negative consequences on the traffic flow. Our experiments are based on the concurrent usage of professional sound effect collections which include audio recordings of high quality. We thoroughly examine the performance of the proposed system on isolated sound events as well as continuous audio streams using confusion matrices and detection error trade-off curves.  相似文献   

6.
进口木材蛀虫检疫是海关的一项重要工作,但其存在着虫声检测算法准确率低、鲁棒性差等问题。针对这些问题,提出了一种基于三维卷积神经网络(3D CNN)的虫音检测方法以实现虫音特征的识别。首先,对原始虫音音频进行交叠分帧预处理,并使用短时傅里叶变换得到虫音音频的语谱图;然后,将语谱图作为3D CNN的输入,使其通过包含三层卷积层的3D CNN以判断音频中是否存在虫音特征。通过设置不同分帧长度下的输入进行网络训练及测试;最后以准确率、F1分数以及ROC曲线作为评估指标进行性能分析。结果表明,在交叠分帧长度取5 s时,训练及测试效果最佳。此时,3D CNN模型在测试集上的准确率达到96.0%,F1分数为0.96,且比二维卷积神经网络(2D CNN)模型准确率提高近18%。说明所提算法能准确地从音频信号中提取虫音特征并完成蛀虫识别任务,为海关检验检疫提供有力保障。  相似文献   

7.
图像中的异常检测是计算机视觉中非常重要的研究主题, 它可以定义为单分类问题;针对图像数据集的规模大,维度高等特性,一种新的深度卷积自编码器(Convolutional Autoencoder, CAE)与核近似单分类支持向量机(One Class Support Vector Machine, OCSVM)相结合的异常检测模型CAE-OCSVM被提出;模型中的深度卷积自编码器负责学习图像的本质特征表示,然后使用随机傅里叶特征对卷积自编码器学习本质特征进行核近似,核近似后输入线性单类支持向量机进行图像异常检测。核近似技术克服了核学习技术时间复杂度高的问题;同时深度卷积自编码器与核近似单类支持向量机通过梯度下降法实现了端到端的学习;模型的AUC性能在四个公开的图像基准数据集上进行了实验验证,同时模型与其它常用的异常检测模型在不同的异常率的情况下进行了性能对比;实验结果证实CAE-OCSVM模型在四个公开图像数据集上的性能都优于其它异常检测模型,表明了CAE-OCSVM模型更适合大规模高维数据集的异常检测  相似文献   

8.
针对深度学习方法运用于入侵检测时需要大量标注数据集和难以实时检测的缺陷,利用网络流量中正常数据多于异常数据的一般规律,提出一种结合集成K-means聚类和自编码器的EKM-AE(ensemble K-means and autoencoder)入侵检测方法.首先通过集成K-means聚类从实时抓取的网络流量中得出正常样例,用于训练自编码器,然后由完成训练的自编码器执行入侵检测.在虚拟局域网主机环境下进行了入侵检测实验,结果表明,在绝大多数实际应用场景(正常流量多于异常流量)下该方法具有良好的检测性能,且具有全过程无监督、可实时在线检测的优点,对主机网络安全有良好的提升作用.  相似文献   

9.
针对传统手工方法优化卷积神经网络(CNN)参数时存在耗时长、不准确,以及参数设置影响算法性能等问题,提出一种基于教与学优化(TLBO)的可变卷积自编码器(CAE)算法.该算法设计了可变长度的个体编码策略,从而快速构建CAE结构,并堆叠CAE为一个CNN;此外,充分利用优秀个体的结构信息来引导算法朝着更有希望的区域搜索,...  相似文献   

10.
针对目前森林盗伐猖獗,且尚无实时检测盗伐行为方法问题,提出了一种基于声音识别的森林盗伐检测方法。通过对声音信号的频谱特征分析、相似度值及信噪比计算,检测是否存在链锯伐木行为。实验结果表明,提出的方法能够有效地排除干扰声音,准确、实时地识别链锯伐木声音。  相似文献   

11.
刘太亨  何昭水 《计算机应用》2021,41(11):3200-3205
针对传统的表面缺陷检测方法只能对具有高对比度或低噪声的明显缺陷轮廓进行检测的问题,提出了一种基于自编码和知识蒸馏的表面缺陷检测方法来准确定位和分类从实际工业环境捕获的输入图像中出现的缺陷。首先,设计了一种级联自动编码器(CAE)架构用于分割和定位缺陷,其目的是将输入的原始图像转换为基于CAE的预测蒙版;其次,利用阈值模块对预测结果进行二值化以获得准确的缺陷轮廓;然后,把缺陷区域检测器提取并裁剪出来的缺陷区域视为下一个模块的输入;最后,将CAE分割结果的缺陷区域通过知识蒸馏进行类别分类。实验结果表明,与其他几种表面缺陷检测方法相比,所提出的方法综合性能最好,其缺陷检测平均准确率为97.00%。该方法能够有效地对较小的、边缘不清晰的缺陷进行分割,满足对物品表面缺陷实时分割检测的工程要求。  相似文献   

12.
音频信号某区域的关注情况受音频特征的影响,目前主要的自下而上的关注区域提取算法大都将一维音频信号转至二维图像利用图像显著性算法进行分析,往往忽略了关注事件在时间维度上的持续性特征。针对此问题,基于音频信号的信息熵特征同时引入统计学时间趋势相关算法,通过对信号分帧求取信息熵值,再进行指数移动平均等计算得到关注度值,从而确定高关注区域。与当前的主流关注度提取算法进行对比,在很好检测到关注区域的起止点基础上,计算得到的关注度值整体更平滑,同时考虑了人耳听觉系统对某事件关注的持续性特点,通过对一段脱口秀节目音频进行实验,得到整体掌声笑声片段检出率为81.6%。  相似文献   

13.
为实现快速而准确的人脸检测,提出了一种基于全卷积神经网络的多尺度人脸检测的方法,将卷积神经网络模型AlexNet的全连接层改为全卷积层,并将分类层改为人脸与非人脸的二分类,训练之后准确率达到99.16%。将训练好的分类模型用于人脸检测时,待检测图片通过多尺度变换后输入全卷积网络得到特征图的概率矩阵,用非极大值抑制得到最精准的人脸框。检测结果表明,该方法在人脸检测时准确率高,检测时间短,表现出较好的性能。  相似文献   

14.
语音传输技术(VOIP)广泛地应用于IP电话等领域,它将语音通讯与数据通讯广泛地融合起来,促进了Internet应用技术的发展。在VOIP技术应用里,语音数据发送人网路之前,通过静音检测、语音压缩等技术,可有效减少数据发送量,降低网络占用率,提高传输效率。应用静音检测和MPEG语音编码压缩技术,采用RTP实时传输协议,开发了一个多路语音传输系统,能够处理并行多路实时语音传输。系统具有较低的CPU占用率,为小规模多路并行实时语音传输的实现提供借鉴。  相似文献   

15.
Convolutional neural networks (CNNs) are typical structures for deep learning and are widely used in image recognition and classification. However, the random initialization strategy tends to become stuck at local plateaus or even diverge, which results in rather unstable and ineffective solutions in real applications. To address this limitation, we propose a hybrid deep learning CNN-AdapDAE model, which applies the features learned by the AdapDAE algorithm to initialize CNN filters and then train the improved CNN for classification tasks. In this model, AdapDAE is proposed as a CNN pre-training procedure, which adaptively obtains the noise level based on the principle of annealing, by starting with a high level of noise and lowering it as the training progresses. Thus, the features learned by AdapDAE include a combination of features at different levels of granularity. Extensive experimental results on STL-10, CIFAR-10, andMNIST datasets demonstrate that the proposed algorithm performs favorably compared to CNN (random filters), CNNAE (pre-training filters by autoencoder), and a few other unsupervised feature learning methods.  相似文献   

16.
声音导引在生命探测与定位中具有重要的现实意义,本文以单片机为核心,基于无线传输理论,设计制作了一声音导引系统。系统含一个以车载移动声源为核心的主机系统和一个以声音接收器为核心的从机系统。主从机均采用STC89C52单片机作为控制核心,从机通过声控电路接收来自主机的声音信号,根据两个接收器接收到的信号的时间差来判断声源位置,然后将此位置信息通过无线方式传送给主机,主机据此控制电机的运转,使之运动到目标位置。实验证明,系统结构简单可靠,性能良好,并且集高可靠性、高性价比及低功耗于一身。  相似文献   

17.
提出了一个全新的概念,该概念表述了通过融合来自分布式视听处理系统的不同信息来提高事故检测鲁棒性以及提供更多的事件描述.最后利用来自伦敦和巴黎的现场测试验证了该系统的性能.本文是以欧盟的PRISMATICA项目为基础.  相似文献   

18.
李娜  顾庆  姜枫  郝慧珍  于华  倪超 《软件学报》2020,31(11):3621-3639
砂岩显微图像分类是地质学研究中一项基本工作,在油气储集层评估等方面有重要意义.在实现自动分类时,由于砂岩显微图像具有复杂多变的显微结构,人工定义特征对砂岩显微图像的表示能力有限.此外,由于样本采集和标注成本高昂,带标记的砂岩显微图像很少.提出一种面向小规模数据集的基于卷积神经网络的特征表示方法FeRNet,以便有效地捕获砂岩显微图像的语义信息,提高对砂岩显微图像的特征表示能力.FeRNet网络结构简单,可降低网络对带标记图像数据量的要求,防止参数过拟合.针对带标记砂岩显微图像数量不足的问题,提出了图像扩增预处理方法及基于卷积自编码网络的权重初始化策略,降低了因数据不足造成的过拟合风险.基于采自西藏地区的砂岩显微图像数据集设计并进行实验,实验结果表明,在带标记砂岩显微图像数据不足的情况下,图像扩增和卷积自编码网络可以有效地改善FeRNet网络的训练效果,通过FeRNet网络提取的特征对砂岩显微图像的表示能力优于人工定义特征.  相似文献   

19.
现今主要的视觉SLAM回环检测方法是基于人工标记特征点算法进行图像间匹配,在复杂环境下会出现准确率急速下降的问题。针对此问题,结合卷积神经网络和局部敏感哈希算法,提出一种基于深度学习的回环检测方法。基于回环检测中的图像相似性判断策略构建图像特征向量集,运用级联的余弦距离哈希函数进行回环检测。实验结果表明,该方法较传统方法有着更高的准确率与速率,更好满足了视觉SLAM系统对消除累计误差和实时性的要求。  相似文献   

20.
王艳 《测控技术》2022,41(6):19-25
针对传统的皮带机跑偏检测系统无法准确获得跑偏位置而导致在检测过程中存在识别速率慢和识别精度差的问题,提出一种基于声源定位和图像处理的皮带机故障检测系统。麦克风阵列拾取皮带声音信号并进行预处理,鉴频器进行鉴频识别,得到故障声后采用卷积神经网络(CNN)算法获取故障声位置。上位机系统收到异常声源的位置信息后,调控工业摄像机对异常位置进行照片拍摄,并将照片发送至上位机系统进行预处理,随后采用灰度平均法进行图像分割,同时提取边缘特征,利用跑偏角和偏移量计算皮带机跑偏的阈值范围,结合支持向量机(SVM)来判断皮带是否跑偏和跑偏的严重程度,并将判断结果发送至报警系统。仿真结果表明,该系统具有较好的故障声源识别和定位功能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号