首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
听觉注意显著性计算模型是研究听觉注意模型的基本问题,显著性计算中选择合适的特征是关键,本文从特征选择的角度提出了一种基于稀疏字典学习的听觉显著性计算模型.该模型首先通过K-SVD字典学习算法学习各种声学信号的特征,然后对字典集进行归类整合,以选取的特征字典为基础,采用OMP算法对信号进行稀疏表示,并直接将稀疏系数按帧合并得到声学信号的听觉显著图.仿真结果表明该听觉显著性计算模型在特征选择上更符合声学信号的自然属性,基于基础特征字典的显著图可以突出噪声中具有结构特征的声信号,基于特定信号特征字典的显著图可以实现对特定声信号的选择性关注.  相似文献   

2.
吕菲  夏秀渝 《自动化学报》2017,43(4):634-644
经典的听觉注意计算模型主要针对声音强度、频率、时间等初级听觉特征进行研究,这些特征不能较好地模拟听觉注意指向性,必须寻求更高级的听觉特征来区分不同声音.根据听觉感知机制,本文基于声源方位特征和神经网络提出了一种双通路信息处理的自下而上听觉选择性注意计算模型.模型首先对双耳信号进行预处理和频谱分析;然后,将其分别送入where通路和what通路,其中where通路用于提取方位特征参数,并利用神经网络提取声源的局部方位特征,接着通过局部特征聚合和全局优化法得到方位特征显著图;最后,根据方位特征显著图提取主导方位并作用于what通路,采用时频掩蔽法分离出相应的主导音.仿真结果表明:该模型引入方位特征作为聚类线索,利用多级神经网络自动筛选出值得注意的声音对象,实时提取复杂声学环境中的主导音,较好地模拟了人类听觉的方位分类机制、注意选择机制和注意转移机制.  相似文献   

3.
Nonstationary acoustic features provide essential cues for many auditory tasks, including sound localization, auditory stream analysis, and speech recognition. These features can best be characterized relative to a precise point in time, such as the onset of a sound or the beginning of a harmonic periodicity. Extracting these types of features is a difficult problem. Part of the difficulty is that with standard block-based signal analysis methods, the representation is sensitive to the arbitrary alignment of the blocks with respect to the signal. Convolutional techniques such as shift-invariant transformations can reduce this sensitivity, but these do not yield a code that is efficient, that is, one that forms a nonredundant representation of the underlying structure. Here, we develop a non-block-based method for signal representation that is both time relative and efficient. Signals are represented using a linear superposition of time-shiftable kernel functions, each with an associated magnitude and temporal position. Signal decomposition in this method is a non-linear process that consists of optimizing the kernel function scaling coefficients and temporal positions to form an efficient, shift-invariant representation. We demonstrate the properties of this representation for the purpose of characterizing structure in various types of nonstationary acoustic signals. The computational problem investigated here has direct relevance to the neural coding at the auditory nerve and the more general issue of how to encode complex, time-varying signals with a population of spiking neurons.  相似文献   

4.
水声信号识别近年来备受关注,由于海洋信道具有时变空变性、信号传播的衰落特性和水下目标声源具有复杂多变性,水声信号识别任务面临巨大挑战.传统的水声信号识别方法难以充分获取目标的表征信息且不具备良好的抗噪声能力,识别效果有待提升.针对上述问题,本文提出一种基于多分支外部注意力网络(multi-branch external attention network, MEANet)的水声信号识别方法,可以在复杂海洋环境下充分获取水声信号的特征并进行识别. MEANet由多分支主干网络,通道、空间注意力模块和外部注意力模块组成.首先,输入数据通过多个并行的主干网络分支,提取水声信号不同层级的特征信息;其次,辅以通道、空间注意力模块对水声信号的通道和空间维度分别进行加权,调节不同通道和空间位置对特征表示的重要性;最后,整合外部注意力模块,以外部记忆单元和附加计算来引导网络的特征提取和预测,从而显著提高模型的识别率和鲁棒性.实验结果表明,本文提出的MEANet在ShipsEar数据集上的水声信号识别率达到98.84%,显著优于其他对比算法,证实了其有效性.  相似文献   

5.
基于侧抑制频谱调谐的显著性检测方法   总被引:1,自引:0,他引:1  
传统的基于频域的显著性检测方法存在处理结果不稳定以及缺少仿生学意义等问题。根据人眼识别的侧抑制机制,提出了一种频域调谐的显著性检测方法。该方法通过对图像傅里叶频谱进行多种非线性自适应调谐,达到抑制图像中冗余特征及增强显著特征的效果,进而实现图像显著区域的有效检测。通过对自然图像和心理物理学模板图像的对比实验和分析表明,该方法在显著区域检测率、显著区域轮廓完整性及显著区域对比度等方面,都比现有显著性检测方法有较大提高。  相似文献   

6.
Auditory attention is a complex mechanism that involves the processing of low-level acoustic cues together with higher level cognitive cues. In this paper, a novel method is proposed that combines biologically inspired auditory attention cues with higher level lexical and syntactic information to model task-dependent influences on a given spoken language processing task. A set of low-level multiscale features (intensity, frequency contrast, temporal contrast, orientation, and pitch) is extracted in parallel from the auditory spectrum of the sound based on the processing stages in the central auditory system to create feature maps that are converted to auditory gist features that capture the essence of a sound scene. The auditory attention model biases the gist features in a task-dependent way to maximize target detection in a given scene. Furthermore, the top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The lexical information is incorporated by using a probabilistic language model, and the syntactic knowledge is modeled using part-of-speech (POS) tags. The combined model is tested on automatically detecting prominent syllables in speech using the BU Radio News Corpus. The model achieves 88.33% prominence detection accuracy at the syllable level and 85.71% accuracy at the word level. These results compare well with reported human performance on this task.   相似文献   

7.
目的 立体视频能提供身临其境的逼真感而越来越受到人们的喜爱,而视觉显著性检测可以自动预测、定位和挖掘重要视觉信息,可以帮助机器对海量多媒体信息进行有效筛选。为了提高立体视频中的显著区域检测性能,提出了一种融合双目多维感知特性的立体视频显著性检测模型。方法 从立体视频的空域、深度以及时域3个不同维度出发进行显著性计算。首先,基于图像的空间特征利用贝叶斯模型计算2D图像显著图;接着,根据双目感知特征获取立体视频图像的深度显著图;然后,利用Lucas-Kanade光流法计算帧间局部区域的运动特征,获取时域显著图;最后,将3种不同维度的显著图采用一种基于全局-区域差异度大小的融合方法进行相互融合,获得最终的立体视频显著区域分布模型。结果 在不同类型的立体视频序列中的实验结果表明,本文模型获得了80%的准确率和72%的召回率,且保持了相对较低的计算复杂度,优于现有的显著性检测模型。结论 本文的显著性检测模型能有效地获取立体视频中的显著区域,可应用于立体视频/图像编码、立体视频/图像质量评价等领域。  相似文献   

8.
This paper addresses the problem of speech recognition in reverberant multisource noise conditions using distant binaural microphones. Our scheme employs a two-stage fragment decoding approach inspired by Bregman's account of auditory scene analysis, in which innate primitive grouping ‘rules’ are balanced by the role of learnt schema-driven processes. First, the acoustic mixture is split into local time-frequency fragments of individual sound sources using signal-level primitive grouping cues. Second, statistical models are employed to select fragments belonging to the sound source of interest, and the hypothesis-driven stage simultaneously searches for the most probable speech/background segmentation and the corresponding acoustic model state sequence. The paper reports recent advances in combining adaptive noise floor modelling and binaural localisation cues within this framework. By integrating signal-level grouping cues with acoustic models of the target sound source in a probabilistic framework, the system is able to simultaneously separate and recognise the sound of interest from the mixture, and derive significant recognition performance benefits from different grouping cue estimates despite their inherent unreliability in noisy conditions. Finally, the paper will show that missing data imputation can be applied via fragment decoding to allow reconstruction of a clean spectrogram that can be further processed and used as input to conventional ASR systems. The best performing system achieves an average keyword recognition accuracy of 85.83% on the PASCAL CHiME Challenge task.  相似文献   

9.
Zhu  Chunbiao  Li  Ge 《Multimedia Tools and Applications》2018,77(19):25181-25197

Saliency detection is an active topic in the multimedia field. Most previous works on saliency detection focus on 2D images. However, these methods are not robust against complex scenes which contain multiple objects or complex backgrounds. Recently, depth information supplies a powerful cue for saliency detection. In this paper, we propose a multilayer backpropagation saliency detection algorithm based on depth mining by which we exploit depth cue from three different layers of images. The proposed algorithm shows a good performance and maintains the robustness in complex situations. Experiments’ results show that the proposed framework is superior to other existing saliency approaches. Besides, we give two innovative applications by this algorithm, such as scene reconstruction from multiple images and small target object detection in video.

  相似文献   

10.
目的 经典的人眼注视点预测模型通常采用跳跃连接的方式融合高、低层次特征,容易导致不同层级之间特征的重要性难以权衡,且没有考虑人眼在观察图像时偏向中心区域的问题。对此,本文提出一种融合注意力机制的图像特征提取方法,并利用高斯学习模块对提取的特征进行优化,提高了人眼注视点预测的精度。方法 提出一种新的基于多重注意力机制(multiple attention mechanism, MAM)的人眼注视点预测模型,综合利用3种不同的注意力机制,对添加空洞卷积的ResNet-50模型提取的特征信息分别在空间、通道和层级上进行加权。该网络主要由特征提取模块、多重注意力模块和高斯学习优化模块组成。其中,空洞卷积能够有效获取不同大小的感受野信息,保证特征图分辨率大小的不变性;多重注意力模块旨在自动优化获得的低层丰富的细节信息和高层的全局语义信息,并充分提取特征图通道和空间信息,防止过度依赖模型中的高层特征;高斯学习模块用来自动选择合适的高斯模糊核来模糊显著性图像,解决人眼观察图像时的中心偏置问题。结果 在公开数据集SALICON(saliency in context)上的实验表明,提出的方法相较于同结...  相似文献   

11.
语音情感识别的精度很大程度上取决于不同情感间的特征差异性。从分析语音的时频特性入手,结合人类的听觉选择性注意机制,提出一种基于语谱特征的语音情感识别算法。算法首先模拟人耳的听觉选择性注意机制,对情感语谱信号进行时域和频域上的分割提取,从而形成语音情感显著图。然后,基于显著图,提出采用Hu不变矩特征、纹理特征和部分语谱特征作为情感识别的主要特征。最后,基于支持向量机算法对语音情感进行识别。在语音情感数据库上的识别实验显示,提出的算法具有较高的语音情感识别率和鲁棒性,尤其对于实用的烦躁情感的识别最为明显。此外,不同情感特征间的主向量分析显示,所选情感特征间的差异性大,实用性强。  相似文献   

12.
Feature extraction methods for sound events have been traditionally based on parametric representations specifically developed for speech signals, such as the well-known Mel Frequency Cepstrum Coefficients (MFCC). However, the discrimination capabilities of these features for Acoustic Event Classification (AEC) tasks could be enhanced by taking into account the spectro-temporal structure of acoustic event signals. In this paper, a new front-end for AEC which incorporates this specific information is proposed. It consists of two different stages: short-time feature extraction and temporal feature integration. The first module aims at providing a better spectral representation of the different acoustic events on a frame-by-frame basis, by means of the automatic selection of the optimal set of frequency bands from which cepstral-like features are extracted. The second stage is designed for capturing the most relevant temporal information in the short-time features, through the application of Non-Negative Matrix Factorization (NMF) on their periodograms computed over long audio segments. The whole front-end has been evaluated in clean and noisy conditions. Experiments show that the removal of certain frequency bands (which are mainly located in the medium region of the spectrum for clean conditions and in low frequencies for noisy environments) in the short-time feature computation process in conjunction with the NMF technique for temporal feature integration improves significantly the performance of a Support Vector Machine (SVM) based AEC system with respect to the use of conventional MFCCs.  相似文献   

13.
目的 鉴于随机游走过程对人类视觉注意力的良好描述能力,提出一种基于惰性随机游走的视觉显著性检测算法。方法 首先通过对背景超像素赋予较大的惰性因子,即以背景超像素作为惰性种子节点,在由图像超像素组成的无向图上演化惰性随机游走过程,获得初始显著性图;然后利用空间位置先验及颜色对比度先验信息对初始显著图进行修正;最终通过基于前景的惰性随机游走产生鲁棒的视觉显著性检测结果。结果 为验证算法有效性,在MSRA-1000数据库上进行了仿真实验,并与主流相关算法进行了定性与定量比较。本文算法的Receiver ROC(operating characteristic)曲线及F值均高于其他相关算法。结论 与传统基于随机过程的显著性检测算法相比,普通随机游走过程无法保证收敛到稳定状态,本文算法从理论上有效克服了该问题,提高了算法的适用性;其次,本文算法通过利用视觉转移的往返时间来刻画显著性差异,在生物视觉的模拟上更加合理贴切,与普通随机游走过程采用的单向转移时间相比,效果更加鲁棒。  相似文献   

14.
音视显著性检测方法采用的双流网络结构,在音视信号不一致时,双流网络的音频信息对视频信息产生负面影响,削弱物体的视觉特征;另外,传统融合方式忽视了特征属性的重要程度。针对双流网络的问题进行研究,提出了一种基于视觉信息补偿的多流音视显著性算法(MSAVIC)。首先,在双流网络的基础上增加单独的视频编码分支,保留视频信号中完整的物体外观和运动信息。其次,利用特征融合策略将视频编码特征与音视频显著性特征相结合,增强视觉信息的表达,实现音视不一致情况下对视觉信息的补偿。理论分析和实验结果表明,MSAVIC在四个数据集上超过其他方法2%左右,在显著性检测方面具有较好的效果。  相似文献   

15.
Visual saliency is an important cue in human visual system to detect salient objects in natural scenes. It has attracted a lot of research focus in computer vision, and has been widely used in many applications including image retrieval, object recognition, image segmentation, and etc. However, the accuracy of salient object detection model remains a challenge. Accordingly, a hierarchical salient object detection model is presented in this paper. In order to accurately interpret object saliency in image, we propose to investigate distinctive features from a global perspective. Image contrast and color distribution are calculated to generate saliency maps respectively, which are then fused using the principal component analysis. Compared with state-of-the-art models, the proposed model can accurately detect the salient object which conform with the human visual principle. The experimental results from the MSRA database validate the effectiveness of our proposed model.  相似文献   

16.
周莺  张基宏  梁永生  柳伟 《计算机科学》2015,42(11):118-122
为了更准确有效地提取人眼观察视频的显著性区域,提出一种基于视觉运动特性的视频时空显著性区域提取方法。该方法首先通过分析视频每帧的频域对数谱得到空域显著图,利用全局运动估计和块匹配得到时域显著图,再结合人眼观察视频时的视觉特性,根据对不同运动特性视频的主观感知,动态融合时空显著图。实验分析从主客观两个方面衡量。视觉观测和量化指标均表明, 与其他经典方法相比,所提方法提取的显著性区域能够更准确地反映人眼的视觉注视区域。  相似文献   

17.
为体现听觉注意神经信息处理计算机制对听觉场景内容的自动分析与理解功能,本文基于人耳对频率变换的感知特性,结合深度信念网络的说话人辨识与听觉显著模型,提出了一种自上而下的听觉显著性注意提取模型.仿真结果表明:该模型具有可行性,同时在利用深度信念网络的说话人辨识技术中能够有效地凸显目标说话人的显著度.  相似文献   

18.
目的 显著性检测是基于对人类视觉的研究,用来帮助计算机传感器感知世界的重要研究手段。现有显著性检测方法大多仅能检测出人类感兴趣的显著点或区域,无法突出对象整体的显著性以及无法区分对象不同层次的显著性。针对上述问题,提出一种基于分层信息融合的物体级显著性检测方法。方法 与当前大多数方法不同,本文同时运用了中级别超像素和物体级别区域两种不同层次的结构信息来获取对象的显著图。首先,将图像分割为中级别的超像素,利用自下而上的方法构造初始显著图;然后通过谱聚类方法将中级别的超像素聚类成物体级的区域,并运用自上而下的先验来调整初始先验图;最后,通过热核扩散过程,将超像素级别上的显著性扩散到物体级的区域上,最终获得一致的均匀的物体级显著性图。结果 在MSRA1000标准数据库上与其他16种相关算法在准确率-召回率曲线及F度量等方面进行了定量比较,检测的平均精度和F-检验分数比其他算法高出5%以上。结论 通过多层次信息融合最终生成的显著图,实现了突出对象整体显著性以及区分不同对象显著性的目标。本文方法同样适用于多目标的显著性检测。  相似文献   

19.
多先验特征与综合对比度的图像显著性检测   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 图像的显著性检测在计算机视觉中应用非常广泛,现有的方法通常在复杂背景区域下表现不佳,由于显著性检测的低层特征并不可靠,同时单一的特征也很难得到高质量的显著图。提出了一种通过增加特征的多样性来实现显著性检测的方法。方法 在高层先验知识的基础上,对背景先验特征和中心先验特征重新进行了定义,并考虑人眼视觉一般会对暖色调更为关注,从而加入颜色先验。另外在图像低层特征上使用目前较为流行的全局对比度和局部对比度特征,在特征融合时针对不同情况分别采取线性和非线性的一种新的融合策略,得到高质量的显著图。结果 在MSRA-1000和DUT-OMRON两个公开数据库进行对比验证,实验结果表明,基于多先验特征与综合对比度的图像显著性检测算法具有较高的查准率、召回率和F-measure值,相较于RBD算法均提高了1.5%以上,综合性能均优于目前的10种主流算法。结论 相较于基于低层特征和单一先验特征的算法,本文算法充分利用了图像信息,能在突出全局对比度的同时也保留较多的局部信息,达到均匀突出显著性区域的效果,有效地抑制复杂的背景区域,得到更加符合视觉感知的显著图。  相似文献   

20.
为了克服图像底层特征与高层语义之间的语义鸿沟,降低自顶向下的显著性检测方法对特定物体先验的依赖,提出一种基于高层颜色语义特征的显著性检测方法。首先从彩色图像中提取结构化颜色特征并在多核学习框架下,实现对图像进行颜色命名获取像素的颜色语义名称;接着利用图像颜色语义名称分布计算高层颜色语义特征,再将其与底层的Gist特征融合,通过线性支持向量机训练生成显著性分类器,实现像素级的显著性检测。实验结果表明,本文算法能够更加准确地检测出人眼视觉关注点。且与传统的底层颜色特征相比,本文颜色语义特征能够获得更好的显著性检测结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号