首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
在改进噪音环境下的语音识别率中,来自于说话人嘴部的可视化语音信息有着显著的作用.介绍了在视听语音识别(AVSR)中的重要组成部分之一:可视化信息的前端设计;描述了一种用于快速处理图像并能达到较高识别率的人脸嘴部检测的机器学习方法,此方法引入了旋转Harr-like特征在积分图像中的应用,在基于AdaBoost学习算法上通过使用单值分类作为基础特征分类器,以级联的方式合并强分类器,最后划分检测区域用于嘴部定位.将上述方法应用于AVSR系统中,基本上达到了对人脸嘴部实时准确的检测效果.  相似文献   

2.
为了解决人脸身份认证中的欺诈问题,提出了一种基于图像扩散速度模型和纹理信息的人脸活体检测算法。真实人脸和虚假人脸图像的空间结构不同,为了提取这种差异特征,该方法使用各向异性扩散增强图像的边缘信息。然后,将原始图像与扩散后图像的差值作为图像的扩散速度,并构建扩散速度模型。接着使用局部二值算法提取图像扩散速度特征并训练分类器。真实人脸图像和虚假人脸图像之间存在很多差异特征,为了进一步提高人脸活体检测算法的泛化能力,该方法同时提取人脸图像的模糊程度特征和色彩纹理特征,通过特征矩阵级联的方法将两种特征进行融合,并训练另一个分类器。最后根据分类器输出概率加权融合的结果做出判决。实验结果表明,该算法能够快速有效地检测出虚假的人脸图像。  相似文献   

3.
针对人脸表情时空域特征信息的有效提取,本文提出了一种CBP-TOP(Centralized Binary Patterns From Three Orthogonal Panels)特征和SVM分类器相结合的人脸表情识别新方法。该方法首先将原始图像序列进行图像预处理,包括人脸检测、图像截取和图像尺度归一化,然后用CBP-TOP算子对图像序列进行分块提取特征,最后采用SVM分类器进行表情识别。实验结果表明,该方法能更有效提取图像序列的运动特征和动态纹理信息,提高了表情识别的准确率。和VLBP特征相比, CBP-TOP特征在表情识别中具有更高的识别率和更快的识别速度。  相似文献   

4.
基于D—S证据理论的表情识别技术   总被引:1,自引:0,他引:1  
王嵘  马希荣 《计算机科学》2009,36(1):231-233
在情感计算理论基础上,提出了基于D-S理论的信息融合的表情识别技术,设计并实现了系统IFFER.在表情识别模块中的分类器训练采用JAFFE表情库.识别中首先利用色度匹配及亮度匹配将人脸图像进行眼部及嘴部的分割,再分别用训练好的眼部SVM分类器及嘴部SVM分类器进行识别,将识别后的结果利用D-S证据理论进行融合.实验结果表明,对分割后的两部分图像进行识别,无论从训练上还是识别上,数据的维数都大大减少,提高了效率.在识别率上,融合后的结果相对于融合前的有显著的提高.  相似文献   

5.
光照变化条件下的人脸图像识别一直以来都是图像处理中的热点和难点问题,为了提高人脸图像的识别率,提出了一种用于非均匀光照条件下人脸识别的算法.利用对数及二维小波变换的多尺度特性提取出人脸的光照不变量,然后运用PCA+LDA方法进行人脸特征提取,并采用基于欧氏距离的最近邻分类器进行识别.通过Matlab编程实验,在Yale B人脸库中达到了较高的识别率.  相似文献   

6.
针对网络不良图像过滤的需求,提出一种基于SVM的不良图片快速过滤方法。该方法利用混合肤色模型实现裸露肤色区域的检测,提取人脸位置、形状和图像背景等特征,组成特征向量。用SVM分类器训练得到检测模型,利用这个模型进行判决,有效提高了不良图片的平均识别率。选取实际网络应用中的正常图像与不良图像,其中不良图像的识别率为83.9%,正常图像的识别率为93.4%,误检率为6.6%,平均识别率达到86.6%,实验显示该方法满足实际应用的需求。  相似文献   

7.
《电子技术应用》2020,(1):34-38
传统人脸检测算法往往不能自动地从原始图像中提取有用的检测特征,而卷积神经网络可以轻易地提取高维度的特征信息,广泛用于图像处理领域。针对上述缺点,采用简单高效的深度学习Caffe框架并通过AlexNet网络训练,数据集为LFW人脸数据集,得出一个模型分类器,对原始图像数据进行图像金字塔变换,并通过前向传播得到特征图,反变换得出人脸坐标,采用非极大值抑制算法得出最优位置,最后达到一个二分类的人脸检测结果。该方法可以实现不同尺度的人脸检测,具有较高的精度,可用于构建人脸检测系统。  相似文献   

8.
叙述如何在复杂背景下的图像或视频中判断是否有人脸,若有,则统计个数。实现原理是基于AdaBoost算法,提取Haar特征和训练得到的级联分类器对人脸进行识别。改进之处在于动态调整各级联分类器的权重,对识别率高的级联分类器(如正脸级联分类器)加大其权重,识别率低的级联分类器(如侧脸级联分类器)降低其权重。试验结果表明,该方法可以更加快速、更加准确地实现人脸检测,具有较好的实时性。  相似文献   

9.
朱文球  刘强 《计算机工程》2007,33(2):171-173
提出一种基于AdaBoost的人脸性别分类方法,从一张低分辨率灰度人脸图像中辨认出一个人的性别。将启发式搜索算法融于AdaBoost算法框架中,从而发现新的可用于更好分类的特征。利用该方法进行人脸性别分类方面的实验,当使用少于500个像素比较时,正确识别率达到了93%以上,这与迄今已公布的最佳的分类器支持向量机(SVM)的正确识别率相当,但速度却快得多。  相似文献   

10.
运用模糊积分进行信息融合的人脸识别方法   总被引:2,自引:0,他引:2  
提出一种运用模糊积分的原理对整体和局部特征进行融合的人脸识别方法.在实际应用中,现有人脸识别系统缺乏对外界环境进行自适应调节的能力,为此,首先将人脸的关键特征点眼睛,鼻子和嘴巴进行分割,接着采用Fisherface方法对人脸图象进行特征提取和压缩,并建立了三个基于局部特征和一个基于整体特征的分类器,最后利用模糊积分的思想对这些分类器进行融合,将融合后的结果用于人脸识别中.试验表明:该方法能够有效的结合人脸图像的互补信息,提高了识别率.  相似文献   

11.
深度学习在语音识别、视觉识别以及其他领域都引起了很多研究者越来越多的关注.在图像处理领域,采用深度学习方法可以获得较高的识别率.本文以玻尔兹曼机和卷积神经网络作为深度学习的研究模型应用于农业方面,从病虫破坏农作物图像识别的角度,结合上述研究模型,并分别结合不同应用场景对模型进行改进.针对病虫破坏农作物的图像识别采用玻尔...  相似文献   

12.
Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported.  相似文献   

13.
Multiple Classifier System has found its applications in many areas such as handwriting recognition, speaker recognition, medical diagnosis, fingerprint recognition, personal identification and others. However, there have been rare attempts to develop content-based image retrieval (CBIR) system that uses multiple classifiers to learn visual similarity. Texture as a primitive visual content is often used in many important applications (viz. Medical image analysis and medical CBIR system). In this paper, a texture image retrieval system is developed that learns the visual similarity in terms of class membership using multiple classifiers. The way proposed approach combines the decisions of multiple classifiers to obtain final class memberships of query for each of the output classes is also a novel concept. A modified distance that is weighted with the membership values obtained through similarity learning is used for ranking. Three different algorithms are proposed for the retrieval of images against a query image displaying the strength of multiple classifier approach, class membership score and their interplay to achieve the objective defined in terms of simplicity, retrieval effectiveness and speed. The proposed methods based on multiple classifiers achieve higher retrieval accuracy with lower standard deviation compared to all the competing methods irrespective of the texture database and feature set used. The multiple classifier retrieval schemes proposed here is tested for texture image retrieval. However, these can be used for any other challenging retrieval problems.  相似文献   

14.
This paper proposes a hybrid-boost learning algorithm for multi-pose face detection and facial expression recognition. To speed-up the detection process, the system searches the entire frame for the potential face regions by using skin color detection and segmentation. Then it scans the skin color segments of the image and applies the weak classifiers along with the strong classifier for face detection and expression classification. This system detects human face in different scales, various poses, different expressions, partial-occlusion, and defocus. Our major contribution is proposing the weak hybrid classifiers selection based on the Harr-like (local) features and Gabor (global) features. The multi-pose face detection algorithm can also be modified for facial expression recognition. The experimental results show that our face detection system and facial expression recognition system have better performance than the other classifiers.  相似文献   

15.
Viseme是在语音驱动说话人头部动画中一种常用的为口形建立的音频-视频模型。本文尝试建立viseme隐马尔可夫模型(HMM),用于驱动说话人头部的语音识别系统,称为前映射系统。为了得到更精确的模型以提高识别率,引入考虑发音口形上下文的Triseme模型。但是引入Triseme模型后,随着模型数量的急剧增加将导致训练数据的严重不足。本文使用决策树状态捆绑方法来缓解这一问题,同时引入了一种以口形相似度为基础的决策树视频问题设计方法。为了比较viseme系统的性能,本文也建立了一个以phoneme为基本HMM模型的语音识别系统。在评价准则上,使用了一种客观评价说话人头部动画的加权识别率。实验表明,以viseme为基本HMM模型的前映射系统可以为说话人头部提供更加合理的口形。  相似文献   

16.
文章抓住人类语音感知多模型的特点,尝试建立一个在噪音环境下的基于音频和视频复合特征的连续语音识别系统。在视频特征提取方面,引入了一种基于特征口形的提取方法。识别实验证明,这种视频特征提取方法比传统DCT、DWT方法能够带来更高的识别率;基于特征口形的音频-视频混合连续语音识别系统具有很好的抗噪性。  相似文献   

17.
Audio-visual recognition system is becoming popular because it overcomes certain problems of traditional audio-only recognition system. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of the system. This problem can be further complicated when more than one visual variation happen at the same time. Although several databases have been created in this area, none of them includes realistic visual variations in video sequence. With the aim to facilitate the development of robust audio-visual recognition systems, the new audio-visual UNMC-VIER database is created. This database contains various visual variations including illumination, facial expression, head pose, and image resolution variations. The most unique aspect of this database is that it includes more than one visual variation in the same video recording. For the audio part, the utterances are spoken in slow and normal speech pace to improve the learning process of audio-visual speech recognition system. Hence, this database is useful for the development of robust audio-visual person, speech recognition and face recognition systems.  相似文献   

18.
We propose a three-stage pixel-based visual front end for automatic speechreading (lipreading) that results in significantly improved recognition performance of spoken words or phonemes. The proposed algorithm is a cascade of three transforms applied on a three-dimensional video region-of-interest that contains the speaker's mouth area. The first stage is a typical image compression transform that achieves a high-energy, reduced-dimensionality representation of the video data. The second stage is a linear discriminant analysis-based data projection, which is applied on a concatenation of a small amount of consecutive image transformed video data. The third stage is a data rotation by means of a maximum likelihood linear transform that optimizes the likelihood of the observed data under the assumption of their class-conditional multivariate normal distribution with diagonal covariance. We applied the algorithm to visual-only 52-class phonetic and 27-class visemic classification on a 162-subject, 8-hour long, large vocabulary, continuous speech audio-visual database. We demonstrated significant classification accuracy gains by each added stage of the proposed algorithm which, when combined, can achieve up to 27% improvement. Overall, we achieved a 60% (49%) visual-only frame-level visemic classification accuracy with (without) use of test set viseme boundaries. In addition, we report improved audio-visual phonetic classification over the use of a single-stage image transform visual front end. Finally, we discuss preliminary speech recognition results.  相似文献   

19.
基于Adaboost算法的人眼状态检测   总被引:2,自引:0,他引:2  
许世峰  曾义 《计算机仿真》2007,24(7):214-216,341
人眼检测在表情识别和人脸识别中起着非常重要的作用,作为一种预处理的手段,人眼检测和定位可以有效地提高表情识别和人脸识别的识别率.提出了一种基于Adaboost算法的实时人眼状态检测的方法.Adaboost是一个构造准确分类器的学习方法.它将一簇弱分类器通过一定的规则结合成为一个强分类器,再把这些强分类器级联成为一个快速、准确的分类器.分析和讨论训练阶段不同的人眼特征选择对最终检测的影响,并实验测试各种特征方法对特定目标的检测率,给出一个理想的分类器.  相似文献   

20.
Recognition of emotion in speech has recently matured to one of the key disciplines in speech analysis serving next generation human-machine interaction and communication. However, compared to automatic speech recognition, that emotion recognition from an isolated word or a phrase is inappropriate for conversation. Because a complete emotional expression may stride across several sentences, and may fetch-up on any word in dialogue. In this paper, we present a segment-based emotion recognition approach to continuous Mandarin Chinese speech. In this proposed approach, the unit for recognition is not a phrase or a sentence but an emotional expression in dialogue. To that end, the following procedures are presented: First, we evaluate the performance of several classifiers in short sentence speech emotion recognition architectures. The results of the experiments show that the WD-KNN classifier achieves the best accuracy for the 5-class emotion recognition what among the five classification techniques. We then implemented a continuous Mandarin Chinese speech emotion recognition system with an emotion radar chart which is based on WD-KNN; this system can represent the intensity of each emotion component in speech. This proposed approach shows how emotions can be recognized by speech signals, and in turn how emotional states can be visualized.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号