首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Lip-reading technologies are rapidly progressing following the breakthrough of deep learning. It plays a vital role in its many applications, such as: human-machine communication practices or security applications. In this paper, we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms. The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers. The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase. Firstly, we extract keyframes from our dataset. Secondly, we produce a Concatenated Frame Images (CFIs) that represent the utterance sequence in one single image. Finally, the VGG-19 is employed for visual features extraction in our proposed model. We have examined different keyframes: 10, 15, and 20 for comparing two types of approaches in the proposed model: (1) the VGG-19 base model and (2) VGG-19 base model with batch normalization. The results show that the second approach achieves greater accuracy: 94% for digit recognition, 97% for phrase recognition, and 93% for digits and phrases recognition in the test dataset. Therefore, our proposed model is superior to models based on CFIs input.  相似文献   

2.
根据人耳听觉特性,提出新的同步多带最大似然线性回归算法用于噪声环境下语音识别。该算法采用最大似然作为参数估计准则,利用各频带信号同步感知和噪声污染假定的方法进行语音模型补偿,有效地提高了识别系统在噪声环境下的识别性能。  相似文献   

3.
彭勇 《中国科技博览》2013,(32):247-249,251
随着计算机技术和通信技术的快速发展,语音识别技术在国民经济中的各个领域得到了广泛的应用,并有相关产品的问世。但为了提高工作效率和节省企业的成本,有许多特定应用要与语音识别进行融合。针对企业报关系统的特点,采用了一种基于HMM模型的二级单字识别方法,解决了系统识别效率与识别稳定性的问题,使得该语音识别方法最终满足了报关系统的应用要求,并扼要介绍了词汇库维护、新人语音训练及建立语音新模型的过程。  相似文献   

4.
语音识别技术可以为要求双手同时作业的操作人员和残疾人提供一种便捷的控制方法.本文提出了一种通过结合FF2(Second-order Frequency Filtering)和RASTA(RelAtive SpecTrAl)技术来增强语音识别鲁棒性的方法,并将这种方法成功应用于机器人化护理床的控制系统中,增强了识别系统在医院、工厂等非稳定噪声环境下语音识别的鲁棒性.通过将HMM/GMM混合模型的传统Mel频率倒谱系数为特征值的识别系统与HMM/GMM混合模型的RASTA-FF2(RelAtive SpectrAl-Second-order Frequency Filtering)为特征值的识别系统进行比较,并分别在纯语音和带噪语音条件下进行测试,得出:经过二阶频率滤波后的FF2特征值再经过RASTA滤波器滤波,特别是在非稳定噪声环境下,以RASTA-FF2为特征值的识别系统比传统的识别系统的识别率更高.这表明FF2特征值与RASTA滤波器技术相结合,一个作用于频域,一个作用于时间域,可以有效地消除语音信号中的不同噪声成份.  相似文献   

5.
负性面部表情影响面孔身份识别的实验研究   总被引:1,自引:0,他引:1  
为验证负性面部表情对面孔身份识别的干扰效应,采用中国人的面孔表情图片为材料,设计了两个Garner范式实验:实验一重复过去研究采用愤怒与快乐表情图片为材料,实验二采用愤怒与悲伤表情图片为材料。结果发现,愤怒与快乐表情不影响面孔身份识别,而愤怒与悲伤表情影响面孔身份识别,说明负性表情能够影响面孔身份识别,结果支持表情身份非独立加工观。这一结果也弥补了过去研究难以发现表情影响身份识别的不足。  相似文献   

6.
首先讨论了基于MCE/GPD的语音识别研究的的最新进展。在此基础上,提出了一种环境特征判别学习的Robust语音识别方法,该方法基于最小分类错误准则利用梯度下降法迭代地学习环境特征。由于梯度下降法产生的是局部最优解,因此,寻找较好的环境特征初始值就显得非常重要。最后,讨论了这种环境特征判别学习方法中参数的初始值选择问题。  相似文献   

7.
The two-stream convolutional neural network exhibits excellent performance in the video action recognition. The crux of the matter is to use the frames already clipped by the videos and the optical flow images pre-extracted by the frames, to train a model each, and to finally integrate the outputs of the two models. Nevertheless, the reliance on the pre-extraction of the optical flow impedes the efficiency of action recognition, and the temporal and the spatial streams are just simply fused at the ends, with one stream failing and the other stream succeeding. We propose a novel hidden twostream collaborative (HTSC) learning network that masks the steps of extracting the optical flow in the network and greatly speeds up the action recognition. Based on the two-stream method, the two-stream collaborative learning model captures the interaction of the temporal and spatial features to greatly enhance the accuracy of recognition. Our proposed method is highly capable of achieving the balance of efficiency and precision on large-scale video action recognition datasets.  相似文献   

8.
Human action recognition under complex environment is a challenging work. Recently, sparse representation has achieved excellent results of dealing with human action recognition problem under different conditions. The main idea of sparse representation classification is to construct a general classification scheme where the training samples of each class can be considered as the dictionary to express the query class, and the minimal reconstruction error indicates its corresponding class. However, how to learn a discriminative dictionary is still a difficult work. In this work, we make two contributions. First, we build a new and robust human action recognition framework by combining one modified sparse classification model and deep convolutional neural network (CNN) features. Secondly, we construct a novel classification model which consists of the representation-constrained term and the coefficients incoherence term. Experimental results on benchmark datasets show that our modified model can obtain competitive results in comparison to other state-of-the-art models.  相似文献   

9.
By recognizing sensory information, through touch, vision, or voice sensory modalities, a robot can interact with people in a more intelligent manner. In human–robot interaction (HRI), emotion recognition has been a popular research topic in recent years. This paper proposes a method for emotion recognition, using a speech signal to recognize several basic human emotional states, for application in an entertainment robot. The proposed method uses voice signal processing and classification. Firstly, end-point detection and frame setting are accomplished in the pre-processing stage. Then, the statistical features of the energy contour are computed. Fisher’s linear discriminant analysis (FLDA) is used to enhance the recognition rate. In the final stage, a support vector machine (SVM) is used to complete the emotional state classification. In order to determine the effectiveness of emotional HRI, an embedded system was constructed and integrated with a self-built entertainment robot. The experimental results for the entertainment robot show that the robot interacts with a person in a responsive manner. The average recognition rate for five emotional states is 73.8% using the database constructed in the authors’ lab.  相似文献   

10.
把基于统计的语料概率统计方法与基于规则的自然语言理解方法结合起来,提出了一种新的汉语计算语言模型,并把该模型应用于语音识别后处理模块中,取得了较理想的结果。  相似文献   

11.
林军 《包装工程》2017,38(8):34-37
目的探索信息时代网络应用艺术设计与民族身份认同的相关性,认清民族艺术和世界艺术的关系,促进民族艺术发展。方法通过分析新时代网络应用艺术设计的特点和民族身份认同的因素,对有代表性的网页、多媒体、APP等网络应用进行分析,论证民族艺术和世界艺术的关系。结论网络应用艺术设计在网页、多媒体、APP等领域可以获取大众认可,其民族身份得到继承、发扬和认同。  相似文献   

12.
This pilot study focuses on employment of hybrid LMS-ICA system for in-vehicle background noise reduction. Modern vehicles are nowadays increasingly supporting voice commands, which are one of the pillars of autonomous and SMART vehicles. Robust speaker recognition for context-aware in-vehicle applications is limited to a certain extent by in-vehicle background noise. This article presents the new concept of a hybrid system, which is implemented as a virtual instrument. The highly modular concept of the virtual car used in combination with real recordings of various driving scenarios enables effective testing of the investigated methods of in-vehicle background noise reduction. The study also presents a unique concept of an adaptive system using intelligent clusters of distributed next generation 5G data networks, which allows the exchange of interference information and/or optimal hybrid algorithm settings between individual vehicles. On average, the unfiltered voice commands were successfully recognized in 29.34% of all scenarios, while the LMS reached up to 71.81%, and LMS-ICA hybrid improved the performance further to 73.03%.  相似文献   

13.
黄群 《影像技术》2010,22(1):50-51,56
打击破坏野生动物资源犯罪是森林警察的神圣职责。本文重点阐述如何利用图像识别技术鉴定犀牛角和豹皮的真伪、狮子的种属及相关法律,保护国家珍惜濒危野生动物。  相似文献   

14.
神经网络方法及其在语音识别中的应用   总被引:2,自引:0,他引:2  
讨论了神经网络技术用于汉语语音信号的端点检测,声,韵母分离、非线性特征抽取和大字表识别的方法,描述了系统实现框图并给出了应用实例。  相似文献   

15.
Flexible piezoelectric acoustic sensors have been developed to generate multiple sound signals with high sensitivity, shifting the paradigm of future voice technologies. Speech recognition based on advanced acoustic sensors and optimized machine learning software will play an innovative interface for artificial intelligence (AI) services. Collaboration and novel approaches between both smart sensors and speech algorithms should be attempted to realize a hyperconnected society, which can offer personalized services such as biometric authentication, AI secretaries, and home appliances. Here, representative developments in speech recognition are reviewed in terms of flexible piezoelectric materials, self-powered sensors, machine learning algorithms, and speaker recognition.  相似文献   

16.
黄小格  韩超 《工业工程设计》2022,4(3):33-38, 75
川西北的白马藏族和嘉绒藏族都属于藏族的分支,地处“藏羌彝走廊”的关键节点,文化成分多元,表现出杂糅、多元、交融的文化面貌。虽然白马藏族和嘉绒藏族在地理位置上有一定距离,但二者的服饰文化之间存在一定的关联性特征,尤其是在女性传统服饰设计表现上更为明显。主要对川西北的白马藏族和嘉绒藏族女性传统服饰进行了比较分析,因二者基于地缘关系所形成的自身文化特质和接受“他文化”影响的来源和程度的差异,呈现出各自族群特色的服饰表现形式和文化内涵。  相似文献   

17.
Previous studies have shown that there is potential semantic dependency between part-of-speech and semantic roles. At the same time, the predicate-argument structure in a sentence is important information for semantic role labeling task. In this work, we introduce the auxiliary deep neural network model, which models semantic dependency between part-of-speech and semantic roles and incorporates the information of predicate-argument into semantic role labeling. Based on the framework of joint learning, part-of-speech tagging is used as an auxiliary task to improve the result of the semantic role labeling. In addition, we introduce the argument recognition layer in the training process of the main task-semantic role labeling, so the argument-related structural information selected by the predicate through the attention mechanism is used to assist the main task. Because the model makes full use of the semantic dependency between part-of-speech and semantic roles and the structural information of predicateargument, our model achieved the F1 value of 89.0% on the WSJ test set of CoNLL2005, which is superior to existing state-of-the-art model about 0.8%.  相似文献   

18.
康峥  黄志华  赖惠成 《声学技术》2022,41(6):862-870
随着压缩感知的深入研究,压缩感知在语音增强方面的应用也备受关注。针对传统压缩感知语音增强算法中存在的不足,将压缩感知与深度学习结合构建名为基于深度压缩感知的语音增强模型(Speech Enhancement based on Deep Compressed Sensing, SEDCS)。基于压缩感知原理使用编解码模型代替压缩感知中语音信号稀疏过程,使用卷积神经网络代替测量矩阵实现语音信号观测降维过程,通过联合训练的方式实现语音增强。实验结果表明:该模型能够完成语音增强任务,并且与现有的压缩感知语音增强算法相比,该模型能取得较好的语音增强效果;相比利用深度学习的语音增强算法,该模型虽性能一般,但在模型泛化性能和测试阶段的增强时间效率上有一定提升。  相似文献   

19.
One of the most commonly reported disabilities is vision loss, which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient. This procedure, however, usually requires an appointment with an ophthalmologist, which is both time-consuming and expensive process. Other issues that can arise include a lack of appropriate equipment and trained practitioners, especially in rural areas. Centered on a cognitively motivated attribute extraction and speech recognition approach, this paper proposes a novel idea that immediately determines the eyesight deficiency. The proposed system uses an adaptive filter bank with weighted mel frequency cepstral coefficients for feature extraction. The adaptive filter bank implementation is inspired by the principle of spectrum sensing in cognitive radio that is aware of its environment and adapts to statistical variations in the input stimuli by learning from the environment. Comparative performance evaluation demonstrates the potential of our automated visual acuity test method to achieve comparable results to the clinical ground truth, established by the expert ophthalmologist’s tests. The overall accuracy achieved by the proposed model when compared with the expert ophthalmologist test is 91.875%. The proposed method potentially offers a second opinion to ophthalmologists, and serves as a cost-effective pre-screening test to predict eyesight loss at an early stage.  相似文献   

20.
As the first barrier to protect cyberspace, the CAPTCHA has made significant contributions to maintaining Internet security and preventing malicious attacks. By researching the CAPTCHA, we can find its vulnerability and improve the security of CAPTCHA. Recently, many studies have shown that improving the image preprocessing effect of the CAPTCHA, which can achieve a better recognition rate by the state-of-the-art machine learning algorithms. There are many kinds of noise and distortion in the CAPTCHA images of this experiment. We propose an adaptive median filtering algorithm based on divide and conquer in this paper. Firstly, the filtering window data quickly sorted by the data correlation, which can greatly improve the filtering efficiency. Secondly, the size of the filtering window is adaptively adjusted according to the noise density. As demonstrated in the experimental results, the proposed scheme can achieve superior performance compared with the conventional median filter. The algorithm can not only effectively detect the noise and remove it, but also has a good effect in preservation details. Therefore, this algorithm can be one of the most strong tools for various CAPTCHA image recognition and related applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号