期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Deep Learning-Based Approach for Arabic Visual Speech Recognition

Nadia H. Alsulami Amani T. Jamal Lamiaa A. Elrefaei 《计算机、材料和连续体（英文）》2022,71(1):85-108

Lip-reading technologies are rapidly progressing following the breakthrough of deep learning. It plays a vital role in its many applications, such as: human-machine communication practices or security applications. In this paper, we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms. The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers. The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase. Firstly, we extract keyframes from our dataset. Secondly, we produce a Concatenated Frame Images (CFIs) that represent the utterance sequence in one single image. Finally, the VGG-19 is employed for visual features extraction in our proposed model. We have examined different keyframes: 10, 15, and 20 for comparing two types of approaches in the proposed model: (1) the VGG-19 base model and (2) VGG-19 base model with batch normalization. The results show that the second approach achieves greater accuracy: 94% for digit recognition, 97% for phrase recognition, and 93% for digits and phrases recognition in the test dataset. Therefore, our proposed model is superior to models based on CFIs input. 相似文献

2.

多带同步模型用于噪声环境下语音识别

孙吴镇扬《中国工程科学》2006,8(3):31-34

根据人耳听觉特性，提出新的同步多带最大似然线性回归算法用于噪声环境下语音识别。该算法采用最大似然作为参数估计准则，利用各频带信号同步感知和噪声污染假定的方法进行语音模型补偿，有效地提高了识别系统在噪声环境下的识别性能。相似文献

3.

基于HMM的实时语音识别方法研究

彭勇《中国科技博览》2013,(32):247-249,251

随着计算机技术和通信技术的快速发展,语音识别技术在国民经济中的各个领域得到了广泛的应用,并有相关产品的问世。但为了提高工作效率和节省企业的成本,有许多特定应用要与语音识别进行融合。针对企业报关系统的特点,采用了一种基于HMM模型的二级单字识别方法,解决了系统识别效率与识别稳定性的问题,使得该语音识别方法最终满足了报关系统的应用要求,并扼要介绍了词汇库维护、新人语音训练及建立语音新模型的过程。相似文献

4.

基于RASTA-FF2滤波降噪技术的语音识别

张东谢存禧《测试技术学报》2006,20(6):549-553

语音识别技术可以为要求双手同时作业的操作人员和残疾人提供一种便捷的控制方法.本文提出了一种通过结合FF2(Second-order Frequency Filtering)和RASTA(RelAtive SpecTrAl)技术来增强语音识别鲁棒性的方法,并将这种方法成功应用于机器人化护理床的控制系统中,增强了识别系统在医院、工厂等非稳定噪声环境下语音识别的鲁棒性.通过将HMM/GMM混合模型的传统Mel频率倒谱系数为特征值的识别系统与HMM/GMM混合模型的RASTA-FF2(RelAtive SpectrAl-Second-order Frequency Filtering)为特征值的识别系统进行比较,并分别在纯语音和带噪语音条件下进行测试,得出:经过二阶频率滤波后的FF2特征值再经过RASTA滤波器滤波,特别是在非稳定噪声环境下,以RASTA-FF2为特征值的识别系统比传统的识别系统的识别率更高.这表明FF2特征值与RASTA滤波器技术相结合,一个作用于频域,一个作用于时间域,可以有效地消除语音信号中的不同噪声成份. 相似文献

5.

负性面部表情影响面孔身份识别的实验研究 总被引：1，自引：0，他引：1

汪亚珉傅小兰《人类工效学》2007,13(3):1-3

为验证负性面部表情对面孔身份识别的干扰效应，采用中国人的面孔表情图片为材料，设计了两个Garner范式实验：实验一重复过去研究采用愤怒与快乐表情图片为材料，实验二采用愤怒与悲伤表情图片为材料。结果发现，愤怒与快乐表情不影响面孔身份识别，而愤怒与悲伤表情影响面孔身份识别，说明负性表情能够影响面孔身份识别，结果支持表情身份非独立加工观。这一结果也弥补了过去研究难以发现表情影响身份识别的不足。相似文献

6.

基于MCE/GPD的语音识别及其一种Robust应用中初始参数的选择 总被引：2，自引：0，他引：2

韩纪庆高文张磊王承发《高技术通讯》2000,10(7):41-44

首先讨论了基于ＭＣＥ／ＧＰＤ的语音识别研究的的最新进展。在此基础上，提出了一种环境特征判别学习的Ｒｏｂｕｓｔ语音识别方法，该方法基于最小分类错误准则利用梯度下降法迭代地学习环境特征。由于梯度下降法产生的是局部最优解，因此，寻找较好的环境特征初始值就显得非常重要。最后，讨论了这种环境特征判别学习方法中参数的初始值选择问题。相似文献

7.

Hidden Two-Stream Collaborative Learning Network for Action Recognition

Shuren Zhou Le Chen Vijayan Sugumaran 《计算机、材料和连续体（英文）》2020,63(3):1545-1561

The two-stream convolutional neural network exhibits excellent performance in the video action recognition. The crux of the matter is to use the frames already clipped by the videos and the optical flow images pre-extracted by the frames, to train a model each, and to finally integrate the outputs of the two models. Nevertheless, the reliance on the pre-extraction of the optical flow impedes the efficiency of action recognition, and the temporal and the spatial streams are just simply fused at the ends, with one stream failing and the other stream succeeding. We propose a novel hidden twostream collaborative (HTSC) learning network that masks the steps of extracting the optical flow in the network and greatly speeds up the action recognition. Based on the two-stream method, the two-stream collaborative learning model captures the interaction of the temporal and spatial features to greatly enhance the accuracy of recognition. Our proposed method is highly capable of achieving the balance of efficiency and precision on large-scale video action recognition datasets. 相似文献

8.

Human Action Recognition Based on Supervised Class-Specific Dictionary Learning with Deep Convolutional Neural Network Features

Binjie Gu Weili Xiong Zhonghu Bai 《计算机、材料和连续体（英文）》2020,63(1):243-261

Human action recognition under complex environment is a challenging work. Recently, sparse representation has achieved excellent results of dealing with human action recognition problem under different conditions. The main idea of sparse representation classification is to construct a general classification scheme where the training samples of each class can be considered as the dictionary to express the query class, and the minimal reconstruction error indicates its corresponding class. However, how to learn a discriminative dictionary is still a difficult work. In this work, we make two contributions. First, we build a new and robust human action recognition framework by combining one modified sparse classification model and deep convolutional neural network (CNN) features. Secondly, we construct a novel classification model which consists of the representation-constrained term and the coefficients incoherence term. Experimental results on benchmark datasets show that our modified model can obtain competitive results in comparison to other state-of-the-art models. 相似文献

9.

Speech signal-based emotion recognition and its application to entertainment robots

Kai-Tai Song Meng-Ju Han Shih-Chieh Wang 《中国工程学刊》2014,37(1):14-25

By recognizing sensory information, through touch, vision, or voice sensory modalities, a robot can interact with people in a more intelligent manner. In human–robot interaction (HRI), emotion recognition has been a popular research topic in recent years. This paper proposes a method for emotion recognition, using a speech signal to recognize several basic human emotional states, for application in an entertainment robot. The proposed method uses voice signal processing and classification. Firstly, end-point detection and frame setting are accomplished in the pre-processing stage. Then, the statistical features of the energy contour are computed. Fisher’s linear discriminant analysis (FLDA) is used to enhance the recognition rate. In the final stage, a support vector machine (SVM) is used to complete the emotional state classification. In order to determine the effectiveness of emotional HRI, an embedded system was constructed and integrated with a self-built entertainment robot. The experimental results for the entertainment robot show that the robot interacts with a person in a responsive manner. The average recognition rate for five emotional states is 73.8% using the database constructed in the authors’ lab. 相似文献

10.

基于统计与规则相结合的汉语计算语言模型及其在语音识别中的应用 总被引：1，自引：0，他引：1

关毅王晓龙张凯《高技术通讯》1998,8(4):16-20

把基于统计的语料概率统计方法与基于规则的自然语言理解方法结合起来，提出了一种新的汉语计算语言模型，并把该模型应用于语音识别后处理模块中，取得了较理想的结果。相似文献

11.

网络应用艺术设计的民族身份认同

林军《包装工程》2017,38(8):34-37

目的探索信息时代网络应用艺术设计与民族身份认同的相关性,认清民族艺术和世界艺术的关系,促进民族艺术发展。方法通过分析新时代网络应用艺术设计的特点和民族身份认同的因素,对有代表性的网页、多媒体、APP等网络应用进行分析,论证民族艺术和世界艺术的关系。结论网络应用艺术设计在网页、多媒体、APP等领域可以获取大众认可,其民族身份得到继承、发扬和认同。相似文献

12.

Hybrid In-Vehicle Background Noise Reduction for Robust Speech Recognition: The Possibilities of Next Generation 5G Data Networks

Radek Martinek Jan Baros Rene Jaros Lukas Danys Jan Nedoma 《计算机、材料和连续体（英文）》2022,71(3):4659-4676

This pilot study focuses on employment of hybrid LMS-ICA system for in-vehicle background noise reduction. Modern vehicles are nowadays increasingly supporting voice commands, which are one of the pillars of autonomous and SMART vehicles. Robust speaker recognition for context-aware in-vehicle applications is limited to a certain extent by in-vehicle background noise. This article presents the new concept of a hybrid system, which is implemented as a virtual instrument. The highly modular concept of the virtual car used in combination with real recordings of various driving scenarios enables effective testing of the investigated methods of in-vehicle background noise reduction. The study also presents a unique concept of an adaptive system using intelligent clusters of distributed next generation 5G data networks, which allows the exchange of interference information and/or optimal hybrid algorithm settings between individual vehicles. On average, the unfiltered voice commands were successfully recognized in 29.34% of all scenarios, while the LMS reached up to 71.81%, and LMS-ICA hybrid improved the performance further to 73.03%. 相似文献

13.

野生动物的图像识别与相关的法律研究

黄群《影像技术》2010,22(1):50-51,56

打击破坏野生动物资源犯罪是森林警察的神圣职责。本文重点阐述如何利用图像识别技术鉴定犀牛角和豹皮的真伪、狮子的种属及相关法律,保护国家珍惜濒危野生动物。相似文献

14.

神经网络方法及其在语音识别中的应用 总被引：2，自引：0，他引：2

胡瑞敏薛东辉《高技术通讯》1995,5(6):11-15

讨论了神经网络技术用于汉语语音信号的端点检测，声，韵母分离、非线性特征抽取和大字表识别的方法，描述了系统实现框图并给出了应用实例。相似文献

15.

Flexible Piezoelectric Acoustic Sensors and Machine Learning for Speech Processing

Young Hoon Jung Seong Kwang Hong Hee Seung Wang Jae Hyun Han Trung Xuan Pham Hyunsin Park Junyeong Kim Sunghun Kang Chang D. Yoo Keon Jae Lee 《Advanced materials (Deerfield Beach, Fla.)》2020,32(35):1904020

Flexible piezoelectric acoustic sensors have been developed to generate multiple sound signals with high sensitivity, shifting the paradigm of future voice technologies. Speech recognition based on advanced acoustic sensors and optimized machine learning software will play an innovative interface for artificial intelligence (AI) services. Collaboration and novel approaches between both smart sensors and speech algorithms should be attempted to realize a hyperconnected society, which can offer personalized services such as biometric authentication, AI secretaries, and home appliances. Here, representative developments in speech recognition are reviewed in terms of flexible piezoelectric materials, self-powered sensors, machine learning algorithms, and speaker recognition. 相似文献

16.

白马藏族与嘉绒藏族女性传统服饰特色比较

下载免费PDF全文

黄小格韩超《工业工程设计》2022,4(3):33-38, 75

川西北的白马藏族和嘉绒藏族都属于藏族的分支，地处“藏羌彝走廊”的关键节点，文化成分多元，表现出杂糅、多元、交融的文化面貌。虽然白马藏族和嘉绒藏族在地理位置上有一定距离，但二者的服饰文化之间存在一定的关联性特征，尤其是在女性传统服饰设计表现上更为明显。主要对川西北的白马藏族和嘉绒藏族女性传统服饰进行了比较分析，因二者基于地缘关系所形成的自身文化特质和接受“他文化”影响的来源和程度的差异，呈现出各自族群特色的服饰表现形式和文化内涵。相似文献

17.

Jointly Part-of-Speech Tagging and Semantic Role Labeling Using Auxiliary Deep Neural Network Model

Yatian Shen Yubo Mai Xiajiong Shen Wenke Ding Mengjiao Guo 《计算机、材料和连续体（英文）》2020,65(1):529-541

Previous studies have shown that there is potential semantic dependency between part-of-speech and semantic roles. At the same time, the predicate-argument structure in a sentence is important information for semantic role labeling task. In this work, we introduce the auxiliary deep neural network model, which models semantic dependency between part-of-speech and semantic roles and incorporates the information of predicate-argument into semantic role labeling. Based on the framework of joint learning, part-of-speech tagging is used as an auxiliary task to improve the result of the semantic role labeling. In addition, we introduce the argument recognition layer in the training process of the main task-semantic role labeling, so the argument-related structural information selected by the predicate through the attention mechanism is used to assist the main task. Because the model makes full use of the semantic dependency between part-of-speech and semantic roles and the structural information of predicateargument, our model achieved the F1 value of 89.0% on the WSJ test set of CoNLL2005, which is superior to existing state-of-the-art model about 0.8%. 相似文献

18.

基于深度压缩感知的语音增强模型

下载免费PDF全文

康峥黄志华赖惠成《声学技术》2022,41(6):862-870

随着压缩感知的深入研究,压缩感知在语音增强方面的应用也备受关注。针对传统压缩感知语音增强算法中存在的不足,将压缩感知与深度学习结合构建名为基于深度压缩感知的语音增强模型(Speech Enhancement based on Deep Compressed Sensing, SEDCS)。基于压缩感知原理使用编解码模型代替压缩感知中语音信号稀疏过程,使用卷积神经网络代替测量矩阵实现语音信号观测降维过程,通过联合训练的方式实现语音增强。实验结果表明：该模型能够完成语音增强任务,并且与现有的压缩感知语音增强算法相比,该模型能取得较好的语音增强效果;相比利用深度学习的语音增强算法,该模型虽性能一般,但在模型泛化性能和测试阶段的增强时间效率上有一定提升。相似文献

19.

Speech Recognition-Based Automated Visual Acuity Testing with Adaptive Mel Filter Bank

Shibli Nisar Muhammad Asghar Khan Fahad Algarni Abdul Wakeel M. Irfan Uddin Insaf Ullah 《计算机、材料和连续体（英文）》2022,70(2):2991-3004

One of the most commonly reported disabilities is vision loss, which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient. This procedure, however, usually requires an appointment with an ophthalmologist, which is both time-consuming and expensive process. Other issues that can arise include a lack of appropriate equipment and trained practitioners, especially in rural areas. Centered on a cognitively motivated attribute extraction and speech recognition approach, this paper proposes a novel idea that immediately determines the eyesight deficiency. The proposed system uses an adaptive filter bank with weighted mel frequency cepstral coefficients for feature extraction. The adaptive filter bank implementation is inspired by the principle of spectrum sensing in cognitive radio that is aware of its environment and adapts to statistical variations in the input stimuli by learning from the environment. Comparative performance evaluation demonstrates the potential of our automated visual acuity test method to achieve comparable results to the clinical ground truth, established by the expert ophthalmologist’s tests. The overall accuracy achieved by the proposed model when compared with the expert ophthalmologist test is 91.875%. The proposed method potentially offers a second opinion to ophthalmologists, and serves as a cost-effective pre-screening test to predict eyesight loss at an early stage. 相似文献

20.

Adaptive Median Filtering Algorithm Based on Divide and Conquer and Its Application in CAPTCHA Recognition

Wentao Ma Jiaohua Qin Xuyu Xiang Yun Tan Yuanjing Luo Neal N. Xiong 《计算机、材料和连续体（英文）》2019,58(3):665-677

As the first barrier to protect cyberspace, the CAPTCHA has made significant contributions to maintaining Internet security and preventing malicious attacks. By researching the CAPTCHA, we can find its vulnerability and improve the security of CAPTCHA. Recently, many studies have shown that improving the image preprocessing effect of the CAPTCHA, which can achieve a better recognition rate by the state-of-the-art machine learning algorithms. There are many kinds of noise and distortion in the CAPTCHA images of this experiment. We propose an adaptive median filtering algorithm based on divide and conquer in this paper. Firstly, the filtering window data quickly sorted by the data correlation, which can greatly improve the filtering efficiency. Secondly, the size of the filtering window is adaptively adjusted according to the noise density. As demonstrated in the experimental results, the proposed scheme can achieve superior performance compared with the conventional median filter. The algorithm can not only effectively detect the noise and remove it, but also has a good effect in preservation details. Therefore, this algorithm can be one of the most strong tools for various CAPTCHA image recognition and related applications. 相似文献