期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Arabic speech recognition using SPHINX engine

Hussein Hyassat Raed Abu Zitar 《International Journal of Speech Technology》2006,9(3-4):133-150

相似文献

2.

Arabic speech synthesis and diacritic recognition

Ilyes Rebai Yassine BenAyed 《International Journal of Speech Technology》2016,19(3):485-494

Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech. 相似文献

3.

Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

Khalid M. O. Nahar Mohammed Abu Shquier Wasfi G. Al-Khatib Husni Al-Muhtaseb Moustafa Elshafei 《International Journal of Speech Technology》2016,19(3):495-508

相似文献

4.

Robust Arabic speech recognition in noisy environments using prosodic features and formant

Anissa Imen Amrous Mohamed Debyeche Abderrahman Amrouche 《International Journal of Speech Technology》2011,14(4):351-359

This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK Toolkit. The front-end of the system combines features based on conventional Mel-Frequency Cepstral Coefficient (MFFC), prosodic information and formants. The experiments are performed on the ARADIGIT corpus which is a database of Arabic spoken words. The obtained results show that the resulting multivariate feature vectors, in noisy environment, lead to a significant improvement, up to 27%, in word accuracy relative the word accuracy obtained from the state-of-the-art MFCC-based system. 相似文献

5.

Cross-word Arabic pronunciation variation modeling for speech recognition

Dia AbuZeina Wasfi Al-Khatib Moustafa Elshafei Husni Al-Muhtaseb 《International Journal of Speech Technology》2011,14(3):227-236

相似文献

6.

基于N元模型的维吾尔语词性标注实验研究

尼加提·纳吉米买合木提·买买提吐尔根·依布拉音《计算机工程与应用》2012,48(25):137-140,173

词性标注有很多不同的研究方法,目前的维吾尔语词性标注方法都以基于规则的方法为主,其准确程度尚不能完全令人满意。在大规模人工标注的语料库的基础之上,研究了基于N元语言模型的维吾尔语词性自动标注的方法,分析了N元语言模型参数的选取以及数据平滑,比较了二元、三元文法模型对维吾尔语词性标注的效率;研究了标注集和训练语料规模对词性标注正确率的影响。实验结果表明,用该方法对维吾尔语进行词性标注有良好的效果。相似文献

7.

HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications

Hamza Frihia Halima Bahi 《International Journal of Speech Technology》2017,20(3):563-573

Building a large vocabulary continuous speech recognition (LVCSR) system requires a lot of hours of segmented and labelled speech data. Arabic language, as many other low-resourced languages, lacks such data, but the use of automatic segmentation proved to be a good alternative to make these resources available. In this paper, we suggest the combination of hidden Markov models (HMMs) and support vector machines (SVMs) to segment and to label the speech waveform into phoneme units. HMMs generate the sequence of phonemes and their frontiers; the SVM refines the frontiers and corrects the labels. The obtained segmented and labelled units may serve as a training set for speech recognition applications. The HMM/SVM segmentation algorithm is assessed using both the hit rate and the word error rate (WER); the resulting scores were compared to those provided by the manual segmentation and to those provided by the well-known embedded learning algorithm. The results show that the speech recognizer built upon the HMM/SVM segmentation outperforms in terms of WER the one built upon the embedded learning segmentation of about 0.05%, even in noisy background. 相似文献

8.

Bidirectional HMM-based Arabic POS tagging

Ayoub Kadim Azzeddine Lazrek 《International Journal of Speech Technology》2016,19(2):303-312

In this work, we will present a new concept of POS tagging that will be implemented for the Arabic language. Indeed, we will see that in Arabic there are a numerous cases where the determination of the morpho-syntactic state of a word depends on the states of the subsequent words, which represents the theoretical foundation of the approach: how to consider, in addition of the past elements, the future ones. We will then demonstrate how the POS tagging in its statistical application: the HMM, is based mainly on the past elements, and how to combine both direct and reverse taggers to tag the same sequence of words in both senses. Thus, we will propose a hypothesis for the result selecting. In the practical part, we will present, in general, the used resource and the changes made on it. Then we will explain the experiment steps and the parameters collected and presented on graphics, that we will discuss later to lead to the final conclusion. 相似文献

9.

Self-learning speaker identification for enhanced speech recognition

Tobias Herbig Franz Gerl Wolfgang Minker 《Computer Speech and Language》2012,26(3):210-227

A novel approach for joint speaker identification and speech recognition is presented in this article. Unsupervised speaker tracking and automatic adaptation of the human-computer interface is achieved by the interaction of speaker identification, speech recognition and speaker adaptation for a limited number of recurring users. Together with a technique for efficient information retrieval a compact modeling of speech and speaker characteristics is presented. Applying speaker specific profiles allows speech recognition to take individual speech characteristics into consideration to achieve higher recognition rates. Speaker profiles are initialized and continuously adapted by a balanced strategy of short-term and long-term speaker adaptation combined with robust speaker identification. Different users can be tracked by the resulting self-learning speech controlled system. Only a very short enrollment of each speaker is required. Subsequent utterances are used for unsupervised adaptation resulting in continuously improved speech recognition rates. Additionally, the detection of unknown speakers is examined under the objective to avoid the requirement to train new speaker profiles explicitly. The speech controlled system presented here is suitable for in-car applications, e.g. speech controlled navigation, hands-free telephony or infotainment systems, on embedded devices. Results are presented for a subset of the SPEECON database. The results validate the benefit of the speaker adaptation scheme and the unified modeling in terms of speaker identification and speech recognition rates. 相似文献

10.

藏文词性自动标注中歧义问题处理方法研究

羊毛卓玛《计算机工程与应用》2013,(24):135-137,148

藏文词性自动标注是藏文信息处理后续句法分析、语义分析及篇章分析必不可少的基础工作。词性歧义问题的处理是藏文词性自动标注的关键所在,也是藏文信息处理的难点问题。对藏文词性标注中词性歧义问题进行了分析研究,提出了符合藏丈语法规则实用于藏文词性标注的解决词性排岐方法。实验证明：该处理方法在藏文词性自动标注中对词性排岐方面有较好的效果,使藏文词性标注正确率有了一定的提高。相似文献

11.

An experimental framework for Arabic digits speech recognition in noisy environments

Azzedine Touazi Mohamed Debyeche 《International Journal of Speech Technology》2017,20(2):205-224

相似文献

12.

数据库语义和词性双标注模型研究

李京辉邵温许向众《微计算机信息》2009,25(21)

本文主要分为三个方面:语义词典的构建,词语标注的数据结构和数据库语义的标注与排歧算法.其中词典用来存储数据库的语义信息,通过程序调用以标注分词后的词语;词语标注的数据结构采用了动态的方式存储数据库语义,可节省内存空间并增强程序的可读性;对于数据库的歧义问题提出了一种利用相关词的语义确定歧义词的语义的方法,充分利用了词语之问的相互关系. 相似文献

13.

Simulated annealing based classifier ensemble techniques: Application to part of speech tagging

Asif Ekbal Sriparna Saha 《Information Fusion》2013,14(3):288-300

Part-of-Speech (PoS) tagging is an important pipelined module for almost all Natural Language Processing (NLP) application areas. In this paper we formulate PoS tagging within the frameworks of single and multi-objective optimization techniques. At the very first step we propose a classifier ensemble technique for PoS tagging using the concept of single objective optimization (SOO) that exploits the search capability of simulated annealing (SA). Thereafter we devise a method based on multiobjective optimization (MOO) to solve the same problem, and for this a recently developed multiobjective simulated annealing based technique, AMOSA, is used. The characteristic features of AMOSA are its concepts of the amount of domination and archive in simulated annealing, and situation specific acceptance probabilities. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) as the underlying classification methods that make use of a diverse set of features, mostly based on local contexts and orthographic constructs. We evaluate our proposed approaches for two Indian languages, namely Bengali and Hindi. Evaluation results of the single objective version shows the overall accuracy of 88.92% for Bengali and 87.67% for Hindi. The MOO based ensemble yields the overall accuracies of 90.45% and 89.88% for Bengali and Hindi, respectively. 相似文献

14.

Paralinguistic profiling using speech recognition

Swati Johar 《International Journal of Speech Technology》2014,17(3):205-209

This research explores the various indicators for non-verbal cues of speech and provides a method of building a paralinguistic profile of these speech characteristics which determines the emotional state of the speaker. Since a major part of human communication consists of vocalization, a robust approach that is capable of classifying and segmenting an audio stream into silent and voiced regions and developing a paralinguistic profile for the same is presented. The data consisting of disruptions is first segmented into frames and this data is analyzed by exploiting short term acoustic features, temporal characteristics of speech and measures of verbal productivity. A matrix is finally developed relating the paralinguistic properties of average pitch, energy, rate of speech, silence duration and loudness to their respective context. Happy and confident states possessed high values of energy and rate of speech and less silence duration whereas tense and sad states showed low values of energy and speech rate and high periods of silence. Paralanguage was found to be an important cue to decipher the implicit meaning in a speech sample. 相似文献

15.

Computer-assisted translation using speech recognition

Vidal E. Casacuberta F. Rodriguez L. Civera J. Hinarejos C.D.M. 《IEEE transactions on audio, speech, and language processing》2006,14(3):941-951

Current machine translation systems are far from being perfect. However, such systems can be used in computer-assisted translation to increase the productivity of the (human) translation process. The idea is to use a text-to-text translation system to produce portions of target language text that can be accepted or amended by a human translator using text or speech. These user-validated portions are then used by the text-to-text translation system to produce further, hopefully improved suggestions. There are different alternatives of using speech in a computer-assisted translation system: From pure dictated translation to simple determination of acceptable partial translations by reading parts of the suggestions made by the system. In all the cases, information from the text to be translated can be used to constrain the speech decoding search space. While pure dictation seems to be among the most attractive settings, unfortunately perfect speech decoding does not seem possible with the current speech processing technology and human error-correcting would still be required. Therefore, approaches that allow for higher speech recognition accuracy by using increasingly constrained models in the speech recognition process are explored here. All these approaches are presented under the statistical framework. Empirical results support the potential usefulness of using speech within the computer-assisted translation paradigm. 相似文献

16.

Audio-visual speech recognition using deep learning

Kuniaki Noda Yuki Yamaguchi Kazuhiro Nakadai Hiroshi G. Okuno Tetsuya Ogata 《Applied Intelligence》2015,42(4):722-737

相似文献

17.

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Mohamed O. M. Khelifa Yahya Mohamed Elhadj Yousfi Abdellah Mostafa Belkasmi 《International Journal of Speech Technology》2017,20(4):937-949

Conventional Hidden Markov Model (HMM) based Automatic Speech Recognition (ASR) systems generally utilize cepstral features as acoustic observation and phonemes as basic linguistic units. Some of the most powerful features currently used in ASR systems are Mel-Frequency Cepstral Coefficients (MFCCs). Speech recognition is inherently complicated due to the variability in the speech signal which includes within- and across-speaker variability. This leads to several kinds of mismatch between acoustic features and acoustic models and hence degrades the system performance. The sensitivity of MFCCs to speech signal variability motivates many researchers to investigate the use of a new set of speech feature parameters in order to make the acoustic models more robust to this variability and thus improve the system performance. The combination of diverse acoustic feature sets has great potential to enhance the performance of ASR systems. This paper is a part of ongoing research efforts aspiring to build an accurate Arabic ASR system for teaching and learning purposes. It addresses the integration of complementary features into standard HMMs for the purpose to make them more robust and thus improve their recognition accuracies. The complementary features which have been investigated in this work are voiced formants and Pitch in combination with conventional MFCC features. A series of experimentations under various combination strategies were performed to determine which of these integrated features can significantly improve systems performance. The Cambridge HTK tools were used as a development environment of the system and experimental results showed that the error rate was successfully decreased, the achieved results seem very promising, even without using language models. 相似文献

18.

Arabic handwritten digit recognition

Sherif Abdleazeem Ezzat El-Sherif 《International Journal on Document Analysis and Recognition》2008,11(3):127-141

In this paper, we fill a gap in the literature by studying the problem of Arabic handwritten digit recognition. The performances of different classification and feature extraction techniques on recognizing Arabic digits are going to be reported to serve as a benchmark for future work on the problem. The performance of well known classifiers and feature extraction techniques will be reported in addition to a novel feature extraction technique we present in this paper that gives a high accuracy and competes with the state-of-the-art techniques. A total of 54 different classifier/features combinations will be evaluated on Arabic digits in terms of accuracy and classification time. The results are analyzed and the problem of the digit ‘0’ is identified with a proposed method to solve it. Moreover, we propose a strategy to select and design an optimal two-stage system out of our study and, hence, we suggest a fast two-stage classification system for Arabic digits which achieves as high accuracy as the highest classifier/features combination but with much less recognition time. 相似文献

19.

Off-line Arabic character recognition

Goraine H. Usher M. Al-Emami S. 《Computer》1992,25(7):71-74

A personal computer-based Arabic character recognition system that performs three preprocessing stages sequentially, thinning, stroke segmentation, and sampling, is described. The eight-direction code used for stroke representation and classification, the character classification done at primary and secondary levels, and the contextual postprocessor used for error detection and correction are described. Experimental results obtained using samples of handwritten and typewritten Arabic words are presented 相似文献

20.

Investigation Amazigh speech recognition using CMU tools

Hassan Satori Fatima ElHaoussi 《International Journal of Speech Technology》2014,17(3):235-243

相似文献