首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
In this paper we investigated Artificial Neural Networks (ANN) based Automatic Speech Recognition (ASR) by using limited Arabic vocabulary corpora. These limited Arabic vocabulary subsets are digits and vowels carried by specific carrier words. In addition to this, Hidden Markov Model (HMM) based ASR systems are designed and compared to two ANN based systems, namely Multilayer Perceptron (MLP) and recurrent architectures, by using the same corpora. All systems are isolated word speech recognizers. The ANN based recognition system achieved 99.5% correct digit recognition. On the other hand, the HMM based recognition system achieved 98.1% correct digit recognition. With vowels carrier words, the MLP and recurrent ANN based recognition systems achieved 92.13% and 98.06, respectively, correct vowel recognition; but the HMM based recognition system achieved 91.6% correct vowel recognition.  相似文献   

In this paper we propose a novel character recognition method for Bangla compound characters. Accurate recognition of compound characters is a difficult problem due to their complex shapes. Our strategy is to decompose a compound character into skeletal segments. The compound character is then recognized by extracting the convex shape primitives and using a template matching scheme. The novelty of our approach lies in the formulation of appropriate rules of character decomposition for segmenting the character skeleton into stroke segments and then grouping them for extraction of meaningful shape components. Our technique is applicable to both printed and handwritten characters. The proposed method performs well for complex-shaped compound characters, which were confusing to the existing methods.  相似文献   

In this article we review several successful extensions to the standard hidden-Markov-model/artificial neural network (HMM/ANN) hybrid, which have recently made important contributions to the field of noise robust automatic speech recognition. The first extension to the standard hybrid was the “multi-band hybrid”, in which a separate ANN is trained on each frequency sub-band, followed by some form of weighted combination of ANN state posterior probability outputs prior to decoding. However, due to the inaccurate assumption of sub-band independence, this system usually gives degraded performance, except in the case of narrow-band noise. All of the systems which we review overcome this independence assumption and give improved performance in noise, while also improving or not significantly degrading performance with clean speech. The “all-combinations multi-band” hybrid trains a separate ANN for each sub-band combination. This, however, typically requires a large number of ANNs. The “all-combinations multi-stream” hybrid trains an ANN expert for every combination of just a small number of complementary data streams. Multiple ANN posteriors combination using maximum a-posteriori (MAP) weighting gives rise to the further successful strategy of hypothesis level combination by MAP selection. An alternative strategy for exploiting the classification capacity of ANNs is the “tandem hybrid” approach in which one or more ANN classifiers are trained with multi-condition data to generate discriminative and noise robust features for input to a standard ASR system. The “multi-stream tandem hybrid” trains an ANN for a number of complementary feature streams, permitting multi-stream data fusion. The “narrow-band tandem hybrid” trains an ANN for a number of particularly narrow frequency sub-bands. This gives improved robustness to noises not seen during training. Of the systems presented, all of the multi-stream systems provide generic models for multi-modal data fusion. Test results for each system are presented and discussed.  相似文献   

Printed Arabic character recognition using HMM   总被引:1,自引:0,他引:1       下载免费PDF全文
The Arabic Language has a very rich vocabulary. More than 200 million people speak this language as their native speaking, and over 1 billion people use it in several religion-related activities. In this paper a new technique is presented for recognizing printed Arabic characters. After a word is segmented, each character/word is entirely transformed into a feature vector. The features of printed Arabic characters include strokes and bays in various directions, endpoints, intersection points, loops, dots and zigzags. The word skeleton is decomposed into a number of links in orthographic order, and then it is transferred into a sequence of symbols using vector quantization. Single hidden Markov model has been used for recognizing the printed Arabic characters. Experimental results show that the high recognition rate depends on the number of states in each sample.  相似文献   

利用空间相关性的改进HMM模型   总被引:1,自引:0,他引:1  
语音识别领域中所采用的经典HMM模型,忽略了语音信号间的相关信息.针对这一问题,利用语音信号的空间相关性对经典HMM模型进行补偿,得到一种改进模型.该方法通过空间相关变换,描述了当前语音特征与历史数据之间的空间相关性,从而对联合状态输出分布进行建模.改进模型的解码算法利用空间相关性变换的参数更新算法在经典ⅧⅥM的解码算法基础上得到.实验结果表明,上述方法在说话人无关连续语音识别系统上获得了明显的性能改进.  相似文献   

Pattern Analysis and Applications - In this paper, we present a segmentation-free word spotting method based on Wave Kernel Signature (WKS) under the foundation of quantum mechanics. The query word...  相似文献   

为了提高软件体系结构求精的精确性与可追溯性,使处于不同抽象层次之间的体系结构之间形成规范的映射体系,引入了形式化方法,定义了一种基于上下文相关文法的形式化的求精文法,并将该文法应用到体系结构求精中,给出了基于构件的体系结构形式化求精过程.最后,基于体系结构求精方法建立了相应的用于指导软件开发的模型.  相似文献   

An interoperable context sensitive model of trust   总被引:2,自引:0,他引:2  
Although the notion of trust is widely used in secure information systems, very few works attempt to formally define it or reason about it. Moreover, in most works, trust is defined as a binary concept—either an entity is completely trusted or not at all. Absolute trust on an entity requires one to have complete knowledge about the entity. This is rarely the case in real-world applications. Not trusting an entity, on the other hand, prohibits all communications with the entity rendering it useless. In short, treating trust as a binary concept is not acceptable in practice. Consequently, a model is needed that incorporates the notion of different degrees of trust. We propose a model that allows us to formalize trust relationships. The trust relationship between a truster and a trustee is associated with a context and depends on the experience, knowledge, and recommendation that the truster has with respect to the trustee in the given context. We show how our model can measure trust and compare two trust relationships in a given context. Sometimes enough information is not available about a given context to evaluate trust. Towards this end we show how the relationships between different contexts can be captured using a context graph. Formalizing the relationships between contexts allows us to extrapolate values from related contexts to approximate the trust of an entity even when all the information needed to calculate the trust is not available. Finally, we show how the semantic mismatch that arises because of different sources using different context graphs can be resolved and the trust of information obtained from these different sources compared.
Sudip ChakrabortyEmail:

Visual voice activity detection (V-VAD) plays an important role in both HCI and HRI, affecting both the conversation strategy and sync between humans and robots/computers. The typical speakingness decision of V-VAD consists of post-processing for signal smoothing and classification using thresholding. Several parameters, ensuring a good trade-off between hit rate and false alarm, are usually heuristically defined. This makes the V-VAD approaches vulnerable to noisy observation and changes of environment conditions, resulting in poor performance and robustness to undesired frequent speaking state changes. To overcome those difficulties, this paper proposes a new probabilistic approach, naming bi-level HMM and analyzing lip activity energy for V-VAD in HRI. The designing idea is based on lip movement and speaking assumptions, embracing two essential procedures into a single model. A bi-level HMM is an HMM with two state variables in different levels, where state occurrence in a lower level conditionally depends on the state in an upper level. The approach works online with low-resolution image and in various lighting conditions, and has been successfully tested in 21 image sequences (22,927 frames). It achieved over 90% of probabilities of detection, in which it brought improvements of almost 20% compared to four other V-VAD approaches.  相似文献   

Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment.  相似文献   

针对当前目标跟踪算法在目标区域光照剧烈变化、长时间遮挡或者平面内旋转时会发生偏移甚至跟丢这一现象,提出了基于局部敏感直方图的时空上下文跟踪算法.该算法以贝叶斯框架为基础,利用生物视觉特性,结合底层灰度特征,基于局部敏感直方图提取光照不变特征,建立目标与背景的统计相关模型来实现跟踪,使跟踪时偏移较小且不会跟丢目标.在对不同视频序列的实验表明:基于局部敏感直方图的时空上下文算法和多示例学习算法相比,在光照变化、平面内旋转或者遮挡时都表现出比较好的跟踪效果且中心误差较小,具有较强鲁棒性.  相似文献   

在维吾尔文联机手写识别过程的训练阶段,单词被切分成字母,经过特征提取和聚类形成特征向量作为模型的输入。构造出以字符为基元的隐马尔可夫模型(HMM),将其嵌入到识别字典网络中。通过基于HMM的分类识别器,最终得到识别结果。首次将消除延迟笔画、建立有延迟笔画和无延迟笔画的字典的方法应用于维吾尔文手写识别中,取得了较高的识别率。  相似文献   

用隐马尔柯夫模型对汉语进行切分和标注排歧   总被引:6,自引:2,他引:6  
对汉语进行切分和标注,不可避免要产生歧义,文中对切分和标注阶段采用相同的模型-隐马尔柯夫模型(HMM)来消歧,在切分阶段,使用基于HMM的切分评分,而在标沐阶段,使用基于HMM的词汇评分,并按最大可能原理和多结果输出原理进行词汇评分实验,实验结果表明,用HMM对汉语进行标注排歧,正确率很高。  相似文献   

Most of the traditional histogram-based thresholding techniques are effective for bi-level thresholding and unable to consider spatial contextual information of the image for selecting optimal threshold. In this article a novel thresholding technique is presented by proposing an energy function to generate the energy curve of an image by taking into an account the spatial contextual information of the image. The behavior of this energy curve is very much similar to the histogram of the image. To incorporate spatial contextual information of the image for threshold selection process, this energy curve is used as an input of our technique instead of histogram. Moreover, to mitigate multilevel thresholding problem the properties of genetic algorithm are exploited. The proposed algorithm is evaluated on the number of different types of images using a validity measure. The results of the proposed technique are compared with those obtained by using histogram of the image and also with an existing genetic algorithm based context sensitive technique. The comparisons confirmed the effectiveness of the proposed technique.  相似文献   

A program to recognize polyhedra by a context sensitive line finder is presented. The program is based on the strategy of recognizing objects step by step, at each time making use of the previous results. At each stage, the most obvious and simple assumption is made and the assumption is tested. To find a line segment, a range of search is proposed. Once a line segment is found, more of the line is determined by tracking along it. Whenever a new fact is found, the program tries to reinterpret the scene taking the obtained information into consideration. Results of the experiment using an image dissector are satisfactory for scenes containing a few blocks and wedges. Some limitations of the present program and proposals for future developments are described.  相似文献   

A novel neural network architecture suitable for image processing applications and comprising three interconnected fuzzy layers of neurons and devoid of any back-propagation algorithm for weight adjustment is proposed in this article. The fuzzy layers of neurons represent the fuzzy membership information of the image scene to be processed. One of the fuzzy layers of neurons acts as an input layer of the network. The two remaining layers viz. the intermediate layer and the output layer are counter-propagating fuzzy layers of neurons. These layers are meant for processing the input image information available from the input layer. The constituent neurons within each layer of the network architecture are fully connected to each other. The intermediate layer neurons are also connected to the corresponding neurons and to a set of neighbors in the input layer. The neurons at the intermediate layer and the output layer are also connected to each other and to the respective neighbors of the corresponding other layer following a neighborhood based connectivity. The proposed architecture uses fuzzy membership based weight assignment and subsequent updating procedure. Some fuzzy cardinality based image context sensitive information are used for deciding the thresholding capabilities of the network. The network self organizes the input image information by counter-propagation of the fuzzy network states between the intermediate and the output layers of the network. The attainment of stability of the fuzzy neighborhood hostility measures at the output layer of the network or the corresponding fuzzy entropy measures determine the convergence of the network operation. An application of the proposed architecture for the extraction of binary objects from various degrees of noisy backgrounds is demonstrated using a synthetic and a real life image.
Ujjwal MaulikEmail:

针对基于隐马尔科夫(HMM,Hidden Markov Model)的MAP和MMSE两种语音增强算法计算量大且前者不能处理非平稳噪声的问题,借鉴语音分离方法,提出了一种语音分离与HMM相结合的语音增强算法。该算法采用适合处理非平稳噪声的多状态多混合单元HMM,对带噪语音在语音模型和噪声模型下的混合状态进行解码,结合语音分离方法中的最大模型理论进行语音估计,避免了迭代过程和计算量特别大的公式计算,减少了计算复杂度。实验表明,该算法能够有效地去除平稳噪声和非平稳噪声,且感知评价指标PESQ 的得分有明显提高,算法时间也得到有效控制。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号