首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Handwritten text recognition is one of the most difficult problems in the field of pattern recognition. Recently, a number of classifier creation and combination methods, known as ensemble methods, have been proposed in the field of machine learning. They have shown improved recognition performance over single classifiers. In this paper the application of some of those ensemble methods in the domain of offline cursive handwritten word recognition is described. The basic word recognizers are given by hidden Markov models (HMMs). It is demonstrated through experiments that ensemble methods have the potential of improving recognition accuracy also in the domain of handwriting recognition.Received: 23 November 2001, Accepted: 19 September 2002, Published online: 6 June 2003  相似文献   

2.
A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructed to cope with errors obtained from their preceding steps. This paper proposes a method to improve boosting-based ensemble learning with penalty profiles via an application of automatic unknown word recognition in Thai language. Treating a sequential problem as a non-sequential problem, the unknown word recognition is required to include a process to rank a set of generated candidates for a potential unknown word position. To strengthen the recognition process with ensemble classification, the penalty profiles are defined to make it more efficient to construct a succeeding classification model which tends to re-rank a set of ranked candidates into a suitable order. As an evaluation, a number of alternative penalty profiles are introduced and their performances are compared for the task of extracting unknown words from a large Thai medical text. Using the Naïve Bayes as the base classifier for ensemble learning, the proposed method with the best setting achieves an accuracy of 90.19%, which is an accuracy gap of 12.88, 10.59, and 6.05 over conventional Naïve Bayes, non-ensemble version, and the flat-penalty profile.  相似文献   

3.
线性合成的双粒度 RNN 集成系统   总被引:1,自引:0,他引:1  
张亮  黄曙光  胡荣贵 《自动化学报》2011,37(11):1402-1406
针对脱机文字识别,提出了一种基于线性合成的双粒度递归神经网络(Recurrent neural net work, RNN)集成系统.首先,使用单词RNN对未知图 像进行识别;然后,依据识别结果进行字符分割,使用字符RNN对分割后的字符进行识别,并利用查表法计算字符的后验概率;最后,综合两个RNN的识别结果决定最终单词输出.在CAPTCHA识别 和手写识别上的实验结果证明了该系统的有效性.  相似文献   

4.
Feature selection for ensembles has shown to be an effective strategy for ensemble creation due to its ability of producing good subsets of features, which make the classifiers of the ensemble disagree on difficult cases. In this paper we present an ensemble feature selection approach based on a hierarchical multi-objective genetic algorithm. The underpinning paradigm is the “overproduce and choose”. The algorithm operates in two levels. Firstly, it performs feature selection in order to generate a set of classifiers and then it chooses the best team of classifiers. In order to show its robustness, the method is evaluated in two different contexts:supervised and unsupervised feature selection. In the former, we have considered the problem of handwritten digit recognition and used three different feature sets and multi-layer perceptron neural networks as classifiers. In the latter, we took into account the problem of handwritten month word recognition and used three different feature sets and hidden Markov models as classifiers. Experiments and comparisons with classical methods, such as Bagging and Boosting, demonstrated that the proposed methodology brings compelling improvements when classifiers have to work with very low error rates. Comparisons have been done by considering the recognition rates only.  相似文献   

5.
几种机器学习方法在人脸识别中的性能比较   总被引:2,自引:1,他引:2       下载免费PDF全文
BP神经网络、RBF神经网络、支持向量机(SVM)和集成学习是目前应用最为广泛的四种机器学习方法。将这四种常用的机器学习方法分别应用于人脸识别,并利用ORL人脸图像库对各学习方法性能进行了测试和评估。测试结果表明SVM和集成学习在实验中取得了较好的性能,最适合用于人脸识别中特征分类器。  相似文献   

6.
This paper develops word recognition methods for historical handwritten cursive and printed documents. It employs a powerful segmentation-free letter detection method based upon joint boosting with histograms of gradients as features. Efficient inference on an ensemble of hidden Markov models can select the most probable sequence of candidate character detections to recognize complete words in ambiguous handwritten text, drawing on character n-gram and physical separation models. Experiments with two corpora of handwritten historic documents show that this approach recognizes known words more accurately than previous efforts, and can also recognize out-of-vocabulary words.  相似文献   

7.
针对语音识别系统对实时性和便携性的要求,提出一种基于MFCC/SVM在DM6446嵌入式系统开发平台上的实现方法,实现了一个面向非特定人的语音识别系统,将有向无环图多类分类支持向量机算法移植到该平台。并在该平台用DAG方法对非特定人孤立词和连接词进行语音识别,比隐马尔可夫模型有明显优势。通过样本预选取算法对训练样本进行预选取处理,并且应用到嵌入式语音识别系统中,大大降低了训练时间和测试时间。  相似文献   

8.
藏语自动分词是藏语信息处理的基础性关键问题,而紧缩词识别是藏语分词中的重点和难点。目前公开的紧缩词识别方法都是基于规则的方法,需要词库支持。该文提出了一种基于条件随机场的紧缩词识别方法,并在此基础上实现了基于条件随机场的藏语自动分词系统。实验结果表明,基于条件随机场的紧缩词识别方法快速、有效,而且可以方便地与分词模块相结合,显著提高了藏语分词的效果。  相似文献   

9.
10.
孤立词语音识别算法性能研究与改进   总被引:1,自引:0,他引:1  
文章针对特定人中小字表孤立词语音识别,以提高实用性为目的,对两种常用识别方法(VQ、DTW)的效果及其性能(识别速度和识别率)改善进行分析与探索,并通过对实验数据的讨论,提出了一些有效的改进方法。  相似文献   

11.
冯艳红  于红  孙庚  赵禹锦 《计算机应用》2016,36(11):3146-3151
针对基于统计特征的领域术语识别方法忽略了术语的语义和领域特性,从而影响识别结果这一问题,提出一种基于词向量和条件随机场(CRF)的领域术语识别方法。该方法利用词向量具有较强的语义表达能力、词语与领域术语之间的相似度具有较强的领域表达能力这一特点,在统计特征的基础上,增加了词语的词向量与领域术语的词向量之间的相似度特征,构成基于词向量的特征向量,并采用CRF方法综合这些特征实现了领域术语识别。最后在领域语料库和SogouCA语料库上进行实验,识别结果的准确率、召回率和F测度分别达到了0.9855、0.9439和0.9643,表明所提的领域术语识别方法取得了较好的效果。  相似文献   

12.
杜飞  杨云  胡媛媛  曹丽娟 《软件学报》2020,31(7):2157-2168
深度学习通过多层特征提取方式,可以将原始复杂数据自动表征为高级抽象特征,该模型具有很强的建模能力,普遍应用于图像识别语音识别、自然语言处理等高复杂问题中.但深度学习由于网络层数深、参数规模庞大,训练时常常会产生梯度消失、陷入局部最优解、过度拟合等现象.借鉴集成学习的思想,提出一个新颖的深度共享集成网络,该网络通过在深度学习各隐藏层引出多个独立输出层的联合训练的方式,在网络的各层注入梯度,从而对低层隐藏层进行梯度补给,从而降低深度学习中的梯度消失现象,并通过集成多输出层的方式使得整个网络拥有更强的泛化性能.  相似文献   

13.
14.
以往使用的垃圾邮件识别方法在面对如今更新速度快且种类繁多的分词时,很难精准地识别出邮件中的关键分词,识别方法的应用能力需要进一步提高。为此,提出一种基于聚类分析算法的垃圾邮件识别方法。首先,预处理邮件样本,得到邮件文本内容的关键分词,剔除停用词,根据分词在邮件文本中出现的频率计算出分词的权重;然后,结合邮件特征属性,构建邮件特征空间,将邮件特征量化;最后,提取出邮件特征并降维处理,将其作为聚类算法的输入,经过迭代计算输出结果从而完成垃圾邮件的识别。实验结果表明:设计的基于聚类分析算法的垃圾邮件识别方法在关键词提取与分词方面更加精确,并且能够准确地识别出垃圾邮件,说明设计的基于聚类分析算法的垃圾邮件识别方法的实际应用能力得到了提高。  相似文献   

15.
本文提出了一种基于多普勒微波雷达的发音动作检测与命令词识别方法.该方法利用微波雷达的多普勒特性检测发音过程中面部肌肉的微小变化,实现不依赖语音声学信号的命令词识别.本文首先设计实现了一个基于多普勒微波雷达的发音动作检测系统,并基于此系统构建了一个包含2个说话人的命令词识别数据库.然后,本文研究了基于支持向量机和卷积神经网络模型的雷达数据分类方法,并对比了不同模型和特征组合在单话者建模和多话者建模情况下的命令词识别性能.实验结果表明,本文设计的数据采集系统可以有效检测发音动作,所构建的卷积神经网络分类器可以取得90%以上的命令词识别准确率.  相似文献   

16.
孤立词语音识别技术,采用的是模式匹配法,是语音识别技术的核心之一。首先,用户将词汇表中的每一词依次说一遍,并且将其特征矢量作为模板存入棋板库。然后,将输入语音的特征矢量依次与模板库中的每个模板进行相似度比较,将相似度最高者作为识别结果输出。本文介绍了孤立词语音识别技术的研究现状及几种常见的技术方法,并且分析探讨了孤立词语音识别技术的应用和发展前景。  相似文献   

17.
Despite several decades of research in document analysis, recognition of unconstrained handwritten documents is still considered a challenging task. Previous research in this area has shown that word recognizers perform adequately on constrained handwritten documents which typically use a restricted vocabulary (lexicon). But in the case of unconstrained handwritten documents, state-of-the-art word recognition accuracy is still below the acceptable limits. The objective of this research is to improve word recognition accuracy on unconstrained handwritten documents by applying a post-processing or OCR correction technique to the word recognition output. In this paper, we present two different methods for this purpose. First, we describe a lexicon reduction-based method by topic categorization of handwritten documents which is used to generate smaller topic-specific lexicons for improving the recognition accuracy. Second, we describe a method which uses topic-specific language models and a maximum-entropy based topic categorization model to refine the recognition output. We present the relative merits of each of these methods and report results on the publicly available IAM database.  相似文献   

18.
二维形状分类识别是计算机视觉和模式识别等领域的一个重要问题,在目标识别、图像理解等应用中发挥着重要作用。针对二维形状分类识别研究,主要从特征描述、形状分类识别、形状标准数据库三个方面综述了该方向近年来最新的研究工作。综合分析了二维形状特征表示方法,主要包括基于轮廓的、基于区域的、基于骨架的以及基于多特征融合的方法,并简要评述;介绍并分析了二维形状分类识别方法,主要包括传统机器学习分类器、集成分类器、深度学习方法等;概述了二维形状识别中常用的标准数据库;展望了二维形状识别分类研究的发展趋势。  相似文献   

19.
20.
针对字符识别对象的多样性,提出了一种基于Bagging集成的字符识别模型,解决了识别模型对部分字符识别的偏好现象。采用Bagging采样策略形成不同的数据子集,在此基础上用决策树算法训练形成多个基分类器,用多数投票机制对基分类器预测结果集成输出。理论分析与仿真实验结果表明,所提模型相比其他分类方法具有更好的分类能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号