首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy.  相似文献   

This paper proposes the use of Maximum A Posteriori Linear Regression (MAPLR) transforms as feature for language recognition. Rather than estimating the transforms using maximum likelihood linear regression (MLLR), MAPLR inserts the priori information of the transforms in the estimation process using maximum a posteriori (MAP) as the estimation criterion to drive the transforms. By multi MAPLR adaptation each language spoken utterance is convert to one discriminative transform supervector consist of one target language transform vector and other non-target transform vectors. SVM classifiers are employed to model the discriminative MAPLR transform supervector. This system can achieve performance comparable to that obtained with state-of-the-art approaches and better than MLLR. Experiment results on 2007 NIST Language Recognition Evaluation (LRE) databases show that relative decline in EER of 4% and on mincost of 9% are obtained after the language recognition system using MAPLR instead of MLLR in 30-s tasks, and further improvement is gained combining with state-of-the-art systems. It leads to gains of 6% on EER and 11% on minDCF comparing with the performance of the only combination of the MMI system and the GMM-SVM system.  相似文献   

Automatic understanding and recognition of human shopping behavior has many potential applications, attracting an increasing interest in the marketing domain. The reliability and performance of the automatic recognition system is highly influenced by the adopted theoretical model of behavior. In this work, we address the analogy between human shopping behavior and a natural language. The adopted methodology associates low-level information extracted from video data with semantic information using the proposed behavior language model. Our contribution on the action recognition level consists of proposing a new feature set which fuses Histograms of Optical Flow (HOF) with directional features. On the behavior level we propose combining smoothed bi-grams with the maximum dependency in a chain of conditional probabilities. The experiments are performed on both laboratory and real-life datasets. The introduced behavior language model achieves an accuracy of 87% on the laboratory data and 76% on the real-life dataset, an improvement of 11% and 8% respectively over the baseline model, by incorporating semantic knowledge and capturing correlations between the basic actions.  相似文献   

为了实现手语视频中手语字母的准确识别,提出了一种基于DI_CamShift和SLVW的算法。该方法将Kinect作为手语视频采集设备,在获取彩色视频的同时得到其深度信息;计算深度图像中手语手势的主轴方向角和质心位置,通过调整搜索窗口对手势进行准确跟踪;使用基于深度积分图像的Ostu算法分割手势,并提取其SIFT特征;构建了SLVW词包作为手语特征,并用SVM进行识别。通过实验验证该算法,其单个手语字母最好识别率为99.87%,平均识别率96.21%。  相似文献   

Software tools are fundamental to the comprehension, analysis, testing and debugging of application systems. A necessary first step in the development of many tools is the construction of a parser front‐end that can recognize the implementation language of the system under development. In this paper, we describe our use of token decoration to facilitate recognition of ambiguous language constructs. We apply our approach to the C++ language since its grammar is replete with ambiguous derivations such as the declaration/expression and template‐declaration/expression ambiguity. We describe our implementation of a parser front‐end for C++, keystone, and we describe our results in decorating tokens for our test suite including the examples from Clause Three of the C++ standard. We are currently exploiting the keystone front‐end to develop a taxonomy for implementation‐based class testing and to reverse‐engineer Unified Modeling Language (UML) class diagrams. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

This paper proposes an efficient method for on-line recognition of cursive Korean characters. The recognition of cursive strokes and the representation of a large character set are important determinants in the recognition rate of Korean characters. To deal with cursive strokes, we classify them automatically by using an ART-2 neural network. This neural network has the advantage of assembling similar patterns together to form classes in a self-organized manner. To deal with the large character set, we construct a character recognition model by using the hidden Markov model (HMM), which has the advantages of providing an explicit representation of time-varying vector sequence and probabilistic interpretation. Probabilistic parameters of the HMM are initialized using the combination rule for Korean characters and a set of primitive strokes that are classified by the ART stroke classifier, and trained with sample data. This is an efficient means of representing all the 11,172 possible Korean characters. We tested the model on 7500 on-line cursive Korean characters and it proved to perform well in recognition rate and speed.  相似文献   

Reading text in natural images has focused again the attention of many researchers during the last few years due to the increasing availability of cheap image-capturing devices in low-cost products like mobile phones. Therefore, as text can be found on any environment, the applicability of text-reading systems is really extensive. For this purpose, we present in this paper a robust method to read text in natural images. It is composed of two main separated stages. Firstly, text is located in the image using a set of simple and fast-to-compute features highly discriminative between character and non-character objects. They are based on geometric and gradient properties. The second part of the system carries out the recognition of the previously detected text. It uses gradient features to recognize single characters and Dynamic Programming (DP) to correct misspelled words. Experimental results obtained with different challenging datasets show that the proposed system exceeds state-of-the-art performance, both in terms of localization and recognition.  相似文献   

宋建炜  邓逸川  苏成 《图学学报》2021,42(2):307-315
建筑施工安全事故分析是施工安全管理的重要环节,但分散在事故报告中的施工安全知识不能得到良好的复用,无法为施工安全管理提供充分的借鉴作用.知识图谱是结构化存储和复用知识的工具,可以用于事故案例快速检索、事故关联路径分析及统计分析等,从而更好地提高施工安全管理水平.命名实体识别(NER)是自动构建知识图谱的关键工作,目前主...  相似文献   

利用预训练语言模型(pre-trained language models,PLM)提取句子的特征表示,在处理下游书面文本的自然语言理解的任务中已经取得了显著的效果。但是,当将其应用于口语语言理解(spoken language understanding,SLU)任务时,由于前端语音识别(automatic speech recognition,ASR)的错误,会导致SLU精度的下降。因此,本文研究如何增强PLM提高SLU模型对ASR错误的鲁棒性。具体来讲,通过比较ASR识别结果和人工转录结果之间的差异,识别出连读和删除的文本组块,通过设置新的预训练任务微调PLM,使发音相近的文本组块产生类似的特征嵌入表示,以达到减轻ASR错误对PLM影响的目的。通过在3个基准数据集上的实验表明,所提出的方法相比之前的方法,精度有较大提升,验证方法的有效性。  相似文献   

International Journal on Document Analysis and Recognition (IJDAR) - In handwritten text recognition, compared to human, computers are far short of linguistic context knowledge, especially...  相似文献   

In knowledge discovery in a text database, extracting and returning a subset of information highly relevant to a user's query is a critical task. In a broader sense, this is essentially identification of certain personalized patterns that drives such applications as Web search engine construction, customized text summarization and automated question answering. A related problem of text snippet extraction has been previously studied in information retrieval. In these studies, common strategies for extracting and presenting text snippets to meet user needs either process document fragments that have been delimitated a priori or use a sliding window of a fixed size to highlight the results. In this work, we argue that text snippet extraction can be generalized if the user's intention is better utilized. It overcomes the rigidness of existing approaches by dynamically returning more flexible start-end positions of text snippets, which are also semantically more coherent. This is achieved by constructing and using statistical language models which effectively capture the commonalities between a document and the user intention. Experiments indicate that our proposed solutions provide effective personalized information extraction services.  相似文献   

This paper investigates the automatic reading of unconstrained omni-writer handwritten texts. It shows how to endow the reading system with learning faculties necessary to adapt the recognition to each writer's handwriting. In the first part of this paper, we explain how the recognition system can be adapted to a current handwriting by exploiting the graphical context defined by the writer's invariants. This adaptation is guaranteed by activating interaction links over the whole text between the recognition procedures of word entities and those of letter entities. In the second part, we justify the need of an open multiple-agent architecture to support the implementation of such a principle of adaptation. The proposed platform allows to plug expert treatments dedicated to handwriting analysis. We show that this platform helps to implement specific collaboration or cooperation schemes between agents which bring out new trends in the automatic reading of handwritten texts.  相似文献   

杨全  彭进业 《计算机应用》2013,33(10):2882-2885
为了实现手语视频中手语字母的准确识别,提出了一种基于DI_CamShift和手语视觉单词(SLVW)的手语识别算法。首先采用Kinect获取手语字母手势视频及其深度信息;然后通过计算获得深度图像中手语手势的主轴方向角和质心位置,计算搜索窗口对手势跟踪;进而使用基于深度积分图像的Ostu算法分割手势并提取其尺度不变特征转换(SIFT)特征;最后构建SLVW词包并用支持向量机(SVM)进行识别。单个手语字母最好识别率为99.67%,平均识别率96.47%  相似文献   

In industrial applications optical character recognition with smart cameras becomes more and more popular. Since these applications mostly have challenging environments for the systems it is most important to have very reliable character segmentation and classification algorithms. The investigations of several algorithms have shown that character segmentation is one if not the main bottleneck of character recognition. Furthermore, the requirements of robust and fast algorithms related to skew angle estimation and line segmentation, as well as tilt angle estimation, and character segmentation are high. This is the reason for introducing such algorithms that are specifically adapted to industrial applications. Additionally, a method is proposed that is based on the Bayes theorem to take account of prior knowledge for line and character segmentation. The main focus of the investigations of the character recognition system is recognition performance and speed, since real-time constraints are very hard in industrial application. Both requirements are evaluated on an image series captured with a smart camera in an industrial application.  相似文献   

Shape representation and recognition is an important topic in many applications of computer vision and artificial intelligence, including character recognition, pattern recognition, machine monitoring, robot manipulation and production part recognition. In this paper, a structural model based on boundary information is proposed to describe the silhouette of planar objects (especially machined parts). The structural model describes objects by a set of primitives, each of which is represented by three geometric features: its length, curvature, and relative orientation. This representation scheme not only compresses the data, but also provides a compact and meaningful form to facilitate further recognition operations. Based on this model, the object recognition is accomplished by using a multilayered feedforward neural network. The proposed model is transformation invariant, which offers the necessary flexibility for real-time implementation in automated manufacturing systems. In addition, the numerical results for a set of ten reference shapes indicate that the matching engine can achieve very high success rates using short recognition times.  相似文献   

Despite the success of license plate recognition (LPR) methods in the past decades, few of them can process multi-style license plates (LPs), especially LPs from different nations, effectively. In this paper, we propose a new method for multi-style LP recognition by representing the styles with quantitative parameters, i.e., plate rotation angle, plate line number, character type and format. In the recognition procedure these four parameters are managed by relevant algorithms, i.e., plate rotation, plate line segmentation, character recognition and format matching algorithm, respectively. To recognize special style LPs, users can configure the method by defining corresponding parameter values, which will be processed by the relevant algorithms. In addition, the probabilities of the occurrence of every LP style are calculated based on the previous LPR results, which will result in a faster and more precise recognition. Various LP images were used to test the proposed method and the results proved its effectiveness.  相似文献   

This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition, concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper. Preliminary experiments show promising results in terms of speed and accuracy. Received October 30, 1998 / Revised January 15, 1999  相似文献   

Stop word location and identification for adaptive text recognition   总被引:2,自引:0,他引:2  
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the most likely candidates for those words, using only widths of the word images. The identity of each word is determined using a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on 90% of all the pages. These can serve as useful seeds to bootstrap font learning. Received October 8, 1999 / Revised March 29, 2000  相似文献   

李伟  吴及  吕萍 《计算机应用》2010,30(10):2563-2566
为了克服语音识别中单遍解码词图生成算法速度较慢的缺点,提出一种基于前后向语言模型的两遍快速解码算法。两遍解码分别采用前向与后向语言模型,同时通过优化以减少前后向语言模型不匹配对识别结果造成的影响。实验证明,该算法在保持识别准确率的基础上有效地提升了解码速度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号