首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
International Journal on Document Analysis and Recognition (IJDAR) - Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant...  相似文献   

2.
On-line handwriting text recognition (HTR) could be used as a more natural way of interaction in many interactive applications. However, current HTR technology is far from developing error-free systems and, consequently, its use in many applications is limited. Despite this, there are many scenarios, as in the correction of the errors of fully-automatic systems using HTR in a post-editing step, in which the information from the specific task allows to constrain the search and therefore to improve the HTR accuracy. For example, in machine translation (MT), the on-line HTR system can also be used to correct translation errors. The HTR can take advantage of information from the translation problem such as the source sentence that is translated, the portion of the translated sentence that has been supervised by the human, or the translation error to be amended. Empirical experimentation suggests that this is a valuable information to improve the robustness of the on-line HTR system achieving remarkable results.  相似文献   

3.

颗粒燃料是将核燃料制成颗粒并弥散在基体中的一种新型燃料构型,广泛应用于高温气冷堆、空间堆、氟盐冷却高温堆等先进堆型中. 以高温气冷堆和空间堆为例,基于开源蒙特卡罗程序OpenMC研究了适用于颗粒燃料临界计算的虚拟网格模拟加速方法,并在山河超算平台开展了超10万核心的大规模并行测试. 结果表明,高温气冷堆模型的有效增殖因数计算结果与石岛湾核电站实验数据符合较好,验证了程序及模型的准确性. 在性能方面,虚拟网格方法与OpenMC此前的真实网格方法相比,在存储空间和计算速度上均有明显提升,高温气冷堆虚拟网格模型的内存和耗时分别为真实网格模型的0.2%和82%;此外,由于虚拟网格方法简化了模型几何,其间接实现了更好的负载均衡,使得程序拥有了更高的并行效率. 对于强可扩展性,在10752核规模的测试中,虚拟网格的并行效率为83.4%,而真实网格为63.6%;对于弱可扩展性,虚拟网格模型在131600核并行效率为83.1%,而真实网格为66.1%.

  相似文献   

4.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

5.
Automatic text categorization and its application to text retrieval   总被引:4,自引:0,他引:4  
We develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two real-world document collections from the MEDLINE database. Next, we investigate the application of automatic categorization to text retrieval. Our experiments clearly indicate that automatic categorization improves the retrieval performance compared with no categorization. We also demonstrate that the retrieval performance using automatic categorization achieves the same retrieval quality as the performance using manual categorization. Furthermore, detailed analysis of the retrieval performance on each individual test query is provided  相似文献   

6.
基于边界可信度相似的快速文本分类方法   总被引:2,自引:0,他引:2       下载免费PDF全文
类别的中心和边界是类别的重要特征.利用训练样本的中心和边界作为分类准则,提出了一种基于边界可信度相似的快速文本分类算法。通过类别边界可信度调整文本与类别的相似性,克服了数据集类别间样本分布不均衡和类别中样本密度不均的缺点,提高了分类性能。实验结果表明该算法提高了文本分类的效果,显示出了较好的鲁棒性,并显著提高了文本分类效率。  相似文献   

7.
一种基于反向文本频率互信息的文本挖掘算法研究   总被引:1,自引:0,他引:1  
针对传统的文本分类算法存在着各特征词对分类结果的影响相同,分类准确率较低,同时造成了算法时间复杂度的增加,在分析了文本分类系统的一般模型,以及在应用了互信息量的特征提取方法提取特征项的基础上,提出一种基于反向文本频率互信息熵文本分类算法。该算法首先采用基于向量空间模型(vector spacemodel,VSM)对文本样本向量进行特征提取;然后对文本信息提取关键词集,筛选文本中的关键词,采用互信息来表示并计算词汇与文档分类相关度;最后计算关键词在文档中的权重。实验结果表明了提出的改进算法与传统的分类算法相比,具有较高的运算速度和较强的非线性映射能力,在收敛速度和准确程度上也有更好的分类效果。  相似文献   

8.
It is well known that the classification effectiveness of the text categorization system is not simply a matter of learning algorithms. Text representation factors are also at work. This paper will consider the ways in which the effectiveness of text classifiers is linked to the five text representation factors: “stop words removal”, “word stemming”, “indexing”, “weighting”, and “normalization”. Statistical analyses of experimental results show that performing “normalization” can always promote effectiveness of text classifiers significantly. The effects of the other factors are not as great as expected. Contradictory to common sense, a simple binary indexing method can sometimes be helpful for text categorization.  相似文献   

9.
Alistair Moffat 《Software》1989,19(2):185-198
The development of efficient algorithms to support arithmetic coding has meant that powerful models of text can now be used for data compression. Here the implementation of models based on recognizing and recording words is considered. Move-to-the-front and several variable-order Markov models have been tested with a number of different data structures, and first the decisions that went into the implementations are discussed and then experimental results are given that show English text being represented in under 2-2 bits per character. Moreover the programs run at speeds comparable to other compression techniques, and are suited for practical use.  相似文献   

10.
Noisy text categorization   总被引:1,自引:0,他引:1  
  相似文献   

11.
The problems involved in accessing the meaning of information in information systems are explored through a contrast between two information technologies, the book and the computer. A distinction between format-structured information and semantically-structured information leads to the necessity to further distinguish between information and knowledge structures, with implications for the information technologies to seek (through access devices, text differentiation, and user-control) an adaptive match between information presentation and the user's own knowledge structures. The evolution of CAI systems toward intelligent tutoring systems is seen as the direction through which text access problems may be alleviated.  相似文献   

12.
Digital modes of editing ask us to re-examine the past centuryof editorial theory and to situate emerging editorial approacheswithin this history. Using the computer as a new textual mediumhas brought about a renewed interest in the conditions for representation.This article concerns itself with how books and computers, respectively,represent texts, and how critical editing mediates or organizesthose representations. It was written in 1997 as a criticalresponse to J.J. McGann's essay ‘The Rationale of Hypertext’.  相似文献   

13.
14.
15.
Tables in text   总被引:1,自引:0,他引:1  
Tables were inserted into a four page article, and subjects were asked to scan the text which was printed in a two-column or a single-column format. The single-column format was scanned significantly faster than the double-column layout, and there were marked reader preferences for the single-column layout.  相似文献   

16.
17.
A text is a triple=(, 1, 2) such that is a labeling function, and 1 and 2 are linear orders on the domain of ; hence may be seen as a word (, 1) together with an additional linear order 2 on the domain of . The order 2 is used to give to the word (, 1) itsindividual hierarchical representation (syntactic structure) which may be a tree but it may be also more general than a tree. In this paper we introducecontext-free grammars for texts and investigate their basic properties. Since each text has its own individual structure, the role of such a grammar should be that of a definition of a pattern common to all individual texts. This leads to the notion of ashapely context-free text grammar also investigated in this paper.  相似文献   

18.
针对视频中文本信息在视频序列和视频索引中的重要性,本文提出了一种基于文字混合特征的文本定位算法.该算法首先对视频序列中每隔25帧的单帧图像进行边缘检测和投影处理来提取文本块,然后用支持向量基进行筛选,排除非文本块的干扰,最后利用视频序列中相邻帧之间的相关性来搜索剩余帧中的文本块.本文的算法在提高检测速度的同时保证了较高的检测准确度.  相似文献   

19.
20.
Self-organizing maps (SOM) have been applied on numerous data clustering and visualization tasks and received much attention on their success. One major shortage of classical SOM learning algorithm is the necessity of predefined map topology. Furthermore, hierarchical relationships among data are also difficult to be found. Several approaches have been devised to conquer these deficiencies. In this work, we propose a novel SOM learning algorithm which incorporates several text mining techniques in expanding the map both laterally and hierarchically. On training a set of text documents, the proposed algorithm will first cluster them using classical SOM algorithm. We then identify the topics of each cluster. These topics are then used to evaluate the criteria on expanding the map. The major characteristic of the proposed approach is to combine the learning process with text mining process and makes it suitable for automatic organization of text documents. We applied the algorithm on the Reuters-21578 dataset in text clustering and categorization tasks. Our method outperforms two comparing models in hierarchy quality according to users’ evaluation. It also receives better F1-scores than two other models in text categorization task.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号