首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This note illustrates the power of software interrupts (traps) and text diversions in document preparation languages by considering three difficult problems in typesetting: figure placement; setting text in arbitrary shapes; and balancing final columns in multi-column layout. The solution of each of these in the nroff document preparation system is discussed, and an Appendix gives the nroff macros which accomplish the tasks.  相似文献   

2.
3.
This paper presents a new knowledge-based system for extracting and identifying text-lines from various real-life mixed text/graphics compound document images. The proposed system first decomposes the document image into distinct object planes to separate homogeneous objects, including textual regions of interest, non-text objects such as graphics and pictures, and background textures. A knowledge-based text extraction and identification method obtains the text-lines with different characteristics in each plane. The proposed system offers high flexibility and expandability by merely updating new rules to cope with various types of real-life complex document images. Experimental and comparative results prove the effectiveness of the proposed knowledge-based system and its advantages in extracting text-lines with a large variety of illumination levels, sizes, and font styles from various types of mixed and overlapping text/graphics complex compound document images.  相似文献   

4.
马智亮  李勇鹤  李恒 《计算机工程》2006,32(13):254-256,268
在实际过程中,多人协同编著同一份文档的情况普遍存在。以建设领域为例,各种项目建议书、可行性研究报告等都需要多人协同完成。但是,目前常用的文档编辑软件均未对多人协同编著提供支持。 为此,该文在分析相关研究的基础上,提出了一个基于网络和通用字处理软件开发文档协同编著系统模型,并开发了原型系统,为文档协同编著系统的开发提供了一条新途径。  相似文献   

5.
6.
浅层狄利赫雷分配(Latent Dirichlet Allocation,LDA)方法近年来被广泛应用于文本聚类、分类、段落切分等等,并且也有人将其应用于基于提问的无监督的多文档自动摘要。该方法被认为能较好地对文本进行浅层语义建模。该文在前人工作基础上提出了基于LDA的条件随机场(Conditional Random Field, CRF)自动文摘(LCAS)方法,研究了LDA在有监督的单文档自动文摘中的作用,提出了将LDA提取的主题(Topic)作为特征加入CRF模型中进行训练的方法,并分析研究了在不同Topic下LDA对摘要结果的影响。实验结果表明,加入LDA特征后,能够有效地提高以传统特征为输入的CRF文摘系统的质量。  相似文献   

7.
This paper documents the motivation, method and results of seven experiments conducted to investigate the properties of automatic document analysis (for the purpose of automatic vocabulary expansion of a personalized language model in a speech dictation system). The results indicated that automatic document analysis of corrected text should improve the accuracy of text dictated in the future, as long as the future text is similar to the analyzed text. None of the manipulations had a measurable effect (either good or bad) when the analyzed text was uncorrected dictation or future text that was not similar to analyzed text. These results were the same for both trained and untrained acoustic models.  相似文献   

8.
于晨斐 《计算机仿真》2007,24(11):324-326
目前电子文档的使用越来越广泛,尤其是Word文档.由此提出了一种基于二次余数的Word文档数字水印算法.由于中文文本自身的特点,使字移编码不太适合于中文文本水印.该算法依据二次余数理论来自适应的嵌入水印信息,在中文文本中实现了字移编码,并且有较大的信息隐藏容量;提取水印时不需要对比原始文本,实现了盲提取;结合非线性动力学系统的Logistic映射产生的混沌序列,对原始水印图像进行了改进的魔方变换,置乱效果更好,从而进一步提高了鲁棒性.实验表明,该算法具有良好的不可见性;Word文档常见的格式调整等并不会破坏水印.  相似文献   

9.
10.
介绍公文系统基本安全管理方法,对传统的文本数字水印技术进行分析,提出一种基于Unicode编码的文本数字水印算法,采用密钥对水印信息加密后通过异或运算嵌入到文本Unicode编码中,同时附加密级编码、奇偶校验码和纠错码构建水印文本编码,实现公文分密级和全过程的安全管理,在定性分析中发现算法具有广泛适用性和较强安全性.  相似文献   

11.
Document images often suffer from different types of degradation that renders the document image binarization a challenging task. This paper presents a document image binarization technique that segments the text from badly degraded document images accurately. The proposed technique is based on the observations that the text documents usually have a document background of the uniform color and texture and the document text within it has a different intensity level compared with the surrounding document background. Given a document image, the proposed technique first estimates a document background surface through an iterative polynomial smoothing procedure. Different types of document degradation are then compensated by using the estimated document background surface. The text stroke edge is further detected from the compensated document image by using L1-norm image gradient. Finally, the document text is segmented by a local threshold that is estimated based on the detected text stroke edges. The proposed technique was submitted to the recent document image binarization contest (DIBCO) held under the framework of ICDAR 2009 and has achieved the top performance among 43 algorithms that are submitted from 35 international research groups.  相似文献   

12.
文字识别软件在识别文字时会产生错字、漏字等错误,因此要进行文件修正,以解决文本显示、文本修正和数据正确输出3个核心问题。为了保证数据的输入、显示、修正和输出时的完整性,设计了4个数据结构体,分别存储文字、表格、图像及需要修改的错误文字的信息,从而在改正原文件错误信息的同时不丢失原有信息。采用面向对象的方法将系统划分为文本视图对象、文本编辑对象和文本文档对象,方便地实现了文本文档数据的传输、文本内容的显示及文本内容的编辑。  相似文献   

13.
CCHMDBS:一个分布协作超媒体中文文档库写作系统   总被引:7,自引:0,他引:7  
本文阐述了一个面向大容量超媒体中文文档协作写作系统的主要设计思想和实现。着重介绍了系统的新一代超媒体系统特征和核心技术,如超链自动链接技术,超媒体系统的中文处理技术尤其是中文检索技术,分布与协作写作技术,文档目录可视化组织管理技术等。  相似文献   

14.
《Information Systems》2000,25(6-7):453-463
The paper discusses how recurrent organizational activities such as document preparation can be supported by a knowledge-based document preparation tool. REGENT (REport GENeration Tool) is a software environment, which generates documents from reusable document pieces by planning, executing and monitoring the document preparation process in an organizational setting. The documents are constructed from stored document pieces using artificial intelligence methods. A system architecture is developed to enable the document generation process within a broader office automation setting. The report preparation process knowledge is captured in a knowledge representation scheme. A two-phased artificial intelligence problem solving strategy is developed to carry out the reasoning steps when configuring reports from document pieces. The REGENT environment is especially effective when preparing recurrent report types such as the preparation of annual reports. The approach is illustrated with examples gathered during the partial implementation of REGENT at FAW (Artificial Intelligence Research Institute).  相似文献   

15.
The user interface for a document workstation is developed. The station is used for the editing and preview of layout, text, graphics, images, tables, logos and patterns. It is part of a document processing system, including main processor, central document store and typesetting facilities. The main part of the article is concerned with the conversion of these requirements to the workstation's architecture.  相似文献   

16.
周强  李宇  许雁冬 《微机发展》2010,(1):43-45,49
跨库检索系统的SRU接口返回的检索结果是XML文件流。IE浏览器可以解析该文件流,根据XSLT文件,自动转换为XHTML文件流,显示检索结果。但是,Firefox,Google Chrome浏览器却无法解析这个XML文件流,它们显示的是非标准格式的文本文字,用户无法查看检索结果。为了使这些浏览器能正常显示检索结果,采用dom4j的应用开发接口,应用XSLT文件,把XML文件流转换为XHTML文件流,从而使检索结果能在Firefox,Google Chrome浏览器上正常显示。  相似文献   

17.
We seek to leverage an expert user's knowledge about how information is organized in a domain and how information is presented in typical documents within a particular domain-specific collection, to effectively and efficiently meet the expert's targeted information needs. We have developed the semantic components model to describe important semantic content within documents. The semantic components model for a given collection (based on a general understanding of the type of information needs expected) consists of a set of document classes, where each class has an associated set of semantic components. Each semantic component instance consists of segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. This paper describes how the semantic components model can be used to improve an information retrieval system. We present experimental evidence from a large interactive searching study that compared the use of semantic components in a system with full text and keyword indexing, where we extended the query language to allow users to search using semantic components, to a base system that did not have semantic components. We evaluate the systems from a system perspective, where semantic components were shown to improve document ranking for precision-oriented searches, and from a user perspective. We also evaluate the systems from a session-based perspective, evaluating not only the results of individual queries but also the results of multiple queries during a single interactive query session.  相似文献   

18.
一种义项矩阵模型SMM   总被引:3,自引:0,他引:3  
本文介绍了一个同时利用词语和义项来索引和检索文档的信息检索模型,称为“义项矩阵模型”SMM(Sense Matrix Model) . 利用词语和义项的关联提出了一种新的文档表示,即把文档表示成为一个term ×sense 矩阵,由此引进或建立起一些很有效用的数据分析技术,包括基于矩阵范数的文档相似度计算、文档向量和矩阵的离散余弦变换(DCT) 、多维数据正交分解(MAD) 等,并提供了一种新的、无需翻译或者模型训练集的跨语言检索和多语言文本分类的技术。另外,还讨论了对文档进行DCT的部分试验结果。  相似文献   

19.
刘颖  胡明涵 《计算机应用》2008,28(5):1359-1361
设计并实现了带有主题词结构的政府公文分类系统,在公文分类预处理过程中充分利用主题词所携带的类别信息,运用随机关键词产生技术和Bootstrapping学习方法对公文文本特征空间进行转换并降维,实现了一个不同于传统的文本分类预处理过程,使公文分类系统的性能得到了提高。基于随机关键词产生技术和Bootstrapping 学习方法的公文分类系统分类效果优于普通分类器。  相似文献   

20.
A new technology for intelligent full text document retrieval is presented. The retrieval of a document is treated as an expert system problem, recognizing that human document retrieval is expert behavior. The technology is semantic measurement. A working prototype system, LIBRARY, has been built based on the technology. Input is a request for information, in unrestricted technical English; output is all documents with measured content similar to that of the request, ranked in order of relevance. Retrieval is unaffected by similarity or dissimilarity of terms between request and document. LIBRARY's performance is comparable to that of an expert human librarian, representing a significant improvement over traditional document retrieval systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号