共查询到19条相似文献,搜索用时 171 毫秒
1.
2.
本文结合Internet中文信息检索系统(WAIS)这样一个特定环境,探索了一种基于频率统计的中文自动标引方法,以往的自动标引方法所采用词典匹配法有词典不完备等限制,而本文所介绍的方法则实现了不用任何词典,并进行了真正的全文献处理的自动词抽词标引,它突破了已有经验和知识的限制,能自动发现和学习新词,可以说它是一种具有自学习特点的智能型中文文献的自动标引方法。基于该方法的自由词主题标引系统已经实现。在Internet下被应用于采用WAIS工具进行中文信息的检索和查询,为在Internet上建立中文信息库和信息查询开辟了道路。 相似文献
3.
词表的自动丰富——从元数据中提取关键词及其定位 总被引:10,自引:2,他引:10
词表和分类法是传统纸质文献环境下最重要的知识组织工具。它的更新和维护一直依靠手工进行。这限制了它在数字图书馆和网络信息环境下的应用。本文介绍了一项基于统计的、从元数据的标题中抽取关键词并定位在词表中的方法。定位的依据是抽取出的关键词所对应的标引词集的收敛性质。标引词是用于标引文献主题的、来自于词表的受控词汇,即主题词。在《中国分类主题词表》和北京大学图书馆提供的5 千余条计算机科技领域的书目数据上所进行实验证明了文中所述的方法是可行的、有效的。这一方法可以直接用来实现基于已标引语料库的自动编目和元数据自动生成。 相似文献
4.
本文介绍了多种文献自动标引中外文情报检索微机系统(简称DZQJ)的设计思想和功能特点。作者从“部件词典法”思想出发,做了进一步的试验和改进。较好地实现了对汉、英等文字的文献正名和并列名自动抽取关键词建立索引的自动标引功能。遵循国家有关文献录著标准和规则设计了各类文献输入输出格式。DZQJ系统具有包括四大检索途径在内的多种检索查找途径。研制中考虑了方便手工检索等问题。 相似文献
5.
6.
基于中文题名的计算机辅助标引 总被引:1,自引:0,他引:1
本文阐述了基于中文文献题名的计算机辅助标引系统的组成结构,并讨论了其中的一些关键技术问题,文章从系统结构设计方面,对该系统的建表模块,目录模块,分词标模块,校对模块,选号打印模块和系统管理模块进行了讨论,并着重讨论了分词标引技术。 相似文献
7.
自动标引是基于内容检索的关键技术之一。目前国内的汉语自动标引研究主要集中于汉语自动分词这个前期处理问题上。提出了一种基于词平台的汉字编码方法,建立了一种新的中文计算机文档表达格式,使词成为最小的信息单位,汉语分析无需再进行自动分词,可直接进行自动标引,从而提高自动标引的效率和质量。 相似文献
8.
9.
基于UCL的网页自动标引技术 总被引:5,自引:0,他引:5
UCL(UniformContentLocator)是作者、编者和读者进行语义沟通的工具,是进行信息快速选择、智能代理和信息主动服务的基础。该文针对网络信息检索中的自动标引问题,提出了一种基于UCL的网页自动标引技术。研究了从HTML编写的网页映射到XML文档的过程,并从中提取符合用户兴趣模型的UCL字段,从而达到网页自动标引的目的。实验验证了理论方案的正确性和有效性。 相似文献
10.
11.
12.
R Rada B Blum E Calhoun H Mili H Orthner S Singer 《Computers and biomedical research》1987,20(3):244-263
The terminology in medical informatics is evolving rapidly. The organizers of MEDINFO and SCAMC have used different sets of keywords to index their documents. Recognizing the limitations of this approach, members of those organizations joined with the National Library of Medicine in the creation of a better terminology for medical informatics. A hierarchical structure was placed on the terms to produce a thesaurus typical of the sort often used in the indexing and retrieving of documents. The building of this thesaurus began with an automatic merging of the thesaurus used by the Association of Computing Machinery and the Information Sciences component of the "Medical Subject Headings." This product was pruned by eliminating terms not related to those in the MEDINFO keyword list or not in the medical informatics literature. Further refinement of the thesaurus resulted from extensive discussions among the authors of this paper. The first major application of this terminology has been to the indexing of the articles in "MEDINFO-86 Proceedings." Major components of this medical informatics thesaurus also have been incorporated into the "Medical Subject Headings." This paper describes the process of preparing the thesaurus and presents an evaluation of its coverage of the "MEDINFO-86 Proceedings." 相似文献
13.
图片自动语义标注是基于内容图像检索中很重要且很有挑战性的工作。本文提出了一种基于Boosting学习的图片自动语义标注方法,建立了一个图片语义标注系统BLIR(boosting for linguistic indexing image retrievalsystem)。假设一组具有同一语义的图像能够用一个由一组特征组合而成的视觉模型来表示。2D-MHMM(2维多分辨率隐马尔科夫模型)实际上就是一种颜色和纹理特殊组合的模板。BLIR系统首先生成大量的2D-MHMM模型,然后用Boosting算法来实现关键词与2D-MHMM模型的关联。在一个包含60000张图像的图库上实现并测试了这个系统。结果表明,对这些测试图像,BLIR方法比其他方法具有更高的检索正确率。 相似文献
14.
Multimodal Video Indexing: A Review of the State-of-the-art 总被引:5,自引:7,他引:5
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods. 相似文献
15.
国内信息导航系统中的信息自动分类子系统的设计与实现 总被引:3,自引:1,他引:3
信息分类检索服务是信息导航系统中通常提供的一种重要服务,该文介绍了一种国内信息导航系统中使用的信息自动分类子系统及其实现方法,阐述了其分类主题词典的构成及其实现,最后也给出了信息自动分类子系统处理后入库的数据的检索方法。 相似文献
16.
The development of video applications for digital multimedia has highlighted the need for indexing tools, enabling the access to meaningful segments of video. The high cost of manual indexing creates a demand for the development of automatic algorithms, able to extract such indices with little intervention. In this paper we present new editing model–based algorithms that automatically extract low–level features in a movie: camera shots and camera motion. Rules of film making are used to derive higher-level elements, such as shot-reverse shot sequences. The algorithms have been tested on 20 h of movies and comparison with techniques in the literature is provided. 相似文献
17.
18.
This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived from Meta-1, to represent the content of documents. The goal of this approach is to improve document retrieval performance by better representation of documents. An evaluation method is described, and the performance of MetaIndex on the task of indexing the Slice of Life medical image collection is reported. 相似文献
19.
E. Appiani F. Cesarini A.M. Colla M. Diligenti M. Gori S. Marinai G. Soda 《International Journal on Document Analysis and Recognition》2001,4(2):69-83
In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described.
This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes.
The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically
index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled
users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying
reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents
automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to
dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning
passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing
strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining
to the specific document class. Experimental results are encouraging overall; in particular, document classification results
fulfill the requirements of high-volume application. Integration into production lines is under execution.
Received March 30, 2000 / Revised June 26, 2001 相似文献