首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
基于反馈规则学习的医学文献主题自动标引方法   总被引:3,自引:0,他引:3  
梁红兵  杨铭魁  黄晓 《计算机工程》2003,29(11):174-176
就中医药文献的自动标引研究,介绍了一种基于规则学习的主题自动标引方法。与以往基于词频统计和加权的自动标引方法,基于反馈的规则学习的方法能有效提取文献的副主题词,并进行主/副题词组配,具有很好的扩展性和适应性,基于此方法开发的系统在大量中医药文献中作了实验,获得了很好的标引结果。  相似文献   

2.
陆小华 《办公自动化》2002,(G00):206-212
本文结合Internet中文信息检索系统(WAIS)这样一个特定环境,探索了一种基于频率统计的中文自动标引方法,以往的自动标引方法所采用词典匹配法有词典不完备等限制,而本文所介绍的方法则实现了不用任何词典,并进行了真正的全文献处理的自动词抽词标引,它突破了已有经验和知识的限制,能自动发现和学习新词,可以说它是一种具有自学习特点的智能型中文文献的自动标引方法。基于该方法的自由词主题标引系统已经实现。在Internet下被应用于采用WAIS工具进行中文信息的检索和查询,为在Internet上建立中文信息库和信息查询开辟了道路。  相似文献   

3.
词表的自动丰富——从元数据中提取关键词及其定位   总被引:10,自引:2,他引:10  
词表和分类法是传统纸质文献环境下最重要的知识组织工具。它的更新和维护一直依靠手工进行。这限制了它在数字图书馆和网络信息环境下的应用。本文介绍了一项基于统计的、从元数据的标题中抽取关键词并定位在词表中的方法。定位的依据是抽取出的关键词所对应的标引词集的收敛性质。标引词是用于标引文献主题的、来自于词表的受控词汇,即主题词。在《中国分类主题词表》和北京大学图书馆提供的5 千余条计算机科技领域的书目数据上所进行实验证明了文中所述的方法是可行的、有效的。这一方法可以直接用来实现基于已标引语料库的自动编目和元数据自动生成。  相似文献   

4.
本文介绍了多种文献自动标引中外文情报检索微机系统(简称DZQJ)的设计思想和功能特点。作者从“部件词典法”思想出发,做了进一步的试验和改进。较好地实现了对汉、英等文字的文献正名和并列名自动抽取关键词建立索引的自动标引功能。遵循国家有关文献录著标准和规则设计了各类文献输入输出格式。DZQJ系统具有包括四大检索途径在内的多种检索查找途径。研制中考虑了方便手工检索等问题。  相似文献   

5.
本系统将分词技术对停用词表的利用相结合,对上百篇中文科技文献标题做了自动抽词标引, 以得了很好的标引效果。  相似文献   

6.
基于中文题名的计算机辅助标引   总被引:1,自引:0,他引:1  
本文阐述了基于中文文献题名的计算机辅助标引系统的组成结构,并讨论了其中的一些关键技术问题,文章从系统结构设计方面,对该系统的建表模块,目录模块,分词标模块,校对模块,选号打印模块和系统管理模块进行了讨论,并着重讨论了分词标引技术。  相似文献   

7.
自动标引是基于内容检索的关键技术之一。目前国内的汉语自动标引研究主要集中于汉语自动分词这个前期处理问题上。提出了一种基于词平台的汉字编码方法,建立了一种新的中文计算机文档表达格式,使词成为最小的信息单位,汉语分析无需再进行自动分词,可直接进行自动标引,从而提高自动标引的效率和质量。  相似文献   

8.
自动标引中中文姓名的切分   总被引:2,自引:2,他引:2  
靳从  唐振民  杨静宇 《计算机工程》2003,29(22):153-154
主题词的分割是计算机自动标引的第1步,由于中文姓名不像英文、欧洲语言那样可以通过大写字母来辨别,这就给姓名的识别带来一定的困难。该文根据自动标引系统的要求,充分利用姓名的特点及相关信息,给出了一个基于姓名基本结构的切分方法。通过系统标引结果证明了方法的可行性。  相似文献   

9.
基于UCL的网页自动标引技术   总被引:5,自引:0,他引:5  
UCL(UniformContentLocator)是作者、编者和读者进行语义沟通的工具,是进行信息快速选择、智能代理和信息主动服务的基础。该文针对网络信息检索中的自动标引问题,提出了一种基于UCL的网页自动标引技术。研究了从HTML编写的网页映射到XML文档的过程,并从中提取符合用户兴趣模型的UCL字段,从而达到网页自动标引的目的。实验验证了理论方案的正确性和有效性。  相似文献   

10.
中医药文献检索是计算机实现知识管理的前提。针对中医药信息检索过程中存在的种种问题,文中研究并提出一种基于Agent 的方剂智能检索系统。该系统以Agent为基本模型架构组件,通过构建中医方剂术语词库,并利用Agent技术的协作性和自治性进行用户的个性化检索,实现了中医药方剂数据的关联分析,完善了中医药文献相似检索功能。实验结果表明,该系统能够有效提高中医药文献的检索效率,为建立新型中医药信息检索系统提供途径。  相似文献   

11.
为了提高饲料环模的机械加工效率,降低生产成本,设计了一种基于AVR单片机的饲料环模机械加工钻床的自动控制系统。实验结果表明,该系统能够自动进刀,自动排屑,自动分度;进刀时以设定进刀速度恒速进刀;分度控制全部由程序完成,故障率低、分度重复精度高;具有自动修正功能,可对任意孔数的饲料环模进行精确分度钻孔加工;可重复进行定位、阶梯孔、通孔、扩孔、铰孔等加工工序,使用极为方便。  相似文献   

12.
The terminology in medical informatics is evolving rapidly. The organizers of MEDINFO and SCAMC have used different sets of keywords to index their documents. Recognizing the limitations of this approach, members of those organizations joined with the National Library of Medicine in the creation of a better terminology for medical informatics. A hierarchical structure was placed on the terms to produce a thesaurus typical of the sort often used in the indexing and retrieving of documents. The building of this thesaurus began with an automatic merging of the thesaurus used by the Association of Computing Machinery and the Information Sciences component of the "Medical Subject Headings." This product was pruned by eliminating terms not related to those in the MEDINFO keyword list or not in the medical informatics literature. Further refinement of the thesaurus resulted from extensive discussions among the authors of this paper. The first major application of this terminology has been to the indexing of the articles in "MEDINFO-86 Proceedings." Major components of this medical informatics thesaurus also have been incorporated into the "Medical Subject Headings." This paper describes the process of preparing the thesaurus and presents an evaluation of its coverage of the "MEDINFO-86 Proceedings."  相似文献   

13.
基于Boosting学习的图片自动语义标注   总被引:1,自引:0,他引:1       下载免费PDF全文
图片自动语义标注是基于内容图像检索中很重要且很有挑战性的工作。本文提出了一种基于Boosting学习的图片自动语义标注方法,建立了一个图片语义标注系统BLIR(boosting for linguistic indexing image retrievalsystem)。假设一组具有同一语义的图像能够用一个由一组特征组合而成的视觉模型来表示。2D-MHMM(2维多分辨率隐马尔科夫模型)实际上就是一种颜色和纹理特殊组合的模板。BLIR系统首先生成大量的2D-MHMM模型,然后用Boosting算法来实现关键词与2D-MHMM模型的关联。在一个包含60000张图像的图库上实现并测试了这个系统。结果表明,对这些测试图像,BLIR方法比其他方法具有更高的检索正确率。  相似文献   

14.
Multimodal Video Indexing: A Review of the State-of-the-art   总被引:5,自引:7,他引:5  
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.  相似文献   

15.
国内信息导航系统中的信息自动分类子系统的设计与实现   总被引:3,自引:1,他引:3  
信息分类检索服务是信息导航系统中通常提供的一种重要服务,该文介绍了一种国内信息导航系统中使用的信息自动分类子系统及其实现方法,阐述了其分类主题词典的构成及其实现,最后也给出了信息自动分类子系统处理后入库的数据的检索方法。  相似文献   

16.
The development of video applications for digital multimedia has highlighted the need for indexing tools, enabling the access to meaningful segments of video. The high cost of manual indexing creates a demand for the development of automatic algorithms, able to extract such indices with little intervention. In this paper we present new editing model–based algorithms that automatically extract low–level features in a movie: camera shots and camera motion. Rules of film making are used to derive higher-level elements, such as shot-reverse shot sequences. The algorithms have been tested on 20 h of movies and comparison with techniques in the literature is provided.  相似文献   

17.
18.
This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived from Meta-1, to represent the content of documents. The goal of this approach is to improve document retrieval performance by better representation of documents. An evaluation method is described, and the performance of MetaIndex on the task of indexing the Slice of Life medical image collection is reported.  相似文献   

19.
In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining to the specific document class. Experimental results are encouraging overall; in particular, document classification results fulfill the requirements of high-volume application. Integration into production lines is under execution. Received March 30, 2000 / Revised June 26, 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号