共查询到20条相似文献,搜索用时 171 毫秒
1.
数字图书馆中的检索式扩展方法研究* 总被引:3,自引:0,他引:3
研究了自动检索式扩展的几个基本方法,包括虚拟相关反馈、文本聚合、语词关联等,简单分析了这些方法在数字图书馆环境中的应用可行性,在此基础上介绍了基于概念的检索式扩展方法。介绍了一个将传统图书馆中的知识组织工具(分类法、主题词表)加以改造来构造知识网络以支持基于概念的检索式扩展的方法。 相似文献
2.
一种面向元数据描述文档的概念检索方法 总被引:2,自引:0,他引:2
元数据描述文档在检索过程中仍然存在着检索词和描述词不匹配的问题。文章在准确描述领域概念之间关系的概念网的支持下,给出检索词和描述词的概念相关度计算公式,提出了用概念扩展来提高检索质量的新方法。并在领域概念网和元数据描述的科技文档组成的实验系统上,进行了多种实验和分析,证明了检索方法的有效性。 相似文献
3.
4.
利用本体和主题词表的集成构造RDF模式 总被引:2,自引:0,他引:2
为了减少语义异构性带来的信息发现、集成和存取的困难,论述了语义元数据构造,提出了通过集成现存的本体和主题词表构造元数据模式的一种新方法,即元数据模式构造的两步方法:在主题词T和本体O之间的连接关系规范;概念主题词表的自动构造。这个集成基于主题词术语和本体概念之间的蕴含关系规范,并产生具体应用的元数据模式,同时也给出利用结果元数据模式构造RDF模式的过程。 相似文献
5.
随着Internet和数字图书馆这两种基础信息资源的大量涌现,用户在检索信息之前,如何选择合适的目标站点来提交查询,从而降低查询代价、提高查询效率,已经成为一个重要任务。这个问题更加一般的说法是“数据源定位”或“数据库发现”。元数据是关于数据的数据,数字图书馆中,每个数据文档由其元数据描述,元数据是数字图书馆管理、检索数据以及在各个层面上实现互操作的重要手段。文章提出了一种基于元数据的数据源发现算法,并在召回率、检索精度等方面对这种算法作了评价。 相似文献
6.
张哲 《计算机技术与发展》2004,14(3)
为了减少语义异构性带来的信息发现、集成和存取的困难,论述了语义元数据构造,提出了通过集成现存的本体和主题词表构造元数据模式的一种新方法,即元数据模式构造的两步方法:在主题词T和本体O之间的连接关系规范;概念主题词表的自动构造.这个集成基于主题词术语和本体概念之间的蕴含关系规范,并产生具体应用的元数据模式,同时也给出利用结果元数据模式构造RDF模式的过程. 相似文献
7.
8.
9.
基于语义网的电子政务文档智能检索 总被引:7,自引:0,他引:7
根据电子政务文档的特点,通过电子政务主题词表计算检索文档集和检索请求的特征值。讨论了检索文档集和检索请求的相似性计算,从而找到与检索请求匹配的文档。根据电子政务文档元数据的语义组织形式,研究电子政务文档元数据的检索问题。对所检索到的文档进行元数据语义组织,从而在语义推理的基础上实现智能检索。 相似文献
10.
政务信息资源检索是政务信息资源共享系统的重要功能。以《政务信息资源目录体系》国家标准中的XML元数据规范为依据,提出了一种支持关键词搜索的政务信息资源检索算法。该算法使用政务信息资源XML元数据的TF*IDF和关键词依赖度对检索结果集进行语义相关度排序,通过改进关键词倒排索引来提高检索效率。实验表明该算法在检索结果排序精确度和时间效率上均有较大的改善,可有效提高政务信息资源利用的数据共享服务能力。 相似文献
11.
HIRMA results in an integrated environment to query any full-text document base system by natural language sentences, obtaining a document set relevant to the query. Moreover it supports hypertextual navigation into the document base. The system uses content based document representation and retrieval methods.
In this paper the representation framework as well as the retrieval and navigation algorithms used by HIRMA are described. Coverage and portability throughout application domains are supported by the lexical acquisition system ARIOSTO that provides the suitable lexical knowledge and processing methods to extract from raw text the semantic representation of documents content. 相似文献
12.
基于Berkeley DB的文献检索设计与实现 总被引:1,自引:0,他引:1
该文基于开放源码的BerkeleyDB嵌入式数据库,采用不支持事务的BerkeleyDBConcurrentDataStore配置,实现了科学文献的全文检索和组合字段检索功能。该检索系统有着低开销,高效率的优点。为了进一步进行比较,文章还设计实现了基于Oracle数据库的检索方案。从实验结果来看,前者无论在开销还是检索效率上都远远优于后者,完全可适用于中大规模的各种检索应用。 相似文献
13.
We present a generic and flexible framework for building geoscientific metadata portals independent of content standards for metadata and protocols. Data can be harvested with commonly used protocols (e.g., Open Archives Initiative Protocol for Metadata Harvesting) and metadata standards like DIF or ISO 19115. The new Java-based portal software supports any XML encoding and makes metadata searchable through Apache Lucene. Software administrators are free to define searchable fields independent of their type using XPath. In addition, by extending the full-text search engine (FTS) Apache Lucene, we have significantly improved queries for numerical and date/time ranges by supplying a new trie-based algorithm, thus, enabling high-performance space/time retrievals in FTS-based geo portals. The harvested metadata are stored in separate indexes, which makes it possible to combine these into different portals. The portal-specific Java API and web service interface is highly flexible and supports custom front-ends for users, provides automatic query completion (AJAX), and dynamic visualization with conventional mapping tools. The software has been made freely available through the open source concept. 相似文献
14.
15.
Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with such rich metadata is not scalable. As well as, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about document automatically. On the other hand, the Web content is increasing becoming multilingual since growing amount of data generated on the Web is non-English. Current metadata extraction systems are generally based on English content and this requires to be revolutionized in order to adapt to the changing dynamics of the Web. To alleviate these problems, we introduce a novel automatic metadata extraction framework, which is based on a novel fuzzy based method for automatic cognitive metadata generation and uses different document parsing algorithms to extract rich metadata from multilingual enterprise content using the newly developed DocBook, Resource Type and Topic ontologies. Since the metadata generation process is based upon DocBook structured enterprise content, our framework is focused on enterprise documents and content which is loosely based on the DocBook type of formatting. DocBook is a common documentation formatting to formally produce corporate data and it is adopted by many enterprises. The proposed framework is illustrated and evaluated on English, German and French versions of the Symantec Norton 360 knowledge bases. The user study showed that the proposed fuzzy-based method generates reasonably accurate values with an average precision of 89.39% on the metadata values of document difficulty, document interactivity level and document interactivity type. The proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for difficulty metadata extraction (∼11% enhancement). In addition, user perceived metadata quality scores (mean of 5.57 out of 6) found to be high and automated metadata analysis showed that the extracted metadata is high quality and can be suitable for personalized information retrieval. 相似文献
16.
17.
18.
提出了一种基于确定性随机分布算法分布元数据和数据对象的可伸缩集群文件系统结构。其中目录路径属性与目录对象分离的元数据管理方法,在提高系统性能、均衡元数据分布和减少元数据迁移等方面具有明显优势。提出的基于动态区间映射的数据对象布局算法,支持权重分布和副本,在均衡数据分布和最少迁移数据方面都具有统计意义上的最优性,有效解决了动态存储系统的数据均衡分布与可伸缩性问题。 相似文献
19.
A new approach is described for the fusion of multimedia information based on the concept of active documents advertising on the Internet, whereby the metadata of a document travels in the network to seek out documents of interest to the parent document and, at the same time, advertises its parent document to other interested documents. This abstraction of metadata is called an adlet, which is the core of our approach. Two important features make this approach applicable to multimedia information fusion, information retrieval, data mining, geographic information systems, and medical information systems: 1) any document, including a Web page, database record, video file, audio file, image and even paper documents, can be enhanced by an adlet and become an active document; and 2) any node in a nonactive network can be enhanced by adlet-savvy software and the adlet-enhanced node can coexist with other nonenhanced nodes. An experimental prototype provides a testbed for feasibility studies in a hybrid active network 相似文献