首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
基于语义网的电子政务文档智能检索   总被引:7,自引:0,他引:7  
杨芳  杨振山 《计算机应用》2005,25(10):2434-2435
根据电子政务文档的特点,通过电子政务主题词表计算检索文档集和检索请求的特征值。讨论了检索文档集和检索请求的相似性计算,从而找到与检索请求匹配的文档。根据电子政务文档元数据的语义组织形式,研究电子政务文档元数据的检索问题。对所检索到的文档进行元数据语义组织,从而在语义推理的基础上实现智能检索。  相似文献   

2.
一种基于语义特征的Web文档检索方法   总被引:2,自引:0,他引:2  
Web文档聚类在Web信息检索中起着重要的作用。文中提出了一种新的Web文档聚类和检索算法。该算法采用有序聚类的方法,根据Web文档的物理结构概括其语义段落和提取相应的语义特征,并以此作为文档检索的基础;在此基础上,根据用户的检索要求直接在文档的语义段落层次计算其相似性,大大提高了检索的精度和效率。实验结果表明,文中提出的算法具有一定的实用性。  相似文献   

3.
基于语义查询本体的语义网文档检索   总被引:1,自引:0,他引:1  
语义网的发展使人们需要对语义网文档进行检索.为了在不需要专业知识和技巧的情况下让用户能形成语义的查询,提出了一种基于本体可以在结构化的知识库里检索语义网文档的算法.通过将自然语言查询术语映射到词汇意义来构造查询本体,以及检索跟查询本体最相似的语义网文档,提高了对语义网文档检索的查准率,使用户能更好地利用语义检索服务.  相似文献   

4.
利用HTML文档的元数据,可以为Web检索提供多样化的检索手段.本文提出了一种从HTML文档自动提取文档元数据的方法,对其中提取规则的设计、规约算法及其复杂度分析做出了重点介绍.该方法的提取规则在语法形式上和文档片断接近,更适合自动生成,通过自动规约生成规则无需人工分析,适应Web文档特点.文章最后给出了实验结果并进行了分析.  相似文献   

5.
基于元数据语义模型的数字资源Top-N检索   总被引:1,自引:0,他引:1       下载免费PDF全文
徐和祥  张世明 《计算机工程》2010,36(22):272-273
提出一种以元数据为语义基础的用户查询模型用于数字资源的检索。通过改进传统关系库中的Top-N算法,以不同数据类型和元数据为语义基础,给出一种基于语义的相似度度量新方法。在此基础上开发一套智能检索系统,并将其用于上海教育资源库。应用结果表明,该系统可有效提高信息检索的准确度。  相似文献   

6.
针对传统的论文检索方法缺乏语义理解,检索结果相关度不高的缺点,采用基于语义网络的文档语义表达模型,提出一种基于领域本体的检索方法。首先结合学科分类体系构建领域本体,并对论文文档进行语义索引;然后根据本体知识和索引信息构建基于语义网络的文档语义表达模型;最后改进用户查询与语义网络的相关度算法,综合关键词和语义的方法实现结果排序。实验结果表明,该方法能有效地提高论文检索的准确率和召回率。  相似文献   

7.
.基于用户查询扩展的自动摘要技术*   总被引:1,自引:0,他引:1  
提出了一种新的文档自动摘要方法,利用非负矩阵分解算法将原始文档表示为若干语义特征向量的线性组合,通过相似性计算来确定与用户查询高度相关的语义特征向量,抽取在该向量上具有较大投影系数的句子作为摘要,在此过程中,多次采用相关反馈技术对用户查询进行扩展优化。实验表明,该方法所得摘要在突出文档主题的同时,体现了用户的需求和兴趣,有效改善了信息检索的效率。  相似文献   

8.
基于保局投影的相关反馈算法   总被引:1,自引:0,他引:1  
在原有保局投影算法中引入用户反馈,用其更新构建降维映射的特征向量,从而得到一个更能够反映语义属性的图像表示子空间.该算法利用用户反馈迅速优化图像表示,使它具有长期学习的能力.实验结果表明:该算法可以提高检索的准确度,而且在经过长期学习后可以获得一个近似最优的图像降维子空间.  相似文献   

9.
潜在语义分析在进行大规模语义检索时计算效率较低、存储开销较大。针对该问题,提出一种基于聚类的潜在语义检索算法。通过文档之间的结构关系对文档进行聚类,利用簇代替文档分析潜在语义,以此减少处理文档的个数。实验结果表明,该算法能减少查询时间,且检索精确度较高。  相似文献   

10.
《计算机工程》2018,(3):189-194
传统的搜索引擎仅返回给用户包含查询关键字的文档,忽略了查询背后用户真正的信息需求。为此,将文档检索看作个性化推荐问题,提出一种查询意图识别的主题模型个性化检索算法。对用户检索历史进行潜在狄利克雷分布主题建模,结合检索历史主题模型识别用户查询的潜在意图,并按主题相关度进行文档推荐,计算查询到文档集的KL距离对文档集排序,最终返回给用户个性化检索文档列表。实验结果表明,与基于协同相似计算和基于用户聚类的推荐算法相比,该算法能够更准确有效地为用户提供个性化检索。  相似文献   

11.
A new approach is described for the fusion of multimedia information based on the concept of active documents advertising on the Internet, whereby the metadata of a document travels in the network to seek out documents of interest to the parent document and, at the same time, advertises its parent document to other interested documents. This abstraction of metadata is called an adlet, which is the core of our approach. Two important features make this approach applicable to multimedia information fusion, information retrieval, data mining, geographic information systems, and medical information systems: 1) any document, including a Web page, database record, video file, audio file, image and even paper documents, can be enhanced by an adlet and become an active document; and 2) any node in a nonactive network can be enhanced by adlet-savvy software and the adlet-enhanced node can coexist with other nonenhanced nodes. An experimental prototype provides a testbed for feasibility studies in a hybrid active network  相似文献   

12.
个人计算机中存在大量无结构文档,从无结构文档中提取有效信息是实现语义桌面管理的一个重点和难点。而实体的识别和提取又是信息提取技术中的一个重要前提和关键步骤。本文首先提出一种利用文本线索和本体元数据来识别无结构文档中实体的方法,然后手工建立一个文档集合,在该集合上验证新方法在特定领域内的实体识别效果。  相似文献   

13.
李勇  相中启 《计算机应用》2019,39(1):245-250
针对云计算环境下已有的密文检索方案不支持检索关键词语义扩展、精确度不够、检索结果不支持排序的问题,提出一种支持检索关键词语义扩展的可排序密文检索方案。首先,使用词频逆文档频率(TF-IDF)方法计算文档中关键词与文档之间的相关度评分,并对文档不同域中的关键词设置不同的位置权重,使用域加权评分方法计算位置权重评分,将相关度评分与位置权重评分的乘积设置为关键词在文档索引向量上相应位置的取值;其次,根据WordNet语义网对授权用户输入的检索关键词进行语义扩展,得到语义扩展检索关键词集合,使用编辑距离公式计算语义扩展检索关键词集合中关键词之间的相似度,并将相似度值设置为检索关键词在文档检索向量上相应位置的取值;最后,加密产生安全索引和文档检索陷门,在向量空间模型(VSM)下进行内积运算,以内积运算的结果为密文检索文档的排序依据。理论分析和实验仿真表明,所提方案在已知密文模型和已知背景知识模型下是安全的,且具备对检索结果的排序能力;与多关键字密文检索结果排序(MRSE)方案相比,所提方案支持关键词语义扩展,查询准确率比MRSE方案更加准确可靠,而检索时间则与MRSE方案相差不大。  相似文献   

14.
Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with such rich metadata is not scalable. As well as, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about document automatically. On the other hand, the Web content is increasing becoming multilingual since growing amount of data generated on the Web is non-English. Current metadata extraction systems are generally based on English content and this requires to be revolutionized in order to adapt to the changing dynamics of the Web. To alleviate these problems, we introduce a novel automatic metadata extraction framework, which is based on a novel fuzzy based method for automatic cognitive metadata generation and uses different document parsing algorithms to extract rich metadata from multilingual enterprise content using the newly developed DocBook, Resource Type and Topic ontologies. Since the metadata generation process is based upon DocBook structured enterprise content, our framework is focused on enterprise documents and content which is loosely based on the DocBook type of formatting. DocBook is a common documentation formatting to formally produce corporate data and it is adopted by many enterprises. The proposed framework is illustrated and evaluated on English, German and French versions of the Symantec Norton 360 knowledge bases. The user study showed that the proposed fuzzy-based method generates reasonably accurate values with an average precision of 89.39% on the metadata values of document difficulty, document interactivity level and document interactivity type. The proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for difficulty metadata extraction (∼11% enhancement). In addition, user perceived metadata quality scores (mean of 5.57 out of 6) found to be high and automated metadata analysis showed that the extracted metadata is high quality and can be suitable for personalized information retrieval.  相似文献   

15.
隐含语义索引及其在中文文本处理中的应用研究   总被引:33,自引:0,他引:33  
信息检索本质上是语义检索,而传统信息检索系统都是基于独立词索引,因此检索效果并不理想,隐含语义索引是一种新型的信息检索模型,它通过奇异值分析,将词向量和文档向量投影到一个低维空间,消减了词和文档之间的语义模糊度,使得文档之间的语义关系更为明晰。实验和理论结果证实了隐含语义索引能够取得更好的检索效果。本文论述了隐含语义索引的理论基础,研究了隐含语义索引在中文文本处理中的应用,包括中文文本检索、中文文本分类和中文文本聚类等。  相似文献   

16.
Semantic web and grid technologies offer a promising approach to facilitate semantic information retrieval based on heterogeneous document repositories. In this paper the authors describe the design and implementation of an Ontology Server (OS) component to be used in a distributed contents management grid system. Such a system could be used to build collection document repositories, mutually interoperable at the semantic level. From the contents point of view, the distributed system is built as a collection of multimedia documents repository nodes glued together by an OS. A set of methodologies and tools to organize the knowledge space around the notion of contents community is developed, where each content provider will publish a set of ontologies to collect metadata information organized and published through a knowledge community, built on top of the OS. These methodologies were deployed while setting up a prototype to connect about 20 museums in the city of Naples (Italy).  相似文献   

17.
A new technology for intelligent full text document retrieval is presented. The retrieval of a document is treated as an expert system problem, recognizing that human document retrieval is expert behavior. The technology is semantic measurement. A working prototype system, LIBRARY, has been built based on the technology. Input is a request for information, in unrestricted technical English; output is all documents with measured content similar to that of the request, ranked in order of relevance. Retrieval is unaffected by similarity or dissimilarity of terms between request and document. LIBRARY's performance is comparable to that of an expert human librarian, representing a significant improvement over traditional document retrieval systems.  相似文献   

18.
基于近似匹配模型的XML元数据检索   总被引:4,自引:0,他引:4  
将无序标签树匹配分解为树结构匹配和标签语义匹配,采用树结构匹配和语义匹配相结 合的方法,对传统树匹配算法进行了改进,提出了近似匹配概念,并针对元数据XML描述的结构化特 征,设计了一种基于三层近似匹配模型的元数据检索方法。这种检索方法可根据用户的不同需求有 效地调节元数据的查准率和查全率。最后构造了基于近似匹配模型的元数据查询系统原型,实验证 明近似匹配模型在元数据检索应用中具有可行性和高效性。  相似文献   

19.
This paper proposes a non-domain-specific metadata ontology as a core component in a semantic model-based document management system (DMS), a potential contender towards the enterprise information systems of the next generation. What we developed is the core semantic component of an ontology-driven DMS, providing a robust semantic base for describing documents’ metadata. We also enabled semantic services such as automated semantic translation of metadata from one domain to another. The core semantic base consists of three semantic layers, each one serving a different view of documents’ metadata. The core semantic component’s base layer represents a non-domain-specific metadata ontology founded on ebRIM specification. The main purpose of this ontology is to serve as a meta-metadata ontology for other domain-specific metadata ontologies. The base semantic layer provides a generic metadata view. For the sake of enabling domain-specific views of documents’ metadata, we implemented two domain-specific metadata ontologies, semantically layered on top of ebRIM, serving domain-specific views of the metadata. In order to enable semantic translation of metadata from one domain to another, we established model-to-model mappings between these semantic layers by introducing SWRL rules. Having the semantic translation of metadata automated not only allows for effortless switching between different metadata views, but also opens the door for automating the process of documents long-term archiving. For the case study, we chose judicial domain as a promising ground for improving the efficiency of the judiciary by introducing the semantics in this field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号