首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
搜索引擎并不真正搜索互联网,它搜索的实际上是预先整理好的网页索引数据库.因此网页索引库建立的好坏直接影响最后的查询结果的准确性和用户的查询速度.本文提出了一种建立倒排索引的算法并进行了分析和研究.  相似文献   

2.
吴文娟  车明 《微处理机》2006,27(6):83-85
倒排文件是搜索引擎检索系统普遍采用的索引技术。在实验基础上,针对中文搜索引擎中索引的时效性和传统倒排索引在更新时的缺点,提出分组索引技术和一种追加索引的更新算法,可以有效提高搜索引擎的检索效率,同时不影响系统检索效果。  相似文献   

3.
一种支持高效检索的即时更新倒排索引方法   总被引:8,自引:1,他引:8  
随着万维网的快速发展,产生了一种全新概念的高效文档索引技术,文章实现了一种支持高效检索及即时更新的倒排索引,它是WebME(WebMiningEnvironment)原型系统的一部分,这部分用来对特定的查询进行高效的检索,并支持即时增量索引,即对新加入的文档可以立即加入索引,且不用重新对原内容进行重索引,并且在更新索引时不会影响查询的进行。  相似文献   

4.
While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic.  相似文献   

5.
随着互联网应用的深入,越来越多的用户希望通过搜索引擎获得特定行业的相关信息,通用搜索引擎无法有效地满足相应需求。文中主要介绍医药行业垂直搜索引擎的设计与实现。设计基于智能搜索引擎的架构,采用了任务驱动的聚焦搜索、隐藏搜索技术;字词混合倒排索引及优化的字倒排索引、检索技术。提供了资源收集阶段的可控策略爬行,和高效的索引、检索功能。实现了针对医药行业的高专业度、高准确率、高效率的信息垂直搜索。  相似文献   

6.
随着互联网应用的深入,越来越多的用户希望通过搜索引擎获得特定行业的相关信息,通用搜索引擎无法有效地满足相应需求。文中主要介绍医药行业垂直搜索引擎的设计与实现。设计基于智能搜索引擎的架构,采用了任务驱动的聚焦搜索、隐藏搜索技术;字词混合倒排索引及优化的字倒排索引、检索技术。提供了资源收集阶段的可控策略爬行,和高效的索引、检索功能。实现了针对医药行业的高专业度、高准确率、高效率的信息垂直搜索。  相似文献   

7.
一种基于可扩展散列表的倒排索引更新策略   总被引:5,自引:0,他引:5  
吴恒山  刘兴字  左琼 《计算机工程》2004,30(8):83-84,F003
该文提出一种新的基于可扩展散列表的倒排索引更新策略,使倒排索引具有良好的可扩展性。它既支持文档的插入、删除操作,又具有较高的查询效率和空间利用率。并在它的基础上,实现了倒排索引的增量更新和实时更新。  相似文献   

8.
互联网文本数量持续爆炸式增长,用户通过互联网查找信息变得更加困难,响应时间得不到满足。针对藏文本身的语言学特点,探讨一种面向信息搜索的藏文文本索引建立策略,建立一种高效的藏文文本索引,以提高藏文信息检索速度。  相似文献   

9.
使用RDBMS的XML文档的扩展倒排索引技术   总被引:1,自引:0,他引:1  
胡光 《计算机工程》2005,31(3):99-101
倒排索引是目前检索领域广泛应用的一种技术,但要对XML文档实现包含查询,该技术还需要改进。该文提出了一种扩展倒排索引技术以处理包含查询,通过实验与以前的方法比较证明了它的有效性。该方法可以不对RDBMS做任何改动,应用在RDBMS中实现处理包含查询能够得到与IR实现一致的效果。  相似文献   

10.
Semplore: A scalable IR approach to search the Web of Data   总被引:1,自引:0,他引:1  
The Web of Data keeps growing rapidly. However, the full exploitation of this large amount of structured data faces numerous challenges like usability, scalability, imprecise information needs and data change. We present Semplore, an IR-based system that aims at addressing these issues. Semplore supports intuitive faceted search and complex queries both on text and structured data. It combines imprecise keyword search and precise structured query in a unified ranking scheme. Scalable query processing is supported by leveraging inverted indexes traditionally used in IR systems. This is combined with a novel block-based index structure to support efficient index update when data changes. The experimental results show that Semplore is an efficient and effective system for searching the Web of Data and can be used as a basic infrastructure for Web-scale Semantic Web search engines.  相似文献   

11.
随着信息搜索日益成为互联网的主要应用.搜索引擎技术正成为计算机工业界和学术界争相研究和开发的热点。本文主要介绍搜索引擎的基本原理、工作过程及技术发展趋势.  相似文献   

12.
随着信息搜索日益成为互联网的主要应用,搜索引擎技术正成为计算机工业界和学术界争相研究和开发的热点。本文主要介绍搜索引擎的基本原理、工作过程及技术发展趋势。  相似文献   

13.
向量空间划分类索引的动态更新代价分析   总被引:1,自引:0,他引:1       下载免费PDF全文
代价分析是借助代价模型预测和评估空间索引结构的一种有效方法。针对索引的空间划分和数据划分这两种策略,在已有的索引结构基础上建立了向量空间划分类型索引的代价模型,该模型可实现查询以及动态更新的性能评价。以KDB-树系为评估对象,从结点存取次数(NA)值推导计算出页面存取次数(PA)的估计值,并在标准数据分布上对估计值的相关误差率进行了验证。结果表明代价模型的平均相关误差率较低,不超过12%。代价分析的结果有助于对索引结构的动态更新代价的预估和查询的优化。  相似文献   

14.
Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the search systems, the effectiveness of a metasearch engine is mainly determined by the quality of the results it returns in response to user queries. Since these services do not maintain their own document index, they exploit multiple search engines using a rank aggregation method in order to classify the collected results. However, the rank aggregation methods which have been proposed until now, utilize a very limited set of parameters regarding these results, such as the total number of the exploited resources and the rankings they receive from each individual resource. In this paper we present QuadRank, a new rank aggregation method, which takes into consideration additional information regarding the query terms, the collected results and the data correlated to each of these results (title, textual snippet, URL, individual ranking and others). We have implemented and tested QuadRank in a real-world metasearch engine, QuadSearch, a system developed as a testbed for algorithms related to the wide problem of metasearching. The name QuadSearch is related to the current number of the exploited engines (four). We have exhaustively tested QuadRank for both effectiveness and efficiency in the real-world search environment of QuadSearch and also, using a task from the recent TREC-2009 conference. The results we present in our experiments reveal that in most cases QuadRank outperformed all component engines, another metasearch engine (Dogpile) and two successful rank aggregation methods, Borda Count and the Outranking Approach.  相似文献   

15.
搜索引擎的混合索引技术   总被引:5,自引:0,他引:5  
倒排文件是搜索引擎检索系统普遍采用的索引技术。针对中文搜索引擎中采用自动分词的全文检索因分词词典规模小导致的检索效率下降与词典规模扩大导致检索效果下降的矛盾,论文在天网搜索引擎的实践基础上,提出了一种基于倒排文件实现的混合索引的方法,它可以有效提高搜索引擎下短语查询的检索效率,同时不影响系统检索效果。  相似文献   

16.
首先介绍了传统搜索引擎的基本原理以及结构,指出了传统搜索引擎存在的不足,然后介绍了元搜索引擎的定义、运作机制及其发展的方向。在此理论基础上提出了新一代元搜索引擎基于用户的调度改进理念。实验表明,该改进提高了用户的检索效率和质量。  相似文献   

17.
We seek to leverage an expert user's knowledge about how information is organized in a domain and how information is presented in typical documents within a particular domain-specific collection, to effectively and efficiently meet the expert's targeted information needs. We have developed the semantic components model to describe important semantic content within documents. The semantic components model for a given collection (based on a general understanding of the type of information needs expected) consists of a set of document classes, where each class has an associated set of semantic components. Each semantic component instance consists of segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. This paper describes how the semantic components model can be used to improve an information retrieval system. We present experimental evidence from a large interactive searching study that compared the use of semantic components in a system with full text and keyword indexing, where we extended the query language to allow users to search using semantic components, to a base system that did not have semantic components. We evaluate the systems from a system perspective, where semantic components were shown to improve document ranking for precision-oriented searches, and from a user perspective. We also evaluate the systems from a session-based perspective, evaluating not only the results of individual queries but also the results of multiple queries during a single interactive query session.  相似文献   

18.
With the rising adoption of web services, effective management of web services becomes a critical issue in making the paradigm of service-oriented computing more practical. In this paper, a novel structure, called Vector-based service Lattice ( VsLattice), is devised to index web services in a semantic way. Each web service is modeled as a group of Service Operation Vectors (SOVs) in the vector space, and each SOV represents an operation provided by the service. The web services, SOVs and the relationship between web services and SOVs form the Conceptual Indexing Context (CIC) of a given service collection. In the CIC, web services that provide similar operations (functions) are conceptually indexed by the same Operation Vector Concepts (OVCs). The underlying relationships among the OVCs are captured with the VsLattice, which is constructed by adopting the traditional concept lattice in a CIC. By taking advantage of the information obtained from the VsLattice, a new representation of SOV is devised. Based on this representation, a novel service retrieval model and the implemental system are developed to retrieve web services efficiently. The performance and retrieving quality of the proposed approach has been evaluated through a series of experiments.
Aoying Zhou (Corresponding author)Email:
  相似文献   

19.
Doing exhaustive relevance judgments is one of the most challenging tasks in the construction process of an IR test collection, especially when the collection is composed of millions of documents. Pooling (or system pooling), which is basically a method for selecting documents to assess, is a solution to overcome this challenge. In this paper, to form such an assessment pool, a new, ranked-based document selection criterion, called the expected level of importance (ELI), is introduced. The results of the experiments performed, using TREC 5, 6, 7, and 8 data, showed that by using a pool in which the documents are sorted in the decreasing order of their calculated ELI scores, relevance judgments can efficiently be made by minimal human effort, while maintaining the size and the effectiveness of the resulting test collection. The criterion we propose can directly be adapted to the traditional TREC pooling practice in favor of efficiency, with no additional cost.  相似文献   

20.
郑晓健 《软件》2014,(3):4-5,8
本文将概念检索扩展到面向领域主题检索的范畴,提出了面向领域主题的智能检索模型。给出了概念语义网络和面向领域主题的形式化描述,利用概念语义网络实现领域主题的同义词及其语义蕴含扩展,并实现一个基于建筑业的面向领域主题的智能搜索引擎。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号