首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
网络信息的日益增加迫切需要适宜的检索工具,特别是进行专业信息的检索,需要体现专业词汇特点的搜索引擎。本文在对搜索引擎核心技术进行研究的基础上,提出了石油化工信息搜索引擎的设计方案,开发了网络机器人模块,实现了海量网页的自动获取;采用最短路径分词和正向最大匹配相结合的算法,实现了中文自动分词;开发了信息索引模块,实现了网页的批量索引和增量索引;开发了信息检索模块,提供布尔逻辑查询,实现摘要自动生成。通过系统集成,初步建立了体现石油化工专业特点的搜索引擎。  相似文献   

2.
为改进元搜索引擎查询速度慢、独立性差的缺点,本文设计了一个元搜索引擎的结果处理模型。该模型结合元搜索引擎的特点设计了一种4级结果集的结构,提高了元搜索引擎结果处理的效率。在结果提取部分提出了根据反馈信息自动调整权重的算法(FBWM),在没有人工干预的情况下自动监视各独立搜索引擎的性能变化并随之动态调整其权重。在结果排序部分,提出了改进的位置/全文排序法(IPFTS),在算法中引入了词条匹配等级的概念,不但能提高搜索结果和查询串相关度的精度,还能保证排名在前的搜索结果的URL的有效性。  相似文献   

3.
介绍了一个智能的互联网信息采集工具,它支持用户用自然语言查询,用知网抽取出查询语句中的关键词,进行自动分类,并自动去元搜索引擎上检索,生成网页搜索结果。该系统在准确率保持和手工获取相差不多的情况下,大大缩短了获取信息的时间,节省了人力。  相似文献   

4.
Excite搜索引擎   总被引:2,自引:0,他引:2  
随着Web在Internet上的迅速发展,各种Web信息检索工具先后涌现,信息查询服务的数量和种类也不断增加,如搜索引擎、Web指南、黄页及白页数据库查询服务等。在大家所熟悉的搜索引擎中,Excite无疑是一个独具人性化和亲切感的搜索引擎。概况Excite检索引擎Web服务器的URL为:http://www.excite.com。其总部位于美国加州Redwood城的Excite公司(ExciteInc.),一直致力于自动超文本链接、主题分类和自动文摘的检索查询软件产品、服务及特性的开发,1995年10月推出的Excite搜索引擎是该公司最为著名的整套Web信息服务产品,主要提…  相似文献   

5.
基于用户行为分析的搜索引擎自动性能评价   总被引:6,自引:2,他引:4  
刘奕群  岑荣伟  张敏  茹立云  马少平 《软件学报》2008,19(11):3023-3032
基于用户行为分析的思路,提出了一种自动进行搜索引擎性能评价的方法.此方法能够基于对用户的查询和点击行为的分析自动生成导航类查询测试集合,并对查询对应的标准答案实现自动标注.基于中文商业搜索引擎日志的实验结果表明,此方法能够与人工标注的评价取得基本一致的评价效果,同时大大减少了评价所需的人力资源,并加快了评价反馈周期.  相似文献   

6.
网络搜索引擎的性能优化策略和相关技术   总被引:5,自引:0,他引:5  
由于检索结果的不准确性,使网络搜索引擎有时难以满足用户的查询需求。因此,在传统搜索引擎技术的基础上,采用其它理论和技术来提高搜索引擎的查准率,可以对搜索引擎进行性能优化。该文提出了几种对网络搜索引擎进行性能优化的策略,并对相关的实现技术进行了探讨。根据网络资源的权威性及其与用户查询的相关性对检索结果进行排序,可以有效提高结果的准确度;通过基于概念的信息检索技术和信息的自动分类技术可以有效地对用户查询进行语义的扩充和理解,更好地满足用户需求;实现搜索引擎的个性化查询和专业化查询,也是提高搜索引擎性能的重要途径。  相似文献   

7.
分析了基于统计进行自动分类的元搜索引擎分类效果缺陷,提出了基于本体进行自动分类的元搜索引擎系统模型,阐述了主要步骤的实现思路,分析了本体在元搜索引擎自动分类中的作用。通过领域本体的语义理解,为用户提供查询概念的语义扩展,使元搜索引擎分类类目结构清晰、逻辑科学、系统,分类效果更加精确。  相似文献   

8.
刘鹏  邹华 《软件》2012,(11):214-217
基于垂直搜索引擎设计思想提出的Web服务搜索引擎相比传统的UDDI服务发现方法能更好的满足用户对于Web服务查询的需求。随着服务搜索引擎技术的不断发展,如何评价其检索效果成为提高服务搜索质量的核心问题。本文提出了一种基于用户行为分析对Web服务搜索引擎进行自动性能评价的方法,并且根据Web服务特点,提出了基于QoS数据信息进行样例集合划分的方法。通过对用户的查询和点击行为分析,推导出针对特定查询集合的检索结果集合,并将两个集合之间自动建立映射。通过分析Web服务搜索引擎的搜索效果,评价本文提出的方法与人工标注的方法的对比,基于用户行为的评价算法能够对服务搜索引擎进行较客观的评价。  相似文献   

9.
Deep Web查询接口的自动判定   总被引:5,自引:1,他引:5  
传统搜索引擎仅可以索引浅层Web页面.然而在网络深处隐含着大量、高质量的信息,传统搜索引擎由于技术原因不能索引这些被称之为Deep Web的页面。由于查询接口是Deep Web的唯一入口,因此要获取Deep Web信息就需判定哪些网页表单是Deep Web查询接口。文中介绍了一种利用朴素贝叶斯分类算法自动判定网页表单是否为Deep Web查询接口的方法,并实验验证了该方法的有效性。  相似文献   

10.
文章分析了传统搜索引擎的缺点,提出了一种基于网页自动分类的分类查询搜索引擎新模型,重点阐述了利用粗糙集进行文本分类的方法,提出了一种基于特征矩阵的决策表约简算法,并以此实现了网页自动分类器。  相似文献   

11.
王行勇  戴丽  于建华 《计算机工程》2002,28(12):134-135,265
尽管通用搜索引擎已经被广泛使用,但它们筛选用户查询结果中无关结果的功能一直不能任人满意,因此有些搜索需要使用专题搜索引擎,文章提出了一种基于中文专题搜索引擎的查询路由架构,该架构为用户的查询寻找合适的专题搜索引擎路由,并找出最佳搜索引擎,除了描述构架外,其中使用到的查询扩展和聚类算法文章也一并给出。  相似文献   

12.
This study presents an analysis of users' queries directed at different search engines to investigate trends and suggest better search engine capabilities. The query distribution among search engines that includes spawning of queries, number of terms per query and query lengths is discussed to highlight the principal factors affecting a user's choice of search engines and evaluate the reasons of varying the length of queries. The results could be used to develop long to short term business plans for search engine service providers to determine whether or not to opt for more focused topic specific search offerings to gain better market share.  相似文献   

13.
This paper presents a simple and intuitive method for mining search engine query logs for fast social filtering, where searchers are provided with dynamic query recommendations on a large-scale industrial-strength search engine. We adopt a dynamic approach that is able to absorb new and recent trends in web usage trends on search engines, while forgetting outdated trends, thus adapting to dynamic changes in web user’s interests. In order to get well-rounded recommendations, we combine two methods: first, we model search engine users’ sequential search behavior, and interpret this consecutive search behavior as client-side query refinement, that should form the basis for the search engine’s own query refinement process. This query refinement process is exploited to learn useful information that helps generate related queries. Second, we combine this method with a traditional text or content based similarity method to compensate for the shortness of query sessions and sparsity of real query log data.  相似文献   

14.
The general public is increasingly using search engines to seek information on risks and threats. Based on a search log from a large search engine, spanning three months, this study explores user patterns of query submission and subsequent clicks in sessions, for two important risk related topics, healthcare and information security, and compares them to other randomly sampled sessions. We investigate two session-level metrics reflecting users' interactivity with a search engine: session length and query click rate. Drawing from information foraging theory, we find that session length can be characterized well by the Inverse Gaussian distribution. Among three types of sessions on different topics (healthcare, information security, and other randomly sampled sessions), we find that healthcare sessions have the most queries and the highest query click rate, and information security sessions have the lowest query click rate. In addition, sessions initiated by the users with greater search engine activity level tend to have more queries and higher query click rates. Among three types of sessions, search engine activity level shows the strongest effect on query click rate for information security sessions and weakest for healthcare sessions. We discuss theoretical and practical implications of the study.  相似文献   

15.
This paper describes and evaluates a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator near being the default operation. Suggestions are offered to the searcher when the length of the result list falls outside predefined bounds. If the list is too long, the engine specializes the query through the use of super phrases; if the list is too short, the engine generalizes the query through the use of proximal subphrases.We describe methods for generating both types of suggestions and present algorithms for ranking the suggestions. Specifically, we present the problem of counting proximal subphrases for specialization and the problem of counting unordered super phrases for generalization.The uptake of our approach was evaluated by analyzing search log data from before and after the suggestion feature was added to a commercial version of the search engine. We looked at approximately 1.5 million queries and found that, after they were added, suggestions represented nearly 30% of the total queries. Efficacy was evaluated through a controlled study of 24 participants performing nine searches using three different search engines. We found that the engine with phrasal query suggestions had better high-precision recall than both the same search engine without suggestions and a search engine with a similar interface but using an Okapi BM25 ranking algorithm.  相似文献   

16.
识别搜索引擎用户的查询意图在信息检索领域是备受关注的研究内容.文中提出一种融合多类特征识别Web查询意图的方法.将Web查询意图识别作为一个分类问题,并从不同类型的资源包括查询文本、搜索引擎返回内容及Web查询日志中抽取出有效的分类特征.在人工标注的真实Web查询语料上采用文中方法进行查询意图识别实验,实验结果显示文中采用的各类特征对于提高查询意图识别的效果皆有一定帮助,综合使用这些特征进行查询意图识别,88.5%的测试查询获得准确的意图识别结果.  相似文献   

17.
18.
19.
In this paper, we present a system LESSON for lecture notes searching and sharing, which is dedicated to both instructors and students for effectively supporting their Web-based teaching and learning activities. The LESSON system employs a metasearch engine for lecture notes searching from Web and a peer-to-peer (P2P) overlay network for lecture notes sharing among the users. A metasearch engine provides an unified access to multiple existing component search engines and has better performance than general-purpose search engines. With the help of a P2P overlay network, all computers used by instructors and students can be connected into a virtual society over the Internet and communicate directly with each other for lecture notes sharing, without any centralized server and manipulation. In order to merge results from multiple component search engines into a single ranked list, we design the RSF strategy that takes rank, similarity and features of lecture notes into account. To implement query routing decision for effectively supporting lecture notes sharing, we propose a novel query routing mechanism. Experimental results indicate that the LESSON system has better performance in lecture notes searching from Web than some popular general-purpose search engines and some existing metasearch schemes; while processing queries within the system, it outperforms some typical routing methods. Concretely, it can achieve relatively high query hit rate with low bandwidth consumption in different types of network topologies.  相似文献   

20.
刘登洪  徐贤 《计算机科学》2017,44(10):234-236, 258
随着网络的普及,网上检索成为了人们获取信息的主要方式。目前的搜索引擎相对独立,覆盖范围比较有限。相比之下,元搜索能够更好地满足用户的检索需求。当用户在元搜索提供的统一界面中输入一个查询时,元搜索会将处理后的用户请求发送给相关的成员搜索引擎。但是一个重要的问题是如何识别出潜在的搜索引擎以便更好地处理用户的请求。鉴于此提出了一种基于遗传算法的选择机制,该方法将各个成员搜索引擎的权重考虑在内。实验结果表明,该方法确实能够提高引擎选择中的效率和精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号