共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
在巨大的Internet/Web信息中很难积极地搜索到准确的信息,搜索引擎技术解决了用户检索Web信息困难的问题,而现有的搜索引擎返回的信息却并不总令用户满意。文中在对MAS理论调研的基础上,提出一个基于MAS的搜索引擎的模型,并将其与著名的Google搜索引擎作比较和分析。 相似文献
4.
5.
《Expert systems with applications》2014,41(2):331-341
Time plays important roles in Web search, because most Web pages contain temporal information and a lot of Web queries are time-related. How to integrate temporal information in Web search engines has been a research focus in recent years. However, traditional search engines have little support in processing temporal-textual Web queries. Aiming at solving this problem, in this paper, we concentrate on the extraction of the focused time for Web pages, which refers to the most appropriate time associated with Web pages, and then we used focused time to improve the search efficiency for time-sensitive queries. In particular, three critical issues are deeply studied in this paper. The first issue is to extract implicit temporal expressions from Web pages. The second one is to determine the focused time among all the extracted temporal information, and the last issue is to integrate focused time into a search engine. For the first issue, we propose a new dynamic approach to resolve the implicit temporal expressions in Web pages. For the second issue, we present a score model to determine the focused time for Web pages. Our score model takes into account both the frequency of temporal information in Web pages and the containment relationship among temporal information. For the third issue, we combine the textual similarity and the temporal similarity between queries and documents in the ranking process. To evaluate the effectiveness and efficiency of the proposed approaches, we build a prototype system called Time-Aware Search Engine (TASE). TASE is able to extract both the explicit and implicit temporal expressions for Web pages, and calculate the relevant score between Web pages and each temporal expression, and re-rank search results based on the temporal-textual relevance between Web pages and queries. Finally, we conduct experiments on real data sets. The results show that our approach has high accuracy in resolving implicit temporal expressions and extracting focused time, and has better ranking effectiveness for time-sensitive Web queries than its competitor algorithms. 相似文献
6.
Juanzi LI Jie TANG Jing ZHANG Qiong LUO Yunhao LIU Mingcai HONG 《Frontiers of Computer Science in China》2008,2(1):94-105
Expertise Oriented Search (EOS) aims at providing comprehensive expertise analysis on data from distributed sources. It is
useful in many application domains, for example, finding experts on a given topic, detecting the confliction of interest between
researchers, and assigning reviewers to proposals. In this paper, we present the design and implementation of our expertise
oriented search system, Arnetminer (). Arnetminer has gathered and integrated information about a half-million computer science researchers from the Web, including
their profiles and publications. Moreover, Arnetminer constructs a social network among these researchers through their co-authorship,
and utilizes this network information as well as the individual profiles to facilitate expertise oriented search tasks. In
particular, the co-authorship information is used both in ranking the expertise of individual researchers for a given topic
and in searching for associations between researchers. We have conducted initial experiments on Arnetminer. Our results demonstrate
that the proposed relevancy propagation expert finding method outperforms the method that only uses person local information,
and the proposed two-stage association search on a large-scale social network is order of magnitude faster than the baseline
method. 相似文献
7.
8.
Modeling multiple interactions with a Markov random field in query expansion for session search
下载免费PDF全文
![点击此处可从《Computational Intelligence》网站下载免费的PDF全文](/ch/ext_images/free.gif)
How to automatically understand and answer users' questions (eg, queries issued to a search engine) expressed with natural language has become an important yet difficult problem across the research fields of information retrieval and artificial intelligence. In a typical interactive Web search scenario, namely, session search, to obtain relevant information, the user usually interacts with the search engine for several rounds in the forms of, eg, query reformulations, clicks, and skips. These interactions are usually mixed and intertwined with each other in a complex way. For the ideal goal, an intelligent search engine can be seen as an artificial intelligence agent that is able to infer what information the user needs from these interactions. However, there still exists a big gap between the current state of the art and this goal. In this paper, in order to bridge the gap, we propose a Markov random field–based approach to capture dependence relations among interactions, queries, and clicked documents for automatic query expansion (as a way of inferring the information needs of the user). An extensive empirical evaluation is conducted on large‐scale web search data sets, and the results demonstrate the effectiveness of our proposed models. 相似文献
9.
网络搜索是目前从因特网上获取信息的主要手段,而网络蜘蛛又是大多数网络搜索工具获取网络信息的主要方法,主题搜索策略是专业搜索引擎的核心技术.通过研究网络蜘蛛的工作原理,分析了网络蜘蛛的搜索策略和搜索优化措施,设计出一种将限制搜索深度.多线程技术和正则表达式匹配方法结合一起的网络蜘蛛,实验结果表明该方法能够快速而准确地搜索所需的相关主题信息. 相似文献
10.
实践证明聚类技术是改进搜索结果显示方式的一种有效手段。然而,目前的聚类方法没有考虑到用户兴趣,对于相同的查询,返回给所有用户同样的聚类结果。由此提出一种个性化聚类检索方法。该方法改进了k-means算法,利用该算法对传统搜索引擎返回的结果结合用户兴趣进行聚类,返回针对特定用户的网页簇。实验证明该方法能够提供个性化服务,改善了聚类的效果,提高了用户的检索效率。 相似文献
11.
基于P2P的Web搜索技术 总被引:4,自引:0,他引:4
Web搜索引擎已经成为人们从海量Web信息中快速找到所需信息的重要工具,随着Web数据量的爆炸性增长,传统的集中式搜索引擎已经越来越不能满足人们不断增长的信息获取需求.随着对等网络(peer-to-peer,简称P2P)技术的快速发展,人们提出了基于P2P的Web搜索技术并迅速成为研究热点.研究的目的是对现有的基于P2P的Web搜索技术进行总结,以期为进一步研究指明方向.首先分析了基于P2P的Web搜索面临的诸多挑战;然后重点总结分析了基于P2P的Web搜索的各项关键技术的研究现状,包括系统拓扑结构、数据存放策略、查询路由机制、索引切分策略、数据集选择、相关性排序、网页收集方法等;最后对已有的3个较有特色的基于P2P的Web搜索原型系统进行了介绍. 相似文献
12.
随着Internet的迅速发展,传统的搜索引擎在覆盖度、查询精度、可扩展性和用户多样化需求等方面存在许多不足。本文详细介绍了多搜索引擎技术,以及在该技术基础上实现的多搜索引擎系统。多搜索引擎系统通过集成目前流行的多个搜索网站,提供更强大的搜索功能,帮助用户更快速更有效地获得所需信息。 相似文献
13.
14.
15.
提出构建数字图书馆主题搜索引擎的总体系统设计。利用一个预处理系统尽量选择高质量的种子站点,从而产生Web主题定义数据;在系统控制器的协调下,各主题爬行器同步地采集爬行器所推荐的Web资源,对下载的资源进行文本分类与主题识别;将已经下载的Web资源按学科分类存储在Web主题资源库中,通过全局信息库建立索引,接入通用接口进行依主题检索。依赖数字图书馆各方面特点,提出支持多线程主题爬行器的设计,并提出一种新颖的URL主题相关性剪切算法EPR,为实现数字图书馆主题搜索引擎原型提供重要的设计。基于开源Lucene平 相似文献
16.
网上信息搜索技术与搜索引擎 总被引:6,自引:1,他引:6
随着Internet在全球范围内的迅速兴起,面对纷繁复杂的Web空间,如何在浩翰如海的信息空间里快速找到并取得所需的信息,便成为人们所关注的主要问题。搜索引擎的出现,极大地方便了Internet用户,使快速有效地获取信息成为可能。目前网上搜索引擎各种各样,有Yahoo!、Excite、AltaVista、Lycos、Infoseek、OpenText、WebCrawler、WWW Worm等几十种。 相似文献
17.
Web信息检索研究进展 总被引:90,自引:3,他引:90
Web上大量、分布、动态的信息造成了“信息过载”,如何在传统信息检索技术的基础上开展针对Web的检索工作已经成为一基项重要的研究课题,但是,繁多的Web信息检索系统和各种模糊的概念给用户的选择和研究人员的讨论带来了不便。同时,有关Web信息检索最新技术的比较完整的分析又十分缺乏。在此,对Web信息检索技术进行了综述,从Web信息检索系统的层次化分类(搜索引擎与目录、元搜索引擎、信息检索agent)、一般机制和关键新技术(基于超链的相关度排序、检索结果的联机聚类、基于概念的检索、相关度反馈)等方面加以阐述,以期对感兴趣的同行有参考作用。 相似文献
18.
While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic. 相似文献
19.
将deep Web发掘与主题爬行技术有机地结合起来,对deep Web垂直搜索引擎系统的关键技术进行了深入研究.首先设计了deep Web主题爬行框架,它是在传统的主题爬行框架的基础上,加入了前端分类器作为爬行策略的执行机构,并对该分类器做定期的增量更新;然后使用主题爬行技术指导deep Web发掘,并且借助开源组件Lucene将主题爬行器所搜索的信息进行合理的安排,以便为检索接口提供查询服务.当用户向搜索引擎提交查询词后,Lucene缺省按照自己的相关度算法对结果进行排序.通过爬虫、索引器和查询接口的设计,实现了一个面向deep Web的垂直搜索引擎原型系统. 相似文献