共查询到20条相似文献,搜索用时 15 毫秒
1.
Web hyperlink structure analysis algorithm plays a significant role in improving the precision of Web information retrieval. Current link algorithms employ iteration function to compute the Web resource weight. The major drawback of this approach is that every Web document has a fixed rank which is independent of Web queries. This paper proposes an improved algorithm that ranks the quality and the relevance of a page according to users' query dynamically. The experiments show that the current link analysis algorithm is improved. 相似文献
2.
随着Web服务应用的迅速发展与日益普及, 如何快速、准确地搜索到用户所需的Web服务成为了制约Web服务发展的关键问题之一。目前的Web服务搜索技术包括:基于UDDI注册中心、通过Web服务网站、使用专用搜索引擎与使用通用搜索引擎四种方式。对现有主要Web服务搜索技术进行了详细评述。在对典型Web服务搜索技术分析比较的基础上, 指出了建立专用的Web服务搜索引擎的必要性以及所面临的问题与挑战。 相似文献
3.
The phenomenal growth of online Flash movies in recent years has made Flash one of the most prevalent media formats on the
Web. The retrieval and management issues of Flash, vital to the utilization of the enormous Flash resource, are unfortunately
overlooked by the research community. This paper presents the first piece of work (to the best of our knowledge) in this domain
by suggesting an integrated framework for the retrieval of Flash movies based on their content characteristics as well as
contextual information. The proposed approach consists of two major components: (1) a content-based retrieval component, which
explores the characteristics of Flash movie content at compositional and semantic levels; and (2) a context-based retrieval
component, which explores the contextual information including the texts and hyperlinks surrounding the movies. An experimental
Flash search engine system has been implemented to demonstrate the feasibility of the suggested framework.
The work described in this paper was supported substantially by a grant (Project No. 7001457), and partially by another grant
(Project No. 7001564), both from CityU of Hong Kong. 相似文献
4.
以列表形式展示的搜索引擎查询结果往往使用户无法快速地找到真正需要的信息。采用对结果进行聚类后,以结构化的形式表现查询结果可以克服这一问题。文中阐述了聚类引擎包括的四部分工作及相关的技术,并对系统的性能和存在的问题进行了初步的分析,为此类系统的实现提供了基础。 相似文献
5.
为了高效地获取与主题相关的资源,就垂直搜索引擎展开了研究。首先,在现有的PageRank算法基础上,提出一种改进的PageRank算法来测量网页的链接相似度;其次,从单个网页考虑,利用每个网页的url、title和正文,给出基于内容的相似度的计算方法;最后结合内容相似度和链接相似度,提出了一种基于链接和内容的BLCT主题爬行算法。实验结果表明,该算法在平均收获率和目标召回率上有显著提高,爬行的网页主题相关性也提高了。 相似文献
6.
结合网页链接分析和网页内容相关性分析提出一种改进的PageRank算法EPR(Extended PageRank),从分析网页内容相似性的角度解决相关性需求,从网页链接分析的角度解决权威性需求。算法为扩展PageRank提供了广阔的空间,并且实验证明,通过选择合适的参数EPR算法可以获得优于传统PageRank算法的排序结果。 相似文献
7.
为了给元搜索引擎的开发者提供建设性意见和建议,以及给普通搜索用户提供使用选择上的指导,对比研究了20个典型的国外元搜索引擎的搜索特性,提出了一个好的元搜索引擎应具有的特性,包括应调用的独立搜索引擎情况、检索结果页面中应包含的信息元素、检索请求提交应支持的处理方式、个性化检索应设置的选项,以及应支持多语言检索等各个方面. 相似文献
8.
George Chang Gunjan Samtani Marcus Healey Franz Kurfess Jason Wang 《Journal of Systems Integration》2001,10(3):253-267
Information retrieval has evolved from searches of references, to abstracts, to documents. Search on the Web involves search engines that promise to parse full-text and other files: audio, video, and multimedia. With the indexable Web at 320 million pages and growing, difficulties with locating relevant information have become apparent. The most prevalent means for information retrieval relies on syntax-based methods: keywords or strings of characters are presented to a search engine, and it returns all the matches in the available documents. This method is satisfactory and easy to implement, but it has some inherent limitations that make it unsuitable for many tasks. Instead of looking for syntactical patterns, the user often is interested in keyword meaning or the location of a particular word in a title or header. This paper describes some precise search approaches in the environmental domain that locate information according to syntactic criteria, augmented by the utilization of information in a certain context. The main emphasis of this paper lies in the treatment of structured knowledge, where essential aspects about the topic of interest are encoded not only by the individual items, but also by their relationships among each other. Examples for such structured knowledge are hypertext documents, diagrams, logical and chemical formulae. Benefits of this approach are enhanced precision and approximate search in an already focused, context-specific search engine for the environment: EnviroDaemon. 相似文献
9.
加速评估算法:一种提高Web结构挖掘质量的新方法 总被引:13,自引:1,他引:13
利用Web结构挖掘可以找到Web上的高质量网页,它大大地提高了搜索引擎的检索精度,目前的Web结构挖掘算法是通过统计链接到每个页面的超链接的数量和源结点的质量对页面进行评估,基于统计链接数目的算法存在一个严重缺陷:页面评价两极分化,一些传统的高质量页面经常出现在Web检索结果的前面,而Web上新加入的高质量页面很难被用户找到,提出了加速评估算法以克服现有Web超链接分析中的不足,并通过搜索引擎平台对算法进行了测试和验证。 相似文献
10.
面向主题的Web信息收集系统的设计与实现 总被引:7,自引:0,他引:7
随着互联网信息的持续爆炸性增长,通用搜索引擎的信息覆盖率和检索精度都在不断下降,发展面向主题信息的专用网络信息检索工具已经成为趋势。文中提出的面向主题的Web信息收集系统是这类工具的核心部件,该系统采用文档矢量模型进行文档相关度计算,并结合页面链接的上下文信息过滤页面;借鉴并修改了Shark启发式查找算法来查找相关页面;可采用多机并行下载提高收集效率;并依据站点的重要程度进行动态更新。在一个面向Internet的计算机教学资源检索的搜索引擎中具体实现了这个Web信息收集系统,整个系统在低性能的台式机上就能运行,并可获得较高的属于指定主题的页面的收集精度和收集效率。 相似文献
11.
介绍了相关反馈技术的基本思想,设计了网络信息检索中相关反馈系统的功能与结构,探索如何在现有的公共网络搜索引擎基础上,利用Java语言实现具有相关反馈功能的搜索引擎接口. 相似文献
12.
A masss of heterogeneous,distributed and dynamic information on the World Wide Web(the Web) has resulted in “information overload“ .It‘s an important and urgent reserach issue to provide users with effective information retrieval service on the Web.Web search enginees attempt to solve this problem,yet their effect is far from satisfying.In this paper,a distributed and cooperative strategy for information retrieval on the Web is proposed to substitute the centralized mode adopted by the current search engines.Then a new information retrieval system model IRSM is presented.which supports the retrieval of metadata about web documents and uses Z39.50 standard protocol to unify the heterogeneous interfaces of uments and uses Z39.50 standard protocol to unify the heterogeneous interfaces of different systems.Based on that,a distributed and cooperative information refieval framework,called DCIRF,is designed to help users in fast and effective information retrieval on the Web. 相似文献
13.
陈浩 《计算技术与自动化》2012,(3):120-123
倒排文件作为现代大规模搜索引擎工作的一个核心技术,其原理简单,具备灵活高效的特点,具体体现在其根据需要可做到适当的变通。本文通过在给定搜索引擎系统内部参数的前提下对其吞吐率的研究,建立一种倒排文件性能模型,该模型有效地提高了倒排文件的运行效率。 相似文献
14.
Background and objective: Medical social networking platforms provide virtual spaces ensuring the interaction between different healthcare participants. As a part of the exchange, these spaces allow subscribers to upload medical images, describing different medical cases for an analysis or an interpretation proposal. Facing this expected huge amount of uploaded images generated daily, it is needed to engage new mechanisms to effectively deal with this circumstance, for enhancing the search function process of medical images, based on what is uploaded. To overcome this issue, setting up of images visual searching based on a content-based medical image retrieval scheme is the solution. More clearly, such mechanism will help and motivate medical social networking subscribers to find visually similar stored images. Methods: To ensure this task, the development of this mechanism, technically, is based mainly on a fusion of three visual features, which offers a flexible and more precision. It is reinforced by a weighted distance approach through attributing weights for feature vectors to scale up the performance. Indeed, the displayed results of this system can be updated based on user's intention by a user interactive feedback mechanism to indicate the truly relevant images. Results: We provide the theoretical performance of our scheme. Extensive experiments were conducted on a categorically classified collection containing 500 images. We conduct a practical evaluation on this dataset classes, putting returned results in a comparative study with other models results, existing in the literature. Conclusions: The proposed scheme preserves the efficiency of the search task. As theoretically and experimentally established, our scheme offers an effective image retrieval model that can support different subscribers' expectations. The relevance feedback mechanism can keep the dynamism of the system, thus offering a continuous searching result evolution. Experimentation outcomes indicate better findings compared with the other models. 相似文献
15.
16.
17.
基于元数据与Z39.50的分布协作式Web信息检索 总被引:21,自引:0,他引:21
Web上大量的异质、分布、动态的信息造成了“信息过载”.如何有效地为用户提供Web信息检索已经成为一项重要的研究课题.Web搜索引擎部分地解决了信息检索问题,然而其效果却远远不能令人满意.提出了Web信息检索的分布协作策略以取代传统的集中式信息检索方式;给出了一种新的Web信息检索系统模型,该模型支持对Web文档的元数据进行检索,并采用Z39.50协议作为接口标准,以克服不同信息检索系统之间的访问异构性.在此基础上,设计了一个分布协作式Web信息检索框架,用以帮助用户有效地进行Web信息检索. 相似文献
18.
首先介绍了传统搜索引擎的基本原理以及结构,指出了传统搜索引擎存在的不足,然后介绍了元搜索引擎的定义、运作机制及其发展的方向。在此理论基础上提出了新一代元搜索引擎基于用户的调度改进理念。实验表明,该改进提高了用户的检索效率和质量。 相似文献
19.
元搜索引擎的现状与发展 总被引:7,自引:1,他引:7
元搜索引擎利用现有的独立搜索引擎的查询性能,将搜索引擎看成一个整体,为用户提供一个统一的查询界面与返回结果。介绍了目前网络上比较著名和流行的一些元搜索引擎,对近几年来关于元搜索引擎的研究进行了分析总结,旨在为对元搜索引擎的进一步研究提供参考。 相似文献
20.
分析了网络机器人(Web Robot)的访问行为特点,发现Robot的访问序列一般不会形成具有链接关系的路径。在定义了用户事务的概念的基础上,提出了一个基于事务分析的检测算法。经实验验证,该算法可以有效地检测未知的和不遵守网络机器人排斥标准的Robot。 相似文献