共查询到19条相似文献,搜索用时 171 毫秒
1.
2.
3.
基于垂直搜索引擎设计思想提出的Web服务搜索引擎相比传统的UDDI服务发现方法能更好的满足用户对于Web服务查询的需求。随着服务搜索引擎技术的不断发展,如何评价其检索效果成为提高服务搜索质量的核心问题。本文提出了一种基于用户行为分析对Web服务搜索引擎进行自动性能评价的方法,并且根据Web服务特点,提出了基于QoS数据信息进行样例集合划分的方法。通过对用户的查询和点击行为分析,推导出针对特定查询集合的检索结果集合,并将两个集合之间自动建立映射。通过分析Web服务搜索引擎的搜索效果,评价本文提出的方法与人工标注的方法的对比,基于用户行为的评价算法能够对服务搜索引擎进行较客观的评价。 相似文献
4.
5.
6.
7.
对于基于关键词的图像检索,利用检索结果的视觉相似性学习二分类器有望成为改善检索结果的最有效途径之一. 为改善搜索引擎的搜索结果,本文提出一种算法框架并且基于此框架着重研究训练数据选择这一关键问题. 训练数据选择过程由两个阶段组成:1)训练数据初始化以开始分类器学习过程;2)分类器迭代学习过程中的动态数据选择. 对于初始训练数据的选择,我们探讨了基于聚类和基于排序两种方法,并且对比了自动训练数据选择与人工标注的结果. 对于动态数据选择,我们比较了支持向量机和基于最大最小后验伪概率的贝叶斯分类器的分类效果. 组合上述两个阶段的不同方法,我们得到了8种不同的算法,并将其用于谷歌搜索引擎进行基于关键词的图像检索. 实验结果证明,如何从含有噪声的搜索结果中选择训练数据是搜索结果改善的关键问题. 实验显示我们的方法能够有效的改善谷歌搜索的结果,尤其是排序在前的结果. 尽早为用户提供更相关的结果能够更大程度的减少用户逐个翻页查看结果的工作. 另外,如何使自动训练数据选择与人工标注媲美仍是需要继续研究的一个问题. 相似文献
8.
9.
搜索引擎返回的结果太多且不能根据用户的兴趣提供检索结果是当前较受关注的问题。把用户兴趣模型和STC聚类算法相结合,提出了改进的STC算法,并提出个性化推荐的策略和兴趣描述更新的方法,实现了一个基于搜索结果的个性化推荐系统(SRPRS)。SRPRS基于改进的STC算法自动组织搜索结果,帮助用户利用主题的方式发现所需的资源。通过实验,分析了SRPRS系统的聚类特性和时间特性。针对搜索引擎的列表显示结果,SRPRS系统在快速查找用户感兴趣的文档上有较好的性能。 相似文献
10.
一种层次化的检索结果聚类方法 总被引:2,自引:1,他引:2
检索结果聚类能够帮助用户快速地浏览搜索引擎返回的结果.传统的聚类方法由于不能生成有意义的类别标签因此是不适合的,为了改善检索结果层次化聚类的效果,采用了基于标签的聚类算法,提出了将DF、查询日志、查询词上下文特征融合的类别标签抽取算法,并以抽取的标签构造基础类别图,通过GBCA算法构建层次化聚类结果.实验证明了多特征融合模型的有效性;GBCA算法在类别标签抽取和F-Measure两个评价指标上都比STC和Snaket算法有很大的提高. 相似文献
11.
System performance assessment and comparison are fundamental for large-scale image search engine development. This article documents a set of comprehensive empirical studies to explore the effects of multiple query evidences on large-scale social image search. The search performance based on the social tags, different kinds of visual features and their combinations are systematically studied and analyzed. To quantify the visual query complexity, a novel quantitative metric is proposed and applied to assess the influences of different visual queries based on their complexity levels. Besides, we also study the effects of automatic text query expansion with social tags using a pseudo relevance feedback method on the retrieval performance. Our analysis of experimental results shows a few key research findings: (1) social tag-based retrieval methods can achieve much better results than content-based retrieval methods; (2) a combination of textual and visual features can significantly and consistently improve the search performance; (3) the complexity of image queries has a strong correlation with retrieval results’ quality—more complex queries lead to poorer search effectiveness; and (4) query expansion based on social tags frequently causes search topic drift and consequently leads to performance degradation. 相似文献
12.
基于元搜索引擎的多关键词检索技术 总被引:7,自引:1,他引:7
文章根据主要中文搜索引擎Google和Baidu对多关键词的查询处理存在的缺陷,提出了“核心关键词”的概念和“分级权重”的计算方法;并进而提出了将基本搜索引擎搜索结果与网页内容分析合并相关度算法和元搜索引擎的准确度偏差评价公式。研究表明,元搜索引擎不但消除了死链接和重复链接,而且准确度偏差与Google和Baidu相比,分别减少7.26%和12.47%,准确度得到了一定提高。 相似文献
13.
14.
应用链接分析的web搜索结果聚类 总被引:3,自引:0,他引:3
随着web上信息的急剧增长,如何有效地从web上获得高质量的web信息已经成为很多研究领域里的热门研究主题之一,比如在数据库,信息检索等领域。在信息检索里,web搜索引擎是最常用的工具,然而现今的搜索引擎还远不能达到满意的要求,使用链接分析,提出了一种新的方法用来聚类web搜索结果,不同于信息检索中基于文本之间共享关键字或词的聚类算法,该文的方法是应用文献引用和匹配分析的方法,基于两web页面所共享和匹配的公共链接,并且扩展了标准的K-means聚类算法,使它更适合于处理噪音页面,并把它应用于web结果页面的聚类,为验证它的有效性,进行了初步实验,实验结果显示通过链接分析对web搜索结果聚类取得了预期效果 相似文献
15.
We define similar video content as video sequences with almost identical content but possibly compressed at different qualities, reformatted to different sizes and frame-rates, undergone minor editing in either spatial or temporal domain, or summarized into keyframe sequences. Building a search engine to identify such similar content in the World-Wide Web requires: 1) robust video similarity measurements; 2) fast similarity search techniques on large databases; and 3) intuitive organization of search results. In a previous paper, we proposed a randomized technique called the video signature (ViSig) method for video similarity measurement. In this paper, we focus on the remaining two issues by proposing a feature extraction scheme for fast similarity search, and a clustering algorithm for identification of similar clusters. Similar to many other content-based methods, the ViSig method uses high-dimensional feature vectors to represent video. To warrant a fast response time for similarity searches on high dimensional vectors, we propose a novel nonlinear feature extraction scheme on arbitrary metric spaces that combines the triangle inequality with the classical Principal Component Analysis (PCA). We show experimentally that the proposed technique outperforms PCA, Fastmap, Triangle-Inequality Pruning, and Haar wavelet on signature data. To further improve retrieval performance, and provide better organization of similarity search results, we introduce a new graph-theoretical clustering algorithm on large databases of signatures. This algorithm treats all signatures as an abstract threshold graph, where the distance threshold is determined based on local data statistics. Similar clusters are then identified as highly connected regions in the graph. By measuring the retrieval performance against a ground-truth set, we show that our proposed algorithm outperforms simple thresholding, single-link and complete-link hierarchical clustering techniques. 相似文献
16.
田甜 《数字社区&智能家居》2009,5(3):1683-1685
检索系统的评价问题一直是信息领域最核心的问题之一。公开有效的搜索引擎质量评估可以指导搜索引擎用户挑选最有利于获取信息的手段,促进搜索引擎服务提供者与研究人员不断尝试新技术,为搜索引擎广告商挑选最有效的广告投放手段。传统的搜索引擎评价系统已经不能满足实际需要,提高搜索引擎评价方法的自动化程度势在必行。本文提出利用用户群体行为分析来进行搜索引擎自动性能评价,通过利用用户点击行为中蕴含的信息对检索结果进行相关性的评价,最终达到对各种不同搜索引擎进行性能评价的目的。 相似文献
17.
18.
We develop a new algorithm for clustering search results. Differently from many other clustering systems that have been recently proposed as a post-processing step for Web search engines, our system is not based on phrase analysis inside snippets, but instead uses latent semantic indexing on the whole document content. A main contribution of the paper is a novel strategy – called dynamic SVD clustering – to discover the optimal number of singular values to be used for clustering purposes. Moreover, the algorithm is such that the SVD computation step has in practice good performance, which makes it feasible to perform clustering when term vectors are available. We show that the algorithm has very good classification performance, and that it can be effectively used to cluster results of a search engine to make them easier to browse by users. The algorithm has being integrated into the Noodles search engine, a tool for searching and clustering Web and desktop documents. 相似文献