首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 125 毫秒
1.
利用搜索日志中查询串自身信息和用户点击信息,提出了双层模型识别领域查询串的方法。第一层模型采用贝叶斯法则和词典相结合的方法对查询串进行识别;针对搜索日志查询串内容简短、信息量少的特点,提出基于域名可信度的第二层识别模型。在搜狗2012版用户查询日志上对双层模型进行了开放测试,召回率和准确率分别达到了85.2%和94.6%,实验结果表明了该方法的有效性。  相似文献   

2.
查询建议可以有效减少用户输入、消除查询歧义,提高信息检索的便捷性和准确率。随着电子商务的发展,查询建议也越来越多地应用于电子商务网站的商品搜索中。然而,传统的基于Web搜索的查询建议方法在电商领域并不能完全适用。针对电商这一特定领域,对不同的查询建议技术进行比较,提出了一种综合考虑用户的搜索以及购物行为的查询建议方法,运用MapReduce技术对用户日志进行挖掘,以此生成检索词词库;并通过在线计算与离线计算结合的方法,为用户提供实时查询建议。实验结果表明,本文提出的基于日志挖掘的电商查询建议方法能有效提高查询建议的准确率,并且具有良好的处理性能。  相似文献   

3.
通过对大规模查询日志进行挖掘分析进而提高检索的准确率一直是信息检索领域的热点问题。文章提出一种基于PMI-IR(逐点互信息方法)的联想词表构造方法。该方法利用序列模式挖掘算法扫描大规模用户查询日志,获取共现频次超过某一阈值的词组合,进行聚类获取候选同义词集,然后依次计算词wordA与每个候选词的相似度,选择相似度高于某一阈值的词作为词wordA的联想词集,最后形成联想词表。实验表明,借助该方法得到的联想词表进行扩展查询提高了检索的准确率。  相似文献   

4.
查询推荐已经成为改善用户搜索体验和提高搜索引擎服务质量的重要方法。提高查询推荐串的质量和用户满意度显得尤为迫切。已有研究方法在相似度计算上忽略了命名实体的重要性和搜索日志整体的信息量度。通过对查询串进行聚类后的热度评估,并提取查询串中的命名实体。然后融合查询串热度信息和命名实体特征到相似度计算公式中,提出了一种新的查询推荐方法,该方法所得结果的满意度平均值均比最新的三种方法的推荐结果值高,表明了该方法的有效性。该方法在相似度计算上利用了识别出的命名实体,同时考虑了推荐串在全局日志中的热度,提高了推荐词的总体质量,但方法局限于提取特征的精确度,有赖于特征进一步的丰富和优化。  相似文献   

5.
搜索日志中蕴含海量的信息,利用搜索日志进行挖掘以及分析热点查询内容,对于提高搜索服务的质量有很大的价值和意义。在融合K-means聚类中心迭代优点和查询词向量长度信息的基础上,提出SKHC(类K-means层次聚类)方法,并以该方法对搜索日志聚类。然后,分析聚类后的查询用户数、查询频次、查询累计时间、查询数、统计量特征与热点查询的关系,提出基于各类热度值进行热点查询内容抽取的方法,同时融合了日志热度值和倒排日志频率统计特征。通过对抽取出的结果进行统计分析,并和日志所在月份发生的热点事件进行相关性比较,发现四川地震和北京奥运月平均热度分别达到最高的0.89和0.81,证明了该方法的有效性。  相似文献   

6.
《计算机科学与探索》2016,(9):1290-1298
传统的查询推荐算法通过挖掘查询日志为用户推荐查询词。通常现存模型只考虑原始查询词与推荐词之间的关系(例如语义相似性或相关性等),没有考虑用户在搜索过程中的满意度情况。针对用户在搜索过程中表现出的不同满意度状态,提出了一个查询推荐基本假设,并通过开展在线用户问卷调查,验证了这一假设。基于相应的假设,提出了一种基于用户搜索满意度状态的自适应查询推荐模型,该模型可以为用户智能推荐不同种类的查询词。当用户对搜索结果满意时,模型将为用户提供更加新颖的推荐词;当用户对搜索结果不满意时,模型将为用户提供一些增强信息表示能力的查询词。大规模日志实验表明,提出的推荐模型显著优于传统的查询流图模型,证明了所提模型的有效性。  相似文献   

7.
许静芳  李星  李粤 《计算机工程》2005,31(21):143-145
提出了一种基于用户查询日志的主题式词典的构建方法,用于中文信息检索中分词。利用互信息从用户查询日志中提取短语并与通用词典相结合构建主题式词典。该词典能提高信息检索的准确率和效率,并有助于解决未登录词问题。  相似文献   

8.
矫健  张仰森 《计算机科学》2014,41(12):168-171,188
对查询进行扩展的目的是找出查询中的潜在语义,确定用户意图,进而构造更适合于搜索引擎检索的查询语句,以提高检索的准确率。提出利用隐马尔可夫模型预测查询中的潜在语义的方法,该模型在大规模用户查询日志上进行训练。由该模型预测出的扩展语句查询的准确率较词共现扩展、同义词扩展等方案均有明显提升。  相似文献   

9.
基于大规模搜索日志进行用户行为分析有助提高搜索引擎的各种性能指标。从三个方面对百度开放日志进行详细分析。首先对查询串长度和频次进行统计,发现查询串中存在着长尾效应,前10%最常用查询串的查询次数占总查询次数的70.8%。其次对URL点击深度和频次进行分析,发现有73%的网页只被点击一次,表明互联网中存在着大量低频访问网页。最后对用户使用高级检索情况进行分析,发现有不足0.12%的用户使用高级检索,表明用户更喜爱简单方便的操作。  相似文献   

10.
查询扩展可以有效地消除查询歧义,提高信息检索的准确率和召回率.通过挖掘用户日志中查询词和相关文档的连接关系,构造关联查询,并在此基础上提出一种从关联查询中提取查询扩展词的查询扩展方法.同时,还提出一种查询歧义的判别方法,该方法可以对查询词所表达的检索意图的模糊程度进行有效度量,也可以对查询词的检索性能进行预先估计.通过对查询歧义的度量来动态调整扩展词的长度,提高查询扩展模型的灵活性和适应能力.  相似文献   

11.
网络查询分类对提高搜索引擎的搜索质量有重要的意义。该文通过对真实用户查询日志的分析和标注,发现四种特征词(称之为“VASE”特征词)对查询分类起决定性作用。我们提取特征词并构造了一个特征词倒排索引,用于对查询进行主题分类。在此基础之上,提出了基于网络扩展和加权特征词的方法改善分类的效果。实验结果显示,基于此分类方法的正确率和召回率分别达到78.2%和77.3%。  相似文献   

12.
Identifying and interpreting user intent are fundamental to semantic search. In this paper, we investigate the association of intent with individual words of a search query. We propose that words in queries can be classified as either content or intent, where content words represent the central topic of the query, while users add intent words to make their requirements more explicit. We argue that intelligent processing of intent words can be vital to improving the result quality, and in this work we focus on intent word discovery and understanding. Our approach towards intent word detection is motivated by the hypotheses that query intent words satisfy certain distributional properties in large query logs similar to function words in natural language corpora. Following this idea, we first prove the effectiveness of our corpus distributional features, namely, word co-occurrence counts and entropies, towards function word detection for five natural languages. Next, we show that reliable detection of intent words in queries is possible using these same features computed from query logs. To make the distinction between content and intent words more tangible, we additionally provide operational definitions of content and intent words as those words that should match, and those that need not match, respectively, in the text of relevant documents. In addition to a standard evaluation against human annotations, we also provide an alternative validation of our ideas using clickthrough data. Concordance of the two orthogonal evaluation approaches provide further support to our original hypothesis of the existence of two distinct word classes in search queries. Finally, we provide a taxonomy of intent words derived through rigorous manual analysis of large query logs.  相似文献   

13.
软件的开发及维护过程中经常要对代码进行搜索。基于关键字匹配的代码搜索面临与传统信息检索一样的问题,即用户查询关键字与代码文本用词不匹配。为提高代码搜索精度,需要挖掘软件中的语义相关词进行查询扩展。本文针对软件工程领域设计了一种基于Word Embedding的语义相关词挖掘方法,并且采用IT技术问答网站Stack Overflow的文档作为语料库训练得到了共包含19332个单词的语义相关词表。与前人工作的对比实验验证了本文方法挖掘的语义相关词能有效提高代码搜索精度。  相似文献   

14.
传统的局部上下文分析其应用效果高度依赖于初次检索的结果。针对此局限,通过对用户查询日志的统计分析和过滤,得到用户最可能感兴趣的文章,代替初始检索得到的N篇文章,作为查询扩展词来源文档集,用局部上下文分析方法计算词间相关度。实验结果表明,该方法能够较大地提高查询精度。  相似文献   

15.
Query expansion by mining user logs   总被引:9,自引:0,他引:9  
Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to previous query expansion methods, ours takes advantage of the user judgments implied in user logs. The experimental results show that the log-based query expansion method can produce much better results than both the classical search method and the other query expansion methods.  相似文献   

16.
In the commodity search system based on Solr full-text search technology, this paper uses the third-party Chinese word breaker on the Chinese search information entered by the user segmentation, inquiry commodity index database based on segmenta- tion results, but it ignores the case which the user enters pinyin to search. By analyzing the grammatical structure of Chinese pinyin, the pinyin word segmentation methods are designed, using a proprietary lexicon of e-commerce to construct a Chinese pinyin li- brary, and implementing a commodity search system based onAjax pinyin input prompts to improve the deficiencies in the search in- put method.  相似文献   

17.
This study proposes to use genetic algorithms for defining the topic boundaries in search of engine transaction logs. Users are interested in multiple topics during a search session, and genetic algorithms are used in this study to determine whether a search engine user has changed topics during a session. Sample data logs from FAST and Excite search engines were analyzed. The findings show that genetic algorithms are fairly successful in identifying topic continuations and shifts in search engine transaction logs.  相似文献   

18.
Web search users complain of the inaccurate results produced by current search engines. Most of these inaccurate results are due to a failure to understand the user??s search goal. This paper proposes a method to extract users?? intentions and to build an intention map representing these extracted intentions. The proposed method makes intention vectors from clicked pages from previous search logs obtained on a given query. The components of the intention vector are weights of the keywords in a document. It extracts user??s intentions by using clustering the intention vectors and extracting intention keywords from each cluster. The extracted the intentions on a query are represented in an intention map. For the efficiency analysis of intention map, we extracted user??s intentions using 2,600 search log data a current domestic commercial search engine. The experimental results with a search engine using the intention maps show statistically significant improvements in user satisfaction scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号