共查询到20条相似文献,搜索用时 0 毫秒
1.
基于本体和用户相关反馈的扩展查询研究 总被引:1,自引:1,他引:1
描述了一种扩展查询(QE)的新方法,这是一种连接用户相关反馈和本体的混合扩展查询技术,有两大贡献:一是连接了用户相关反馈和本体技术,二是采用FirteX作为实验平台。与目前广泛应用的基于余弦相似性的扩展查询技术相比,实验结果表明方法平均精度达到15%,高于基于余弦相似性的扩展查询技术的13%,并且将平均反馈率提高到了16%。 相似文献
2.
一种基于本体和用户日志的查询扩展方法 总被引:1,自引:0,他引:1
为了解决信息检索中存在的用词歧义性问题,提出一种基于本体和用户日志的查询扩展方法。利用领域本体从语义层面扩展用户查询形成初始扩展概念集,结合用户查询日志利用共现度分析对初始扩展概念集进行二次筛选。实验结果表明,与传统的基于局部共现的扩展方法和基于本体的扩展方法相比较,该方法在保障良好鲁棒性的同时,有效地提高了检索准确率。 相似文献
3.
4.
基于Web日志的个性化搜索引擎模型的发现* 总被引:1,自引:0,他引:1
个性化搜索是指同样的关键字对不同的人返回其感兴趣的搜索结果。对于不同的用户个体,同样的关键字可能有不同含义,如关键字“apple”被爱好音乐的人士理解为Apple iPod,但也会被健康饮食的人士理解为apple fruit。每次用户搜索关键字的过程,都会被记录在网站服务器的后台日志中。通过若干挖掘算法,将Web原始日志信息进行用户识别,会话分组后,提取单一用户多次会话中的搜索关键字关联规则,为实现个性化搜索引擎提供参考。 相似文献
5.
Search engines are among the most popular as well as useful services on the web. There is a need, however, to cater to the preferences of the users when supplying the search results to them. We propose to maintain the search profile of each user, on the basis of which the search results would be determined. This requires the integration of techniques for measuring search quality, learning from the user feedback and biased rank aggregation, etc. For the purpose of measuring web search quality, the “user satisfaction” is gauged by the sequence in which he picks up the results, the time he spends at those documents and whether or not he prints, saves, bookmarks, e-mails to someone or copies-and-pastes a portion of that document. For rank aggregation, we adopt and evaluate the classical fuzzy rank ordering techniques for web applications, and also propose a few novel techniques that outshine the existing techniques. A “user satisfaction” guided web search procedure is also put forward. Learning from the user feedback proceeds in such a way that there is an improvement in the ranking of the documents that are consistently preferred by the users. As an integration of our work, we propose a personalized web search system. 相似文献
6.
网络搜索分析在优化搜索引擎方面具有举足轻重的作用,而且对用户个人搜索特性进行分析能够提高搜索引擎的精准度。目前,大多数已有模型(比如点击图模型及其变体),注重研究用户群体的共同特点。然而,关于如何做到既可以获取用户群体共同特点又可以获取用户个人特点方面的研究却非常少。本文研究了基于个人用户网络搜索分析新问题,即通过研究用户搜索的突发性现象,获取个人用户搜索查询的主题分布情况。提出了两个搜索主题模型,即搜索突发性模型(SBM)和耦合敏感搜索突发性模型(CS-SBM)。SBM假设查询词和URL主题是无关的,CS-SBM假设查询词和URL之间是有主题关联的,得到的主题分布信息存储在偏Dirichlet先验中,采用Beta分布刻画用户搜索的时间特性。实验结果表明,每一个用户的网络搜索轨迹都有多种基于用户的独有特点。同时,在使用大量真实用户查询日志数据情况下,与LDA、DCMLDA、TOT相比,本文提出的模型具有明显的泛化性能优势,并且有效地描绘了用户搜索查询主题在时间上的变化过程。 相似文献
7.
This article proposes a methodology to mine valuable information about the usage of a facility (e.g. building, open public spaces, etc.), based only on Wi-Fi network connection history. Data are collected at Concordia University in Montréal, Canada. Using the Wi-Fi access log data, we characterize activities taking place within a building without any additional knowledge of the building itself. The methodology is based on identification and generation of pertinent variables derived by Principal Component Analysis (PCA) for clustering (i.e. PCA-guided clustering) and time-space activity identification. K-means clustering algorithm is then used to identify 7 activity types associated with buildings in the context of a campus. Based on the activity clusters' centroids, a search algorithm is proposed to associate activities of the same types over multiple days. The spatial distribution of the computed activities and building plans are then compared, which shows a more than 85% match for the weekdays. 相似文献
8.
工作流挖掘技术能够从系统的执行日志中构建出过程,大部分过程挖掘方法都使用了一种图形化的方式来表示模型,也就是控制流图.讨论了工作流模式图挖掘,它实际上是工作流挖掘的一种扩展;对其中所涉及的问题进行了剖析,并介绍了一种模式图挖掘算法. 相似文献
9.
查询扩展是改善和提高信息检索性能的核心技术之一,其关键问题是如何获取与原查询相关的扩展词。通过关联规则挖掘技术获取扩展词是一种有效的扩展词来源方法。为了获取高质量的扩展词,提出了一种面向查询扩展的基于文本数据库的词间正负关联规则挖掘算法。该算法采用支持度-置信度-相关度框架衡量关联规则,避免产生自相矛盾的正、负关联规则,并结合查询项,给出新的剪枝策略,挖掘出只含有查询词项的正负规则,提高了挖掘效率。实验结果表明,与传统的挖掘算法比较,提出的算法更有效、合理,能检测和删除相互矛盾的规则。 相似文献
10.
Adaptive applications may benefit from having models of users? personality to adapt their behavior accordingly. There is a wide variety of domains in which this can be useful, i.e., assistive technologies, e-learning, e-commerce, health care or recommender systems, among others. The most commonly used procedure to obtain the user personality consists of asking the user to fill in questionnaires. However, on one hand, it would be desirable to obtain the user personality as unobtrusively as possible, yet without compromising the reliability of the model built. On the other hand, our hypothesis is that users with similar personality are expected to show common behavioral patterns when interacting through virtual social networks, and that these patterns can be mined in order to predict the tendency of a user personality. With the goal of inferring personality from the analysis of user interactions within social networks, we have developed TP2010, a Facebook application. It has been used to collect information about the personality traits of more than 20,000 users, along with their interactions within Facebook. Based on all the collected data, automatic classifiers were trained by using different machine-learning techniques, with the purpose of looking for interaction patterns that provide information about the users? personality traits. These classifiers are able to predict user personality starting from parameters related to user interactions, such as the number of friends or the number of wall posts. The results show that the classifiers have a high level of accuracy, making the proposed approach a reliable method for predicting the user personality 相似文献
11.
针对传统查询扩展方法在专业领域中扩展词与原始查询之间缺乏语义关联的问题,提出一种基于语义向量表示的查询扩展方法。首先,构建了一个语义向量表示模型,通过对语料库中词的上下文语义进行学习,得到词的语义向量表示;其次,根据词语义向量表示,计算词之间的语义相似度;然后,选取与查询中词汇的语义最相似的词作为查询的扩展词,扩展原始查询语句;最后,基于提出的查询扩展方法构建了生物医学文档检索系统,针对基于维基百科或WordNet的传统查询扩展方法和BioASQ 2014—2015参加竞赛的系统进行对比实验和显著性差异指标分析。实验结果表明,基于语义向量表示查询扩展的检索方法所得到结果优于传统查询扩展方法的结果,平均准确率至少提高了1个百分点,在与竞赛系统的对比中,系统的效果均有显著性提高。 相似文献
12.
韩士雄 《计算机工程与设计》2011,32(11):3716-3721
为了识别出分布式环境下工作流的执行流程,对分布式工作流管理系统进行了研究,通过对分布式工作流执行站点中XML格式的系统运行日志进行分析,提出了一种增量式工作流挖掘算法。该算法通过对大量工作流执行站点中的活动执行时间序列进行分析与合并,从而重构出分布式环境下的工作流模型。该算法主要由两个重要部分组成:一个是时间序列挖掘算法,用于从工作流执行日志中挖掘出活动间的执行时间序列;另一个是工作流程识别算法,在时间序列挖掘算法得出的活动执行时间序列基础上,识别出结构化的工作流模型。通过实例结果表明了该算法的有效性。 相似文献
13.
Micha? WalickiAuthor VitaeDiogo R. FerreiraAuthor Vitae 《Data & Knowledge Engineering》2011,70(10):821-841
Finding the case id in unlabeled event logs is arguably one of the hardest challenges in process mining research. While this problem has been addressed with greedy approaches, these usually converge to sub-optimal solutions. In this work, we describe an approach to perform complete search over the search space. We formulate the problem as a matter of finding the minimal set of patterns contained in a sequence, where patterns can be interleaved but do not have repeating symbols. This represents a new problem that has not been previously addressed in the literature, with NP-hard variants and conjectured NP-completeness. We solve it in a stepwise manner, by generating and verifying a list of candidate solutions. The techniques, introduced to address various subtasks, can be applied independently for solving more specific problems. The approach has been implemented and applied in a case study with real data from a business process supported in a software application. 相似文献
14.
Liu Q Chen E Xiong H Ding CH Chen J 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2012,42(1):218-233
Recommender systems suggest a few items from many possible choices to the users by understanding their past behaviors. In these systems, the user behaviors are influenced by the hidden interests of the users. Learning to leverage the information about user interests is often critical for making better recommendations. However, existing collaborative-filtering-based recommender systems are usually focused on exploiting the information about the user's interaction with the systems; the information about latent user interests is largely underexplored. To that end, inspired by the topic models, in this paper, we propose a novel collaborative-filtering-based recommender system by user interest expansion via personalized ranking, named iExpand. The goal is to build an item-oriented model-based collaborative-filtering framework. The iExpand method introduces a three-layer, user-interests-item, representation scheme, which leads to more accurate ranking recommendation results with less computation cost and helps the understanding of the interactions among users, items, and user interests. Moreover, iExpand strategically deals with many issues that exist in traditional collaborative-filtering approaches, such as the overspecialization problem and the cold-start problem. Finally, we evaluate iExpand on three benchmark data sets, and experimental results show that iExpand can lead to better ranking performance than state-of-the-art methods with a significant margin. 相似文献
15.
Workflow mining: discovering process models from event logs 总被引:17,自引:0,他引:17
van der Aalst W. Weijters T. Maruster L. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(9):1128-1142
Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and, typically, there are discrepancies between the actual workflow processes and the processes as perceived by the management. Therefore, we have developed techniques for discovering workflow models. The starting point for such techniques is a so-called "workflow log" containing information about the workflow process as it is actually being executed. We present a new algorithm to extract a process model from such a log and represent it in terms of a Petri net. However, we also demonstrate that it is not possible to discover arbitrary workflow processes. We explore a class of workflow processes that can be discovered. We show that the /spl alpha/-algorithm can successfully mine any workflow represented by a so-called SWF-net. 相似文献
16.
17.
18.
19.
基于局部类别分析的查询扩展 总被引:1,自引:0,他引:1
针对查询扩展中局部分析方法查准率不高的缺点,提出一种新算法。该算法通过分析与用户查询密切相关的文档,从而得到与其相关的文档类别,进而根据相关类别中的文档用词与用户查询用词的共现关系对查询进行扩展。通过与传统的局部分析方法、全局分析方法的实验对比,结果表明新算法具有更快的检索速度和更高的查准率。 相似文献
20.
Pablo A. D. de Castro Fabrício O. de França Hamilton M. Ferreira Guilherme Palermo Coelho Fernando J. Von Zuben 《Natural computing》2010,9(3):579-602
Query expansion is a technique utilized to improve the performance of information retrieval systems by automatically adding
related terms to the initial query. These additional terms can be obtained from documents stored in a database. Usually, this
task is performed by clustering the documents and then extracting representative terms from the clusters. Afterwards, a new
search is performed in the whole database using the expanded set of terms. Recently, the authors have proposed an immune-inspired
algorithm, namely BIC-aiNet, to perform biclustering of texts. Biclustering differs from standard clustering algorithms in
the sense that the former can detect partial similarities in the attributes. The preliminary results indicated that our proposal
is able to group similar texts effectively and the generated biclusters consistently presented relevant words to represent
a category of texts. Motivated by this promising scenario, this paper better formalizes the proposal and investigates the
usefulness of the whole methodology on larger datasets. The BIC-aiNet was applied to a set of documents aiming at identifying
the set of relevant terms associated with each bicluster, giving rise to a query expansion tool. The obtained results were
compared with those produced by two alternative proposals in the literature, and they indicate that these techniques tend
to generate complementary results, as a consequence of the use of distinct similarity metrics. 相似文献