共查询到20条相似文献,搜索用时 78 毫秒
1.
Miao Wan Arne J?nsson Cong Wang Lixiang Li Yixian Yang 《Knowledge and Information Systems》2011,33(1):89-115
Users of a Web site usually perform their interest-oriented actions by clicking or visiting Web pages, which are traced in access log files. Clustering Web user access patterns may capture common user interests to a Web site, and in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. The conventional Web usage mining techniques for clustering Web user sessions can discover usage patterns directly, but cannot identify the latent factors or hidden relationships among users?? navigational behaviour. In this paper, we propose an approach based on a vector space model, called Random Indexing, to discover such intrinsic characteristics of Web users?? activities. The underlying factors are then utilised for clustering individual user navigational patterns and creating common user profiles. The clustering results will be used to predict and prefetch Web requests for grouped users. We demonstrate the usability and superiority of the proposed Web user clustering approach through experiments on a real Web log file. The clustering and prefetching tasks are evaluated by comparison with previous studies demonstrating better clustering performance and higher prefetching accuracy. 相似文献
3.
4.
集成Web使用挖掘和内容挖掘的用户浏览兴趣迁移挖掘算法 总被引:2,自引:0,他引:2
提出了一种集成Web使用挖掘和内容挖掘的用户浏览兴趣迁移模式的模型和算法。介绍了Web页面及其聚类。通过替代用户事务中的页面为相应聚类的方法得到用户浏览兴趣序列。从用户浏览兴趣序列中得到用户浏览兴趣迁移模式。该模型对于网络管理者理解用户的行为特征和安排Web站点结构有较大的意义。 相似文献
5.
The World Wide Web's constant growth and transformation offers great opportunities - and poses many challenges for exploitation. The semantic Web has emerged as a possible answer to these challenges by offering an approach in organizing and sharing the Web, but it doesn't explicitly consider user needs and requirements. Heraclitus is a framework for semantic Web adaptation in which user needs and requirements are the driving force, using text and Web usage mining to support Web site ontology and topology evolution. The result is improved user interaction with the semantic Web 相似文献
6.
7.
Magpie has been one of the first truly effective approaches to bringing semantics into the web browsing experience. The key innovation brought by Magpie was the replacement of a manual annotation process by an automatically associated ontology-based semantic layer over web resources, which ensured added value at no cost for the user. Magpie also differs from older open hypermedia systems: its associations between entities in a web page and semantic concepts from an ontology enable link typing and subsequent interpretation of the resource. The semantic layer in Magpie also facilitates locating semantic services and making them available to the user, so that they can be manually activated by a user or opportunistically triggered when appropriate patterns are encountered during browsing. In this paper we track the evolution of Magpie as a technology for developing open and flexible Semantic Web applications. Magpie emerged from our research into user-accessible Semantic Web, and we use this viewpoint to assess the role of tools like Magpie in making semantic content useful for ordinary users. We see such tools as crucial in bootstrapping the Semantic Web through the automation of the knowledge generation process. 相似文献
8.
Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm 总被引:1,自引:0,他引:1
We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed and its superior performance is demonstrated. 相似文献
9.
当今互联网所提供的功能和服务越来越多,Web内容也越来越丰富,移动应用越来越流行。然而,复杂的Web服务应用对用户提出了更高的要求,给用户浏览带来了很多问题,很多时候用户会感到无所适从。文中提出基于用户浏览序列模式的用户行为提取与分析方法。该方法可以分为浏览模式分析和用户聚类两部分。在浏览模式分析时,首先根据用户行为数据得到浏览序列,然后运用序列模式挖掘PrefixSpan算法获取用户习惯的浏览模式,最后把分析获取的用户浏览模式应用到Web浏览中,为不同的用户需求提供个性化的服务。在用户聚类时,运用层次聚类方法按照浏览模式的相似性对用户进行聚类,以分析用户的不同属性(如年龄、职业、学历等)对用户浏览模式的影响。实验结果表明,文中采用的PrefixSpan算法和层次聚类方法在用户浏览模式分析和研究方面具有很好的可行性和有效性。 相似文献
10.
基于特征映射的微博用户标签兴趣聚类方法 总被引:1,自引:1,他引:0
针对现有的用户兴趣聚类方法没有考虑用户标签之间存在的语义相关性问题,提出了一种基于特征映射的微博用户标签兴趣聚类方法。首先,获取待分析用户及其所关注用户的用户标签,选取出现频数高于设定阈值的标签构建模糊矩阵的特征维;然后,考虑标签之间的语义相关性,利用特征映射的思想将用户标签根
据其与特征维标签之间的语义相似度映射到每个特征维下,计算每个特征维所对应的特征值;最后,利用模糊聚类得到了不同阈值下的用户兴趣聚类结果。实验结果表明,本文提出的基于特征映射的微博用户标签兴趣聚类方法有效地改善了用户兴趣聚类效果。 相似文献
11.
介绍了一种扩展UDDI以支持语义信息的方法,即在注册Web服务时添加语义信息,并支持基于语义的查询。首先在UDDI系统中加入一个领域本体库,再为该UDDI中的每个注册服务添加语义信息,并将服务和本体库的对应关系存入到UDDI的数据库中。在服务申请者查询Web服务时,由用户提供语义查询模板,根据用户描述的本体语义信息得到候选服务列表,再根据用户对服务质量的要求计算候选服务的匹配度,将候选服务依照其匹配度的大小顺序返回给用户。 相似文献
12.
13.
The degree of personalization that a Web site offers in presenting its services to users is an important attribute contributing to the site's popularity. Web server access logs contain substantial data about user access patterns. One way to solve this problem is to group users on the basis of their Web interests and then organize the site's structure according to the needs of different groups. Two main difficulties inhibit this approach: the essentially infinite diversity of user interests and the change in these interests with time. We have developed a clustering algorithm that groups users according to their Web access patterns. The algorithm is based on the ART1 version of adaptive resonance theory. In our ART1-based algorithm, a prototype vector represents each user cluster by generalizing the URLs most frequently accessed by all cluster members. We have compared our algorithm's performance with the traditional k-means clustering algorithm. Results showed that the ART1-based technique performed better in terms of intracluster distances. We also applied the technique in a prefetching scheme that predicts future user requests. 相似文献
14.
会话识别是用户访问行为分析的基础和关键工作,其质量对于识别和发现用户的信息需求具有决定性的影响。目前常用的是基于时间阈值的切分方法,但是该方法存在的主要问题是针对不同用户时间阈值难以准确地确定。提出了一种新的基于聚类技术的会话识别优化方法,首先建立了基于聚类的会话识别优化模型,然后采用改进的K-means算法进行会话识别。实验结果表明该方法与传统方法相比具有较好的效果。 相似文献
15.
16.
Rules are increasingly becoming an important form of knowledge representation on the Semantic Web. There are currently few methods that can ensure that the acquisition and management of rules can scale to the size of the Web. We previously developed methods to help manage large rule bases using syntactical analyses of rules. This approach did not incorporate semantics. As a result, rule categorization based on syntactic features may not be effective. In this paper, we present a novel approach for grouping rules based on whether the rule elements share relationships within a domain ontology. We have developed our method for rules specified in the Semantic Web Rule Language (SWRL), which is based on the Web Ontology Language (OWL) and shares its formal underpinnings. Our method uses vector space modeling of rule atoms and an ontology-based semantic similarity measure. We apply a clustering method to detect rule relatedness, and we use a statistical model selection method to find the optimal number of clusters within a rule base. Using three different SWRL rule bases, we evaluated the results of our semantic clustering method against those of our syntactic approach. We have found that our new approach creates clusters that better match the rule bases’ logical structures. Semantic clustering of rule bases may help users to more rapidly comprehend, acquire, and manage the growing numbers of rules on the Semantic Web. 相似文献
17.
Web用户会话聚类是电子商务领域的NP-难问题,目的是发现相似的用户访问行为模式。该问题难度在于对大规模的Web会话进行聚类,且每个会话都表示为高维向量。提出一种细菌觅食算法和K-means相结合的优化算法,用知名的数据集测试其有效性。对Web会话进行聚类,与流行的聚类算法进行比较,实验结果显示该算法高效且性能更优。 相似文献
18.
19.
20.
从Web日志文件中挖掘出用户行为模式,是所有Web站点管理者的迫切需要,但由于web日志数据量大,存有大量的干扰和不完整的数据,导致无法准确的抽取出用户行为的模式。小环境无监督聚类算法适合挖掘具有噪音和不完整数据的大量数据集,但它是基于欧几里德空间的二维模型,数据表示不直观。我们对UNC进行改进,提出了具有层次结构的UNC(简称LUNC)。性能测试实验证明,该模型具有较好的整体性能。 相似文献