首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 171 毫秒
1.
《Information Systems》2006,31(4-5):247-265
As more information becomes available on the Web, there has been a crescent interest in effective personalization techniques. Personal agents providing assistance based on the content of Web documents and the user interests emerged as a viable alternative to this problem. Provided that these agents rely on having knowledge about users contained into user profiles, i.e., models of user preferences and interests gathered by observation of user behavior, the capacity of acquiring and modeling user interest categories has become a critical component in personal agent design. User profiles have to summarize categories corresponding to diverse user information interests at different levels of abstraction in order to allow agents to decide on the relevance of new pieces of information. In accomplishing this goal, document clustering offers the advantage that an a priori knowledge of categories is not needed, therefore the categorization is completely unsupervised. In this paper we present a document clustering algorithm, named WebDCC (Web Document Conceptual Clustering), that carries out incremental, unsupervised concept learning over Web documents in order to acquire user profiles. Unlike most user profiling approaches, this algorithm offers comprehensible clustering solutions that can be easily interpreted and explored by both users and other agents. By extracting semantics from Web pages, this algorithm also produces intermediate results that can be finally integrated in a machine-understandable format such as an ontology. Empirical results of using this algorithm in the context of an intelligent Web search agent proved it can reach high levels of accuracy in suggesting Web pages.  相似文献   

2.
为对异构数据库中的大量孤立、没有语义描述的数据进行自动归类及本体建模,实现异构数据库数据的知识获取,提出了一个基于本体与Web服务的异构数据库知识获取框架,给出了通过Web服务包装异构数据库的访问机制,设计出贝叶斯分类器并应用该分类器对获取的异构数据自动映射到相关的本体.该方法能够通过贝叶斯分类器自动对异构数据归类,实现了异构数据库的交互知识获取.  相似文献   

3.
Twitter and Reddit are two of the most popular social media sites used today. In this paper, we study the use of machine learning and WordNet-based classifiers to generate an interest profile from a user’s tweets and use this to recommend loosely related Reddit threads which the reader is most likely to be interested in. We introduce a genre classification algorithm using a similarity measure derived from WordNet lexical database for English to label genres for nouns in tweets. The proposed algorithm generates a user’s interest profile from their tweets based on a referencing taxonomy of genres derived from the genre-tagged Brown Corpus augmented with a technology genre. The top K genres of a user’s interest profile can be used for recommending subreddit articles in those genres. Experiments using real life test cases collected from Twitter have been done to compare the performance on genre classification by using the WordNet classifier and machine learning classifiers such as SVM, Random Forests, and an ensemble of Bayesian classifiers. Empirically, we have obtained similar results from the two different approaches with a sufficient number of tweets. It seems that machine learning algorithms as well as the WordNet ontology are viable tools for developing recommendation engine based on genre classification. One advantage of the WordNet approach is simplicity and no learning is required. However, the WordNet classifier tends to have poor precision on users with very few tweets.  相似文献   

4.
从Web日志文件中挖掘出用户行为模式,是所有Web站点管理者的迫切需要,但由于web日志数据量大,存有大量的干扰和不完整的数据,导致无法准确的抽取出用户行为的模式。小环境无监督聚类算法适合挖掘具有噪音和不完整数据的大量数据集,但它是基于欧几里德空间的二维模型,数据表示不直观。我们对UNC进行改进,提出了具有层次结构的UNC(简称LUNC)。性能测试实验证明,该模型具有较好的整体性能。  相似文献   

5.
Abstract: User profiling on the Web is a topic that has attracted a great number of technological approaches and applications. In most user profiling approaches the website learns profiles from data implicitly acquired from user behaviours, i.e. observing the behaviours of users with a statistically significant number of accesses. This paper presents an alternative approach. In this approach the website explicitly acquires data from users, user interests are represented in a Bayesian network, and user profiles are enriched and refined over time. The profile enrichment is achieved through a sequential asking algorithm based on the value-of-information theory using the Shannon entropy concept. However, what mostly characterizes the approach is the fact that the user is involved in a collaborative process of profile building. The approach has been tried out for over a year in a real application. On the basis of the experimental results the approach turns out to be particularly suitable for applications where the website is strongly based on deep domain knowledge (as for example is the case for scientific websites) and has a community of users that share the same domain knowledge of the website and produce a 'low' number of accesses ('low' compared to the high number of accesses of a typical commercial website). After presenting the technical aspects of the approach, we discuss the underlying ideas in the light of the experimental results and the literature on human–computer interaction and user profiling.  相似文献   

6.
Concern for privacy when users are surfing on the Web has increased recently. Nowadays, many users are aware that when they are accessing Web sites, these Web sites can track them and create profiles on the elements they access, the advertisements they see, the different links they visit, from which Web sites they come from and to which sites they exit, and so on. In order to maintain user privacy, several techniques, methods and solutions have appeared. In this paper we present an analysis of both these solutions and the main tools that are freely distributed or can be used freely and that implement some of these techniques and methods to preserve privacy when users and surfing on the Internet. This work, unlike previous reviews, shows in a comprehensive way, all the different risks when a user navigates on the Web, the different solutions proposed that finally have being implemented and being used to achieve Web privacy goal. Thus, users can decide which tools to use when they want navigate privately and what kind of risks they are assuming.  相似文献   

7.
目的 视频精彩片段提取是视频内容标注、基于内容的视频检索等领域的热点研究问题。视频精彩片段提取主要根据视频底层特征进行精彩片段的提取,忽略了用户兴趣对于提取结果的影响,导致提取结果可能与用户期望不相符。另一方面,基于用户兴趣的语义建模需要大量的标注视频训练样本才能获得较为鲁棒的语义分类器,而对于大量训练样本的标注费时费力。考虑到互联网中包含内容丰富且易于获取的图像,将互联网图像中的知识迁移到视频片段的语义模型中可以减少大量的视频数据标注工作。因此,提出利用互联网图像的用户兴趣的视频精彩片段提取框架。方法 利用大量互联网图像对用户兴趣语义进行建模,考虑到从互联网中获取的知识变化多样且有噪声,如果不加选择盲目地使用会影响视频片段提取效果,因此,将图像根据语义近似性进行分组,将语义相似但使用不同关键词检索得到的图像称为近义图像组。在此基础上,提出使用近义语义联合组权重模型权衡,根据图像组与视频的语义相关性为不同图像组分配不同的权重。首先,根据用户兴趣从互联网图像搜索引擎中检索与该兴趣语义相关的图像集,作为用户兴趣精彩片段提取的知识来源;然后,通过对近义语义图像组的联合组权重学习,将图像中习得的知识迁移到视频中;最后,使用图像集中习得的语义模型对待提取片段进行精彩片段提取。结果 本文使用CCV数据库中的视频对本文提出的方法进行验证,同时与多种已有的视频关键帧提取算法进行比较,实验结果显示本文算法的平均准确率达到46.54,较其他算法相比提高了21.6%,同时算法耗时并无增加。此外,为探究优化过程中不同平衡参数对最终结果的影响,进一步验证本文方法的有效性,本文在实验过程中通过移除算法中的正则项来验证每一项对于算法框架的影响。实验结果显示,在移除任何一项后算法的准确率明显降低,这表明本文方法所提出的联合组权重模型对提取用户感兴趣视频片段的有效性。结论 本文提出了一种针对用户兴趣语义的视频精彩片段提取方法,根据用户关注点的不同,为不同用户提取其感兴趣的视频片段。  相似文献   

8.
 We present a study of the role of user profiles using fuzzy logic in web retrieval processes. Flexibility for user interaction and for adaptation in profile construction becomes an important issue. We focus our study on user profiles, including creation, modification, storage, clustering and interpretation. We also consider the role of fuzzy logic and other soft computing techniques to improve user profiles. Extended profiles contain additional information related to the user that can be used to personalize and customize the retrieval process as well as the web site. Web mining processes can be carried out by means of fuzzy clustering of these extended profiles and fuzzy rule construction. Fuzzy inference can be used in order to modify queries and extract knowledge from profiles with marketing purposes within a web framework. An architecture of a portal that could support web mining technology is also presented.  相似文献   

9.
In order to adapt functionality to their individual users, systems need information about these users. The Social Web provides opportunities to gather user data from outside the system itself. Aggregated user data may be useful to address cold-start problems as well as sparse user profiles, but this depends on the nature of individual user profiles distributed on the Social Web. For example, does it make sense to re-use Flickr profiles to recommend bookmarks in Delicious? In this article, we study distributed form-based and tag-based user profiles, based on a large dataset aggregated from the Social Web. We analyze the completeness, consistency and replication of form-based profiles, which users explicitly create by filling out forms at Social Web systems such as Twitter, Facebook and LinkedIn. We also investigate tag-based profiles, which result from social tagging activities in systems such as Flickr, Delicious and StumbleUpon: to what extent do tag-based profiles overlap between different systems, what are the benefits of aggregating tag-based profiles. Based on these insights, we developed and evaluated the performance of several cross-system user modeling strategies in the context of recommender systems. The evaluation results show that the proposed methods solve the cold-start problem and improve recommendation quality significantly, even beyond the cold-start.  相似文献   

10.
In this paper, we present a complete framework and findings in mining Web usage patterns from Web log files of a real Web site that has all the challenging aspects of real-life Web usage mining, including evolving user profiles and external data describing an ontology of the Web content. Even though the Web site under study is part of a nonprofit organization that does not "sell" any products, it was crucial to understand "who" the users were, "what" they looked at, and "how their interests changed with time," all of which are important questions in Customer Relationship Management (CRM). Hence, we present an approach for discovering and tracking evolving user profiles. We also describe how the discovered user profiles can be enriched with explicit information need that is inferred from search queries extracted from Web log data. Profiles are also enriched with other domain-specific information facets that give a panoramic view of the discovered mass usage modes. An objective validation strategy is also used to assess the quality of the mined profiles, in particular their adaptability in the face of evolving user behavior.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号