首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
《Information Systems》2006,31(4-5):247-265
As more information becomes available on the Web, there has been a crescent interest in effective personalization techniques. Personal agents providing assistance based on the content of Web documents and the user interests emerged as a viable alternative to this problem. Provided that these agents rely on having knowledge about users contained into user profiles, i.e., models of user preferences and interests gathered by observation of user behavior, the capacity of acquiring and modeling user interest categories has become a critical component in personal agent design. User profiles have to summarize categories corresponding to diverse user information interests at different levels of abstraction in order to allow agents to decide on the relevance of new pieces of information. In accomplishing this goal, document clustering offers the advantage that an a priori knowledge of categories is not needed, therefore the categorization is completely unsupervised. In this paper we present a document clustering algorithm, named WebDCC (Web Document Conceptual Clustering), that carries out incremental, unsupervised concept learning over Web documents in order to acquire user profiles. Unlike most user profiling approaches, this algorithm offers comprehensible clustering solutions that can be easily interpreted and explored by both users and other agents. By extracting semantics from Web pages, this algorithm also produces intermediate results that can be finally integrated in a machine-understandable format such as an ontology. Empirical results of using this algorithm in the context of an intelligent Web search agent proved it can reach high levels of accuracy in suggesting Web pages.  相似文献   

2.
为对异构数据库中的大量孤立、没有语义描述的数据进行自动归类及本体建模,实现异构数据库数据的知识获取,提出了一个基于本体与Web服务的异构数据库知识获取框架,给出了通过Web服务包装异构数据库的访问机制,设计出贝叶斯分类器并应用该分类器对获取的异构数据自动映射到相关的本体.该方法能够通过贝叶斯分类器自动对异构数据归类,实现了异构数据库的交互知识获取.  相似文献   

3.
Twitter and Reddit are two of the most popular social media sites used today. In this paper, we study the use of machine learning and WordNet-based classifiers to generate an interest profile from a user’s tweets and use this to recommend loosely related Reddit threads which the reader is most likely to be interested in. We introduce a genre classification algorithm using a similarity measure derived from WordNet lexical database for English to label genres for nouns in tweets. The proposed algorithm generates a user’s interest profile from their tweets based on a referencing taxonomy of genres derived from the genre-tagged Brown Corpus augmented with a technology genre. The top K genres of a user’s interest profile can be used for recommending subreddit articles in those genres. Experiments using real life test cases collected from Twitter have been done to compare the performance on genre classification by using the WordNet classifier and machine learning classifiers such as SVM, Random Forests, and an ensemble of Bayesian classifiers. Empirically, we have obtained similar results from the two different approaches with a sufficient number of tweets. It seems that machine learning algorithms as well as the WordNet ontology are viable tools for developing recommendation engine based on genre classification. One advantage of the WordNet approach is simplicity and no learning is required. However, the WordNet classifier tends to have poor precision on users with very few tweets.  相似文献   

4.
从Web日志文件中挖掘出用户行为模式,是所有Web站点管理者的迫切需要,但由于web日志数据量大,存有大量的干扰和不完整的数据,导致无法准确的抽取出用户行为的模式。小环境无监督聚类算法适合挖掘具有噪音和不完整数据的大量数据集,但它是基于欧几里德空间的二维模型,数据表示不直观。我们对UNC进行改进,提出了具有层次结构的UNC(简称LUNC)。性能测试实验证明,该模型具有较好的整体性能。  相似文献   

5.
Abstract: User profiling on the Web is a topic that has attracted a great number of technological approaches and applications. In most user profiling approaches the website learns profiles from data implicitly acquired from user behaviours, i.e. observing the behaviours of users with a statistically significant number of accesses. This paper presents an alternative approach. In this approach the website explicitly acquires data from users, user interests are represented in a Bayesian network, and user profiles are enriched and refined over time. The profile enrichment is achieved through a sequential asking algorithm based on the value-of-information theory using the Shannon entropy concept. However, what mostly characterizes the approach is the fact that the user is involved in a collaborative process of profile building. The approach has been tried out for over a year in a real application. On the basis of the experimental results the approach turns out to be particularly suitable for applications where the website is strongly based on deep domain knowledge (as for example is the case for scientific websites) and has a community of users that share the same domain knowledge of the website and produce a 'low' number of accesses ('low' compared to the high number of accesses of a typical commercial website). After presenting the technical aspects of the approach, we discuss the underlying ideas in the light of the experimental results and the literature on human–computer interaction and user profiling.  相似文献   

6.
Concern for privacy when users are surfing on the Web has increased recently. Nowadays, many users are aware that when they are accessing Web sites, these Web sites can track them and create profiles on the elements they access, the advertisements they see, the different links they visit, from which Web sites they come from and to which sites they exit, and so on. In order to maintain user privacy, several techniques, methods and solutions have appeared. In this paper we present an analysis of both these solutions and the main tools that are freely distributed or can be used freely and that implement some of these techniques and methods to preserve privacy when users and surfing on the Internet. This work, unlike previous reviews, shows in a comprehensive way, all the different risks when a user navigates on the Web, the different solutions proposed that finally have being implemented and being used to achieve Web privacy goal. Thus, users can decide which tools to use when they want navigate privately and what kind of risks they are assuming.  相似文献   

7.
In order to adapt functionality to their individual users, systems need information about these users. The Social Web provides opportunities to gather user data from outside the system itself. Aggregated user data may be useful to address cold-start problems as well as sparse user profiles, but this depends on the nature of individual user profiles distributed on the Social Web. For example, does it make sense to re-use Flickr profiles to recommend bookmarks in Delicious? In this article, we study distributed form-based and tag-based user profiles, based on a large dataset aggregated from the Social Web. We analyze the completeness, consistency and replication of form-based profiles, which users explicitly create by filling out forms at Social Web systems such as Twitter, Facebook and LinkedIn. We also investigate tag-based profiles, which result from social tagging activities in systems such as Flickr, Delicious and StumbleUpon: to what extent do tag-based profiles overlap between different systems, what are the benefits of aggregating tag-based profiles. Based on these insights, we developed and evaluated the performance of several cross-system user modeling strategies in the context of recommender systems. The evaluation results show that the proposed methods solve the cold-start problem and improve recommendation quality significantly, even beyond the cold-start.  相似文献   

8.
目的 视频精彩片段提取是视频内容标注、基于内容的视频检索等领域的热点研究问题。视频精彩片段提取主要根据视频底层特征进行精彩片段的提取,忽略了用户兴趣对于提取结果的影响,导致提取结果可能与用户期望不相符。另一方面,基于用户兴趣的语义建模需要大量的标注视频训练样本才能获得较为鲁棒的语义分类器,而对于大量训练样本的标注费时费力。考虑到互联网中包含内容丰富且易于获取的图像,将互联网图像中的知识迁移到视频片段的语义模型中可以减少大量的视频数据标注工作。因此,提出利用互联网图像的用户兴趣的视频精彩片段提取框架。方法 利用大量互联网图像对用户兴趣语义进行建模,考虑到从互联网中获取的知识变化多样且有噪声,如果不加选择盲目地使用会影响视频片段提取效果,因此,将图像根据语义近似性进行分组,将语义相似但使用不同关键词检索得到的图像称为近义图像组。在此基础上,提出使用近义语义联合组权重模型权衡,根据图像组与视频的语义相关性为不同图像组分配不同的权重。首先,根据用户兴趣从互联网图像搜索引擎中检索与该兴趣语义相关的图像集,作为用户兴趣精彩片段提取的知识来源;然后,通过对近义语义图像组的联合组权重学习,将图像中习得的知识迁移到视频中;最后,使用图像集中习得的语义模型对待提取片段进行精彩片段提取。结果 本文使用CCV数据库中的视频对本文提出的方法进行验证,同时与多种已有的视频关键帧提取算法进行比较,实验结果显示本文算法的平均准确率达到46.54,较其他算法相比提高了21.6%,同时算法耗时并无增加。此外,为探究优化过程中不同平衡参数对最终结果的影响,进一步验证本文方法的有效性,本文在实验过程中通过移除算法中的正则项来验证每一项对于算法框架的影响。实验结果显示,在移除任何一项后算法的准确率明显降低,这表明本文方法所提出的联合组权重模型对提取用户感兴趣视频片段的有效性。结论 本文提出了一种针对用户兴趣语义的视频精彩片段提取方法,根据用户关注点的不同,为不同用户提取其感兴趣的视频片段。  相似文献   

9.
 We present a study of the role of user profiles using fuzzy logic in web retrieval processes. Flexibility for user interaction and for adaptation in profile construction becomes an important issue. We focus our study on user profiles, including creation, modification, storage, clustering and interpretation. We also consider the role of fuzzy logic and other soft computing techniques to improve user profiles. Extended profiles contain additional information related to the user that can be used to personalize and customize the retrieval process as well as the web site. Web mining processes can be carried out by means of fuzzy clustering of these extended profiles and fuzzy rule construction. Fuzzy inference can be used in order to modify queries and extract knowledge from profiles with marketing purposes within a web framework. An architecture of a portal that could support web mining technology is also presented.  相似文献   

10.
In this paper, we present a complete framework and findings in mining Web usage patterns from Web log files of a real Web site that has all the challenging aspects of real-life Web usage mining, including evolving user profiles and external data describing an ontology of the Web content. Even though the Web site under study is part of a nonprofit organization that does not "sell" any products, it was crucial to understand "who" the users were, "what" they looked at, and "how their interests changed with time," all of which are important questions in Customer Relationship Management (CRM). Hence, we present an approach for discovering and tracking evolving user profiles. We also describe how the discovered user profiles can be enriched with explicit information need that is inferred from search queries extracted from Web log data. Profiles are also enriched with other domain-specific information facets that give a panoramic view of the discovered mass usage modes. An objective validation strategy is also used to assess the quality of the mined profiles, in particular their adaptability in the face of evolving user behavior.  相似文献   

11.
Improving the Quality of the Personalized Electronic Program Guide   总被引:4,自引:0,他引:4  
As Digital TV subscribers are offered more and more channels, it is becoming increasingly difficult for them to locate the right programme information at the right time. The personalized Electronic Programme Guide (pEPG) is one solution to this problem; it leverages artificial intelligence and user profiling techniques to learn about the viewing preferences of individual users in order to compile personalized viewing guides that fit their individual preferences. Very often the limited availability of profiling information is a key limiting factor in such personalized recommender systems. For example, it is well known that collaborative filtering approaches suffer significantly from the sparsity problem, which exists because the expected item-overlap between profiles is usually very low. In this article we address the sparsity problem in the Digital TV domain. We propose the use of data mining techniques as a way of supplementing meagre ratings-based profile knowledge with additional item-similarity knowledge that can be automatically discovered by mining user profiles. We argue that this new similarity knowledge can significantly enhance the performance of a recommender system in even the sparsest of profile spaces. Moreover, we provide an extensive evaluation of our approach using two large-scale, state-of-the-art online systems—PTVPlus, a personalized TV listings portal and Físchlár, an online digital video library system.  相似文献   

12.
For educators, the World Wide Web offers a valuable technology for knowledge sharing. It can complement more traditional approaches to knowledge sharing such as books and lectures. Here, we identify and differentiate three major approaches for Web-based knowledge sharing: course-centered sites, subject-centered sites, and book-centered sites. A rationale for book-centered sites, those developed to facilitate students’ and instructors’ efforts in courses that use the book, is advanced. We introduce an architecture of features that can guide developers of such sites. This is illustrated by a book-centered site implemented according to the architecture. Several sites for introductory business computing books are compared and contrasted in terms of the architecture, suggesting ways in which each can be extended. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

13.
Magpie has been one of the first truly effective approaches to bringing semantics into the web browsing experience. The key innovation brought by Magpie was the replacement of a manual annotation process by an automatically associated ontology-based semantic layer over web resources, which ensured added value at no cost for the user. Magpie also differs from older open hypermedia systems: its associations between entities in a web page and semantic concepts from an ontology enable link typing and subsequent interpretation of the resource. The semantic layer in Magpie also facilitates locating semantic services and making them available to the user, so that they can be manually activated by a user or opportunistically triggered when appropriate patterns are encountered during browsing. In this paper we track the evolution of Magpie as a technology for developing open and flexible Semantic Web applications. Magpie emerged from our research into user-accessible Semantic Web, and we use this viewpoint to assess the role of tools like Magpie in making semantic content useful for ordinary users. We see such tools as crucial in bootstrapping the Semantic Web through the automation of the knowledge generation process.  相似文献   

14.
Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization   总被引:21,自引:1,他引:20  
Web usage mining, possibly used in conjunction with standard approaches to personalization such as collaborative filtering, can help address some of the shortcomings of these techniques, including reliance on subjective user ratings, lack of scalability, and poor performance in the face of high-dimensional and sparse data. However, the discovery of patterns from usage data by itself is not sufficient for performing the personalization tasks. The critical step is the effective derivation of good quality and useful (i.e., actionable) aggregate usage profiles from these patterns. In this paper we present and experimentally evaluate two techniques, based on clustering of user transactions and clustering of pageviews, in order to discover overlapping aggregate profiles that can be effectively used by recommender systems for real-time Web personalization. We evaluate these techniques both in terms of the quality of the individual profiles generated, as well as in the context of providing recommendations as an integrated part of a personalization engine. In particular, our results indicate that using the generated aggregate profiles, we can achieve effective personalization at early stages of users' visits to a site, based only on anonymous clickstream data and without the benefit of explicit input by these users or deeper knowledge about them.  相似文献   

15.
Bayesian Network Classifiers   总被引:154,自引:0,他引:154  
Friedman  Nir  Geiger  Dan  Goldszmidt  Moises 《Machine Learning》1997,29(2-3):131-163
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.  相似文献   

16.
For learning a Bayesian network classifier, continuous attributes usually need to be discretized. But the discretization of continuous attributes may bring information missing, noise and less sensitivity to the changing of the attributes towards class variables. In this paper, we use the Gaussian kernel function with smoothing parameter to estimate the density of attributes. Bayesian network classifier with continuous attributes is established by the dependency extension of Naive Bayes classifiers. We also analyze the information provided to a class for each attributes as a basis for the dependency extension of Naive Bayes classifiers. Experimental studies on UCI data sets show that Bayesian network classifiers using Gaussian kernel function provide good classification accuracy comparing to other approaches when dealing with continuous attributes.  相似文献   

17.
Users of a Web site usually perform their interest-oriented actions by clicking or visiting Web pages, which are traced in access log files. Clustering Web user access patterns may capture common user interests to a Web site, and in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. The conventional Web usage mining techniques for clustering Web user sessions can discover usage patterns directly, but cannot identify the latent factors or hidden relationships among users?? navigational behaviour. In this paper, we propose an approach based on a vector space model, called Random Indexing, to discover such intrinsic characteristics of Web users?? activities. The underlying factors are then utilised for clustering individual user navigational patterns and creating common user profiles. The clustering results will be used to predict and prefetch Web requests for grouped users. We demonstrate the usability and superiority of the proposed Web user clustering approach through experiments on a real Web log file. The clustering and prefetching tasks are evaluated by comparison with previous studies demonstrating better clustering performance and higher prefetching accuracy.  相似文献   

18.
19.
Our social media experience is no longer limited to a single site. We use different social media sites for different purposes and our information on each site is often partial. By collecting complementary information for the same individual across sites, one can better profile users. These profiles can help improve online services such as advertising or recommendation across sites. To combine complementary information across sites, it is critical to understand how information for the same individual varies across sites. In this study, we aim to understand how two fundamental properties of users vary across social media sites. First, we study how user friendship behavior varies across sites. Our findings show how friend distributions for individuals change as they join new sites. Next, we analyze how user popularity changes across sites as individuals join different sites. We evaluate our findings and demonstrate how our findings can be employed to predict how popular users are likely to be on new sites they join.  相似文献   

20.
We show that it is possible to collect data that are useful for collaborative filtering (CF) using an autonomous Web spider. In CF, entities are recommended to a new user based on the stated preferences of other, similar users. We describe a CF spider that collects from the Web lists of semantically related entities. These lists can then be used by existing CF algorithms by encoding them as ‘pseudo-users'. Importantly, the spider can collect useful data without pre-programmed knowledge about the format of particular pages or particular sites. Instead, the CF spider uses commercial Web-search engines to find pages likely to contain lists in the domain of interest, and then applies previously proposed heuristics to extract lists from these pages. We show that data collected by this spider are nearly as effective for CF as data collected from real users, and more effective than data collected by two plausible hand-programmed spiders. In some cases, autonomously spidered data can also be combined with actual user data to improve performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号