首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 680 毫秒
1.
Since years,online social networks have evolved from profile and communication websites to online portals where people interact with each other,share and consume multimedia-enriched data and play different types of games.Due to the immense popularity of these online games and their huge revenue potential,the number of these games increases every day,resulting in a current offering of thousands of online social games.In this paper,the applicability of neighborhood-based collaborative filtering(CF) algorithms for the recommendation of online social games is evaluated.This evaluation is based on a large dataset of an online social gaming platform containing game ratings(explicit data) and online gaming behavior(implicit data) of millions of active users.Several similarity metrics were implemented and evaluated on the explicit data,implicit data and a combination thereof.It is shown that the neighborhood-based CF algorithms greatly outperform the content-based algorithm,currently often used on online social gaming websites.The results also show that a combined approach,i.e.,taking into account both implicit and explicit data at the same time,yields overall good results on all evaluation metrics for all scenarios,while only slightly performing worse compared to the strengths of the explicit or implicit only approaches.The best performing algorithms have been implemented in a live setup of the online game platform.  相似文献   

2.
As an important branch of Knowledge Discovery, the task of Data Classification is to determine the objects that belong to which pre-defined goals. As evolutionary computation does not require priori assumptions, it shows great vitality in dealing with imprecise, incomplete and uncertain information, which the traditional methods of statistical classifications are helpless in the classification issues. This paper presents a classification algorithm based on cloud model and genetic algorithm. Experiments show that the algorithm is efficient to continuous attribute data sets for the classification.  相似文献   

3.
In many sensor network applications,it is essential to get the data distribution of the attribute value over the network.Such data distribution can be got through clustering,which partitions the network into contiguous regions,each of which contains sensor nodes of a range of similar readings.This paper proposes a method named Distributed,Hierarchical Clustering (DHC) for online data analysis and mining in senior networks.Different from the acquisition and aggregation of raw sensory data,DHC clusters sensor nodes based on their current data values as well as their geographical proximity,and computes a summary for each cluster.Furthermore,these clusters,together with their summaries,are produced in a distributed,bottom-up manner.The resulting hierarchy of clusters and their summaries facilitates interactive data exploration at multiple resolutions.It can also be used to improve the efficiency of data-centric routing and query processing in sensor networks.We also design and evaluate the maintenance mechanisms for DHC to make it be able to work on evolving data.Our simulation results on real world datasets as well as synthetic datasets show the effectiveness and efficiency of our approach.  相似文献   

4.
Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks.In this paper,we propose a novel vertical data representation called N-list,which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets.Based on the N-list data structure,we develop an efficient mining algorithm,PrePost,for mining all frequent itemsets.Efficiency of PrePost is achieved by the following three reasons.First,N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree.Second,the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy,where m and n are the cardinalities of the two N-lists respectively.Third,PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list.We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets.The experimental results show that the PrePost algorithm is the fastest in most cases.Even though the algorithm consumes more memory when the datasets are sparse,it is still the fastest one.  相似文献   

5.
In this paper, we study the problem of efficiently computing k-medians over high-dimensional and high speed data streams. The focus of this paper is on the issue of minimizing CPU time to handle high speed data streams on top of the requirements of high accuracy and small memory. Our work is motivated by the following observation: the existing algorithms have similar approximation behaviors in practice, even though they make noticeably different worst case theoretical guarantees. The underlying reason is that in order to achieve high approximation level with the smallest possible memory, they need rather complex techniques to maintain a sketch, along time dimension, by using some existing off-line clustering algorithms. Those clustering algorithms cannot guarantee the optimal clustering result over data segments in a data stream but accumulate errors over segments, which makes most algorithms behave the same in terms of approximation level, in practice. We propose a new grid-based approach which divides the entire data set into cells (not along time dimension). We can achieve high approximation level based on a novel concept called (1-∈)-dominant. We further extend the method to the data stream context, by leveraging a density-based heuristic and frequent item mining techniques over data streams. We only need to apply an existing clustering once to computing k-medians, on demand, which reduces CPU time significantly. We conducted extensive experimental studies, and show that our approaches outperform other well-known approaches.  相似文献   

6.
Spatio-temporal clustering has been a hot topic in the feld of spatio-temporal data mining and knowledge discovery.It can be employed to uncover and interpret developmental trends of geographic phenomenon in the real world.However,existing spatio-temporal clustering methods seldom consider both spatiotemporal autocorrelations and heterogeneities among spatio-temporal entities,and the coupling in space and time has not been well highlighted.In this paper,a unifed framework for the clustering analysis of spatio-temporal data is proposed,and a novel spatio-temporal clustering algorithm is developed by means of a spatio-temporal statistics methodology and intelligence computation technology.Our method is applied successfully to fnding spatio-temporal cluster in China’s annual temperature database for the period 1951 1992.  相似文献   

7.
Knowledge Representation in KDD Based on Linguistic Atoms   总被引:11,自引:0,他引:11       下载免费PDF全文
  相似文献   

8.
A rapidly increasing number of Web databases are now become accessible via their HTML form-based query interfaces. Query result pages are dynamically generated in response to user queries, which encode structured data and are displayed for human use. Query result pages usually contain other types of information in addition to query results, e.g., advertisements, navigation bar etc. The problem of extracting structured data from query result pages is critical for web data integration applications, such as comparison shopping, meta-search engines etc, and has been intensively studied. A number of approaches have been proposed. As the structures of Web pages become more and more complex, the existing approaches start to fail, and most of them do not remove irrelevant contents which may affect the accuracy of data record extraction. We propose an automated approach for Web data extraction. First, it makes use of visual features and query terms to identify data sections and extracts data records in these sections. We also represent several content and visual features of visual blocks in a data section, and use them to filter out noisy blocks. Second, it measures similarity between data items in different data records based on their visual and content features, and aligns them into different groups so that the data in the same group have the same semantics. The results of our experiments with a large set of Web query result pages in di?erent domains show that our proposed approaches are highly effective.  相似文献   

9.
There are large and growing textual corpora in which people express contrastive opinions about the same topic.This has led to an increasing number of studies about contrastive opinion mining.However,there are several notable issues with the existing studies.They mostly focus on mining contrastive opinions from multiple data collections,which need to be separated into their respective collections beforehand.In addition,existing models are opaque in terms of the relationship between topics that are extracted and the sentences in the corpus which express the topics;this opacity does not help us understand the opinions expressed in the corpus.Finally,contrastive opinion is mostly analysed qualitatively rather than quantitatively.This paper addresses these matters and proposes a novel unified latent variable model(contraLDA),which:mines contrastive opinions from both single and multiple data collections,extracts the sentences that project the contrastive opinion,and measures the strength of opinion contrastiveness towards the extracted topics.Experimental results show the effectiveness of our model in mining contrasted opinions,which outperformed our baselines in extracting coherent and informative sentiment-bearing topics.We further show the accuracy of our model in classifying topics and sentiments of textual data,and we compared our results to five strong baselines.  相似文献   

10.
How does a social network evolve? Sociologists have studied this question for many years.According to some famous sociologists,social links are driven by social intersections.Actors who affiliate with the shared intersections tend to become interpersonally linked and form a cluster.In the social network,an actor cluster could be a clique or a group of several smaller-sized cliques.Thus we can conclude that a social network is composed of superposed cliques of different sizes.However,sociologists did not verify the theory in large scale data due to lack of computing ability.Motivated by this challenge,incorporated with the theory,we utilize data mining technologies to study the evolution patterns of large scale social networks in real world.Then,we propose a novel Clique-superposition generative model,which generates undirected weighted networks.By extensive experiments,we demonstrate that our model can generate networks with static and time evolving patterns observed not only in earlier literature but also in our work.  相似文献   

11.
文本知识发现:基于信息抽取的文本挖掘   总被引:11,自引:0,他引:11  
1.引言大家熟知,所谓“数据丰富但知识缺乏“的现状导致了数据挖掘(Data Mining)技术研究的兴起,数据挖掘又称数据库知识发现(Knowledge Discovery in Databases)是从海量的结构化信息中抽取或挖掘隐含信息和知识的重要方法和途径。数据挖掘技术已相当成熟。因为除了结构化的数据之外,在数字化信息中更多地存在大量自由、非结构化或半结构化的文本信息如新闻文章、电子书本、电子图书馆藏、Web页面内容、Email、文档数据库等,显然手工处理需要花费大量的人力物力,并且具有不确定性。所以出现了从文本中发现知  相似文献   

12.
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.  相似文献   

13.
基于Web的数据采掘   总被引:19,自引:0,他引:19  
本文论述了知识发现的数据采掘的概念以及所使用的技术,并分析了在INTERNET进行数据采掘的特点和难点。  相似文献   

14.
结构化数据挖掘与复杂类型数据挖掘既有联系,又有区别。如何将这两者统一起来,建立一个统一的理论框架,以指导数据挖掘与知识发现研完,已经成为一个迫切需要解决的问题。本文提出了知识发现状态空间统一模型UMKDSS,将结构化数据挖掘与复杂类型数据挖掘联系起来,为复杂类型数据挖掘提供理论指导。文章最后给出了UMKDSS在Web文本挖掘中的应用实例。  相似文献   

15.
摘 归纳了最新的数据挖掘和知识发现方法的理论和应用进展,详细总结了研究和应用的一些关键技术,最后对数据挖掘和知识发现将来的理论发展趋势和应用趋势做出了展望。  相似文献   

16.
基于时序数据的模式发现算法研究   总被引:1,自引:1,他引:0  
数据库中的知识发现是人工智能领域的一个重要课题,该文针对时序数据中复杂模式的问题,提出了一种新的时序序列模式的逻辑表示法,并设计出一种新的时序序列建模算法。  相似文献   

17.
动态数据挖掘过程中矛盾性知识的研究   总被引:1,自引:0,他引:1  
目前知识发现领域中前沿的、棘手的但又是亟待解决的问题之一就是有关矛盾性知识的问题。本文在积累多年来对知识发现内在机理研究成果的基础上,进一步探求在动态大系统的知识发现过程中,矛盾性知识的概念模型及其突变规律,对当前主流的发展、解决KDD所面临的若干难题和挑战将具有一定的理论意义和实际意义。  相似文献   

18.
This research adopts a framework that synthesizes Knowledge Discovery in Database (KDD), Cross Industry Standard Process for Data Mining (CRISP-DM), and agile practices. The application of this framework is demonstrated through an institutional case study of three knowledge discovery projects: Persistence, Retention, and Donor projects. Results from the case study suggest that (a) interaction and iteration are foundations for the success of a knowledge discovery project, especially one with a strong business focus; (b) agile practices facilitate the interaction and iteration nature of a knowledge discovery project; (c) adding business understanding and deployment steps from CRISP-DM to KDD explicitly helps data miners stay focused on the ultimate goals of the project—the needs of the business and the users.  相似文献   

19.
本文对在维修预测问题中粗糙集理论的应用做了探讨。分析了用粗糙集理论进行数据库知识发现,为智能设备诊断提供数据存储的方法、可行性。着重讨论了粗糙集理论要点和用该理论分析了实际工作中的一个例子。  相似文献   

20.
在分析与研究已有研究成果的基础上,该文提出了知识发现状态空间统一模型UMKDSS,将结构化数据挖掘与复杂类型数据挖掘联系起来,成为知识发现领域的一种统一框架理论,为复杂类型数据挖掘提供理论指导。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号