首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
面向属性的归纳与概念聚类   总被引:3,自引:1,他引:3  
伍小荣  谢立宏 《计算机工程》2003,29(5):92-93,123
面向属性的归纳是新近提出的一种广泛用于数据库中知识发现的方法,文章指出这种方法与一种机器学习方法-概念聚类之间的紧密联系,并描述如何使用一个概念聚类算法进行面向属性的归纳。  相似文献   

2.
Expert systems are built from knowledge traditionally elicited from the human expert. It is precisely knowledge elicitation from the expert that is the bottleneck in expert system construction. On the other hand, a data mining system, which automatically extracts knowledge, needs expert guidance on the successive decisions to be made in each of the system phases. In this context, expert knowledge and data mining discovered knowledge can cooperate, maximizing their individual capabilities: data mining discovered knowledge can be used as a complementary source of knowledge for the expert system, whereas expert knowledge can be used to guide the data mining process. This article summarizes different examples of systems where there is cooperation between expert knowledge and data mining discovered knowledge and reports our experience of such cooperation gathered from a medical diagnosis project called Intelligent Interpretation of Isokinetics Data, which we developed. From that experience, a series of lessons were learned throughout project development. Some of these lessons are generally applicable and others pertain exclusively to certain project types.  相似文献   

3.
Twitter data has recently been considered to perform a large variety of advanced analysis. Analysis of Twitter data imposes new challenges because the data distribution is intrinsically sparse, due to a large number of messages post every day by using a wide vocabulary. Aimed at addressing this issue, generalized itemsets – sets of items at different abstraction levels – can be effectively mined and used to discover interesting multiple-level correlations among data supplied with taxonomies. Each generalized itemset is characterized by a correlation type (positive, negative, or null) according to the strength of the correlation among its items.This paper presents a novel data mining approach to supporting different and interesting targeted analysis – topic trend analysis, context-aware service profiling – by analyzing Twitter posts. We aim at discovering contrasting situations by means of generalized itemsets. Specifically, we focus on comparing itemsets discovered at different abstraction levels and we select large subsets of specific (descendant) itemsets that show correlation type changes with respect to their common ancestor. To this aim, a novel kind of pattern, namely the Strong Flipping Generalized Itemset (SFGI), is extracted from Twitter messages and contextual information supplied with taxonomy hierarchies. Each SFGI consists of a frequent generalized itemset X and the set of its descendants showing a correlation type change with respect to X.Experiments performed on both real and synthetic datasets demonstrate the effectiveness of the proposed approach in discovering interesting and hidden knowledge from Twitter data.  相似文献   

4.
The medical diagnosis system described here uses underlying knowledge in the isokinetic domain, obtained by combining the expertise of a physician specialised in isokinetic techniques and data mining techniques applied to a set of existing data. An isokinetic machine is basically a physical support on which patients exercise one of their joints, in this case the knee, according to different ranges of movement and at a constant speed. The data on muscle strength supplied by the machine are processed by an expert system that has built-in knowledge elicited from an expert in isokinetics. It cleans and pre-processes the data and conducts an intelligent analysis of the parameters and morphology of the isokinetic curves. Data mining methods based on the discovery of sequential patterns in time series and the fast Fourier transform, which identifies similarities and differences among exercises, were applied to the processed information to characterise injuries and discover reference patterns specific to populations. The results obtained were applied in two environments: one for the blind and another for elite athletes.  相似文献   

5.
基于AOI的客户行为分析方法   总被引:1,自引:0,他引:1  
结合数据立方体技术以及概念分层的分析方法,将面向属性的归纳方法(AOI)与K-means聚类算法相结合,应用于客户时序数据聚类分析中,使每一类客户都具有相似的时序特征.实验表明该方法(AOIGen)能够满足大数据量的客户行为分析要求,比其它方法具有直观、高效等特点.  相似文献   

6.
Mining association rules and mining sequential patterns both are to discover customer purchasing behaviors from a transaction database, such that the quality of business decision can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the association rules and sequential patterns from a large database, and users may be only interested in some information.

Moreover, the criteria of the discovered association rules and sequential patterns for the user requirements may not be the same. Many uninteresting information for the user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only interesting knowledge to them from a large database of customer transactions. In this paper, a data mining language is presented. From the data mining language, users can specify the interested items and the criteria of the association rules or sequential patterns to be discovered. Also, the efficient data mining techniques are proposed to extract the association rules and the sequential patterns according to the user requirements.  相似文献   


7.
The monopoly of state ownership of telecommunication industry in Taiwan was lifted in 1997. In choosing an ISP, pricing was and still is a main differentiating factor in the mind of customers; however, service quality has emerged as a major concern among users lately. Management of ISP has discovered that service quality is important not only for attracting new customers, but, more importantly, for retaining existing customers who may otherwise be lured away by lower fees. Hence, it is essential to develop a CRM system, which could help keeping existing customers and exploring further business opportunities at the same time. In this study, we, based on the IP traffic data, developed a CRM systematic approach for a major ISP company in Taiwan to enhance customers' longer-term loyalty. This approach employs CRISP-DM methodology, and applies Attribute-Oriented Induction as the mining technique to discover network usage behaviors of customers, which help management identify usage pattern and also pinpoint the time when usage is excessively heavy. The former allows management to make effective personal calls for services or maintenance, and the latter presents opportunities for management to offer personalized cares and advanced products. Pixel-oriented visualization is applied to improve the understanding of mining results.  相似文献   

8.
Updating generalized association rules with evolving taxonomies   总被引:2,自引:1,他引:1  
Mining generalized association rules among items in the presence of taxonomies has been recognized as an important model for data mining. Earlier work on mining generalized association rules, however, required the taxonomies to be static, ignoring the fact that the taxonomies of items cannot necessarily be kept unchanged. For instance, some items may be reclassified from one hierarchy tree to another for more suitable classification, abandoned from the taxonomies if they will no longer be produced, or added into the taxonomies as new items. Additionally, the analysts might have to dynamically adjust the taxonomies from different viewpoints so as to discover more informative rules. Under these circumstances, effectively updating the discovered generalized association rules is a crucial task. In this paper, we examine this problem and propose two novel algorithms, called Diff_ET and Diff_ET2, to update the discovered frequent itemsets. Empirical evaluation shows that the proposed algorithms are very effective and have good linear scale-up characteristics.  相似文献   

9.
Recently the coupling of proton transfer reaction ionization with a time-of-flight mass analyser (PTR-TOF-MS) has been proposed to realise a volatile organic compound (VOC) detector that overcomes the limitations in terms of time and mass resolution of the previous instrument based on a quadrupole mass analysers (PTR-Quad-MS). This opens new horizons for research and allows for new applications in fields where the rapid and sensitive monitoring and quantification of volatile organic compounds (VOCs) is crucial as, for instance, environmental sciences, food sciences and medicine. In particular, if coupled with appropriate data mining methods, it can provide a fast MS-nose system with rich analytical information. The main, perhaps even the only, drawback of this new technique in comparison to its precursor is related to the increased size and complexity of the data sets obtained. It appears that this is the main limitation to its full use and widespread application. Here we present and discuss a complete computer-based strategy for the data analysis of PTR-TOF-MS data from basic mass spectra handling, to the application of up-to date data mining methods. As a case study we apply the whole procedure to the classification of apple cultivars and clones, which was based on the distinctive profiles of volatile organic compound emissions.  相似文献   

10.
广域网中存在地理上分布的海量的各种数据,分析和处理这些数据需要利用高性能的分布式并行处理系统,网格能够满足这种要求.知识网格就是使用基本的网格服务(通信服务、信息服务、授权服务和资源管理服务)去建立特定的分布式并行知识发现工具和服务.结合知识网格特点,讨论了知识网格的体系结构和支持知识挖掘应用的服务集.运用分布式数据挖掘的元学习模型,给出了利用知识网格提供的知识挖掘服务实现分布式数据挖掘的过程.  相似文献   

11.
In contrast to routine knowledge and skills, flexible problem solving is associated with the ability to apply one’s knowledge structures in relatively new situations. In the absence of specific knowledge-based guidance, such processes could be very cognitively demanding. This paper suggests that learning flexible problem solving skills could be enhanced by explicitly instructing learners in generalized forms of schematic knowledge structures that are applicable to a greater variety of problems. The paper presents results of an experimental study that has investigated this approach in learning the operation of a technical device, and discusses implications of these results for the design of computer-based instruction.  相似文献   

12.
Profiling is not about data but about knowledge. It provides a crucial technology in a society that is flooded with noise and information. Profiling is another term for sophisticated pattern recognition, and the enabling technology for Ambient Intelligence. It confronts us with a new type of inductive knowledge, inferred by means of automated algorithms. To the extent that decisions that impact our lives are based on such knowledge, we need to develop the means to make this knowledge accessible for individual citizens and provide them with the legal and technological tools to anticipate and contest such knowledge or challenge its application.  相似文献   

13.
随着经济的高速发展,我们的社会进入了网络信息的时代,产生了数以万计的数据。在这些各色各样的数据背后隐藏着大量的信息。怎样从这些不同的数据中找出规律,发现对我们的生活有帮助的信息,这一问题,越来越多的受到人们的关注。而我们所讲的数据挖掘就是从大量的,不完全的,有各种声音的,模糊不清的,随机的实际应用数据中,提纯隐含在最里面的,人们原先并不清楚的,但是又是潜藏且有用信息和知识的过程。数据挖掘的技术其目的就是应对当今社会信息的爆炸,为大量信息的处理提供了科学和行之有效的手段。  相似文献   

14.
This paper argues for a return to fundamentals and for a balanced assessment of the contribution that Information Technology can make as we enter the new millennium. It argues that the field of Information Systems should no longer be distracted from its natural locus of concern and competence, or claim more than it can actually achieve. More specifically, and as a case in point, we eschew IT-enabled Knowledge Management, both in theory and in practice. We view Knowledge Management as the most recent in a long line of fads and fashions embraced by the Information Systems community that have little to offer. Rather, we argue for a refocusing of our attention back on the management ofdata, since IT processes data-notinformation and certainly notknowledge. In so doing, we develop a model that provides a tentative means of distinguishing between the terms. This model also forms the basis for on-going empirical research designed to test the efficacy of our argument in a number of case companies currently implementing ERP and Knowledge Management Systems.  相似文献   

15.
Gamma distributions are some of the most popular models for hydrological processes. In this paper, a very flexible family which contains the gamma distribution as a particular case is introduced. Evidence of flexibility is shown by examining the shape of its probability density function (pdf). A treatment of the mathematical properties is provided by deriving expressions for the n th moment. Estimation and simulation issues are also considered. Finally, a detailed application to drought data from the State of Nebraska is illustrated.  相似文献   

16.
数据挖掘是指从数据库的大量数据中揭示隐含的、先前未知的、潜在有用信息的非平凡的过程.使用可视化数据挖掘的技术从足球比赛的数据集中找到模式.这些模式可以在足球比赛中直接或间接地提供有益的见解,并在比赛中运用决策支持系统.  相似文献   

17.
广义不完备系统中的知识约简   总被引:1,自引:0,他引:1  
以同时具有丢失型和遗漏型未知属性值的广义不完备系统为研究对象,根据特征关系,讨论了广义不完备信息系统中的知识约简方法。在广义不完备目标信息系统中,引入了下、上近似分布约简的概念,并给出了相应的判定定理与辨识公式,最后用一个实例说明了此方法的有效性。  相似文献   

18.
The Semantic Web technologies are being increasingly used for exploiting relations between data. In addition, new tendencies of real-time systems, such as social networks, sensors, cameras or weather information, are continuously generating data. This implies that data and links between them are becoming extremely vast. Such huge quantity of data needs to be analyzed, processed, as well as stored if necessary. In this position paper, we will introduce recent work on Real-Time Business Intelligence combined with semantic data stream management. We will present underlying approaches such as continuous queries, data summarization and matching, and stream reasoning.  相似文献   

19.
Non-Gaussian spatial data are common in many sciences such as environmental sciences, biology and epidemiology. Spatial generalized linear mixed models (SGLMMs) are flexible models for modeling these types of data. Maximum likelihood estimation in SGLMMs is usually made cumbersome due to the high-dimensional intractable integrals involved in the likelihood function and therefore the most commonly used approach for estimating SGLMMs is based on the Bayesian approach. This paper proposes a computationally efficient strategy to fit SGLMMs based on the data cloning (DC) method suggested by Lele et al. (2007). This method uses Markov chain Monte Carlo simulations from an artificially constructed distribution to calculate the maximum likelihood estimates and their standard errors. In this paper, the DC method is adapted and generalized to estimate SGLMMs and some of its asymptotic properties are explored. Performance of the method is illustrated by a set of simulated binary and Poisson count data and also data about car accidents in Mashhad, Iran. The focus is inference in SGLMMs for small and medium data sets.  相似文献   

20.
提出了一种获取渔场知识的数据挖掘模型及知识表示方法。首先利用SVM(支持向量机)和模糊分类器从数据库中提取出有利于渔情预测的静态知识,然后通过可拓挖掘方法将静态知识转换为动态知识,最后采用本体构建技术对渔场的静态知识和动态知识进行表达,建立本体知识库。在上述方法的研究基础上,建立了以印度洋大眼金枪鱼为例的渔情预测原型系统。系统运行结果表明,提出的获取渔场知识的数据挖掘模型及知识表示方法是有效可行的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号