首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 41 毫秒
1.
2.

The discovery of multi-level knowledge is important to allow queries at and across different levels of abstraction. While there are some similarities between our research and that of others in this area, the work reported in this paper does not directly involve databases and is differently motivated. Our research is interested in taking data in the form of rule-bases and finding multi-level knowledge. This paper describes our motivation, our preferred technique for acquiring the initial knowledge known as Ripple-Down Rules, the use of Formal Concept Analysis to develop an abstraction hierarchy, and our application of these ideas to knowledge bases from the domain of chemical pathology. We also provide an example of how the approach can be applied to other prepositional knowledge bases and suggest that it can be used as an additional phase to many existing data mining approaches.  相似文献   

3.
Data mining extracts implicit, previously unknown, and potentially useful information from databases. Many approaches have been proposed to extract information, and one of the most important ones is finding association rules. Although a large amount of research has been devoted to this subject, none of it finds association rules from directed acyclic graph (DAG) data. Without such a mining method, the hidden knowledge, if any, cannot be discovered from the databases storing DAG data such as family genealogy profiles, product structures, XML documents, task precedence relations, and course structures. In this article, we define a new kind of association rule in DAG databases called the predecessor–successor rule, where a node x is a predecessor of another node y if we can find a path in DAG where x appears before y. The predecessor–successor rules enable us to observe how the characteristics of the predecessors influence the successors. An approach containing four stages is proposed to discover the predecessor–successor rules. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 621–637, 2006.  相似文献   

4.
A value of a game v is a function which to each coalition S assigns the value v(S) of this coalition, meaning the expected pay–off for players in that coalition. A classical approach of von Neumann and Morgenstern [6] had set some formal requirements on v which contemporary theories of value adhere to. A Shapley value of the game with a value v [14] is a functional Φ giving for each player p the value Φp(v) estimating the expected pay-off of the player p in the game. Game as well as conflict theory have been given recently much attention on the part of rough and fuzzy set communities [11,8,1,4,7,2]. In particular, problems of plausible strategies [1] in conflicts as well as problems related to Shapley's value [3,2] have been addressed.We confront here the problem of estimating a value as well as Shapley's value of a game from a partial data about the game. We apply to this end the rough set ideas of approximations, defining the lower and the upper value of the game and, respectively, the lower and upper Shapley value. We also define a notion of an exact coalition, on which both values coincide giving the true value of the game; we investigate the structure of the family of exact sets showing its closeness on complements, disjoint sums, and intersections of coalitions covering the set of players. This work sets open a new area of rough set applications in mining constructs from data. The construct mined in this case are values as well as Shapley values of games.  相似文献   

5.
Geo-spatial data mining in the analysis of a demographic database   总被引:2,自引:0,他引:2  
Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting patterns not explicitly stored in spatial databases. The approaches usually followed in the analysis of geo-spatial data with the aim of knowledge discovery are essentially characterised by the development of new algorithms, which treat the position and extension of objects mainly through the manipulation of their co-ordinates. In this paper a new approach to this process is presented, where geographic identifiers give the positional aspects of geographic data. These identifiers are manipulated using qualitative reasoning principles, which allow for the inference of new spatial relations required for the data mining step of the knowledge discovery process. The analysis of a demographic database, with the proposed principles, enabled the discovery of patterns that are hidden in the explored geo-spatial and demographic data.Acknowledgements Our acknowledgment to NEPS (Núcleo de Estudos da População e Sociedade) of University of Minho, for making the demographic data available.  相似文献   

6.
Although knowledge discovery from large relational databases has gained popularity and its significance is well recognized, the prohibitive nature of the cost associated with extracting such knowledge, as well as the lack of suitable declarative query language support act as limiting factors. Surprisingly, little or no relational technology has yet been significantly exploited in data mining even though data often reside in relational tables. Consequently, no relational optimization has yet been possible for data mining. We exploit the transitive nature of large item sets and the so called anti-monotonicity property of support thresholds of large item sets to develop a natural least fixpoint operator for set oriented data mining from relational databases. The operator proposed has several advantages including optimization opportunities, and traditional candidate set free large item set generation. We present an SQL3 expression for association rule mining and discuss its mapping to the least fixpoint operator developed in this paper.  相似文献   

7.
Sequential Pattern Mining in Multi-Databases via Multiple Alignment   总被引:2,自引:0,他引:2  
To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequences databases with long patterns. Hence, ApproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases.  相似文献   

8.
Batch sequencing and cooperation   总被引:1,自引:0,他引:1  
Game theoretic analysis of sequencing situations has been restricted to manufacturing systems which consist of machines that can process only one job at a time. However, in many manufacturing systems, operations are carried out by batch machines which can simultaneously process multiple jobs. This paper aims to extend the game theoretical approach to the cost allocation problems arising from sequencing situations on systems that consist of batch machines. To analyze the allocation problem at hand, it focusses on the existence of core elements, convexity, and the Shapley value.  相似文献   

9.

While knowledge discovery in databases (KDD) is defined as an iterative sequence of the following steps: data pre-processing, data mining, and post data mining, a significant amount of research in data mining has been done, resulting in a variety of algorithms and techniques for each step. However, a single data-mining technique has not been proven appropriate for every domain and data set. Instead, several techniques may need to be integrated into hybrid systems and used cooperatively during a particular data-mining operation. That is, hybrid solutions are crucial for the success of data mining. This paper presents a hybrid framework for identifying patterns from databases or multi-databases. The framework integrates these techniques for mining tasks from an agent point of view. Based on the experiments conducted, putting different KDD techniques together into the agent-based architecture enables them to be used cooperatively when needed. The proposed framework provides a highly flexible and robust data-mining platform and the resulting systems demonstrate emergent behaviors although it does not improve the performance of individual KDD techniques.  相似文献   

10.
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns.In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.  相似文献   

11.
Online mining of fuzzy multidimensional weighted association rules   总被引:1,自引:1,他引:0  
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness. Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach. OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis. In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes).  相似文献   

12.
Building on the promise shown in game-based learning research, this paper explores methods for Game-Based Learning Assessments (GBLA) using a variety of educational data mining techniques (EDM). GBLA research examines patterns of behaviors evident in game data logs for the measurement of implicit learning—the development of unarticulated knowledge that is not yet expressible on a test or formal assessment. This paper reports on the study of two digital games showing how the combination of human coding with EDM has enabled researchers to measure implicit learning of Physics. In the game Impulse, researchers combined human coding of video with educational data mining to create a set of automated detectors of students' implicit understanding of Newtonian mechanics. For Quantum Spectre, an optics puzzle game, human coding of Interaction Networks was used to identify common student errors. Findings show that several of our measures of student implicit learning within these games were significantly correlated with improvements in external postassessments. Methods and detailed findings were different for each type of game. These results suggest GBLA shows promise for future work such as adaptive games and in-class, data-driven formative assessments, but design of the assessment mechanics must be carefully crafted for each game.  相似文献   

13.
Academics and practitioners have a common interest in the continuing development of methods and computer applications that support or perform knowledge-intensive engineering tasks. Operations management dysfunctions and lost production time are problems of enormous magnitude that impact the performance and quality of industrial systems as well as their cost of production. Association rule mining is a data mining technique used to find out useful and invaluable information from huge databases. This work develops a better conceptual base for improving the application of association rule mining methods to extract knowledge on operations and information management. The emphasis of the paper is on the improvement of the operations processes. The application example details an industrial experiment in which association rule mining is used to analyze the manufacturing process of a fully integrated provider of drilling products. The study reports some new interesting results with data mining and knowledge discovery techniques applied to a drill production process. Experiment’s results on real-life data sets show that the proposed approach is useful in finding effective knowledge associated to dysfunctions causes.  相似文献   

14.
The paper presents a variation of the EMAIL Game, originally proposed byRubinstein (American Economic Review, 1989), in which coordination ofthe more rewarding-risky joint course of actions is shown to obtain, evenwhen the relevant game is, at most, ``mutual knowledge.' In the exampleproposed, a mediator is introduced in such a way that two individualsare symmetrically informed, rather than asymmetrically as in Rubinstein,about the game chosen by nature. As long as the message failure probabilityis sufficiently low, with the upper bound being a function of the gamepayoffs, conditional beliefs in the opponent's actions can allow playersto choose a more rewarding-risky action. The result suggests that, forefficient coordination to obtain, the length of interactive knowledge onthe game, possibly up to ``almost common knowledge,' does not seem to bea major conceptual issue and that emphasis should be focused instead onthe communication protocol and an appropriate relationship between thereliability of communication channels and the payoffs at stake.  相似文献   

15.
Building fast and accurate classifiers for large-scale databases is an important task in data mining. There is growing evidence that integrating classification and association rule mining can produce more efficient and accurate classifiers than traditional techniques. In this paper, the problem of producing rules with multiple labels is investigated, and we propose a multi-class, multi-label associative classification approach (MMAC). In addition, four measures are presented in this paper for evaluating the accuracy of classification approaches to a wide range of traditional and multi-label classification problems. Results for 19 different data sets from the UCI data collection and nine hyperheuristic scheduling runs show that the proposed approach is an accurate and effective classification technique, highly competitive and scalable if compared with other traditional and associative classification approaches. Fadi Abdeljaber Thabtah received a B.S. degree in Computer Science from Philadelphia University, Jordan, in 1997 and an M.S. degree in Computer Science from California State University, USA in 2001. From 1996 to 2001, he worked as professional in database programming and administration in United Insurance Ltd. in Amman. In 2002, he started his academic career and joined the Philadelphia University as a lecturer. He is currently a final graduate student at the Department of Computer Science, Bradford University, UK. He has published about seven scientific papers in the areas of data mining and machine learning. His research interests include machine learning, data mining, artificial intelligence and object-oriented databases. Peter Cowling is a Professor of Computing at the University of Bradford. He obtained M.A. and D.Phil. degrees from the University of Oxford. He leads the Modelling Optimisation Scheduling And Intelligent Control (MOSAIC) research centre (http://mosaic.ac), whose main research interests lie in the investigation and development of new modelling, optimisation, control and decision support technologies, which bridge the gap between theory and practice. Applications include production and personnel scheduling, intelligent game agents and data mining. He has published over 40 scientific papers in these areas and is active as a consultant to industry. Yonghong Peng's research areas include machine learning and data mining, and bioinformatics. He has published more than 35 scientific papers in related areas. Dr. Peng is a member of the IEEE and Computer Society, and has been a member of the programme committee of several conferences and workshops. Dr. Peng referees papers for several journals including the IEEE Trans. on Systems, Man and Cybernetics (part C), IEEE Trans. on Evolutionary Computation, Journal of Fuzzy Sets and Systems, Journal of Bioinformatics, and Journal of Data Mining and Knowledge Discovery, and is refereeing papers for several conferences.  相似文献   

16.
This paper is concerned with the investigation of the relevance and suitability of the data mining approach to serial documents. Conceptually the paper is divided into three parts. The first part presents the salient features of data mining and its symbiotic relationship to data warehousing. In the second part of the paper, historical serial documents are introduced, and the Ottoman Tax Registers (Defters) are taken as a case study. Their conformance to the data mining approach is established in terms of structure, analysis and results. A high-level conceptual model for the Defters is also presented. The final part concludes with a brief consideration of the implication of data mining for historical research.  相似文献   

17.
Large databases are becoming increasingly common in civil infrastructure applications. Although it is relatively simple to specifically query these databases at a low level, more abstract questions like ‘How does the environment affect pavement cracking?’ are difficult to answer with traditional methods. Data mining techniques can provide a solution for learning abstract knowledge from civil infrastruc-ture databases. However, data mining needs to be performed within a systematic process to ensure correct and reproducible results. Many decisions must be made during this process, making it difficult for novice analysts to apply data mining techniques thoroughly. This paper presents an application of a knowledge discovery process to data collected for an ‘intelligent’ building. The knowledge discovery process is illustrated and explained through this case study. Additionally, we discuss the importance of this case study in the context of a research effort to develop an interactive guide for the knowledge discovery process.  相似文献   

18.
This article reports a study exploring motivations of Pokémon Game use, individual differences related to personality traits, and game habits. First, it analyzed Pokémon GO motivations through exploratory factor analysis (EFA) by administering online the Pokémon GO Motivational Scale to a group of Italian gamers (N = 560). Successively, a Confirmatory Factor Analysis (CFA) was conducted testing three factorial models of Pokémon Game motivations on a selected random sample (N = 310). Results showed a three-factor model of Pokémon GO Game motivations (i.e. Personal Needs, Social Needs and Recreation), accounting for 68.9% of total variance plus a general higher order factor that best fits the data. Individual differences in Pokémon GO motivations and personality traits have been explored showing that high involved Pokémon GO players are introverted, low agreeableness, and conscientiousness people, driven by personal social and recreational needs. Reciprocal influences on motivational involvement, personality, and game habits were discussed.  相似文献   

19.
空间数据库的聚类方法   总被引:4,自引:0,他引:4  
1 引言近年来,数据库的数量和单个数据库的容量都大大增长了。比如,空间物体数据库包括几十亿个望远镜图像,NASA地球观测系统每小时都会产生50GB的数据。这么大的数据量已经远远超出了人为分析解释的能力范围。数据库中的知识发现(KDD)是识别数据中有价值的、新的、潜在有用的、可理解的模式的一  相似文献   

20.
Today's database systems must deal with uncertainty in the data they store. Consequently, there is a strong need for mining probabilistic databases. Because probabilistic data in first normal form relations is redundant, existing mining techniques are inadequate for discovering probabilistic databases. This paper designs a new strategy for identifying potentially useful patterns in probabilistic databases. A dependent rule is thus identified in a probabilistic database, represented in the form X → Y with conditional probability matrix MY|X . This method uses an instance selection to increase efficiency, enabling us to reduce the search space. We evaluated the proposed technique, and our experimental results demonstrate that the approach is effective and efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号