期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mining association rules from quantitative data

《Intelligent Data Analysis》1999,3(5):363-376

Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm. 相似文献

2.

Mining fuzzy association rules from uncertain data

Cheng-Hsiung Weng Yen-Liang Chen 《Knowledge and Information Systems》2010,23(2):129-152

Association rule mining is an important data analysis method that can discover associations within data. There are numerous previous studies that focus on finding fuzzy association rules from precise and certain data. Unfortunately, real-world data tends to be uncertain due to human errors, instrument errors, recording errors, and so on. Therefore, a question arising immediately is how we can mine fuzzy association rules from uncertain data. To this end, this paper proposes a representation scheme to represent uncertain data. This representation is based on possibility distributions because the possibility theory establishes a close connection between the concepts of similarity and uncertainty, providing an excellent framework for handling uncertain data. Then, we develop an algorithm to mine fuzzy association rules from uncertain data represented by possibility distributions. Experimental results from the survey data show that the proposed approach can discover interesting and valuable patterns with high certainty. 相似文献

3.

Mining fuzzy association rules from questionnaire data

Yen-Liang Chen Cheng-Hsiung Weng 《Knowledge》2009,22(1):46-56

Association rule mining is one of most popular data analysis methods that can discover associations within data. Association rule mining algorithms have been applied to various datasets, due to their practical usefulness. Little attention has been paid, however, on how to apply the association mining techniques to analyze questionnaire data. Therefore, this paper first identifies the various data types that may appear in a questionnaire. Then, we introduce the questionnaire data mining problem and define the rule patterns that can be mined from questionnaire data. A unified approach is developed based on fuzzy techniques so that all different data types can be handled in a uniform manner. After that, an algorithm is developed to discover fuzzy association rules from the questionnaire dataset. Finally, we evaluate the performance of the proposed algorithm, and the results indicate that our method is capable of finding interesting association rules that would have never been found by previous mining algorithms. 相似文献

4.

Mining significant association rules from uncertain data

Anshu Zhang Wenzhong Shi Geoffrey I. Webb 《Data mining and knowledge discovery》2016,30(4):928-963

In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions. 相似文献

5.

Mining fuzzy association rules from low-quality data

A. M. Palacios M. J. Gacto J. Alcalá-Fdez 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(5):883-901

Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily analyze the data and make good decisions regarding the domains concerned. Different studies have proposed methods for mining association rules from databases with crisp values. However, the data in many real-world applications have a certain degree of imprecision. In this paper we address this problem, and propose a new data-mining algorithm for extracting interesting knowledge from databases with imprecise data. The proposed algorithm integrates imprecise data concepts and the fuzzy apriori mining algorithm to find interesting fuzzy association rules in given databases. Experiments for diagnosing dyslexia in early childhood were made to verify the performance of the proposed algorithm. 相似文献

6.

Mining temporal interval relational rules from temporal data

Yong Joon Lee Author Vitae 《Journal of Systems and Software》2009,82(1):155-167

Temporal data mining is still one of important research topic since there are application areas that need knowledge from temporal data such as sequential patterns, similar time sequences, cyclic and temporal association rules, and so on. Although there are many studies for temporal data mining, they do not deal with discovering knowledge from temporal interval data such as patient histories, purchaser histories, and web logs etc. We propose a new temporal data mining technique that can extract temporal interval relation rules from temporal interval data by using Allen’s theory: a preprocessing algorithm designed for the generalization of temporal interval data and a temporal relation algorithm for mining temporal relation rules from the generalized temporal interval data. This technique can provide more useful knowledge in comparison with conventional data mining techniques. 相似文献

7.

Mining SOTs and dispatching rules from RFID-enabled real-time shopfloor production data

Ray Y. Zhong George Q. Huang Q. Y. Dai T. Zhang 《Journal of Intelligent Manufacturing》2014,25(4):825-843

Radio frequency identification (RFID) has been widely used in manufacturing field and created a ubiquitous production environment, where advanced production planning and scheduling (APS) might be enabled. Within such environment, APS usually requires standard operation times (SOTs) and dispatching rules which have been obtained from time studies or based on past experiences. Wide variations exist and frequently cause serious discrepancies in executing plans and schedules. This paper proposes a data mining approach to estimate realistic SOTs and unknown dispatching rules from RFID-enabled shopfloor production data. The approach is evaluated by real-world data from a collaborative company which has been used RFID technology for supporting its shopfloor production over seven years. The key impact factors on SOTs are quantitatively examined. A reference table with the mined precise and practical SOTs is established for typical operations and suitable dispatching rules are labled as managerial implicities, aiming at improving the quality and stability of production plans and schedules. 相似文献

8.

挖掘Web日志中的分类关联规则 总被引：1，自引：0，他引：1

下载免费PDF全文

林文龙刘业政姜元春焦宁《计算机工程与应用》2007,43(31):123-125

用户分类是Web访问模式挖掘研究的一个重要任务。提出一种应用关联分类技术对Web用户进行分类的方法：首先通过对Web日志文件预处理得到训练事务数据集,然后从该事务集中挖掘分类关联规则,并利用所挖掘的规则集构建了一个分类器,从而实现了根据用户访问历史对用户进行分类。相似文献

9.

Mining purchasing decision rules from service encounter data of retail chain stores

Fu-Ren Lin Rung-Wei Po Claudia Valeria Cruz Orellan 《Information Systems and E-Business Management》2011,9(2):193-221

In this explorative research, we aim to find the most important service experience variables that determine customer purchasing decision and the clerks’ influence on customers’ purchases. This study was conducted as a case study of a children’s apparel company, denoted Company L, which has 243 retail stores. Company L has implemented Point of Sale (POS) systems in its retail stores, and would like to know what functions could be added to induce storefront employees to deliver better customer service. We, therefore, focus on observing the services provided by storefront employees and their reflection on a customer’s purchasing decision in a retail store. The study generated decision trees via Weka, a data mining open source software platform, to analyze multiple data sources to (1) understand what makes a good service experience for a customer, (2) get explicit knowledge from service encounter information, and (3) externalize the tacit knowledge of storefront service experiences. These findings can be used to improve Company L’s POS system to guide storefront employees to learn from trained decision rules. Moreover, the company can internalize service experience knowledge by aggregating learned rules from the company’s retail stores. 相似文献

10.

Learning classification rules from data

《Computers & Mathematics with Applications》2003,45(4-5):737-748

We present ELEM2, a machine learning system that induces classification rules from a set of data based on a heuristic search over a hypothesis space. ELEM2 is distinguished from other rule induction systems in three aspects. First, it uses a new heuristtic function to guide the heuristic search. The function reflects the degree of relevance of an attribute-value pair to a target concept and leads to selection of the most relevant pairs for formulating rules. Second, ELEM2 handles inconsistent training examples by defining an unlearnable region of a concept based on the probability distribution of that concept in the training data. The unlearnable region is used as a stopping criterion for the concept learning process, which resolves conflicts without removing inconsistent examples. Third, ELEM2 employs a new rule quality measure in its post-pruning process to prevent rules from overfitting the data. The rule quality formula measures the extent to which a rule can discriminate between the positive and negative examples of a class. We describe features of ELEM2, its rule induction algorithm and its classification procedure. We report experimental results that compare ELEM2 with C4.5 and CN2 on a number of datasets. 相似文献

11.

Discovering validation rules from microbiological data

Evelina Lamma Fabrizio Riguzzi Sergio Storari Paola Mello Anna Nanetti 《New Generation Computing》2003,21(2):123-133

A huge amount of data is daily collected from clinical microbiology laboratories. These data concern the resistance or susceptibility of bacteria to tested antibiotics. Almost all microbiology laboratories follow standard antibiotic testing guidelines which suggest antibiotic test execution methods and result interpretation and validation (among them, those annually published by NCCLS^2,3). Guidelines basically specify, for each species, the antibiotics to be tested, how to interpret the results of tests and a list of exceptions regarding particular antibiotic test results. Even if these standards are quite assessed, they do not consider peculiar features of a given hospital laboratory, which possibly influence the antimicrobial test results, and the further validation process. In order to improve and better tailor the validation process, we have applied knowledge discovery techniques, and data mining in particular, to microbiological data with the purpose of discovering new validation rules, not yet included in NCCLS guidelines, but considered plausible and correct by interviewed experts. In particular, we applied the knowledge discovery process in order to find (association) rules relating to each other the susceptibility or resistance of a bacterium to different antibiotics. This approach is not antithetic, but complementary to that based on NCCLS rules: it proved very effective in validating some of them, and also in extending that compendium. In this respect, the new discovered knowledge has lead microbiologists to be aware of new correlations among some antimicrobial test results, which were previously unnoticed. Last but not least, the new discovered rules, taking into account the history of the considered laboratory, are better tailored to the hospital situation, and this is very important since some resistances to antibiotics are specific to particular, local hospital environments. Evelina Lamma, Ph.D.: She got her degree in Electrical Engineering at the University of Bologna in 1985, and her Ph.D. in Computer Science in 1990. Her research activity centers on logic programming languages, artificial intelligence and agent-based programming. She was co-organizers of the 3rd International Workshop on Extensions of Logic Programming ELP92, held in Bologna in February 1992, and of the 6th Italian Congress on Artificial Intelligence, held in Bologna in September 1999. She is a member of the Italian Association for Artificial Intelligence (AI^*IA), associated with ECCAI. Currently, she is Full Professor at the University of Ferrara, where she teaches Artificial Intelligence and Fondations of Computer Science. Fabrizio Riguzzi, Ph.D.: He is Assistant Professor at the Department of Engineering of the University of Ferrara, Italy. He received his Laurea from the University of Bologna in 1999. He joined the Department of Engineering of the University of Ferrara in 1999. He has been a visiting researcher at the University of Cyprus and at the New University of Lisbon. His research interests include: data mining (and in particular methods for learning from multirelational data), machine learning, belief revision, genetic algorithms and software engineering. Sergio Storari: He got his degree in Electrical Engineering at the University of Ferrara in 1998. His research activity centers on artificial intelligence, knowledge-based systems, data mining and multi-agent systems. He is a member of the Italian Association for Artificial Intelligence (AI^*IA), associated with ECCAI. Currently, he is attending the third year of Ph.D. course about “Study and application of Artificial Intelligence techniques for medical data analysis” at DEIS University of Bologna. Paola Mello, Ph.D.: She got her degree in Electrical Engineering at the University of Bologna in 1982, and her Ph.D. in Computer Science in 1988. Her research activity centers on knowledge representation, logic programming, artificial intelligence and knowledge-based systems. She was co-organizers of the 3rd International Workshop on Extensions of Logic Programming ELP92, held in Bologna in February 1992, and of the 6th Italian Congress on Artificial Intelligence, Held in Bologna in September 1999. She is a member of the Italian Association for Artificial Intelligence (AI^*IA), associated with ECCAI. Currently, she is Full Professor at the University of Bologna, where she teaches Artificial Intelligence and Fondations of Computer Science. Anna Nanetti: She got a degree in biologics sciences at the University of Bologna in 1974. Currently, she is an Academic Recearcher in the Microbiology section of the Clinical, Specialist and Experimental Medicine Department of the Faculty of Medicine and Surgery, University of Bologna. 相似文献

12.

Mining emotion-aware sequential rules at user-level from micro-blogs

Skenduli Marjana Prifti Biba Marenglen Loglisci Corrado Ceci Michelangelo Malerba Donato 《Journal of Intelligent Information Systems》2021,57(2):369-394

Journal of Intelligent Information Systems - Social Media have enabled users to keep inter-personal relationships, but also to voice personal sensations, emotions and feelings. The recent... 相似文献

13.

矢量空间数据库中关联规则的挖掘算法研究 总被引：2，自引：0，他引：2

厍向阳许五弟薛惠锋《计算机应用》2004,24(8):47-49

按照矢量空间数据的特点和空间数据挖掘的要求,以GIS的空间分析和空间数据处理为工具,探讨了矢量空间数据库中关联规则挖掘的数据处理方法,提出了关联规则的挖掘算法。最后,通过实例进行了验证。相似文献

14.

Mining functional dependencies from data

Hong Yao Howard J. Hamilton 《Data mining and knowledge discovery》2008,16(2):197-219

In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data. By exploiting Armstrong’s Axioms for functional dependencies, we identify equivalences among attributes, which can be used to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is, we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments. These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data. 相似文献

15.

Identification of fuzzy rules from learning data

《Annual Review in Automatic Programming》1994

相似文献

16.

Synthesizing high-frequency rules from different data sources 总被引：10，自引：0，他引：10

Xindong Wu Shichao Zhang 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(2):353-367

Many large organizations have multiple data sources, such as different branches of an interstate company. While putting all data together from different sources might amass a huge database for centralized processing, mining association rules at different data sources and forwarding the rules (rather than the original raw data) to the centralized company headquarter provides a feasible way to deal with multiple data source problems. In the meanwhile, the association rules at each data source may be required for that data source in the first instance, so association analysis at each data source is also important and useful. However, the forwarded rules from different data sources may be too many for the centralized company headquarter to use. This paper presents a weighting model for synthesizing high-frequency association rules from different data sources. There are two reasons to focus on high-frequency rules. First, a centralized company headquarter is interested in high-frequency rules because they are supported by most of its branches for corporate profitability. Second, high-frequency rules have larger chances to become valid rules in the union of all data sources. In order to extract high-frequency rules efficiently, a procedure of rule selection is also constructed to enhance the weighting model by coping with low-frequency rules. Experimental results show that our proposed weighting model is efficient and effective. 相似文献

17.

Mining fuzzy β-certain and β-possible rules from quantitative data based on the variable precision rough-set model

《Expert systems with applications》2007,32(1):223-232

The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Ziarko thus proposed the variable precision rough-set model to deal with noisy data and uncertain information. This model allowed for some degree of uncertainty and misclassification in the mining process. Conventionally, the mining algorithms based on the rough-set theory identify the relationships among data using crisp attribute values; however, data with quantitative values are commonly seen in real-world applications. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then calculates the fuzzy β-lower and the fuzzy β-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects. The paper thus extends the existing rough-set mining approaches to process quantitative data with tolerance of noise and uncertainty. 相似文献

18.

Mining interesting association rules from customer databases and transaction databases 总被引：1，自引：0，他引：1

Pauray S. M. Tsai Chien-Ming Chen 《Information Systems》2004,29(8):139-696

In this paper, we examine a new data mining issue of mining association rules from customer databases and transaction databases. The problem is decomposed into two subproblems: identifying all the large itemsets from the transaction database and mining association rules from the customer database and the large itemsets identified. For the first subproblem, we propose an efficient algorithm to discover all the large itemsets from the transaction database. Experimental results show that by our approach, the total execution time can be reduced significantly. For the second subproblem, a relationship graph is constructed according to the identified large itemsets from the transaction database and the priorities of condition attributes from the customer database. Based on the relationship graph, we present an efficient graph-based algorithm to discover interesting association rules embedded in the transaction database and the customer database. 相似文献

19.

Mining interesting imperfectly sporadic rules 总被引：1，自引：0，他引：1

Yun Sing Koh Nathan Rountree Richard A. O’Keefe 《Knowledge and Information Systems》2008,14(2):179-196

Detecting association rules with low support but high confidence is a difficult data mining problem. To find such rules using approaches like the Apriori algorithm, minimum support must be set very low, which results in a large number of redundant rules. We are interested in sporadic rules; i.e. those that fall below a maximum support level but above the level of support expected from random coincidence. There are two types of sporadic rules: perfectly sporadic and imperfectly sporadic. Here we are more concerned about finding imperfectly sporadic rules, where the support of the antecedent as a whole falls below maximum support, but where items may have quite high support individually. In this paper, we introduce an algorithm called Mining Interesting Imperfectly Sporadic Rules (MIISR) to find imperfectly sporadic rules efficiently, e.g. fever, headache, stiff neck → meningitis. Our proposed method uses item constraints and coincidence pruning to discover these rules in reasonable time. This paper is an expanded version of Koh et al. [Advances in knowledge discovery and data mining: 10th Pacific-Asia Conference (PAKDD 2006), Singapore. Lecture Notes in Computer Science 3918, Springer, Berlin, pp 473–482]. Yun Sing Koh is currently a Ph.D. student at the Department of Computer Science, University of Otago, New Zealand. Her main research interest is in association rule mining with particular interest in generating hard-to-find association rules and interestingness measures. She holds a B.Sc. (Honours) degree in computer science and a Master’s degree in software engineering, both from the University of Malaya, Malaysia. Nathan Rountree has been a faculty member of the Department of Computer Science at the University of Otago, Dunedin, since 1999. His research interests are in the fields of data mining, artificial neural networks, and computer science education. He is also a consulting software engineer for Profiler Corporation, a Dunedin-based company specialising in data mining and knowledge discovery. Richard A. O’Keefe holds a B.Sc. (Honours) degree in mathematics and physics, majoring in statistics, and an M.Sc. degree in physics (underwater acoustics), both obtained from the University of Auckland, New Zealand. He received his Ph.D. degree in artificial intelligence from the University of Edinburgh. He is the author of “The Craft of Prolog’’ (MIT Press). Dr. O’Keefe is now a lecturer at the University of Otago, New Zealand. His computing interests include declarative programming languages, especially Prolog and Erlang; statistical applications, including data mining and information retrieval; and applications of logic. He is also a member of the editorial board of theory and practice of logic programming. 相似文献

20.

模糊时序关联规则挖掘

崔晓军薛永生《计算机应用》2007,27(3):561-564

借助模糊概念和模糊运算，对时间区间的描述很容易实现。对于指定的日历模式，不同的时间区间可根据它们的隶属度具有不同的权重。在模糊日历代数基础上，结合增量挖掘和累进计数的思想，提出了一种基于模糊日历的模糊时序关联规则挖掘方法。理论分析和实验结果均表明，该算法是高效可行的。相似文献