共查询到20条相似文献,搜索用时 15 毫秒
1.
《Intelligent Data Analysis》1999,3(5):363-376
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm. 相似文献
2.
Association rule mining is an important data analysis method that can discover associations within data. There are numerous
previous studies that focus on finding fuzzy association rules from precise and certain data. Unfortunately, real-world data
tends to be uncertain due to human errors, instrument errors, recording errors, and so on. Therefore, a question arising immediately
is how we can mine fuzzy association rules from uncertain data. To this end, this paper proposes a representation scheme to
represent uncertain data. This representation is based on possibility distributions because the possibility theory establishes
a close connection between the concepts of similarity and uncertainty, providing an excellent framework for handling uncertain
data. Then, we develop an algorithm to mine fuzzy association rules from uncertain data represented by possibility distributions.
Experimental results from the survey data show that the proposed approach can discover interesting and valuable patterns with
high certainty. 相似文献
3.
Association rule mining is one of most popular data analysis methods that can discover associations within data. Association rule mining algorithms have been applied to various datasets, due to their practical usefulness. Little attention has been paid, however, on how to apply the association mining techniques to analyze questionnaire data. Therefore, this paper first identifies the various data types that may appear in a questionnaire. Then, we introduce the questionnaire data mining problem and define the rule patterns that can be mined from questionnaire data. A unified approach is developed based on fuzzy techniques so that all different data types can be handled in a uniform manner. After that, an algorithm is developed to discover fuzzy association rules from the questionnaire dataset. Finally, we evaluate the performance of the proposed algorithm, and the results indicate that our method is capable of finding interesting association rules that would have never been found by previous mining algorithms. 相似文献
4.
In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions. 相似文献
5.
A. M. Palacios M. J. Gacto J. Alcalá-Fdez 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(5):883-901
Data mining is most commonly used in attempts to induce association rules from databases which can help decision-makers easily
analyze the data and make good decisions regarding the domains concerned. Different studies have proposed methods for mining
association rules from databases with crisp values. However, the data in many real-world applications have a certain degree
of imprecision. In this paper we address this problem, and propose a new data-mining algorithm for extracting interesting
knowledge from databases with imprecise data. The proposed algorithm integrates imprecise data concepts and the fuzzy apriori
mining algorithm to find interesting fuzzy association rules in given databases. Experiments for diagnosing dyslexia in early
childhood were made to verify the performance of the proposed algorithm. 相似文献
6.
Yong Joon Lee Author Vitae 《Journal of Systems and Software》2009,82(1):155-167
Temporal data mining is still one of important research topic since there are application areas that need knowledge from temporal data such as sequential patterns, similar time sequences, cyclic and temporal association rules, and so on. Although there are many studies for temporal data mining, they do not deal with discovering knowledge from temporal interval data such as patient histories, purchaser histories, and web logs etc. We propose a new temporal data mining technique that can extract temporal interval relation rules from temporal interval data by using Allen’s theory: a preprocessing algorithm designed for the generalization of temporal interval data and a temporal relation algorithm for mining temporal relation rules from the generalized temporal interval data. This technique can provide more useful knowledge in comparison with conventional data mining techniques. 相似文献
7.
Ray Y. Zhong George Q. Huang Q. Y. Dai T. Zhang 《Journal of Intelligent Manufacturing》2014,25(4):825-843
Radio frequency identification (RFID) has been widely used in manufacturing field and created a ubiquitous production environment, where advanced production planning and scheduling (APS) might be enabled. Within such environment, APS usually requires standard operation times (SOTs) and dispatching rules which have been obtained from time studies or based on past experiences. Wide variations exist and frequently cause serious discrepancies in executing plans and schedules. This paper proposes a data mining approach to estimate realistic SOTs and unknown dispatching rules from RFID-enabled shopfloor production data. The approach is evaluated by real-world data from a collaborative company which has been used RFID technology for supporting its shopfloor production over seven years. The key impact factors on SOTs are quantitatively examined. A reference table with the mined precise and practical SOTs is established for typical operations and suitable dispatching rules are labled as managerial implicities, aiming at improving the quality and stability of production plans and schedules. 相似文献
8.
用户分类是Web访问模式挖掘研究的一个重要任务。提出一种应用关联分类技术对Web用户进行分类的方法:首先通过对Web日志文件预处理得到训练事务数据集,然后从该事务集中挖掘分类关联规则,并利用所挖掘的规则集构建了一个分类器,从而实现了根据用户访问历史对用户进行分类。 相似文献
9.
Fu-Ren Lin Rung-Wei Po Claudia Valeria Cruz Orellan 《Information Systems and E-Business Management》2011,9(2):193-221
In this explorative research, we aim to find the most important service experience variables that determine customer purchasing decision and the clerks’ influence on customers’ purchases. This study was conducted as a case study of a children’s apparel company, denoted Company L, which has 243 retail stores. Company L has implemented Point of Sale (POS) systems in its retail stores, and would like to know what functions could be added to induce storefront employees to deliver better customer service. We, therefore, focus on observing the services provided by storefront employees and their reflection on a customer’s purchasing decision in a retail store. The study generated decision trees via Weka, a data mining open source software platform, to analyze multiple data sources to (1) understand what makes a good service experience for a customer, (2) get explicit knowledge from service encounter information, and (3) externalize the tacit knowledge of storefront service experiences. These findings can be used to improve Company L’s POS system to guide storefront employees to learn from trained decision rules. Moreover, the company can internalize service experience knowledge by aggregating learned rules from the company’s retail stores. 相似文献
10.
《Computers & Mathematics with Applications》2003,45(4-5):737-748
We present ELEM2, a machine learning system that induces classification rules from a set of data based on a heuristic search over a hypothesis space. ELEM2 is distinguished from other rule induction systems in three aspects. First, it uses a new heuristtic function to guide the heuristic search. The function reflects the degree of relevance of an attribute-value pair to a target concept and leads to selection of the most relevant pairs for formulating rules. Second, ELEM2 handles inconsistent training examples by defining an unlearnable region of a concept based on the probability distribution of that concept in the training data. The unlearnable region is used as a stopping criterion for the concept learning process, which resolves conflicts without removing inconsistent examples. Third, ELEM2 employs a new rule quality measure in its post-pruning process to prevent rules from overfitting the data. The rule quality formula measures the extent to which a rule can discriminate between the positive and negative examples of a class. We describe features of ELEM2, its rule induction algorithm and its classification procedure. We report experimental results that compare ELEM2 with C4.5 and CN2 on a number of datasets. 相似文献
11.
Evelina Lamma Fabrizio Riguzzi Sergio Storari Paola Mello Anna Nanetti 《New Generation Computing》2003,21(2):123-133
A huge amount of data is daily collected from clinical microbiology laboratories. These data concern the resistance or susceptibility
of bacteria to tested antibiotics. Almost all microbiology laboratories follow standard antibiotic testing guidelines which
suggest antibiotic test execution methods and result interpretation and validation (among them, those annually published by
NCCLS2,3). Guidelines basically specify, for each species, the antibiotics to be tested, how to interpret the results of tests and
a list of exceptions regarding particular antibiotic test results. Even if these standards are quite assessed, they do not
consider peculiar features of a given hospital laboratory, which possibly influence the antimicrobial test results, and the
further validation process.
In order to improve and better tailor the validation process, we have applied knowledge discovery techniques, and data mining
in particular, to microbiological data with the purpose of discovering new validation rules, not yet included in NCCLS guidelines,
but considered plausible and correct by interviewed experts. In particular, we applied the knowledge discovery process in
order to find (association) rules relating to each other the susceptibility or resistance of a bacterium to different antibiotics.
This approach is not antithetic, but complementary to that based on NCCLS rules: it proved very effective in validating some
of them, and also in extending that compendium. In this respect, the new discovered knowledge has lead microbiologists to
be aware of new correlations among some antimicrobial test results, which were previously unnoticed. Last but not least, the
new discovered rules, taking into account the history of the considered laboratory, are better tailored to the hospital situation,
and this is very important since some resistances to antibiotics are specific to particular, local hospital environments.
Evelina Lamma, Ph.D.: She got her degree in Electrical Engineering at the University of Bologna in 1985, and her Ph.D. in Computer Science in 1990.
Her research activity centers on logic programming languages, artificial intelligence and agent-based programming. She was
co-organizers of the 3rd International Workshop on Extensions of Logic Programming ELP92, held in Bologna in February 1992,
and of the 6th Italian Congress on Artificial Intelligence, held in Bologna in September 1999. She is a member of the Italian
Association for Artificial Intelligence (AI*IA), associated with ECCAI. Currently, she is Full Professor at the University of Ferrara, where she teaches Artificial Intelligence
and Fondations of Computer Science.
Fabrizio Riguzzi, Ph.D.: He is Assistant Professor at the Department of Engineering of the University of Ferrara, Italy. He received his Laurea from
the University of Bologna in 1999. He joined the Department of Engineering of the University of Ferrara in 1999. He has been
a visiting researcher at the University of Cyprus and at the New University of Lisbon. His research interests include: data
mining (and in particular methods for learning from multirelational data), machine learning, belief revision, genetic algorithms
and software engineering.
Sergio Storari: He got his degree in Electrical Engineering at the University of Ferrara in 1998. His research activity centers on artificial
intelligence, knowledge-based systems, data mining and multi-agent systems. He is a member of the Italian Association for
Artificial Intelligence (AI*IA), associated with ECCAI. Currently, he is attending the third year of Ph.D. course about “Study and application of Artificial
Intelligence techniques for medical data analysis” at DEIS University of Bologna.
Paola Mello, Ph.D.: She got her degree in Electrical Engineering at the University of Bologna in 1982, and her Ph.D. in Computer Science in 1988.
Her research activity centers on knowledge representation, logic programming, artificial intelligence and knowledge-based
systems. She was co-organizers of the 3rd International Workshop on Extensions of Logic Programming ELP92, held in Bologna
in February 1992, and of the 6th Italian Congress on Artificial Intelligence, Held in Bologna in September 1999. She is a
member of the Italian Association for Artificial Intelligence (AI*IA), associated with ECCAI. Currently, she is Full Professor at the University of Bologna, where she teaches Artificial Intelligence
and Fondations of Computer Science.
Anna Nanetti: She got a degree in biologics sciences at the University of Bologna in 1974. Currently, she is an Academic Recearcher in
the Microbiology section of the Clinical, Specialist and Experimental Medicine Department of the Faculty of Medicine and Surgery,
University of Bologna. 相似文献
12.
Skenduli Marjana Prifti Biba Marenglen Loglisci Corrado Ceci Michelangelo Malerba Donato 《Journal of Intelligent Information Systems》2021,57(2):369-394
Journal of Intelligent Information Systems - Social Media have enabled users to keep inter-personal relationships, but also to voice personal sensations, emotions and feelings. The recent... 相似文献
13.
14.
In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data.
By exploiting Armstrong’s Axioms for functional dependencies, we identify equivalences among attributes, which can be used
to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective
pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked
is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine
algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is,
we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments.
These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data. 相似文献
15.
16.
Synthesizing high-frequency rules from different data sources 总被引:10,自引:0,他引:10
Many large organizations have multiple data sources, such as different branches of an interstate company. While putting all data together from different sources might amass a huge database for centralized processing, mining association rules at different data sources and forwarding the rules (rather than the original raw data) to the centralized company headquarter provides a feasible way to deal with multiple data source problems. In the meanwhile, the association rules at each data source may be required for that data source in the first instance, so association analysis at each data source is also important and useful. However, the forwarded rules from different data sources may be too many for the centralized company headquarter to use. This paper presents a weighting model for synthesizing high-frequency association rules from different data sources. There are two reasons to focus on high-frequency rules. First, a centralized company headquarter is interested in high-frequency rules because they are supported by most of its branches for corporate profitability. Second, high-frequency rules have larger chances to become valid rules in the union of all data sources. In order to extract high-frequency rules efficiently, a procedure of rule selection is also constructed to enhance the weighting model by coping with low-frequency rules. Experimental results show that our proposed weighting model is efficient and effective. 相似文献
17.
《Expert systems with applications》2007,32(1):223-232
The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Ziarko thus proposed the variable precision rough-set model to deal with noisy data and uncertain information. This model allowed for some degree of uncertainty and misclassification in the mining process. Conventionally, the mining algorithms based on the rough-set theory identify the relationships among data using crisp attribute values; however, data with quantitative values are commonly seen in real-world applications. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then calculates the fuzzy β-lower and the fuzzy β-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects. The paper thus extends the existing rough-set mining approaches to process quantitative data with tolerance of noise and uncertainty. 相似文献
18.
Mining interesting association rules from customer databases and transaction databases 总被引:1,自引:0,他引:1
In this paper, we examine a new data mining issue of mining association rules from customer databases and transaction databases. The problem is decomposed into two subproblems: identifying all the large itemsets from the transaction database and mining association rules from the customer database and the large itemsets identified. For the first subproblem, we propose an efficient algorithm to discover all the large itemsets from the transaction database. Experimental results show that by our approach, the total execution time can be reduced significantly. For the second subproblem, a relationship graph is constructed according to the identified large itemsets from the transaction database and the priorities of condition attributes from the customer database. Based on the relationship graph, we present an efficient graph-based algorithm to discover interesting association rules embedded in the transaction database and the customer database. 相似文献
19.
Mining interesting imperfectly sporadic rules 总被引:1,自引:0,他引:1
Yun Sing Koh Nathan Rountree Richard A. O’Keefe 《Knowledge and Information Systems》2008,14(2):179-196
Detecting association rules with low support but high confidence is a difficult data mining problem. To find such rules using
approaches like the Apriori algorithm, minimum support must be set very low, which results in a large number of redundant rules. We are interested in sporadic rules; i.e. those that fall below a maximum support level but above the level of support expected from random coincidence. There are two types of sporadic rules: perfectly
sporadic and imperfectly sporadic. Here we are more concerned about finding imperfectly sporadic rules, where the support
of the antecedent as a whole falls below maximum support, but where items may have quite high support individually. In this
paper, we introduce an algorithm called Mining Interesting Imperfectly Sporadic Rules (MIISR) to find imperfectly sporadic
rules efficiently, e.g. fever, headache, stiff neck → meningitis. Our proposed method uses item constraints and coincidence pruning to discover these rules in reasonable time. This paper
is an expanded version of Koh et al. [Advances in knowledge discovery and data mining: 10th Pacific-Asia Conference (PAKDD
2006), Singapore. Lecture Notes in Computer Science 3918, Springer, Berlin, pp 473–482].
Yun Sing Koh is currently a Ph.D. student at the Department of Computer Science, University of Otago, New Zealand. Her main research interest
is in association rule mining with particular interest in generating hard-to-find association rules and interestingness measures.
She holds a B.Sc. (Honours) degree in computer science and a Master’s degree in software engineering, both from the University
of Malaya, Malaysia.
Nathan Rountree has been a faculty member of the Department of Computer Science at the University of Otago, Dunedin, since 1999. His research
interests are in the fields of data mining, artificial neural networks, and computer science education. He is also a consulting
software engineer for Profiler Corporation, a Dunedin-based company specialising in data mining and knowledge discovery.
Richard A. O’Keefe holds a B.Sc. (Honours) degree in mathematics and physics, majoring in statistics, and an M.Sc. degree in physics (underwater
acoustics), both obtained from the University of Auckland, New Zealand. He received his Ph.D. degree in artificial intelligence
from the University of Edinburgh. He is the author of “The Craft of Prolog’’ (MIT Press). Dr. O’Keefe is now a lecturer at
the University of Otago, New Zealand. His computing interests include declarative programming languages, especially Prolog
and Erlang; statistical applications, including data mining and information retrieval; and applications of logic. He is also
a member of the editorial board of theory and practice of logic programming. 相似文献
20.
借助模糊概念和模糊运算,对时间区间的描述很容易实现。对于指定的日历模式,不同的时间区间可根据它们的隶属度具有不同的权重。在模糊日历代数基础上,结合增量挖掘和累进计数的思想,提出了一种基于模糊日历的模糊时序关联规则挖掘方法。理论分析和实验结果均表明,该算法是高效可行的。 相似文献