Quantifiable data mining using ratio rules |
| |
Authors: | Flip Korn Alexandros Labrinidis Yannis Kotidis Christos Faloutsos |
| |
Affiliation: | (1) AT&T Labs - Research, Florham Park, NJ 07932, USA; E-mail: flip@research.att.com, US;(2) University of Maryland, College Park, MD 20742, USA; E-mail: {labrinid,kotidis}@cs.umd.edu, US;(3) Carnegie Mellon University, Pittsburgh, PA 15213, USA; E-mail: christos@cs.cmu.edu, US |
| |
Abstract: | Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing
error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the
given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values
from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess”
the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting,
answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules
in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods
which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics,
biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in
binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than
using straightforward column averages.
Received: March 15, 1999 / Accepted: November 1, 1999 |
| |
Keywords: | :Data mining – Forecasting – Knowledge discovery – Guessing error |
本文献已被 SpringerLink 等数据库收录! |
|