首页 | 本学科首页   官方微博 | 高级检索  
     


Constraining and summarizing association rules in medical data
Authors:Carlos Ordonez  Norberto Ezquerra  Cesar A. Santana
Affiliation:(1) Teradata, NCR, 17095 Via del Campo, San Diego, CA 92127, USA;(2) Georgia Institute of Technology, Atlanta, GA, USA;(3) Emory University Hospital, GA, USA
Abstract:Association rules are a data mining technique used to discover frequent patterns in a data set. In this work, association rules are used in the medical domain, where data sets are generally high dimensional and small. The chief disadvantage about mining association rules in a high dimensional data set is the huge number of patterns that are discovered, most of which are irrelevant or redundant. Several constraints are proposed for filtering purposes, since our aim is to discover only significant association rules and accelerate the search process. A greedy algorithm is introduced to compute rule covers in order to summarize rules having the same consequent. The significance of association rules is evaluated using three metrics: support, confidence and lift. Experiments focus on discovering association rules on a real data set to predict absence or existence of heart disease. Constraints are shown to significantly reduce the number of discovered rules and improve running time. Rule covers summarize a large number of rules by producing a succinct set of rules with high-quality metrics. Carlos Ordonez received a degree in applied mathematics (actuarial sciences) and an MS degree in computer science, both from the UNAM University, Mexico, in 1992 and 1996, respectively. He got a PhD degree in computer science from the Georgia Institute of Technology, USA, in 2000. Dr. Ordonez currently works for Teradata (NCR) conducting research on database and data mining technology. He has published more than 20 research articles and holds three patents. Norberto Ezquerra obtained his undergraduate degree in mathematics and physics from the University of South Florida, and his doctoral degree from Florida State University, USA. He is an associate professor at the College of Computing at the Georgia Institute of Technology and an adjunct faculty member in the Emory University School of Medicine. His research interests include computer graphics, computer vision in medicine, AI in medicine, modeling of physically based systems, medical informatics and telemedicine. He is associate editor of the IEEE Transactions on Medical Imaging Journal, and a member of the American Medical Informatics Association and the IEEE Engineering in Medicine Biology Society. Cesar A. Santana received his MD degree in 1984 from the Institute of Medical Science, in Havana, Cuba. In 1988, he finished his residency training in internal medicine, and in 1991, completed a fellowship in nuclear medicine in Havana, Cuba. Dr. Santana received a PhD in nuclear cardiology in 1996 from the Department of Cardiology of the Vall d' Hebron University Hospital in Barcelona, Spain. Dr. Santana is an assistant professor at the Emory University School of Medicine and conducts research in the Radiology Department at the Emory University Hospital.
Keywords:Association rules  Search constraint  Cover  Lift
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号