首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining interesting imperfectly sporadic rules   总被引:1,自引:0,他引:1  
Detecting association rules with low support but high confidence is a difficult data mining problem. To find such rules using approaches like the Apriori algorithm, minimum support must be set very low, which results in a large number of redundant rules. We are interested in sporadic rules; i.e. those that fall below a maximum support level but above the level of support expected from random coincidence. There are two types of sporadic rules: perfectly sporadic and imperfectly sporadic. Here we are more concerned about finding imperfectly sporadic rules, where the support of the antecedent as a whole falls below maximum support, but where items may have quite high support individually. In this paper, we introduce an algorithm called Mining Interesting Imperfectly Sporadic Rules (MIISR) to find imperfectly sporadic rules efficiently, e.g. fever, headache, stiff neckmeningitis. Our proposed method uses item constraints and coincidence pruning to discover these rules in reasonable time. This paper is an expanded version of Koh et al. [Advances in knowledge discovery and data mining: 10th Pacific-Asia Conference (PAKDD 2006), Singapore. Lecture Notes in Computer Science 3918, Springer, Berlin, pp 473–482]. Yun Sing Koh is currently a Ph.D. student at the Department of Computer Science, University of Otago, New Zealand. Her main research interest is in association rule mining with particular interest in generating hard-to-find association rules and interestingness measures. She holds a B.Sc. (Honours) degree in computer science and a Master’s degree in software engineering, both from the University of Malaya, Malaysia. Nathan Rountree has been a faculty member of the Department of Computer Science at the University of Otago, Dunedin, since 1999. His research interests are in the fields of data mining, artificial neural networks, and computer science education. He is also a consulting software engineer for Profiler Corporation, a Dunedin-based company specialising in data mining and knowledge discovery. Richard A. O’Keefe holds a B.Sc. (Honours) degree in mathematics and physics, majoring in statistics, and an M.Sc. degree in physics (underwater acoustics), both obtained from the University of Auckland, New Zealand. He received his Ph.D. degree in artificial intelligence from the University of Edinburgh. He is the author of “The Craft of Prolog’’ (MIT Press). Dr. O’Keefe is now a lecturer at the University of Otago, New Zealand. His computing interests include declarative programming languages, especially Prolog and Erlang; statistical applications, including data mining and information retrieval; and applications of logic. He is also a member of the editorial board of theory and practice of logic programming.  相似文献   

2.
Constraining and summarizing association rules in medical data   总被引:4,自引:4,他引:0  
Association rules are a data mining technique used to discover frequent patterns in a data set. In this work, association rules are used in the medical domain, where data sets are generally high dimensional and small. The chief disadvantage about mining association rules in a high dimensional data set is the huge number of patterns that are discovered, most of which are irrelevant or redundant. Several constraints are proposed for filtering purposes, since our aim is to discover only significant association rules and accelerate the search process. A greedy algorithm is introduced to compute rule covers in order to summarize rules having the same consequent. The significance of association rules is evaluated using three metrics: support, confidence and lift. Experiments focus on discovering association rules on a real data set to predict absence or existence of heart disease. Constraints are shown to significantly reduce the number of discovered rules and improve running time. Rule covers summarize a large number of rules by producing a succinct set of rules with high-quality metrics. Carlos Ordonez received a degree in applied mathematics (actuarial sciences) and an MS degree in computer science, both from the UNAM University, Mexico, in 1992 and 1996, respectively. He got a PhD degree in computer science from the Georgia Institute of Technology, USA, in 2000. Dr. Ordonez currently works for Teradata (NCR) conducting research on database and data mining technology. He has published more than 20 research articles and holds three patents. Norberto Ezquerra obtained his undergraduate degree in mathematics and physics from the University of South Florida, and his doctoral degree from Florida State University, USA. He is an associate professor at the College of Computing at the Georgia Institute of Technology and an adjunct faculty member in the Emory University School of Medicine. His research interests include computer graphics, computer vision in medicine, AI in medicine, modeling of physically based systems, medical informatics and telemedicine. He is associate editor of the IEEE Transactions on Medical Imaging Journal, and a member of the American Medical Informatics Association and the IEEE Engineering in Medicine Biology Society. Cesar A. Santana received his MD degree in 1984 from the Institute of Medical Science, in Havana, Cuba. In 1988, he finished his residency training in internal medicine, and in 1991, completed a fellowship in nuclear medicine in Havana, Cuba. Dr. Santana received a PhD in nuclear cardiology in 1996 from the Department of Cardiology of the Vall d' Hebron University Hospital in Barcelona, Spain. Dr. Santana is an assistant professor at the Emory University School of Medicine and conducts research in the Radiology Department at the Emory University Hospital.  相似文献   

3.
Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods. Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C. She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information Engineering at National Chung Cheng University. Her current research interests include database design, image processing and data mining techniques. S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database marketing, knowledge management, data mining, and decision support systems. Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science and Technology. He is currently an associate professor of information management department in the National Chung Hsing University at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas focus on data engineering, database techniques and information security. Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan, R.O.C. His research interests include data mining, XML, and database compression.  相似文献   

4.
The concept of Privacy-Preserving has recently been proposed in response to the concerns of preserving personal or sensible information derived from data mining algorithms. For example, through data mining, sensible information such as private information or patterns may be inferred from non-sensible information or unclassified data. There have been two types of privacy concerning data mining. Output privacy tries to hide the mining results by minimally altering the data. Input privacy tries to manipulate the data so that the mining result is not affected or minimally affected. For output privacy in hiding association rules, current approaches require hidden rules or patterns to be given in advance [10, 18–21, 24, 27]. This selection of rules would require data mining process to be executed first. Based on the discovered rules and privacy requirements, hidden rules or patterns are then selected manually. However, for some applications, we are interested in hiding certain constrained classes of association rules such as collaborative recommendation association rules [15, 22]. To hide such rules, the pre-process of finding these hidden rules can be integrated into the hiding process as long as the recommended items are given. In this work, we propose two algorithms, DCIS (Decrease Confidence by Increase Support) and DCDS (Decrease Confidence by Decrease Support), to automatically hiding collaborative recommendation association rules without pre-mining and selection of hidden rules. Examples illustrating the proposed algorithms are given. Numerical simulations are performed to show the various effects of the algorithms. Recommendations of appropriate usage of the proposed algorithms based on the characteristics of databases are reported. Leon Wang received his Ph.D. in Applied Mathematics from State University of New York at Stony Brook in 1984. From 1984 to 1987, he was an assistant professor in mathematics at University of New Haven, Connecticut, USA. From 1987 to 1994, he joined New York Institute of Technology as a research associate in the Electromagnetic Lab and assistant/associate professor in the Department of Computer Science. From 1994 to 2001, he joined I-Shou University in Taiwan as associate professor in the Department of Information Management. In 1996, he was the Director of Computing Center. From 1997 to 2000, he was the Chairman of Department of Information Management. In 2001, he was Professor and director of Library, all in I-Shou University. In 2002, he was Associate Professor and Chairman in Information Management at National University of Kaohsiung, Taiwan. In 2003, he rejoined New York Institute of Technology. Dr.Wang has published 33 journal papers, 102 conference papers, and 5 book chapters, in the areas of data mining, machine learning, expert systems, and fuzzy databases, etc. Dr. Wang is a member of IEEE, Chinese Fuzzy System Association Taiwan, Chinese Computer Association, and Chinese Information Management Association. Ayat Jafari received the Ph.D. degree from City University of New York. He has conducted considerable research in the areas of Computer Communication Networks, Local Area Networks, and Computer Network Security, and published many technical articles. His interests and expertise are in the area of Computer Networks, Signal Processing, and Digital Communications. He is currently the Chairman of the Computer Science and Electrical Engineering Department of New York Institute of Technology. Tzung-Pei Hong received his B.S. degree in chemical engineering from National Taiwan University in 1985, and his Ph.D. degree in computer science and information engineering from National Chiao-Tung University in 1992. He was a faculty at the Department of Computer Science in Chung-Hua Polytechnic Institute from 1992 to 1994, and at the Department of Information Management in I-Shou University from 1994 to 2001. He was in charge of the whole computerization and library planning for National University of Kaohsiung in Preparation from 1997 to 2000, and served as the first director of the library and computer center in National University of Kaohsiung from 2000 to 2001 and as the Dean of Academic Affairs from 2003 to 2006. He is currently a professor at the Department of Electrical Engineering and at the Department of Computer Science and Information Engineering. His current research interests include machine learning, data mining, soft computing, management information systems, and www applications. Springer  相似文献   

5.
This paper introduces a new algorithm of mining association rules.The algorithm RP counts the itemsets with different sizes in the same pass of scanning over the database by dividing the database into m partitions.The total number of pa sses over the database is only(k 2m-2)/m,where k is the longest size in the itemsets.It is much less than k .  相似文献   

6.
Graphs are increasingly becoming a vital source of information within which a great deal of semantics is embedded. As the size of available graphs increases, our ability to arrive at the embedded semantics grows into a much more complicated task. One form of important hidden semantics is that which is embedded in the edges of directed graphs. Citation graphs serve as a good example in this context. This paper attempts to understand temporal aspects in publication trends through citation graphs, by identifying patterns in the subject matters of scientific publications using an efficient, vertical association rule mining model. Such patterns can (a) indicate subject-matter evolutionary history, (b) highlight subject-matter future extensions, and (c) give insights on the potential effects of current research on future research. We highlight our major differences with previous work in the areas of graph mining, citation mining, and Web-structure mining, propose an efficient vertical data representation model, introduce a new subjective interestingness measure for evaluating patterns with a special focus on those patterns that signify strong associations between properties of cited papers and citing papers, and present an efficient algorithm for the purpose of discovering rules of interest followed by a detailed experimental analysis. Imad Rahal is a newly appointed assistant professor in the Department of Computer Science at the College of Saint Benedict ∣ Saint John's University, Collegeville, MN, and a Ph.D. candidate at North Dakota State University, Fargo, ND. In August 2003, he earned his master's degree in computer science from North Dakota State University. Prior to that, he graduated summa cum laude from the Lebanese American University, Beirut, Lebanon, in February 2001 with a bachelor's degree in computer science. Currently, he is completing the final requirements for his Ph.D. degree in computer science on an NSF ND-EPSCoR doctoral dissertation assistantship with August of 2005 as a projected completion date. He is very active in research, proposal writing, and publications; his research interests are largely in the broad areas of data mining, machine learning, databases, artificial intelligence, and bioinformatics. Dongmei Ren is working for the Database Technology Institute for z/OS, IBM Silicon Valley Lab, San Jose, CA, as a staff software engineer. She holds a Ph.D. degree from North Dakota State University, Fargo, ND, and master's and bachelor's degrees from TianJin University, TianJin, China. She has been a software engineer at DaTang Telecommunications, Beijing, China. Her areas of expertise are outlier analysis, data mining and knowledge discovery, database systems, machine learning, intelligent systems, wireless networks and bioinformatics. She has been awarded the Siemens Scholarship research enhancement for excellent performance in study and research. She is a member of ACM, IEEE. Weihua Wu is a network monitoring & managed services analyst at Hewlett-Packard Co. in Canada. He holds a master's degree from North Dakota State University and a bachelor's degree from Nanjing University, both in computer science. His research areas of interest include data mining, knowledge discovery, data warehousing, information technology, network security, and bioinformatics. He has participated in various projects supported by NSF, DARPA, NASA, USDA, and GSA grants. Anne Denton is an assistant professor in computer science at North Dakota State University. Her research interests are in data mining, knowledge discovery in scientific data, and bioinformatics. Specific interests include data mining of diverse data, in which objects are characterized by a variety of properties such as numerical and categorical attributes, graphs, sequences, time-dependent attributes, and others. She received her Ph.D. in physics from the University of Mainz, Germany, and her M.S. in computer science from North Dakota State University, Fargo, ND. Christopher Besemann received his M.Sc. in computer science from North Dakota State University in Fargo, ND, 2005. Currently, he works in data mining research topics including association mining and relational data mining with recent work in model integration as a research assistant. He is accepted under a fellowship program for Ph.D. study at North Dakota State University. William Perrizo is a professor of computer science at North Dakota State University. He holds a Ph.D. degree from the University of Minnesota, a master's degree from the University of Wisconsin and a bachelor's degree from St. John's University. He has been a research scientist at the IBM Advanced Business Systems Division and the U.S. Air Force Electronic Systems Division. His areas of expertise are data mining, knowledge discovery, database systems, distributed database systems, high speed computer and communications networks, precision agriculture and bioinformatics. He is a member of ISCA, ACM, IEEE, IAAA, and AAAS.  相似文献   

7.
The recent increase in HyperText Transfer Protocol (HTTP) traffic on the World Wide Web (WWW) has generated an enormous amount of log records on Web server databases. Applying Web mining techniques on these server log records can discover potentially useful patterns and reveal user access behaviors on the Web site. In this paper, we propose a new approach for mining user access patterns for predicting Web page requests, which consists of two steps. First, the Minimum Reaching Distance (MRD) algorithm is applied to find the distances between the Web pages. Second, the association rule mining technique is applied to form a set of predictive rules, and the MRD information is used to prune the results from the association rule mining process. Experimental results from a real Web data set show that our approach improved the performance over the existing Markov-model approach in precision, recall, and the reduction of user browsing time. Mei-Ling Shyu received her Ph.D. degree from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN in 1999, and three Master's degrees from Computer Science, Electrical Engineering, and Restaurant, Hotel, Institutional, and Tourism Management from Purdue University. She has been an Associate Professor in the Department of Electrical and Computer Engineering (ECE) at the University of Miami (UM), Coral Gables, FL, since June 2005, Prior to that, she was an Assistant Professor in ECE at UM dating from January 2000. Her research interests include data mining, multimedia database systems, multimedia networking, database systems, and security. She has authored and co-authored more than 120 technical papers published in various prestigious journals, refereed conference/symposium/workshop proceedings, and book chapters. She is/was the guest editor of several journal special issues. Choochart Haruechaiyasak received his Ph.D. degree from the Department of Electrical and Computer Engineering, University of Miami, in 2003 with the Outstanding Departmental Graduating Student award from the College of Engineering. After receiving his degree, he has joined the National Electronics and Computer Technology Center (NECTEC), located in Thailand Science Park, as a researcher in Information Research and Development Division (RDI). His current research interests include data/ text/ Web mining, Natural Language Processing, Information Retrieval, Search Engines, and Recommender Systems. He is currently leading a small group of researchers and programmer to develop an open-source search engine for Thai language. One of his objectives is to promote the use of data mining technology and other advanced applications in Information Technology in Thailand. He is also a visiting lecturer for Data Mining, Artificial Intelligence and Decision Support Systems courses in many universities in Thailand. Shu-Ching Chen received his Ph.D. from the School of Electrical and Computer Engineering at Purdue University, West Lafayette, IN, USA in December, 1998. He also received Master's degrees in Computer Science, Electrical Engineering, and Civil Engineering from Purdue University. He has been an Associate Professor in the School of Computing and Information Sciences (SCIS), Florida International University (FIU) since August, 2004. Prior to that, he was an Assistant Professor in SCIS at FIU dating from August, 1999. His main research interests include distributed multimedia database systems and multimedia data mining. Dr. Chen has authored and co-authored more than 140 research papers in journals, refereed conference/symposium/workshop proceedings, and book chapters. In 2005, he was awarded the IEEE Systems, Man, and Cybernetics Society's Outstanding Contribution Award. He was also awarded a University Outstanding Faculty Research Award from FIU in 2004, Outstanding Faculty Service Award from SCIS in 2004 and Outstanding Faculty Research Award from SCIS in 2002.  相似文献   

8.
In this paper, we explore extending association analysis to non-traditional types of patterns and non-binary data by generalizing the notion of confidence. We begin by describing a general framework that measures the strength of the connection between two association patterns by the extent to which the strength of one association pattern provides information about the strength of another. Although this framework can serve as the basis for designing or analyzing measures of association, the focus in this paper is to use the framework as the basis for extending the traditional concept of confidence to error-tolerant itemsets (ETIs) and continuous data. To that end, we provide two examples. First, we (1) describe an approach to defining confidence for ETIs that preserves the interpretation of confidence as an estimate of a conditional probability, and (2) show how association rules based on ETIs can have better coverage (at an equivalent confidence level) than rules based on traditional itemsets. Next, we derive a confidence measure for continuous data that agrees with the standard confidence measure when applied to binary transaction data. Further analysis of this result exposes some of the important issues involved in constructing a confidence measure for continuous data. Michael Steinbach earned the B.S. degree in mathematics, the M.S. degree in statistics, and the M.S. and Ph.D. degrees in computer science, all from the University of Minnesota. He also has held a variety of software engineering, analysis, and design positions in industry at Silicon Biology, Racotek, and NCR. Steinbach is currently a research associate in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. He is a co-author of the textbook,Introduction to Data Mining and has published numerous technical papers in peer-reviewed journals and conference proceedings. His research interests include data mining, statistics, and bioinformatics. He is a member of the IEEE and the ACM. Vipin Kumar is currently William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota. He received the B.E. degree in electronics and communication engineering from the University of Roorkee, India, in 1977, the M.E. degree in electronics engineering from Philips International Institute, Eindhoven, The Netherlands, in 1979, and the Ph.D. degree in computer science from the University of Maryland, College Park, in 1982. Kumar’s current research interests include high-performance computing and data mining. His research has resulted in the development of the concept of isoefficiency metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software for sparse matrix factorization (PSPASES), graph partitioning (METIS, ParMetis, hMetis), and dense hierarchical solvers. He has authored over 200 research articles, and has coedited or coauthored 9 books including the widely used text booksIntroduction to Parallel Computing andIntroduction to Data Mining, both published by Addison Wesley. Kumar has served as chair/co-chair for many conferences/workshops in the area of data mining and parallel computing, including theIEEE International Conference on Data Mining (2002) and the 15th International Parallel and Distributed Processing Symposium (2001). Currently, Kumar is the Chair of the steering committee of theSIAM International Conference on Data Mining, and a member of the steering committee of theIEEE International Conference on Data Mining. Kumar serves or has served on the editorial boards ofData Mining and Knowledge Discovery,Knowledge and Information Systems,IEEE Computational Intelligence Bulletin,Annual Review of Intelligent Informatics, Parallel Computing,Journal of Parallel and Distributed Computing,IEEE Transactions of Data and Knowledge Engineering (1993–1997),IEEE Concurrency (1997–2000), andIEEE Parallel and Distributed Technology (1995–1997). He is a Fellow of the ACM and IEEE and a member of SIAM.  相似文献   

9.
Efficient Incremental Maintenance of Frequent Patterns with FP-Tree   总被引:3,自引:0,他引:3       下载免费PDF全文
Mining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, efficient maintenance of the discovered patterns is crucial. Most existing methods need to scan the entire database repeatedly, which is an obvious disadvantage. In this paper, an efficient incremental mining algorithm, Incremental-Mining (IM), is proposed for maintenance of the frequent patterns when new incremental data come. Based on the frequent pattern tree (FP-tree) structure, IM gives a way to make the most of the things from the previous mining process, and requires scanning the original data once at most. Furthermore, IM can identify directly the differential set of frequent patterns, which may be more informative to users. Moreover, IM can deal with changing thresholds as well as changing data, thus provide a full maintenance scheme. IM has been implemented and the performance study shows it outperforms three other incremental algorithms: FUP, DB-tree and re-running frequent pattern growth (FP-growth).  相似文献   

10.
ARMiner: A Data Mining Tool Based on Association Rules   总被引:3,自引:0,他引:3       下载免费PDF全文
In this paper,ARM iner,a data mining tool based on association rules,is introduced.Beginning with the system architecture,the characteristics and functions are discussed in details,including data transfer,concept hierarchy generalization,mining rules with negative items and the re-development of the system.An example of the tool‘s application is also shown.Finally,Some issues for future research are presented.  相似文献   

11.
Peptide binding to Major Histocompatibility Complex (MHC) is a prerequisite for any T cell-mediated immune response. Predicting which peptides can bind to a specific MHC molecule is indispensable to minimizing the number of peptides required to synthesize, to the development of vaccines and immunotherapy of cancer, and to aiding to understand the specificity of T-cell mediated immunity. At present, although predictions based on machine learning methods have good prediction performance, they cannot acquire understandable knowledge and prediction performance can be further improved. Thereupon, the Rule Sets ENsemble (RSEN) algorithm, which takes advantage of diverse attribute and attribute value reduction algorithms based on rough set (RS) theory, is proposed as the initial trial to acquire understandable rules along with enhancement of prediction performance. Finally, the RSEN is applied to predict the peptides that bind to HLA-DR4(B1* 0401). Experimentation results show: (1) prepositional rules for predicting the peptides that bind to HLA-DR4 (B1* 0401) are obtained; (2) compared with individual RS-based algorithms, the RSEN has a significant decrease (13%–38%) in prediction error rate; (3) compared with the Back-Propagation Neural Networks (BPNN), prediction error rate of the RSEN decreases by 4%–16%. The acquired rules have been applied to help experts make molecules modeling. An Zeng received the Ph.D. degree in computer applications technology from South China University of Technology in 2005. Nowadays she is a lecturer at the Faculty of Computer of Guangdong University of Technology. Her research interests are data mining, bioinformatics, neural networks, artificial intelligence, and computational immunology. In these areas she has published over 20 technical papers in various prestigious journals or conference proceedings. She is a member of the IEEE. Contact her at the Faculty of Computer, Guangdong Univ. of Technology, University Town, PanYu District, Guangzhou, 510006, P.R. China. Dan Pan received the Ph.D. degree in circuits and systems from South China University of Technology in 2001. He is a senior engineer in Guangdong Mobile Communication Co. Ltd at present. His research interests are data mining, machine learning, bioinformatics, and data warehousing, and applications of business modeling and software engineering to computer-aided business operations systems, especially in the telecom industry. In these areas he has published over 30 technical papers in refereed journals or conference proceedings. As a member of the International Association of Science and Technology for Development (IASTED) technical committee on artificial intelligence and expert systems, he served a number of conferences and publications. He is a member of the IEEE. Contact him at Guangdong Mobile Communication Co. Ltd., 208 Yuexiu South Rd., Guangzhou, 510100, P.R. China. Jian-bin He received the M.E. in computer science from South China University of Technology in 2002. He now is a data mining consultant at Teradata division of NCR (China), supporting telecom carriers to do data mining in data warehouses for market research. His research interests include statistical learning, semi-supervised learning, spectral clustering, multi-relational data mining and their application to social science. Contact him at NCR(China) Co. Ltd., Unit 2306, Tower B, Center Plaza, 161 Linhexi Road, Guangzhou, 510620, P.R. China.  相似文献   

12.
A fast joint probabilistic data association (FJPDA) algorithm is proposed in tiffs paper. Cluster probability matrix is approximately calculated by a new method, whose elements βi^t(K) can be taken as evaluation functions. According to values of βi^t(K), N events with larger joint probabilities can be searched out as the events with guiding joint probabilities, tiros, the number of searching nodes will be greatly reduced. As a result, this method effectively reduces the calculation load and nnkes it possible to be realized on real-thne, Theoretical ,analysis and Monte Carlo simulation results show that this method is efficient.  相似文献   

13.
STAMP: A Model for Generating Adaptable Multimedia Presentations   总被引:1,自引:1,他引:0  
The STAMP model addresses the dynamic generation of multimedia presentations in the domain of Multimedia Web-based Information Systems. STAMP allows the presentation of multimedia data obtained from XML compatible data sources by means of query. Assuming that the size and the nature of the elements of information provided by a data source is not known a priori, STAMP proposes templates which describe the spatial, temporal, navigational structuration of multimedia presentations whose content varies. The instantiation of a template is done with respect to the set of spatial and temporal constraints associated with the delivery context. A set of adaptations preserving the initial intention of the presentation is proposed.Ioan Marius Bilasco is a Ph.D. student at the University Joseph Fourier in Grenoble, France, since 2003. He received his BS degree in Computer Science form the University Babes Bolyai in Cluj-Napoca, Romania and his MS degree in Computer Science from the University Joseph Fourier in Grenoble, France. He joined the LSR-IMAG Laboratory in Grenoble in 2001. His research interests include adaptability in Web-based Information Systems, 3D multimedia data modelling and mobile communications.Jérôme Gensel is an Assistant Professor at the University Pierre Mendès France in Grenoble, France, since 1996. He received his Ph.D. in 1995 from the University of Grenoble for his work on Constraint Programming and Knowledge Representation in the Sherpa project at the French National Institute of Computer Sciences and Automatics (INRIA). He joined the LSR-IMAG Laboratory in Grenoble in 2001. His research interests include adaptability and cooperation in Web-based Information Systems, multimedia data (especially video) modeling, semi-structured and object-based knowledge representation and constraint programming.Marlène Villanova-Oliver is an Assistant Professor at the University Pierre Mendès France in Grenoble, France, since 2003. In 1999, she received her MS degree in Computer Science from the University Joseph Fourier of Grenoble and the European Diploma of 3rd cycle in Management and Technology of Information Systems (MATIS). She received her Ph.D. in 2002 from the National Polytechnic Institute of Grenoble (INPG). She is a member of the LSR-IMAG Laboratory in Grenoble since 1998. Her research interests include adaptability in Web-based Information Systems, user modeling, adaptable Web Services.  相似文献   

14.
Mining frequent patterns with a frequent pattern tree (FP-tree in short) avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves much better performance and efficiency than Apriori-like algorithms. However, the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data is added to an existing database because two scans may be needed for not only the new data but also the existing data. In this research we propose a new data structure, the pattern tree (P-tree in short), and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data does not need to be re-scanned. Our experiments show that the P-tree method outperforms the FP-tree method by a factor up to an order of magnitude in large datasets. A preliminary version of this paper has been published in theProceedings of the 2002 IEEE International Conference on Data Mining (ICDM ’02), 629–632. Hao Huang: He is pursuing his Ph.D. degree in the Department of Computer Science at the University of Virginia. His research interests are Gird Computing, Data Mining and their applications in Bioinformatics. He received his M.S. in Computer Science from Colorado School of Mines in 2001. Xindong Wu, Ph.D.: He is Professor and Chair of the Department of Computer Science at the University of Vermont, USA. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM, and WWW. Dr. Wu is the Executive Editor (January 1, 1999-December 31, 2004) and an Honorary Editor-in-Chief (starting January 1, 2005) of Knowledge and Information Systems (a peer-reviewed archival journal published by Springer), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP), and the Chair of the IEEE Computer Society Technical Committee on Computational Intelligence (TCCI). He served as an Associate Editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE) between January 1, 2000 and December 31, 2003, and is the Editor-in-Chief of TKDE since January 1, 2005. He is the winner of the 2004 ACM SIGKDD Service Award. Richard Relue, Ph.D.: He received his Ph.D. in Computer Science from the Colorado School of Mines in 2003. His research interests include association rules in data mining, neural networks for automated classification, and artificial intelligence for robot navigation. He has been an Information Technology consultant since 1992, working with Ball Aerospace and Technology, Rational Software, Natural Fuels Corporation, and Western Interstate Commission for Higher Education (WICHE).  相似文献   

15.
Constrained frequent patterns and closed frequent patterns are two paradigms aimed at reducing the set of extracted patterns to a smaller, more interesting, subset. Although a lot of work has been done with both these paradigms, there is still confusion around the mining problem obtained by joining closed and constrained frequent patterns in a unique framework. In this paper, we shed light on this problem by providing a formal definition and a thorough characterisation. We also study computational issues and show how to combine the most recent results in both paradigms, providing a very efficient algorithm that exploits the two requirements (satisfying constraints and being closed) together at mining time in order to reduce the computation as much as possible. Francesco Bonchi received his Ph.D. in computer science from the University of Pisa in December 2003, with the thesis “Frequent Pattern Queries: Language and Optimizations”. Currently, he is a postdoc at the Institute of Information Science and Technologies (ISTI) of the Italian National Research Council in Pisa, where he is a member of the Knowledge Discovery and Delivery Laboratory. He has been a visiting fellow at the Kanwal Rekhi School of Information Technology, Indian Institute of Technology, Bombay (2000, 2001). His current research interests are data mining query language and Optimization, frequent pattern mining, privacy-preserving data mining, bioinformatics. He is one of the teachers of a course on data mining held at the faculty of Economics at the University of Pisa. He served as a referee at various national and international conferences on databases, data mining, logic programming and artificial intelligence. Claudio Lucchese received the Master Degree in Computer Science summa cum laude from Ca' Foscari University of Venice in October 2003. He is currently a Ph.D. student at the same university and Research Associate at the Institute of Information Science and Technologies (ISTI) of the Italian National Research Council in Pisa, where he is a member of the High Performance Computing Laboratory. He is mainly interested in frequent pattern mining, privacy-preserving data mining, and data mining techniques for information retrieval.  相似文献   

16.
Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit card fraud protection and sensor networking. One popular solution is to separate stream data into chunks, learn a base classifier from each chunk, and then integrate all base classifiers for effective classification. In this paper, we propose a new dynamic classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams. The proposed algorithm dynamically selects a single “best” classifier to classify each test instance at run time. Our scheme uses statistical information from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative studies demonstrate the efficiency and efficacy of our method. Such a DCS scheme appears to be promising in mining data streams with dramatic concept drifting or with a significant amount of noise, where the base classifiers are likely conflictive or have low confidence. A preliminary version of this paper was published in the Proceedings of the 4th IEEE International Conference on Data Mining, pp 305–312, Brighton, UK Xingquan Zhu received his Ph.D. degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent four months with Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From 2001 to 2002, he was a Postdoctoral Associate in the Department of Computer Science, Purdue University, West Lafayette, IN. He is currently a Research Assistant Professor in the Department of Computer Science, University of Vermont, Burlington, VT. His research interests include Data mining, machine learning, data quality, multimedia computing, and information retrieval. Since 2000, Dr. Zhu has published extensively, including over 40 refereed papers in various journals and conference proceedings. Xindong Wu is a Professor and the Chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, ICML, KDD, ICDM, and WWW, as well as 11 books and conference proceedings. Dr. Wu is the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He is the 2004 ACM SIGKDD Service Award winner. Ying Yang received her Ph.D. in Computer Science from Monash University, Australia in 2003. Following academic appointments at the University of Vermont, USA, she currently holds a Research Fellow at Monash University, Australia. Dr. Yang is recognized for contributions in the fields of machine learning and data mining. She has published many scientific papers and book chapters on adaptive learning, proactive mining, noise cleansing and discretization. Contact her at yyang@mail.csse.monash.edu.au.  相似文献   

17.
Classification is an important technique in data mining.The decision trees builty by most of the existing classification algorithms commonly feature over-branching,which will lead to poor efficiency in the subsequent classification period.In this paper,we present a new value-oriented classification method,which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible,based on the concepts of frequent-pattern-node and exceptive-child-node.The experiments show that while using relevant anal-ysis as pre-processing ,our classification method,without loss of accuracy,can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.  相似文献   

18.
Extensive studies have shown that mining microarray data sets is important in bioinformatics research and biomedical applications. In this paper, we explore a novel type of gene–sample–time microarray data sets that records the expression levels of various genes under a set of samples during a series of time points. In particular, we propose the mining of coherent gene clusters from such data sets. Each cluster contains a subset of genes and a subset of samples such that the genes are coherent on the samples along the time series. The coherent gene clusters may identify the samples corresponding to some phenotypes (e.g., diseases), and suggest the candidate genes correlated to the phenotypes. We present two efficient algorithms, namely the Sample-Gene Search and the GeneSample Search, to mine the complete set of coherent gene clusters. We empirically evaluate the performance of our approaches on both a real microarray data set and synthetic data sets. The test results have shown that our approaches are both efficient and effective to find meaningful coherent gene clusters. Daxin Jiang received the Ph.D. degree in computer science and engineering from the State University of New York at Buffalo in 2005. He received the B.S. degree in computer science from the University of Science and Technology of China. From 1998 to 2000, he was a M.S. student in Software Institute, Chinese Academy of Sciences. He is currently an assistant professor at the School of Computer Engineering, Nanyang Technology University, Singapore. His research interests include data mining, bioinformatics, machine learning, and information retrieval. Jian Pei received the Ph.D. degree in computing science from Simon Fraser University, Canada, in 2002, under Dr. Jiawei Han's supervision. He also received the B.Eng. and the M.Eng. degrees from Shanghai Jiao Tong University, China, in 1991 and 1993, respectively, both in Computer Science. He is currently an assistant professor of computing science at Simon Fraser University. His research interests include developing effective and efficient data analysis techniques for novel data intensive applications. He is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the National Science Foundation (NSF) of the United States. Since 2000, he has published over 70 research papers in refereed journals, conferences, and workshops, has served in the organization committees and the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD. Murali Ramanathan is an associate professor of pharmaceutical sciences and neurology. He received the B.Tech. (Honors) in chemical engineering from the Indian Institute of Technology, India, in 1983. After a 4-year stint in the chemical industry, he obtained the M.S. degree in chemical engineering from Iowa State University, Ames, IA, in 1987, and the Ph.D. degree in bioengineering from the University of California-San Francisco and University of California-Berkeley Joint Program in Bioengineering in 1994. Dr. Ramanathan research interests are primarily focused on the treatment of multiple sclerosis (MS), an inflammatory-demyelinating disease of the central nervous system that affects over 1 million patients worldwide. MS is a complex, variable disease that causes physical and cognitive disability and nearly 50% of patients diagnosed with MS are unable to walk after 15 years. The etiology and pathogenesis of MS remains poorly understood. Dr. Ramanathan's research interests include stochastic modeling of pharmaceutical systems and novel approaches to analyzing and using genetic and genomic data for improving patient care and optimizing therapy. Chuan Lin is currently a Ph.D. student in the Department of Computer Science and Engineering, State University of New York at Buffalo. She received the B.E. and the M.S. degrees in computer science and technology from Tsinghua University in China. Her research interests include bioinformatics, data mining, and machine learning. Chun Tang received the B.S. and M.S. degrees from Peking University, China, in 1996 and 1999, respectively, and the Ph.D. degree from State University of New York at Buffalo, USA, in 2005, all in computer science. Currently, she is a postdoctoral associate of Center for Medical Informatics, Yale University. Her research interests include bioinformatics, data mining, machine learning, database, and information retrieval. Aidong Zhang received the Ph.D. degree in computer science from Purdue University, West Lafayette, Indiana, in 1994. She was an assistant professor from 1994 to 1999, an associate professor from 1999 to 2002, and has been a professor since 2002 in the Department of Computer Science and Engineering at State University of New York at Buffalo. Her research interests include multimedia systems, content-based image retrieval, bioinformatics, and data mining. She is an author of over 140 research publications in these areas. Dr. Zhang's research has been funded by NSF, NIH, NIMA, and Xerox. Zhang serves on the editorial boards of International Journal of Bioinformatics Research and Applications (IJBRA), ACM Multimedia Systems, International Journal of Multimedia Tools and Applications, and International Journal of Distributed and Parallel Databases. She was the editor for ACM SIGMOD DiSC (Digital Symposium Collection) from 2001 to 2003. She was co-chair of the technical program committee for ACM Multimedia in 2001. She has also served on various conference program committees. Dr. Zhang is a recipient of the National Science Foundation CAREER award and SUNY Chancellor's Research Recognition award.  相似文献   

19.
As the amount of multimedia data is increasing day-by-day thanks to cheaper storage devices and increasing number of information sources, the machine learning algorithms are faced with large-sized datasets. When original data is huge in size small sample sizes are preferred for various applications. This is typically the case for multimedia applications. But using a simple random sample may not obtain satisfactory results because such a sample may not adequately represent the entire data set due to random fluctuations in the sampling process. The difficulty is particularly apparent when small sample sizes are needed. Fortunately the use of a good sampling set for training can improve the final results significantly. In KDD’03 we proposed EASE that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that EASE outperforms simple random sampling (SRS). In this paper we propose EASIER that extends EASE in two ways. (1) EASE is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. EASIER, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. (2) EASE was shown to work on IBM QUEST dataset which is a categorical count data set. EASIER, in addition, is shown to work on continuous data of images and audio features. We have successfully applied EASIER to image classification and audio event identification applications. Experimental results show that EASIER outperforms SRS significantly. Surong Wang received the B.E. and M.E. degree from the School of Information Engineering, University of Science and Technology Beijing, China, in 1999 and 2002 respectively. She is currently studying toward for the Ph.D. degree at the School of Computer Engineering, Nanyang Technological University, Singapore. Her research interests include multimedia data processing, image processing and content-based image retrieval. Manoranjan Dash obtained Ph.D. and M. Sc. (Computer Science) degrees from School of Computing, National University of Singapore. He has worked in academic and research institutes extensively and has published more than 30 research papers (mostly refereed) in various reputable machine learning and data mining journals, conference proceedings, and books. His research interests include machine learning and data mining, and their applications in bioinformatics, image processing, and GPU programming. Before joining School of Computer Engineering (SCE), Nanyang Technological University, Singapore, as Assistant Professor, he worked as a postdoctoral fellow in Northwestern University. He is a member of IEEE and ACM. He has served as program committee member of many conferences and he is in the editorial board of “International journal of Theoretical and Applied Computer Science.” Liang-Tien Chia received the B.S. and Ph.D. degrees from Loughborough University, in 1990 and 1994, respectively. He is an Associate Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. He has recently been appointed as Head, Division of Computer Communications and he also holds the position of Director, Centre for Multimedia and Network Technology. His research interests include image/video processing & coding, multimodal data fusion, multimedia adaptation/transmission and multimedia over the Semantic Web. He has published over 80 research papers.  相似文献   

20.
Many supervised machine learning tasks can be cast as multi-class classification problems. Support vector machines (SVMs) excel at binary classification problems, but the elegant theory behind large-margin hyperplane cannot be easily extended to their multi-class counterparts. On the other hand, it was shown that the decision hyperplanes for binary classification obtained by SVMs are equivalent to the solutions obtained by Fisher's linear discriminant on the set of support vectors. Discriminant analysis approaches are well known to learn discriminative feature transformations in the statistical pattern recognition literature and can be easily extend to multi-class cases. The use of discriminant analysis, however, has not been fully experimented in the data mining literature. In this paper, we explore the use of discriminant analysis for multi-class classification problems. We evaluate the performance of discriminant analysis on a large collection of benchmark datasets and investigate its usage in text categorization. Our experiments suggest that discriminant analysis provides a fast, efficient yet accurate alternative for general multi-class classification problems. Tao Li is currently an assistant professor in the School of Computer Science at Florida International University. He received his Ph.D. degree in Computer Science from University of Rochester in 2004. His primary research interests are: data mining, machine learning, bioinformatics, and music information retrieval. Shenghuo Zhu is currently a researcher in NEC Laboratories America, Inc. He received his B.E. from Zhejiang University in 1994, B.E. from Tsinghua University in 1997, and Ph.D degree in Computer Science from University of Rochester in 2003. His primary research interests include information retrieval, machine learning, and data mining. Mitsunori Ogihara received a Ph.D. in Information Sciences at Tokyo Institute of Technology in 1993. He is currently Professor and Chair of the Department of Computer Science at the University of Rochester. His primary research interests are data mining, computational complexity, and molecular computation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号