首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Many continual range queries can be issued against data streams. To efficiently evaluate continual queries against a stream, a main memory-based query index with a small storage cost and a fast search time is needed, especially if the stream is rapid. In this paper, we study a CEI-based query index that meets both criteria for efficient processing of continual interval queries. This new query index is an indirect indexing approach. It centres around a set of predefined virtual containment-encoded intervals, or CEIs. The CEIs are used to first decompose query intervals and then perform efficient search operations. The CEIs are defined and labeled such that containment relationships among them are encoded in their IDs. The containment encoding makes decomposition and search operations efficient; from the encoding of the smallest CEI containing a data point, the encodings of other containing CEIs can be easily derived. Closed-form formulae for the bounds of the average index storage cost are derived. Simulations are conducted to evaluate the effectiveness of the CEI-based query index and to compare it with alternative approaches. The results show that the CEI-based query index significantly outperforms existing approaches in terms of both storage cost and search time. Kun-Lung Wu received the B.S. degree in electrical engineering from the National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in computer science from the University of Illinois at Urbana–Champaign. He is with the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Techniques Group. His current research interests include data streams, continual queries, mobile computing, Internet technologies and applications, database systems and distributed and parallel computing. He has published extensively and holds various patents in these areas. Dr. Wu is a Senior Member of the IEEE Computer Society and a member of the ACM. He was an Associate Editor for the IEEE Transactions on Knowledge and Data Engineering, 2000–2004. He was the general chair for the 3rd International Workshop on e-Commerce and Web-Based Information Systems (WECWIS 2001). He has served as an organising and program committee member on various conferences. He has received various IBM awards, including IBM Corporate Environmental Affair Excellence Award, Research Division Award and Invention Achievement Awards. He received a best paper award from IEEE EEE 2004. He is an IBM Master Inventor. Shyh-Kwei Chen received the B.S. degree in computer science and information engineering from National Taiwan University, Taipei, Taiwan, in 1983, the M.S. degree in computer science from the University of Minnesota, Minneapolis, in 1987, and the Ph.D. degree in computer science from University of Illinois at Urbana–Champaign, in 1994. Dr. Chen has been with the IBM Thomas J. Watson Research Center, Yorktown Heights, New York since October 1994, where he is currently a research staff member. His current research interests include XML, electronic commerce, business performance management, data engineering and compilers. He is a member of the ACM, the IEEE and the IEEE Computer Society. Philip S. Yu received the B.S. degree in electrical engineering from National Taiwan University, the M.S. and Ph.D. degrees in electrical engineering from Stanford University, and the M.B.A. degree from New York University. He is with the IBM Thomas J. Watson Research Center and is currently manager of the Software Tools and Techniques group. His research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing and performance modelling. Dr. Yu has published more than 400 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents. Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is an associate editor of ACM Transactions on Internet Technology. He is a member of the IEEE Data Engineering steering committee and is also on the steering committee of IEEE Conference on Data Mining. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001–2004), an editor and advisory board member of IEEE Transactions on Knowledge and Data Engineering and also a guest coeditor of the special issue on mining of databases. He had also served as an associate editor of Knowledge and Information Systems. In addition to serving as program committee member on various conferences, he was the program cochair of the 11th International Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, and the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, and the program chair of the 2nd International Workshop on Research Issues on Data Engineering: Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases and the 2nd International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chair of the 14th International Conference on Data Engineering and the general cochair of the 2nd IEEE International Conference on Data Mining. He has received several IBM honours, including two IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, two Research Division Awards and the 81st Plateau of Invention Achievement Awards. He received an Outstanding Contributions Award from IEEE International Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. Dr. Yu is an IBM Master Inventor and was recognised as one of the IBM's 10 top leading inventors in 1999.  相似文献   

We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept ofselective processing for load shedding. We allow stream tuples to be stored in the windows and shed excessive CPU load by performing the join operations, not on the entire set of tuples within the windows, but on a dynamically changing subset of tuples that are learned to be highly beneficial. We support such dynamic selective processing through three forms of runtimeadaptations: adaptation to input stream rates, adaptation to time correlation between the streams and adaptation to join directions. Our load shedding approach enables us to integrateutility-based load shedding withtime correlation-based load shedding. Indexes are used to further speed up the execution of stream joins. Experiments are conducted to evaluate our adaptive load shedding in terms of output rate and utility. The results show that our selective processing approach to load shedding is very effective and significantly outperforms the approach that drops tuples from the input streams. Bugra Gedik received the B.S. degree in C.S. from the Bilkent University, Ankara, Turkey, and the Ph.D. degree in C.S. from the College of Computing at the Georgia Institute of Technology, Atlanta, GA, USA. He is with the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Techniques Group. Dr. Gedik's research interests lie in data intensive distributed computing systems, spanning data-centric peer-to-peer overlay networks, mobile and sensor-based distributed data management systems, and distributed data stream processing systems. His research focus is on developing system-level architectures and techniques to address scalability problems in distributed continual query systems and applications. He is the recipient of the ICDCS 2003 best paper award. He has served in the program committees of several international conferences, such as ICDE, MDM, and CollaborateCom. Kun-Lung Wu received the B.S. degree in E.E. from the National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in C.S. both from the University of Illinois at Urbana-Champaign. He is with the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Techniques Group. His recent research interests include data streams, continual queries, mobile computing, Internet technologies and applications, database systems and distributed computing. He has published extensively and holds many patents in these areas. Dr. Wu is a Senior Member of the IEEE Computer Society and a member of the ACM. He is the Program Co-Chair for the IEEE Joint Conference on e-Commerce Technology (CEC 2007) and Enterprise Computing, e-Commerce and e-Services (EEE 2007). He was an Associate Editor for the IEEE Trans. on Knowledge and Data Engineering, 2000–2004. He was the general chair for the 3rd International Workshop on E-Commerce and Web-Based Information Systems (WECWIS 2001). He has served as an organizing and program committee member on various conferences. He has received various IBM awards, including IBM Corporate Environmental Affair Excellence Award, Research Division Award, and several Invention Achievement Awards. He received a best paper award from IEEE EEE 2004. He is an IBM Master Inventor. Philip S. Yu received the B.S. Degree in E.E. from National Taiwan University, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University. He is with the IBM Thomas J. Watson Research Center and currently manager of the Software Tools and Techniques group. His research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, and performance modeling. Dr. Yu has published more than 430 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents. Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is associate editors of ACM Transactions on the Internet Technology and ACM Transactions on Knowledge Discovery in Data. He is a member of the IEEE Data Engineering steering committee and is also on the steering committee of IEEE Conference on Data Mining. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001–2004), an editor, advisory board member and also a guest co-editor of the special issue on mining of databases. He had also served as an associate editor of Knowledge and Information Systems. In addition to serving as program committee member on various conferences, he will be serving as the general chair of 2006 ACM Conference on Information and Knowledge Management and the program chair of the 2006 joint conferences of the 8th IEEE Conference on E-Commerce Technology (CEC' 06) and the 3rd IEEE Conference on Enterprise Computing, E-Commerce and E-Services (EEE' 06). He was the program chair or co-chairs of the 11th IEEE Intl. Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, the 2nd IEEE Intl. Workshop on Research Issues on Data Engineering: Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases, and the 2nd IEEE Intl. Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chair of the 14th IEEE Intl. Conference on Data Engineering and the general co-chair of the 2nd IEEE Intl. Conference on Data Mining. He has received several IBM honors including 2 IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, 2 Research Division Awards and the 84th plateau of Invention Achievement Awards. He received an Outstanding Contributions Award from IEEE Intl. Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. Dr. Yu is an IBM Master Inventor. Ling Liu is an associate professor at the College of Computing at Georgia Tech. There, she directs the research programs in Distributed Data Intensive Systems Lab (DiSL), examining research issues and technical challenges in building large scale distributed computing systems that can grow without limits. Dr. Liu and the DiSL research group have been working on various aspects of distributed data intensive systems, ranging from decentralized overlay networks, exemplified by peer to peer computing, data grid computing, to mobile computing systems and location based services, sensor network computing, and enterprise computing systems. She has published over 150 international journal and conference articles. Her research group has produced a number of software systems that are either open sources or directly accessible online, among which the most popular ones are WebCQ and XWRAPElite. Dr. Liu is currently on the editorial board of several international journals, including IEEE Transactions on Knowledge and Data Engineering, International Journal of Very large Database systems (VLDBJ), International Journal of Web Services Research, and has chaired a number of conferences as a PC chair, a vice PC chair, or a general chair, including IEEE International Conference on Data Engineering (ICDE 2004, ICDE 2006, ICDE 2007), IEEE International Conference on Distributed Computing (ICDCS 2006), IEEE International Conference on Web Services (ICWS 2004). She is a recipient of IBM Faculty Award (2003, 2006). Dr. Liu's current research is partly sponsored by grants from NSF CISE CSR, ITR, CyberTrust, a grant from AFOSR, an IBM SUR grant, and an IBM faculty award.  相似文献   

This paper considers the problem of mining closed frequent itemsets over a data stream sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets will make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding window. The selected itemsets contain a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than representative algorithms for the sate-of-the-art approaches. Yun Chi is currently a Ph.D. student at the Department of Computer Science, UCLA. His main areas of research include database systems, data mining, and bioinformatics. For data mining, he is interested in mining labeled trees and graphs, mining data streams, and mining data with uncertainty. Haixun Wang is currently a research staff member at IBM T. J. Watson Research Center. He received the B.S. and the M.S. degree, both in computer science, from Shanghai Jiao Tong University in 1994 and 1996. He received the Ph.D. degree in computer science from the University of California, Los Angeles in 2000. He has published more than 60 research papers in referred international journals and conference proceedings. He is a member of the ACM, the ACM SIGMOD, the ACM SIGKDD, and the IEEE Computer Society. He has served in program committees of international conferences and workshops, and has been a reviewer for some leading academic journals in the database field. Philip S. Yureceived the B.S. Degree in electrical engineering from National Taiwan University, the M.S. and Ph.D. degrees in electrical engineering from Stanford University, and the M.B.A. degree from New York University. He is with the IBM Thomas J. Watson Research Center and currently manager of the Software Tools and Techniques group. His research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, and performance modeling. Dr. Yu has published more than 430 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents.Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is associate editors of ACM Transactions on the Internet Technology and ACM Transactions on Knowledge Discovery in Data. He is a member of the IEEE Data Engineering steering committee and is also on the steering committee of IEEE Conference on Data Mining. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001–2004), an editor, advisory board member and also a guest co-editor of the special issue on mining of databases. He had also served as an associate editor of Knowledge and Information Systems. In addition to serving as program committee member on various conferences, he will be serving as the general chairman of 2006 ACM Conference on Information and Knowledge Management and the program chairman of the 2006 joint conferences of the 8th IEEE Conference on E-Commerce Technology (CEC' 06) and the 3rd IEEE Conference on Enterprise Computing, E-Commerce and E-Services (EEE' 06). He was the program chairman or co-chairs of the 11th IEEE International Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, the 2nd IEEE International Workshop on Research Issues on Data Engineering:Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases, and the 2nd IEEE International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chairman of the 14th IEEE International Conference on Data Engineering and the general co-chairman of the 2nd IEEE International Conference on Data Mining. He has received several IBM honors including 2 IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, 2 Research Division Awards and the 84th plateau of Invention Achievement Awards. He received an Outstanding Contributions Award from IEEE International Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts" in 1999. Dr. Yu is an IBM Master Inventor. Richard R. Muntz is a Professor and past chairman of the Computer Science Department, School of Engineering and Applied Science, UCLA. His current research interests are sensor rich environments, multimedia storage servers and database systems, distributed and parallel database systems, spatial and scientific database systems, data mining, and computer performance evaluation. He is the author of over one hundred and fifty research papers.Dr. Muntz received the BEE from Pratt Institute in 1963, the MEE from New York University in 1966, and the Ph.D. in Electrical Engineering from Princeton University in 1969. He is a member of the Board of Directors for SIGMETRICS and past chairman of IFIP WG7.3 on performance evaluation. He was a member of the Corporate Technology Advisory Board at NCR/Teradata, a member of the Science Advisory Board of NASA's Center of Excellence in Space Data Information Systems, and a member of the Goddard Space Flight Center Visiting Committee on Information Technology. He recently chaired a National Research Council study on “The Intersection of Geospatial Information and IT” which was published in 2003. He was an associate editor for the Journal of the ACM from 1975 to 1980 and the Editor-in-Chief of ACM Computing Surveys from 1992 to 1995. He is a Fellow of the ACM and a Fellow of the IEEE.  相似文献   

On High Dimensional Projected Clustering of Data Streams   总被引:3,自引:0,他引:3  
The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which require only one pass over the data. Recently, single-scan, stream analysis methods have been proposed in this context. However, a lot of stream data is high-dimensional in nature. High-dimensional data is inherently more complex in clustering, classification, and similarity search. Recent research discusses methods for projected clustering over high-dimensional data sets. This method is however difficult to generalize to data streams because of the complexity of the method and the large volume of the data streams.In this paper, we propose a new, high-dimensional, projected data stream clustering method, called HPStream. The method incorporates a fading cluster structure, and the projection based clustering methodology. It is incrementally updatable and is highly scalable on both the number of dimensions and the size of the data streams, and it achieves better clustering quality in comparison with the previous stream clustering methods. Our performance study with both real and synthetic data sets demonstrates the efficiency and effectiveness of our proposed framework and implementation methods.Charu C. Aggarwal received his B.Tech. degree in Computer Science from the Indian Institute of Technology (1993) and his Ph.D. degree in Operations Research from the Massachusetts Institute of Technology (1996). He has been a Research Staff Member at the IBM T. J. Watson Research Center since June 1996. He has applied for or been granted over 50 US patents, and has published over 75 papers in numerous international conferences and journals. He has twice been designated Master Inventor at IBM Research in 2000 and 2003 for the commercial value of his patents. His contributions to the Epispire project on real time attack detection were awarded the IBM Corporate Award for Environmental Excellence in 2003. He has been a program chair of the DMKD 2003, chair for all workshops organized in conjunction with ACM KDD 2003, and is also an associate editor of the IEEE Transactions on Knowledge and Data Engineering Journal. His current research interests include algorithms, data mining, privacy, and information retrieval.Jiawei Han is a Professor in the Department of Computer Science at the University of Illinois at Urbana–Champaign. He has been working on research into data mining, data warehousing, stream and RFID data mining, spatiotemporal and multimedia data mining, biological data mining, social network analysis, text and Web mining, and software bug mining, with over 300 conference and journal publications. He has chaired or served in many program committees of international conferences and workshops, including ACM SIGKDD Conferences (2001 best paper award chair, 1996 PC co-chair), SIAM-Data Mining Conferences (2001 and 2002 PC co-chair), ACM SIGMOD Conferences (2000 exhibit program chair), International Conferences on Data Engineering (2004 and 2002 PC vice-chair), and International Conferences on Data Mining (2005 PC co-chair). He also served or is serving on the editorial boards for Data Mining and Knowledge Discovery, IEEE Transactions on Knowledge and Data Engineering, Journal of Computer Science and Technology, and Journal of Intelligent Information Systems. He is currently serving on the Board of Directors for the Executive Committee of ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). Jiawei has received three IBM Faculty Awards, the Outstanding Contribution Award at the 2002 International Conference on Data Mining, ACM Service Award (1999) and ACM SIGKDD Innovation Award (2004). He is an ACM Fellow (since 2003). He is the first author of the textbook “Data Mining: Concepts and Techniques” (Morgan Kaufmann, 2001).Jianyong Wang received the Ph.D. degree in computer science in 1999 from the Institute of Computing Technology, the Chinese Academy of Sciences. Since then, he ever worked as an assistant professor in the Department of Computer Science and Technology, Peking (Beijing) University in the areas of distributed systems and Web search engines (May 1999–May 2001), and visited the School of Computing Science at Simon Fraser University (June 2001–December 2001), the Department of Computer Science at the University of Illinois at Urbana-Champaign (December 2001–July 2003), and the Digital Technology Center and Department of Computer Science and Engineering at the University of Minnesota (July 2003–November 2004), mainly working in the area of data mining. He is currently an associate professor in the Department of Computer Science and Technology, Tsinghua University, Beijing, China.Philip S. Yuis the manager of the Software Tools and Techniques group at the IBM Thomas J. Watson Research Center. The current focuses of the project include the development of advanced algorithms and optimization techniques for data mining, anomaly detection and personalization, and the enabling of Web technologies to facilitate E-commerce and pervasive computing. Dr. Yu,s research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, disk arrays, computer architecture, performance modeling and workload analysis. Dr. Yu has published more than 340 papers in refereed journals and conferences. He holds or has applied for more than 200 US patents. Dr. Yu is an IBM Master Inventor.Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He will become the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering on Jan. 2001. He is an associate editor of ACM Transactions of the Internet Technology and also Knowledge and Information Systems Journal. He is a member of the IEEE Data Engineering steering committee. He also serves on the steering committee of IEEE Intl. Conference on Data Mining. He received an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts”. Philip S. Yu received the B.S. Degree in E.E. from National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University.  相似文献   

In some business applications such as trading management in financial institutions, it is required to accurately answer ad hoc aggregate queries over data streams. Materializing and incrementally maintaining a full data cube or even its compression or approximation over a data stream is often computationally prohibitive. On the other hand, although previous studies proposed approximate methods for continuous aggregate queries, they cannot provide accurate answers. In this paper, we develop a novel prefix aggregate tree (PAT) structure for online warehousing data streams and answering ad hoc aggregate queries. Often, a data stream can be partitioned into the historical segment, which is stored in a traditional data warehouse, and the transient segment, which can be stored in a PAT to answer ad hoc aggregate queries. The size of a PAT is linear in the size of the transient segment, and only one scan of the data stream is needed to create and incrementally maintain a PAT. Although the query answering using PAT costs more than the case of a fully materialized data cube, the query answering time is still kept linear in the size of the transient segment. Our extensive experimental results on both synthetic and real data sets illustrate the efficiency and the scalability of our design. Moonjung Cho is a Ph.D. candidate in the Department of Computer Science and Engineering at State University of New York at Buffalo. She obtained her Master from same university in 2003. She has industry experiences as associate researcher for 4 years. Her research interests are in the area of data mining, data warehousing and data cubing. She has received a full scholarship from Institute of Information Technology Assessment in Korea. Jian Pei received the Ph.D. degree in Computing Science from Simon Fraser University, Canada, in 2002. He is currently an Assistant Professor of Computing Science at Simon Fraser University, Canada. In 2002–2004, he was an Assistant Professor of Computer Science and Engineering at the State University of New York at Buffalo, USA. His research interests can be summarized as developing advanced data analysis techniques for emerging applications. Particularly, he is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) and National Science Foundation (NSF). He has published over 70 papers in refereed journals, conferences, and workshops, has served in the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD. Ke Wang received Ph.D from Georgia Institute of Technology. He is currently a professor at School of Computing Science, Simon Fraser University. Before joining Simon Fraser, he was an associate professor at National University of Singapore. He has taught in the areas of database and data mining. Ke Wang's research interests include database technology, data mining and knowledge discovery, machine learning, and emerging applications, with recent interests focusing on the end use of data mining. This includes explicitly modeling the business goal (such as profit mining, bio-mining and web mining) and exploiting user prior knowledge (such as extracting unexpected patterns and actionable knowledge). He is interested in combining the strengths of various fields such as database, statistics, machine learning and optimization to provide actionable solutions to real life problems. Ke Wang has published in database, information retrieval, and data mining conferences, including SIGMOD, SIGIR, PODS, VLDB, ICDE, EDBT, SIGKDD, SDM and ICDM. He is an associate editor of the IEEE TKDE journal and has served program committees for international conferences including DASFAA, ICDE, ICDM, PAKDD, PKDD, SIGKDD and VLDB.  相似文献   

Mining frequent patterns with a frequent pattern tree (FP-tree in short) avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves much better performance and efficiency than Apriori-like algorithms. However, the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data is added to an existing database because two scans may be needed for not only the new data but also the existing data. In this research we propose a new data structure, the pattern tree (P-tree in short), and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data does not need to be re-scanned. Our experiments show that the P-tree method outperforms the FP-tree method by a factor up to an order of magnitude in large datasets. A preliminary version of this paper has been published in theProceedings of the 2002 IEEE International Conference on Data Mining (ICDM ’02), 629–632. Hao Huang: He is pursuing his Ph.D. degree in the Department of Computer Science at the University of Virginia. His research interests are Gird Computing, Data Mining and their applications in Bioinformatics. He received his M.S. in Computer Science from Colorado School of Mines in 2001. Xindong Wu, Ph.D.: He is Professor and Chair of the Department of Computer Science at the University of Vermont, USA. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM, and WWW. Dr. Wu is the Executive Editor (January 1, 1999-December 31, 2004) and an Honorary Editor-in-Chief (starting January 1, 2005) of Knowledge and Information Systems (a peer-reviewed archival journal published by Springer), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP), and the Chair of the IEEE Computer Society Technical Committee on Computational Intelligence (TCCI). He served as an Associate Editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE) between January 1, 2000 and December 31, 2003, and is the Editor-in-Chief of TKDE since January 1, 2005. He is the winner of the 2004 ACM SIGKDD Service Award. Richard Relue, Ph.D.: He received his Ph.D. in Computer Science from the Colorado School of Mines in 2003. His research interests include association rules in data mining, neural networks for automated classification, and artificial intelligence for robot navigation. He has been an Information Technology consultant since 1992, working with Ball Aerospace and Technology, Rational Software, Natural Fuels Corporation, and Western Interstate Commission for Higher Education (WICHE).  相似文献   

The pairwise attribute noise detection algorithm   总被引:1,自引:3,他引:1  
Analyzing the quality of data prior to constructing data mining models is emerging as an important issue. Algorithms for identifying noise in a given data set can provide a good measure of data quality. Considerable attention has been devoted to detecting class noise or labeling errors. In contrast, limited research work has been devoted to detecting instances with attribute noise, in part due to the difficulty of the problem. We present a novel approach for detecting instances with attribute noise and demonstrate its usefulness with case studies using two different real-world software measurement data sets. Our approach, called Pairwise Attribute Noise Detection Algorithm (PANDA), is compared with a nearest neighbor, distance-based outlier detection technique (denoted DM) investigated in related literature. Since what constitutes noise is domain specific, our case studies uses a software engineering expert to inspect the instances identified by the two approaches to determine whether they actually contain noise. It is shown that PANDA provides better noise detection performance than the DM algorithm. Jason Van Hulse is a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. His research interests include data mining and knowledge discovery, machine learning, computational intelligence and statistics. He is a student member of the IEEE and IEEE Computer Society. He received the M.A. degree in mathematics from Stony Brook University in 2000, and is currently Director, Decision Science at First Data Corporation. Taghi M. Khoshgoftaar is a professor at the Department of Computer Science and Engineering, Florida Atlantic University, and the director of the Empirical Software Engineering and Data Mining and Machine Learning Laboratories. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, machine learning, and statistical modeling. He has published more than 300 refereed papers in these subjects. He has been a principal investigator and project leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the IEEE, the IEEE Computer Society, and IEEE Reliability Society. He served as the program chair and general chair of the IEEE International Conference on Tools with Artificial Intelligence in 2004 and 2005, respectively. Also, he has served on technical program committees of various international conferences, symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems. Haiying Huang received the M.S. degree in computer engineeringfrom Florida Atlantic University, Boca Raton, Florida, USA, in 2002. She is currently a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. Her research interests include software engineering, computational intelligence, data mining, software measurement, software reliability, and quality engineering.  相似文献   

Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit card fraud protection and sensor networking. One popular solution is to separate stream data into chunks, learn a base classifier from each chunk, and then integrate all base classifiers for effective classification. In this paper, we propose a new dynamic classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams. The proposed algorithm dynamically selects a single “best” classifier to classify each test instance at run time. Our scheme uses statistical information from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative studies demonstrate the efficiency and efficacy of our method. Such a DCS scheme appears to be promising in mining data streams with dramatic concept drifting or with a significant amount of noise, where the base classifiers are likely conflictive or have low confidence. A preliminary version of this paper was published in the Proceedings of the 4th IEEE International Conference on Data Mining, pp 305–312, Brighton, UK Xingquan Zhu received his Ph.D. degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent four months with Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From 2001 to 2002, he was a Postdoctoral Associate in the Department of Computer Science, Purdue University, West Lafayette, IN. He is currently a Research Assistant Professor in the Department of Computer Science, University of Vermont, Burlington, VT. His research interests include Data mining, machine learning, data quality, multimedia computing, and information retrieval. Since 2000, Dr. Zhu has published extensively, including over 40 refereed papers in various journals and conference proceedings. Xindong Wu is a Professor and the Chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, ICML, KDD, ICDM, and WWW, as well as 11 books and conference proceedings. Dr. Wu is the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He is the 2004 ACM SIGKDD Service Award winner. Ying Yang received her Ph.D. in Computer Science from Monash University, Australia in 2003. Following academic appointments at the University of Vermont, USA, she currently holds a Research Fellow at Monash University, Australia. Dr. Yang is recognized for contributions in the fields of machine learning and data mining. She has published many scientific papers and book chapters on adaptive learning, proactive mining, noise cleansing and discretization. Contact her at yyang@mail.csse.monash.edu.au.  相似文献   

A motion compensated lifting (MCLIFT) ramework for the 3D wavelet video coding is proposed in this paper,By using bi-directional motion compensation in each lifting step of the temporal direction,the video frames are effectively de-correlated,With the proper entropy coding and bit-stream packaging schemes,the MCLIFT wavelet video coder is scalable at frame rate and quality level .Experimental results show that the MCLIFT video coder outperforms the 3D wavelet video coder without motion by an average of 0.9-1.3dB,and outperforms MPEG-4 coder by an average of 0.2-0.6dB.  相似文献   

In this paper, we propose an agent architecture to improve flexibility of a videoconference system with strategy-centric adaptive QoS (Quality of Service) control mechanism. The proposed architecture realizes more flexibility by changing their QoS control strategies dynamically. To switch the strategies, system considers the properties of problems occurred on QoS and status of problem solving process. This architecture is introduced as a part of knowledge base of agent that deals with cooperation between software module of videoconference systems. We have implemented the mechanism, and our prototype system shows its capability of flexible problem solving against the QoS degradation, along with other possible problems within the given time limitation. Thus we confirmed that the proposed architecture can improve its flexibility of a videoconference system compared to traditional systems. Takuo Suganuma, Dr.Eng.: He is a research associate of Research Institute of Electrical Communication of Tohoku University. He received a Dr.Eng. degree from Chiba Institute of Technology in 1997. His research interests include agent-based computing and design methodology for distributed systems. He is a member of IPSJ, IEICE and IEEE. SungDoke Lee: He is a Ph.D. Student in the Graduate School of Information Sciences in Tohoku University. He received his MEng degree at Chonbuk National University, Korea in 1991. His research interests include Flexible Network and Knowledge of Agent. Tetsuo Kinoshita, Dr.Eng.: He is an associate professor of Research Institute of Electrical Communication of Tohoku University. He received a Dr.Eng. degree in information engineering from Tohoku University, Japan. His research interests include knowledge engineering, cooperative distributed processing and agent-based computing. He received the the IPSJ Best Paper Award in 1997, etc. He is a member of IPSJ, IEICE, JSAI, AAAI, ACM and IEEE. Norio Shiratori, Dr.Eng.: After receiving his Dr.Eng degree at Tohoku University, he joined the Research Institute of Electrical Communication of Tohoku University in 1977, and is now a professor at the same University. He has been engaged in research on distributed processing system, and flexible intelligent network. He received the 25th Anniversary of IPSJ Memorial Prize-Winning Paper Award in 1985, the 6th Telecommunications Advancement Foundation Incorporation Award in 1991, the Best Paper Award of ICOIN-9 in 1994, the IPSJ Best Paper Award in 1997, etc. He has been named a Fellow of the IEEE for his contributions to the field of computer communication networks.  相似文献   

The study on database technologies, or more generally, the technologies of data and information management, is an important and active research field. Recently, many exciting results have been reported. In this fast growing field, Chinese researchers play more and more active roles. Research papers from Chinese scholars, both in China and abroad,appear in prestigious academic forums.In this paper,we, nine young Chinese researchers working in the United States, present concise surveys and report our recent progress on the selected fields that we are working on.Although the paper covers only a small number of topics and the selection of the topics is far from balanced, we hope that such an effort would attract more and more researchers,especially those in China,to enter the frontiers of database research and promote collaborations. For the obvious reason, the authors are listed alphabetically, while the sections are arranged in the order of the author list.  相似文献   

Frequent itemset mining was initially proposed and has been studied extensively in the context of association rule mining. In recent years, several studies have also extended its application to transaction or document clustering. However, most of the frequent itemset based clustering algorithms need to first mine a large intermediate set of frequent itemsets in order to identify a subset of the most promising ones that can be used for clustering. In this paper, we study how to directly find a subset of high quality frequent itemsets that can be used as a concise summary of the transaction database and to cluster the categorical data. By exploring key properties of the subset of itemsets that we are interested in, we proposed several search space pruning methods and designed an efficient algorithm called SUMMARY. Our empirical results show that SUMMARY runs very fast even when the minimum support is extremely low and scales very well with respect to the database size, and surprisingly, as a pure frequent itemset mining algorithm it is very effective in clustering the categorical data and summarizing the dense transaction databases. Jianyong Wang received the Ph.D. degree in computer science in 1999 from the Institute of Computing Technology, the Chinese Academy of Sciences. Since then, he ever worked as an assistant professor in the Department of Computer Science and Technology, Peking (Beijing) University in the areas of distributed systems and Web search engines, and visited the School of Computing Science at Simon Fraser University, the Department of Computer Science at the University of Illinois at Urbana-Champaign, and the Digital Technology Center and the Department of Computer Science at the University of Minnesota, mainly working in the area of data mining. He is currently an associate professor of the Department of Computer Science and Technology at Tsinghua University, P.R. China. George Karypis received his Ph.D. degree in computer science at the University of Minnesota and he is currently an associate professor at the Department of Computer Science and Engineering at the University of Minnesota. His research interests spans the areas of parallel algorithm design, data mining, bioinformatics, information retrieval, applications of parallel processing in scientific computing and optimization, sparse matrix computations, parallel preconditioners, and parallel programming languages and libraries. His research has resulted in the development of software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph partitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering-based recommendation algorithms (SUGGEST), clustering high dimensional datasets (CLUTO), and finding frequent patterns in diverse datasets (PAFI). He has coauthored over ninety journal and conference papers on these topics and a book title “Introduction to Parallel Computing” (Publ. Addison Wesley, 2003, 2nd edition). In addition, he is serving on the program committees of many conferences and workshops on these topics and is an associate editor of the IEEE Transactions on Parallel and Distributed Systems.  相似文献   

Extensive studies have shown that mining microarray data sets is important in bioinformatics research and biomedical applications. In this paper, we explore a novel type of gene–sample–time microarray data sets that records the expression levels of various genes under a set of samples during a series of time points. In particular, we propose the mining of coherent gene clusters from such data sets. Each cluster contains a subset of genes and a subset of samples such that the genes are coherent on the samples along the time series. The coherent gene clusters may identify the samples corresponding to some phenotypes (e.g., diseases), and suggest the candidate genes correlated to the phenotypes. We present two efficient algorithms, namely the Sample-Gene Search and the GeneSample Search, to mine the complete set of coherent gene clusters. We empirically evaluate the performance of our approaches on both a real microarray data set and synthetic data sets. The test results have shown that our approaches are both efficient and effective to find meaningful coherent gene clusters. Daxin Jiang received the Ph.D. degree in computer science and engineering from the State University of New York at Buffalo in 2005. He received the B.S. degree in computer science from the University of Science and Technology of China. From 1998 to 2000, he was a M.S. student in Software Institute, Chinese Academy of Sciences. He is currently an assistant professor at the School of Computer Engineering, Nanyang Technology University, Singapore. His research interests include data mining, bioinformatics, machine learning, and information retrieval. Jian Pei received the Ph.D. degree in computing science from Simon Fraser University, Canada, in 2002, under Dr. Jiawei Han's supervision. He also received the B.Eng. and the M.Eng. degrees from Shanghai Jiao Tong University, China, in 1991 and 1993, respectively, both in Computer Science. He is currently an assistant professor of computing science at Simon Fraser University. His research interests include developing effective and efficient data analysis techniques for novel data intensive applications. He is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the National Science Foundation (NSF) of the United States. Since 2000, he has published over 70 research papers in refereed journals, conferences, and workshops, has served in the organization committees and the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD. Murali Ramanathan is an associate professor of pharmaceutical sciences and neurology. He received the B.Tech. (Honors) in chemical engineering from the Indian Institute of Technology, India, in 1983. After a 4-year stint in the chemical industry, he obtained the M.S. degree in chemical engineering from Iowa State University, Ames, IA, in 1987, and the Ph.D. degree in bioengineering from the University of California-San Francisco and University of California-Berkeley Joint Program in Bioengineering in 1994. Dr. Ramanathan research interests are primarily focused on the treatment of multiple sclerosis (MS), an inflammatory-demyelinating disease of the central nervous system that affects over 1 million patients worldwide. MS is a complex, variable disease that causes physical and cognitive disability and nearly 50% of patients diagnosed with MS are unable to walk after 15 years. The etiology and pathogenesis of MS remains poorly understood. Dr. Ramanathan's research interests include stochastic modeling of pharmaceutical systems and novel approaches to analyzing and using genetic and genomic data for improving patient care and optimizing therapy. Chuan Lin is currently a Ph.D. student in the Department of Computer Science and Engineering, State University of New York at Buffalo. She received the B.E. and the M.S. degrees in computer science and technology from Tsinghua University in China. Her research interests include bioinformatics, data mining, and machine learning. Chun Tang received the B.S. and M.S. degrees from Peking University, China, in 1996 and 1999, respectively, and the Ph.D. degree from State University of New York at Buffalo, USA, in 2005, all in computer science. Currently, she is a postdoctoral associate of Center for Medical Informatics, Yale University. Her research interests include bioinformatics, data mining, machine learning, database, and information retrieval. Aidong Zhang received the Ph.D. degree in computer science from Purdue University, West Lafayette, Indiana, in 1994. She was an assistant professor from 1994 to 1999, an associate professor from 1999 to 2002, and has been a professor since 2002 in the Department of Computer Science and Engineering at State University of New York at Buffalo. Her research interests include multimedia systems, content-based image retrieval, bioinformatics, and data mining. She is an author of over 140 research publications in these areas. Dr. Zhang's research has been funded by NSF, NIH, NIMA, and Xerox. Zhang serves on the editorial boards of International Journal of Bioinformatics Research and Applications (IJBRA), ACM Multimedia Systems, International Journal of Multimedia Tools and Applications, and International Journal of Distributed and Parallel Databases. She was the editor for ACM SIGMOD DiSC (Digital Symposium Collection) from 2001 to 2003. She was co-chair of the technical program committee for ACM Multimedia in 2001. She has also served on various conference program committees. Dr. Zhang is a recipient of the National Science Foundation CAREER award and SUNY Chancellor's Research Recognition award.  相似文献   

Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets. Shuting Xu received her PhD in Computer Science from the University of Kentucky in 2005. Dr. Xu is presently an Assistant Professor in the Department of Computer Information Systems at the Virginia State University. Her research interests include data mining and information retrieval, database systems, parallel, and distributed computing. Jun Zhang received a PhD from The George Washington University in 1997. He is an Associate Professor of Computer Science and Director of the Laboratory for High Performance Scientific Computing & Computer Simulation and Laboratory for Computational Medical Imaging & Data Analysis at the University of Kentucky. His research interests include computational neuroinformatics, data miningand information retrieval, large scale parallel and scientific computing, numerical simulation, iterative and preconditioning techniques for large scale matrix computation. Dr. Zhang is associate editor and on the editorial boards of four international journals in computer simulation andcomputational mathematics, and is on the program committees of a few international conferences. His research work has been funded by the U.S. National Science Foundation and the Department of Energy. He is recipient of the U.S. National Science Foundation CAREER Award and several other awards. Dianwei Han received an M.E. degree from Beijing Institute of Technology, Beijing, China, in 1995. From 1995to 1998, he worked in a Hitachi company(BHH) in Beijing, China. He received an MS degree from Lamar University, USA, in 2003. He is currently a PhD student in the Department of Computer Science, University of Kentucky, USA. His research interests include data mining and information retrieval, computational medical imaging analysis, and artificial intelligence. Jie Wang received the masters degree in Industrial Automation from Beijing University of Chemical Technology in 1996. She is currently a PhD student and a member of the Laboratory for High Performance Computing and Computer Simulation in the Department of Computer Science at the University of Kentucky, USA. Her research interests include data mining and knowledge discovery, information filtering and retrieval, inter-organizational collaboration mechanism, and intelligent e-Technology.  相似文献   

Building fast and accurate classifiers for large-scale databases is an important task in data mining. There is growing evidence that integrating classification and association rule mining can produce more efficient and accurate classifiers than traditional techniques. In this paper, the problem of producing rules with multiple labels is investigated, and we propose a multi-class, multi-label associative classification approach (MMAC). In addition, four measures are presented in this paper for evaluating the accuracy of classification approaches to a wide range of traditional and multi-label classification problems. Results for 19 different data sets from the UCI data collection and nine hyperheuristic scheduling runs show that the proposed approach is an accurate and effective classification technique, highly competitive and scalable if compared with other traditional and associative classification approaches. Fadi Abdeljaber Thabtah received a B.S. degree in Computer Science from Philadelphia University, Jordan, in 1997 and an M.S. degree in Computer Science from California State University, USA in 2001. From 1996 to 2001, he worked as professional in database programming and administration in United Insurance Ltd. in Amman. In 2002, he started his academic career and joined the Philadelphia University as a lecturer. He is currently a final graduate student at the Department of Computer Science, Bradford University, UK. He has published about seven scientific papers in the areas of data mining and machine learning. His research interests include machine learning, data mining, artificial intelligence and object-oriented databases. Peter Cowling is a Professor of Computing at the University of Bradford. He obtained M.A. and D.Phil. degrees from the University of Oxford. He leads the Modelling Optimisation Scheduling And Intelligent Control (MOSAIC) research centre (http://mosaic.ac), whose main research interests lie in the investigation and development of new modelling, optimisation, control and decision support technologies, which bridge the gap between theory and practice. Applications include production and personnel scheduling, intelligent game agents and data mining. He has published over 40 scientific papers in these areas and is active as a consultant to industry. Yonghong Peng's research areas include machine learning and data mining, and bioinformatics. He has published more than 35 scientific papers in related areas. Dr. Peng is a member of the IEEE and Computer Society, and has been a member of the programme committee of several conferences and workshops. Dr. Peng referees papers for several journals including the IEEE Trans. on Systems, Man and Cybernetics (part C), IEEE Trans. on Evolutionary Computation, Journal of Fuzzy Sets and Systems, Journal of Bioinformatics, and Journal of Data Mining and Knowledge Discovery, and is refereeing papers for several conferences.  相似文献   

Complete behavior of a communication protocol can be very large. It is worth investigating whether partial exploration of the behavior generates reasonable results. We present such a procedure which performs partial exploration using most-probable-first search. Some of the ideas used in this procedure are based on a convolutional decoding procedure due to Jelinek and a performance evaluation procedure due to Rudin. Multiple trees of protocol behavior are constructed. Some results on estimating the probability of encountering an unexplored state in a finite run of a protocol are also presented. Nicholas F. Maxemchuk received the B.S.E.E. degree from the City College of New York, NY, and the M.S.E.E. and Ph.D. degrees from the University of Pennsylvania, Philadelphia. He is the Head of the Distributed Systems Research Department at AT & T Bell Laboratories, Murray Hill, NJ, and has been at AT & T Bell Laboratories since 1976. Prior to joining Bell Laboratories he was at the RCA David Sarnoff Research Center in Princeton, NJ for eight years. Dr. Maxemchuk has been on the adjunct faculties of Columbia University and the University of Pennsylvania. He has been an advisor to the United Nations on data networking and has been on networking panels for the US Air Force and DARPA. He has served as the Editor for Data Communications for the IEEE Transactions on Communications, as a Guest Editor for the IEEE Journal on Selected Areas in Communications, and has been on the program committee for numerous conferences and workshops. He was awarded the RCA Laboratories Outstanding Achievement Award, the Bell Laboratories Distinguished Technical Staff Award, and the IEEE's 1985 and 1987 Leonard G. Abraham Prize Paper Award. Krishan Sabnani received a BSEE degree from Indian Institute of Technology, New Delhi, India and a PhD degree from Columbia University, New York, NY. In 1981, he joined AT & T Bell Laboratories after graduating from Columbia University. He is currently working in the Distributed Systems Research Department of AT & T Bell Laboratories. His major area of interest is communication protocols. Dr. Sabnani was a co-chairman of the Eighth International Symposium on Protocol Specification, Testing, and Verification held in Atlantic City, NJ during June 1988. He is currently an editor of the IEEE Transactions on Communications and of the IEEE Transactions on Computers. He has served on the program committees of several conferences. He is also a guest editor of two special issues of the Journal on Selected Areas in Communications (JSAC) and the Computer Networks and ISDN Systems Journal, respectively.  相似文献   

Advances in wireless and mobile computing environments allow a mobile user to access a wide range of applications. For example, mobile users may want to retrieve data about unfamiliar places or local life styles related to their location. These queries are called location-dependent queries. Furthermore, a mobile user may be interested in getting the query results repeatedly, which is called location-dependent continuous querying. This continuous query emanating from a mobile user may retrieve information from a single-zone (single-ZQ) or from multiple neighbouring zones (multiple-ZQ). We consider the problem of handling location-dependent continuous queries with the main emphasis on reducing communication costs and making sure that the user gets correct current-query result. The key contributions of this paper include: (1) Proposing a hierarchical database framework (tree architecture and supporting continuous query algorithm) for handling location-dependent continuous queries. (2) Analysing the flexibility of this framework for handling queries related to single-ZQ or multiple-ZQ and propose intelligent selective placement of location-dependent databases. (3) Proposing an intelligent selective replication algorithm to facilitate time- and space-efficient processing of location-dependent continuous queries retrieving single-ZQ information. (4) Demonstrating, using simulation, the significance of our intelligent selective placement and selective replication model in terms of communication cost and storage constraints, considering various types of queries. Manish Gupta received his B.E. degree in Electrical Engineering from Govindram Sakseria Institute of Technology & Sciences, India, in 1997 and his M.S. degree in Computer Science from University of Texas at Dallas in 2002. He is currently working toward his Ph.D. degree in the Department of Computer Science at University of Texas at Dallas. His current research focuses on AI-based software synthesis and testing. His other research interests include mobile computing, aspect-oriented programming and model checking. Manghui Tu received a Bachelor degree of Science from Wuhan University, P.R. China, in 1996, and a Master's Degree in Computer Science from the University of Texas at Dallas 2001. He is currently working toward the Ph.D. degree in the Department of Computer Science at the University of Texas at Dallas. Mr. Tu's research interests include distributed systems, wireless communications, mobile computing, and reliability and performance analysis. His Ph.D. research work focuses on the dependent and secure data replication and placement issues in network-centric systems. Latifur R. Khan has been an Assistant Professor of Computer Science department at University of Texas at Dallas since September 2000. He received his Ph.D. and M.S. degrees in Computer Science from University of Southern California (USC) in August 2000 and December 1996, respectively. He obtained his B.Sc. degree in Computer Science and Engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, in November of 1993. Professor Khan is currently supported by grants from the National Science Foundation (NSF), Texas Instruments, Alcatel, USA, and has been awarded the Sun Equipment Grant. Dr. Khan has more than 50 articles, book chapters and conference papers focusing in the areas of database systems, multimedia information management and data mining in bio-informatics and intrusion detection. Professor Khan has also served as a referee for database journals, conferences (e.g. IEEE TKDE, KAIS, ADL, VLDB) and he is currently serving as a program committee member for the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD2005), ACM 14th Conference on Information and Knowledge Management (CIKM 2005), International Conference on Database and Expert Systems Applications DEXA 2005 and International Conference on Cooperative Information Systems (CoopIS 2005), and is program chair of ACM SIGKDD International Workshop on Multimedia Data Mining, 2004. Farokh Bastani received the B.Tech. degree in Electrical Engineering from the Indian Institute of Technology, Bombay, and the M.S. and Ph.D. degrees in Computer Science from the University of California, Berkeley. He is currently a Professor of Computer Science at the University of Texas at Dallas. Dr. Bastani's research interests include various aspects of the ultrahigh dependable systems, especially automated software synthesis and testing, embedded real-time process-control and telecommunications systems and high-assurance systems engineering. Dr. Bastani was the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (IEEE-TKDE). He is currently an emeritus EIC of IEEE-TKDE and is on the editorial board of the International Journal of Artificial Intelligence Tools, the International Journal of Knowledge and Information Systems and the Springer-Verlag series on Knowledge and Information Management. He was the program cochair of the 1997 IEEE Symposium on Reliable Distributed Systems, 1998 IEEE International Symposium on Software Reliability Engineering, 1999 IEEE Knowledge and Data Engineering Workshop, 1999 International Symposium on Autonomous Decentralised Systems, and the program chair of the 1995 IEEE International Conference on Tools with Artificial Intelligence. He has been on the program and steering committees of several conferences and workshops and on the editorial boards of the IEEE Transactions on Software Engineering, IEEE Transactions on Knowledge and Data Engineering and the Oxford University Press High Integrity Systems Journal. I-Ling Yen received her B.S. degree from Tsing-Hua University, Taiwan, and her M.S. and Ph.D. degrees in Computer Science from the University of Houston. She is currently an Associate Professor of Computer Science at University of Texas at Dallas. Dr. Yen's research interests include fault-tolerant computing, security systems and algorithms, distributed systems, Internet technologies, E-commerce and self-stabilising systems. She has published over 100 technical papers in these research areas and received many research awards from NSF, DOD, NASA and several industry companies. She has served as Program Committee member for many conferences and Program Chair/Cochair for the IEEE Symposium on Application-Specific Software and System Engineering & Technology, IEEE High Assurance Systems Engineering Symposium, IEEE International Computer Software and Applications Conference, and IEEE International Symposium on Autonomous Decentralized Systems. She has also served as a guest editor for a theme issue of IEEE Computer devoted to high-assurance systems.  相似文献   

This paper deals with some new operators of genetic algorithms and demonstrates their effectiveness to the traveling salesman problem (TSP) and microarray gene ordering. The new operators developed are nearest fragment operator based on the concept of nearest neighbor heuristic, and a modified version of order crossover operator. While these result in faster convergence of Genetic Algorithm (GAs) in finding the optimal order of genes in microarray and cities in TSP, the nearest fragment operator can augment the search space quickly and thus obtain much better results compared to other heuristics. Appropriate number of fragments for the nearest fragment operator and appropriate substring length in terms of the number of cities/genes for the modified order crossover operator are determined systematically. Gene order provided by the proposed method is seen to be superior to other related methods based on GAs, neural networks and clustering in terms of biological scores computed using categorization of the genes. Shubhra Sankar Ray is a Visiting Research Fellow at the Center for Soft Computing Research: A National Facility, Indian Statistical Institute, Kolkata, India. He received the M.Sc. in Electronic Science and M.Tech in Radiophysics & Electronics from University of Calcutta, Kolkata, India, in 2000 and 2002, respectively. Till March 2006, he had been a Senior Research Fellow of the Council of Scientific and Industrial Research (CSIR), New Delhi, India, working at Machine Intelligence Unit, Indian Statistical Institute, India. His research interests include bioinformatics, evolutionary computation, neural networks, and data mining. Sanghamitra Bandyopadhyay is an Associate Professor at Indian Statistical Institute, Calcutta, India. She did her Bachelors in Physics and Computer Science in 1988 and 1992 respectively. Subsequently, she did her Masters in Computer Science from Indian Institute of Technology (IIT), Kharagpur in 1994 and Ph.D in Computer Science from Indian Statistical Institute, Calcutta in 1998. She has worked in Los Alamos National Laboratory, Los Alamos, USA, in 1997, as a graduate research assistant, in the University of New South Wales, Sydney, Australia, in 1999, as a post doctoral fellow, in the Department of Computer Science and Engineering, University of Texas at Arlington, USA, in 2001 as a faculty and researcher, and in the Department of Computer Science and Engineering, University of Maryland Baltimore County, USA, in 2004 as a visiting research faculty. Dr. Bandyopadhyay is the first recipient of Dr. Shanker Dayal Sharma Gold Medal and Institute Silver Medal for being adjudged the best all round post graduate performer in IIT, Kharagpur in 1994. She has received the Indian National Science Academy (INSA) and the Indian Science Congress Association (ISCA) Young Scientist Awards in 2000, as well as the Indian National Academy of Engineering (INAE) Young Engineers' Award in 2002. She has published over ninety articles in international journals, conference and workshop proceedings, edited books and journal special issues and served as the Program Co-Chair of the 1st International Conference on Pattern Recognition and Machine Intelligence, 2005, Kolkata, India, and as the Tutorial Co-Chair, World Congress on Lateral Computing, 2004, Bangalore, India. She is on the editorial board of the International Journal on Computational Intelligence. Her research interests include Evolutionary and Soft Computation, Pattern Recognition, Data Mining, Bioinformatics, Parallel & Distributed Systems and VLSI. Sankar K. Pal (www.isical.ac.in/∼sankar) is the Director and Distinguished Scientist of the Indian Statistical Institute. He has founded the Machine Intelligence Unit, and the Center for Soft Computing Research: A National Facility in the Institute in Calcutta. He received a Ph.D. in Radio Physics and Electronics from the University of Calcutta in 1979, and another Ph.D. in Electrical Engineering along with DIC from Imperial College, University of London in 1982. He worked at the University of California, Berkeley and the University of Maryland, College Park in 1986-87; the NASA Johnson Space Center, Houston, Texas in 1990-92 & 1994; and in US Naval Research Laboratory, Washington DC in 2004. Since 1997 he has been serving as a Distinguished Visitor of IEEE Computer Society (USA) for the Asia-Pacific Region, and held seve ral visiting positions in Hong Kong and Australian universities. Prof. Pal is a Fellow of the IEEE, USA, Third World Academy of Sciences, Italy, International Association for Pattern recognition, USA, and all the four National Academies for Science/Engineering in India. He is a co-author of thirteen books and about three hundred research publications in the areas of Pattern Recognition and Machine Learning, Image Processing, Data Mining and Web Intelligence, Soft Computing, Neural Nets, Genetic Algorithms, Fuzzy Sets, Rough Sets, and Bioinformatics. He has received the 1990 S.S. Bhatnagar Prize (which is the most coveted award for a scientist in India), and many prestigious awards in India and abroad including the 1999 G.D. Birla Award, 1998 Om Bhasin Award, 1993 Jawaharlal Nehru Fellowship, 2000 Khwarizmi International Award from the Islamic Republic of Iran, 2000–2001 FICCI Award, 1993 Vikram Sarabhai Research Award, 1993 NASA Tech Brief Award (USA), 1994 IEEE Trans. Neural Networks Outstanding Paper Award (USA), 1995 NASA Patent Application Award (USA), 1997 IETE-R.L. Wadhwa Gold Medal, the 2001 INSA-S.H. Zaheer Medal, and 2005-06 P.C. Mahalanobis Birth Centenary Award (Gold Medal) for Lifetime Achievement . Prof. Pal is an Associate Editor of IEEE Trans. Pattern Analysis and Machine Intelligence, IEEE Trans. Neural Networks [1994–98, 2003–06], Pattern Recognition Letters, Neurocomputing (1995–2005), Applied Intelligence, Information Sciences, Fuzzy Sets and Systems, Fundamenta Informaticae, Int. J. Computational Intelligence and Applications, and Proc. INSA-A; a Member, Executive Advisory Editorial Board, IEEE Trans. Fuzzy Systems, Int. Journal on Image and Graphics, and Int. Journal of Approximate Reasoning; and a Guest Editor of IEEE Computer.  相似文献   

Classification is an important technique in data mining.The decision trees builty by most of the existing classification algorithms commonly feature over-branching,which will lead to poor efficiency in the subsequent classification period.In this paper,we present a new value-oriented classification method,which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible,based on the concepts of frequent-pattern-node and exceptive-child-node.The experiments show that while using relevant anal-ysis as pre-processing ,our classification method,without loss of accuracy,can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号