首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The study on database technologies, or more generally, the technologies of data and information management, is an important and active research field. Recently, many exciting results have been reported. In this fast growing field, Chinese researchers play more and more active roles. Research papers from Chinese scholars, both in China and abroad,appear in prestigious academic forums.In this paper,we, nine young Chinese researchers working in the United States, present concise surveys and report our recent progress on the selected fields that we are working on.Although the paper covers only a small number of topics and the selection of the topics is far from balanced, we hope that such an effort would attract more and more researchers,especially those in China,to enter the frontiers of database research and promote collaborations. For the obvious reason, the authors are listed alphabetically, while the sections are arranged in the order of the author list.  相似文献   

2.
Extensive studies have shown that mining microarray data sets is important in bioinformatics research and biomedical applications. In this paper, we explore a novel type of gene–sample–time microarray data sets that records the expression levels of various genes under a set of samples during a series of time points. In particular, we propose the mining of coherent gene clusters from such data sets. Each cluster contains a subset of genes and a subset of samples such that the genes are coherent on the samples along the time series. The coherent gene clusters may identify the samples corresponding to some phenotypes (e.g., diseases), and suggest the candidate genes correlated to the phenotypes. We present two efficient algorithms, namely the Sample-Gene Search and the GeneSample Search, to mine the complete set of coherent gene clusters. We empirically evaluate the performance of our approaches on both a real microarray data set and synthetic data sets. The test results have shown that our approaches are both efficient and effective to find meaningful coherent gene clusters. Daxin Jiang received the Ph.D. degree in computer science and engineering from the State University of New York at Buffalo in 2005. He received the B.S. degree in computer science from the University of Science and Technology of China. From 1998 to 2000, he was a M.S. student in Software Institute, Chinese Academy of Sciences. He is currently an assistant professor at the School of Computer Engineering, Nanyang Technology University, Singapore. His research interests include data mining, bioinformatics, machine learning, and information retrieval. Jian Pei received the Ph.D. degree in computing science from Simon Fraser University, Canada, in 2002, under Dr. Jiawei Han's supervision. He also received the B.Eng. and the M.Eng. degrees from Shanghai Jiao Tong University, China, in 1991 and 1993, respectively, both in Computer Science. He is currently an assistant professor of computing science at Simon Fraser University. His research interests include developing effective and efficient data analysis techniques for novel data intensive applications. He is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the National Science Foundation (NSF) of the United States. Since 2000, he has published over 70 research papers in refereed journals, conferences, and workshops, has served in the organization committees and the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD. Murali Ramanathan is an associate professor of pharmaceutical sciences and neurology. He received the B.Tech. (Honors) in chemical engineering from the Indian Institute of Technology, India, in 1983. After a 4-year stint in the chemical industry, he obtained the M.S. degree in chemical engineering from Iowa State University, Ames, IA, in 1987, and the Ph.D. degree in bioengineering from the University of California-San Francisco and University of California-Berkeley Joint Program in Bioengineering in 1994. Dr. Ramanathan research interests are primarily focused on the treatment of multiple sclerosis (MS), an inflammatory-demyelinating disease of the central nervous system that affects over 1 million patients worldwide. MS is a complex, variable disease that causes physical and cognitive disability and nearly 50% of patients diagnosed with MS are unable to walk after 15 years. The etiology and pathogenesis of MS remains poorly understood. Dr. Ramanathan's research interests include stochastic modeling of pharmaceutical systems and novel approaches to analyzing and using genetic and genomic data for improving patient care and optimizing therapy. Chuan Lin is currently a Ph.D. student in the Department of Computer Science and Engineering, State University of New York at Buffalo. She received the B.E. and the M.S. degrees in computer science and technology from Tsinghua University in China. Her research interests include bioinformatics, data mining, and machine learning. Chun Tang received the B.S. and M.S. degrees from Peking University, China, in 1996 and 1999, respectively, and the Ph.D. degree from State University of New York at Buffalo, USA, in 2005, all in computer science. Currently, she is a postdoctoral associate of Center for Medical Informatics, Yale University. Her research interests include bioinformatics, data mining, machine learning, database, and information retrieval. Aidong Zhang received the Ph.D. degree in computer science from Purdue University, West Lafayette, Indiana, in 1994. She was an assistant professor from 1994 to 1999, an associate professor from 1999 to 2002, and has been a professor since 2002 in the Department of Computer Science and Engineering at State University of New York at Buffalo. Her research interests include multimedia systems, content-based image retrieval, bioinformatics, and data mining. She is an author of over 140 research publications in these areas. Dr. Zhang's research has been funded by NSF, NIH, NIMA, and Xerox. Zhang serves on the editorial boards of International Journal of Bioinformatics Research and Applications (IJBRA), ACM Multimedia Systems, International Journal of Multimedia Tools and Applications, and International Journal of Distributed and Parallel Databases. She was the editor for ACM SIGMOD DiSC (Digital Symposium Collection) from 2001 to 2003. She was co-chair of the technical program committee for ACM Multimedia in 2001. She has also served on various conference program committees. Dr. Zhang is a recipient of the National Science Foundation CAREER award and SUNY Chancellor's Research Recognition award.  相似文献   

3.
In some business applications such as trading management in financial institutions, it is required to accurately answer ad hoc aggregate queries over data streams. Materializing and incrementally maintaining a full data cube or even its compression or approximation over a data stream is often computationally prohibitive. On the other hand, although previous studies proposed approximate methods for continuous aggregate queries, they cannot provide accurate answers. In this paper, we develop a novel prefix aggregate tree (PAT) structure for online warehousing data streams and answering ad hoc aggregate queries. Often, a data stream can be partitioned into the historical segment, which is stored in a traditional data warehouse, and the transient segment, which can be stored in a PAT to answer ad hoc aggregate queries. The size of a PAT is linear in the size of the transient segment, and only one scan of the data stream is needed to create and incrementally maintain a PAT. Although the query answering using PAT costs more than the case of a fully materialized data cube, the query answering time is still kept linear in the size of the transient segment. Our extensive experimental results on both synthetic and real data sets illustrate the efficiency and the scalability of our design. Moonjung Cho is a Ph.D. candidate in the Department of Computer Science and Engineering at State University of New York at Buffalo. She obtained her Master from same university in 2003. She has industry experiences as associate researcher for 4 years. Her research interests are in the area of data mining, data warehousing and data cubing. She has received a full scholarship from Institute of Information Technology Assessment in Korea. Jian Pei received the Ph.D. degree in Computing Science from Simon Fraser University, Canada, in 2002. He is currently an Assistant Professor of Computing Science at Simon Fraser University, Canada. In 2002–2004, he was an Assistant Professor of Computer Science and Engineering at the State University of New York at Buffalo, USA. His research interests can be summarized as developing advanced data analysis techniques for emerging applications. Particularly, he is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) and National Science Foundation (NSF). He has published over 70 papers in refereed journals, conferences, and workshops, has served in the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD. Ke Wang received Ph.D from Georgia Institute of Technology. He is currently a professor at School of Computing Science, Simon Fraser University. Before joining Simon Fraser, he was an associate professor at National University of Singapore. He has taught in the areas of database and data mining. Ke Wang's research interests include database technology, data mining and knowledge discovery, machine learning, and emerging applications, with recent interests focusing on the end use of data mining. This includes explicitly modeling the business goal (such as profit mining, bio-mining and web mining) and exploiting user prior knowledge (such as extracting unexpected patterns and actionable knowledge). He is interested in combining the strengths of various fields such as database, statistics, machine learning and optimization to provide actionable solutions to real life problems. Ke Wang has published in database, information retrieval, and data mining conferences, including SIGMOD, SIGIR, PODS, VLDB, ICDE, EDBT, SIGKDD, SDM and ICDM. He is an associate editor of the IEEE TKDE journal and has served program committees for international conferences including DASFAA, ICDE, ICDM, PAKDD, PKDD, SIGKDD and VLDB.  相似文献   

4.
It is likely that customers issue requests based on out-of-date information in e-commerce application systems. Hence, the transaction failure rates would increase greatly. In this paper, we present a preference update model to address this problem. A preference update is an extended SQL update statement where a user can request the desired number of target data items by specifying multiple preferences. Moreover, the preference update allows easy extraction of criteria from a set of concurrent requests and, hence, optimal decisions for the data assignments can be made. We propose a group evaluation strategy for preference update processing in a multidatabase environment. The experimental results show that the group evaluation can effectively increase the customer satisfaction level with acceptable cost. Peng Li is the Chief Software Architect of didiom LLC. Before that, he was a visiting assistant professor of computer science department in Western Kentucky University. He received his Ph.D. degree of computer science from the University of Texas at Dallas. He also holds a B.Sc. and M.S. in Computer Science from the Renmin University of China. His research interests include database systems, database security, transaction processing, distributed and Internet computer and E-commerce. Manghui Tu received a Bachelor degree of Science from Wuhan University, P.R. China in 1996, and a Master Degree in Computer Science from the University of Texas at Dallas 2001. He is currently working toward the PhD degree in the Department of Computer Science at the University of Texas at Dallas. Mr. Tu’s research interests include distributed systems, grid computing, information security, mobile computing, and scientific computing. His PhD research work focus on the data management in secure and high performance data grid. He is a student member of the IEEE. I-Ling Yen received her BS degree from Tsing-Hua University, Taiwan, and her MS and PhD degrees in Computer Science from the University of Houston. She is currently an Associate Professor of Computer Science at the University of Texas at Dallas. Dr. Yen’s research interests include fault-tolerant computing, security systems and algorithms, distributed systems, Internet technologies, E-commerce, and self-stabilizing systems. She had published over 100 technical papers in these research areas and received many research awards from NSF, DOD, NASA, and several industry companies. She has served as Program Committee member for many conferences and Program Chair/Co-Chair for the IEEE Symposium on Application-Specific Software and System Engineering & Technology, IEEE High Assurance Systems Engineering Symposium, IEEE International Computer Software and Applications Conference, and IEEE International Symposium on Autonomous Decentralized Systems. She is a member of the IEEE. Zhonghang Xia received the B.S. degree in applied mathematics from Dalian University of Technology in 1990, the M.S. degree in Operations Research from Qufu Normal University in 1993, and the Ph.D. degree in computer science from the University of Texas at Dallas in 2004. He is now an assistant professor in the Department of Computer Science, Western Kentucky University, Bowling Green, KY. His research interests are in the area of multimedia computing and networking, distributed systems, and data mining.  相似文献   

5.
Efficient Incremental Maintenance of Frequent Patterns with FP-Tree   总被引:3,自引:0,他引:3       下载免费PDF全文
Mining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, efficient maintenance of the discovered patterns is crucial. Most existing methods need to scan the entire database repeatedly, which is an obvious disadvantage. In this paper, an efficient incremental mining algorithm, Incremental-Mining (IM), is proposed for maintenance of the frequent patterns when new incremental data come. Based on the frequent pattern tree (FP-tree) structure, IM gives a way to make the most of the things from the previous mining process, and requires scanning the original data once at most. Furthermore, IM can identify directly the differential set of frequent patterns, which may be more informative to users. Moreover, IM can deal with changing thresholds as well as changing data, thus provide a full maintenance scheme. IM has been implemented and the performance study shows it outperforms three other incremental algorithms: FUP, DB-tree and re-running frequent pattern growth (FP-growth).  相似文献   

6.
Classification is an important technique in data mining.The decision trees builty by most of the existing classification algorithms commonly feature over-branching,which will lead to poor efficiency in the subsequent classification period.In this paper,we present a new value-oriented classification method,which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible,based on the concepts of frequent-pattern-node and exceptive-child-node.The experiments show that while using relevant anal-ysis as pre-processing ,our classification method,without loss of accuracy,can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.  相似文献   

7.
Recently, periodic pattern mining from time series data has been studied extensively. However, an interesting type of periodic pattern, called partial periodic (PP) correlation in this paper, has not been investigated. An example of PP correlation is that power consumption is high either on Monday or Tuesday but not on both days. In general, a PP correlation is a set of offsets within a particular period such that the data at these offsets are correlated with a certain user-desired strength. In the above example, the period is a week (7 days), and each day of the week is an offset of the period. PP correlations can provide insightful knowledge about the time series and can be used for predicting future values. This paper introduces an algorithm to mine time series for PP correlations based on the principal component analysis (PCA) method. Specifically, given a period, the algorithm maps the time series data to data points in a multidimensional space, where the dimensions correspond to the offsets within the period. A PP correlation is then equivalent to correlation of data when projected to a subset of the dimensions. The algorithm discovers, with one sequential scan of data, all those PP correlations (called minimum PP correlations) that are not unions of some other PP correlations. Experiments using both real and synthetic data sets show that the PCA-based algorithm is highly efficient and effective in finding the minimum PP correlations. Zhen He is a lecturer in the Department of Computer Science at La Trobe University. His main research areas are database systems optimization, time series mining, wireless sensor networks, and XML information retrieval. Prior to joining La Trobe University, he worked as a postdoctoral research associate in the University of Vermont. He holds Bachelors, Honors and Ph.D degrees in Computer Science from the Australian National University. X. Sean Wang received his Ph.D degree in Computer Science from the University of Southern California in 1992. He is currently the Dorothean Chair Professor in Computer Science at the University of Vermont. He has published widely in the general area of databases and information security, and was a recipient of the US National Science Foundation Research Initiation and CAREER awards. His research interests include database systems, information security, data mining, and sensor data processing. Byung Suk Lee is associate professor of Computer Science at the University of Vermont. His main research areas are database systems, data modeling, and information retrieval. He held positions in industry and academia: Gold Star Electric, Bell Communications Research, Datacom Global Communications, University of St. Thomas, and currently University of Vermont. He was also a visiting professor at Dartmouth College and a participating guest at Lawrence Livermore National Laboratory. He served on international conferences as a program committee member, a publicity chair, and a special session organizer, and also on US federal funding proposal review panel. He holds a BS degree from Seoul National University, MS from Korea Advanced Institute of Science and Technology, and Ph.D from Stanford University. Alan C. H. Ling is an assistant professor at Department of Computer Science in University of Vermont. His research interests include combinatorial design theory, coding theory, sequence designs, and applications of design theory.  相似文献   

8.
The research presented in this paper approaches the issue of robot team navigation using relative positioning. With this approach each robot is equipped with sensors that allow it to independently estimate the relative direction of an assigned leader. Acoustic sensor systems are used and were seen to work very effectively in environments where datum relative positioning systems (such as GPS or acoustic transponders) are typically ineffective. While acoustic sensors provide distinct advantages, the variability of the acoustic environment presents significant control challenges. To address this challenge, directional control of the robot was accomplished with a feed forward neural network trained using a genetic algorithm, and a new approach to training using recent memories was successfully implemented. The design of this controller is presented and its performance is compared with more traditional classic logic and behavior controllers. Patrick McDowell received his bachelor's degree in Computer Science in 1984 from the University of Idaho. He spent the next 15 years working as a computer scientist for a small defense contractor where he specialized in real time data acquisition, application development, and image processing. In 1999 he received his master's degree in computer science from the University of Southern Mississippi. In 2000 he began work at the Naval Research Lab where he has focused on application of machine learning techniques to autonomous underwater navigation. In 2005 he received his Ph.D. in Computer Science from Louisiana State University. His research interests include legged robotics, machine learning, and artificial intelligence. In Fall of 2006 he joined Southeastern Louisiana University as an assistant professor of Computer Science. Brian S. Bourgeois received his Ph.D. in Electrical Engineering from Tulane University located in New Orleans, LA in 1991. Since then he has worked at the Stennis Space Center, MS detachment of the Naval Research Laboratory. He has worked on research projects spanning an array of technologies including airborne survey sytems, acoustic backscattering, bathymetry and imaging sonar systems, the ORCA unmanned underwater vehicle and the development of an autonomous survey system for hydrographic survey ships. He is presently the head of the Position, Navigation and Timing team at NRL with research interests including underwater positioning and communications and autonomous navigation. Ms. McDowell received her M.S. in Applied Physics in 2002 from the University or New Orleans. She is presently a candidate for a Ph. D. in Engineering and Applied Science. She joined the Naval Research Laboratory in 1991 as a research engineer and has spent most of that time working in experimental and theoretical acoustic modeling. Ms. McDowell's specific research interest lie in the areas of sonar performance analysis. Dr. S. S. Iyengar is the Chairman and Roy Paul Daniels Chaired Professor of Computer Science at Louisiana State University and is also Satish Dhawan Chaired Professor at Indian Institute of Science. He has been involved with research in high-performance algorithms, data structures, sensor fusion, data mining, and intelligent systems since receiving his Ph.D. degree (1974) and his M.S. from the Indian Institute of Science (1970). He has been a consultant to several industrial and government organizations (JPL, NASA etc.). In 1999, Professor Iyengar won the most prestigious research award titled Distinguished Research Award and a university medal for his research contributions in optimal algorithms for sensor fusion/image processing. Dr. Jianhua Chen received her Ph.D. in computer science in 1988 from Jilin University, Chang Chun, China. In August 1988, She joined the Computer Science Department of Louisiana State University, Baton Rouge, USA, where she is currently an associate professor. Dr. Chen's research interests include Machine Learning and Data Mining, Fuzzy Sets and Systems, Knowledge Representation and Reasoning.  相似文献   

9.
Advances in wireless and mobile computing environments allow a mobile user to access a wide range of applications. For example, mobile users may want to retrieve data about unfamiliar places or local life styles related to their location. These queries are called location-dependent queries. Furthermore, a mobile user may be interested in getting the query results repeatedly, which is called location-dependent continuous querying. This continuous query emanating from a mobile user may retrieve information from a single-zone (single-ZQ) or from multiple neighbouring zones (multiple-ZQ). We consider the problem of handling location-dependent continuous queries with the main emphasis on reducing communication costs and making sure that the user gets correct current-query result. The key contributions of this paper include: (1) Proposing a hierarchical database framework (tree architecture and supporting continuous query algorithm) for handling location-dependent continuous queries. (2) Analysing the flexibility of this framework for handling queries related to single-ZQ or multiple-ZQ and propose intelligent selective placement of location-dependent databases. (3) Proposing an intelligent selective replication algorithm to facilitate time- and space-efficient processing of location-dependent continuous queries retrieving single-ZQ information. (4) Demonstrating, using simulation, the significance of our intelligent selective placement and selective replication model in terms of communication cost and storage constraints, considering various types of queries. Manish Gupta received his B.E. degree in Electrical Engineering from Govindram Sakseria Institute of Technology & Sciences, India, in 1997 and his M.S. degree in Computer Science from University of Texas at Dallas in 2002. He is currently working toward his Ph.D. degree in the Department of Computer Science at University of Texas at Dallas. His current research focuses on AI-based software synthesis and testing. His other research interests include mobile computing, aspect-oriented programming and model checking. Manghui Tu received a Bachelor degree of Science from Wuhan University, P.R. China, in 1996, and a Master's Degree in Computer Science from the University of Texas at Dallas 2001. He is currently working toward the Ph.D. degree in the Department of Computer Science at the University of Texas at Dallas. Mr. Tu's research interests include distributed systems, wireless communications, mobile computing, and reliability and performance analysis. His Ph.D. research work focuses on the dependent and secure data replication and placement issues in network-centric systems. Latifur R. Khan has been an Assistant Professor of Computer Science department at University of Texas at Dallas since September 2000. He received his Ph.D. and M.S. degrees in Computer Science from University of Southern California (USC) in August 2000 and December 1996, respectively. He obtained his B.Sc. degree in Computer Science and Engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, in November of 1993. Professor Khan is currently supported by grants from the National Science Foundation (NSF), Texas Instruments, Alcatel, USA, and has been awarded the Sun Equipment Grant. Dr. Khan has more than 50 articles, book chapters and conference papers focusing in the areas of database systems, multimedia information management and data mining in bio-informatics and intrusion detection. Professor Khan has also served as a referee for database journals, conferences (e.g. IEEE TKDE, KAIS, ADL, VLDB) and he is currently serving as a program committee member for the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD2005), ACM 14th Conference on Information and Knowledge Management (CIKM 2005), International Conference on Database and Expert Systems Applications DEXA 2005 and International Conference on Cooperative Information Systems (CoopIS 2005), and is program chair of ACM SIGKDD International Workshop on Multimedia Data Mining, 2004. Farokh Bastani received the B.Tech. degree in Electrical Engineering from the Indian Institute of Technology, Bombay, and the M.S. and Ph.D. degrees in Computer Science from the University of California, Berkeley. He is currently a Professor of Computer Science at the University of Texas at Dallas. Dr. Bastani's research interests include various aspects of the ultrahigh dependable systems, especially automated software synthesis and testing, embedded real-time process-control and telecommunications systems and high-assurance systems engineering. Dr. Bastani was the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (IEEE-TKDE). He is currently an emeritus EIC of IEEE-TKDE and is on the editorial board of the International Journal of Artificial Intelligence Tools, the International Journal of Knowledge and Information Systems and the Springer-Verlag series on Knowledge and Information Management. He was the program cochair of the 1997 IEEE Symposium on Reliable Distributed Systems, 1998 IEEE International Symposium on Software Reliability Engineering, 1999 IEEE Knowledge and Data Engineering Workshop, 1999 International Symposium on Autonomous Decentralised Systems, and the program chair of the 1995 IEEE International Conference on Tools with Artificial Intelligence. He has been on the program and steering committees of several conferences and workshops and on the editorial boards of the IEEE Transactions on Software Engineering, IEEE Transactions on Knowledge and Data Engineering and the Oxford University Press High Integrity Systems Journal. I-Ling Yen received her B.S. degree from Tsing-Hua University, Taiwan, and her M.S. and Ph.D. degrees in Computer Science from the University of Houston. She is currently an Associate Professor of Computer Science at University of Texas at Dallas. Dr. Yen's research interests include fault-tolerant computing, security systems and algorithms, distributed systems, Internet technologies, E-commerce and self-stabilising systems. She has published over 100 technical papers in these research areas and received many research awards from NSF, DOD, NASA and several industry companies. She has served as Program Committee member for many conferences and Program Chair/Cochair for the IEEE Symposium on Application-Specific Software and System Engineering & Technology, IEEE High Assurance Systems Engineering Symposium, IEEE International Computer Software and Applications Conference, and IEEE International Symposium on Autonomous Decentralized Systems. She has also served as a guest editor for a theme issue of IEEE Computer devoted to high-assurance systems.  相似文献   

10.
Efficient string matching with wildcards and length constraints   总被引:1,自引:2,他引:1  
This paper defines a challenging problem of pattern matching between a pattern P and a text T, with wildcards and length constraints, and designs an efficient algorithm to return each pattern occurrence in an online manner. In this pattern matching problem, the user can specify the constraints on the number of wildcards between each two consecutive letters of P and the constraints on the length of each matching substring in T. We design a complete algorithm, SAIL that returns each matching substring of P in T as soon as it appears in T in an O(n+klmg) time with an O(lm) space overhead, where n is the length of T, k is the frequency of P's last letter occurring in T, l is the user-specified maximum length for each matching substring, m is the length of P, and g is the maximum difference between the user-specified maximum and minimum numbers of wildcards allowed between two consecutive letters in P.SAIL stands for string matching with wildcards and length constraints. Gong Chen received the B.Eng. degree from the Beijing University of Technology, China, and the M.Sc. degree from the University of Vermont, USA, both in computer science. He is currently a graduate student in the Department of Statistics at the University of California, Los Angeles, USA. His research interests include data mining, statistical learning, machine learning, algorithm analysis and design, and database management. Xindong Wu is a professor and the chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM and WWW, as well as 12 books and conference proceedings. Dr. Wu is the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM),an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He is the 2004 ACM SIGKDD Service Award winner. Xingquan Zhu received his Ph.D degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent 4 months with Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From 2001 to 2002, he was a postdoctoral associate in the Department of Computer Science at Purdue University, West Lafayette, IN. He is currently a research assistant professor in the Department of Computer Science, the University of Vermont, Burlington, VT. His research interests include data mining, machine learning, data quality, multimedia computing, and information retrieval. Since 2000, Dr. Zhu has published extensively, including over 50 refereed papers in various journals and conference proceedings. Abdullah N. Arslan got his Ph.D. degree in Computer Science in 2002 from the University of California at Santa Barbara. Upon his graduation he joined the Department of Computer Science at the University of Vermont as an assistant professor. He has been with the computer science faculty there since then. Dr. Arslan's main research interests are on algorithms on strings, computational biology and bioinformatics. Dr. Arslan earned his Master's degree in Computer Science in 1996 from the University of North Texas, Denton, Texas and his Bachelor's degree in Computer Engineering in 1990 from the Middle East Technical University, Ankara, Turkey. He worked as a programmer for the Central Bank of Turkey between 1991 and 1994. Yu He received her B.E. degree in Information Engineering from Zhejiang University, China, in 2001. She is currently a graduate student in the Department of Computer Science at the University of Vermont. Her research interests include data mining, bioinformatics and pattern recognition.  相似文献   

11.
The paper is about some families of rewriting P systems, where the application of evolution rules is extended from the classical sequential rewriting to the parallel one (as, for instance, in Lindenmayer systems). As a result, consistency problems for the communication of strings may arise. Three variants of parallel rewriting P systems (already present in the literature) are considered here, together with the strategies they use to face the communication problem, and some parallelism methods for string rewriting are defined. We give a survey of all known results about each variant and we state some relations among the three variants, thus establishing hierarchies of parallel rewriting P systems. Various open problems related to the subject are also presented. Danicla Besozzi: She is assistant professor at the University of Milano. She received her M.S. in Mathematics (2000) from the University of Como and Ph.D. in Computer Science (2004) from the University of Milano. Her research interests cover topics in Formal Language Theory, Molecular Computing, Systems Biology. She is member of EATCS (European Association for Theoretical Computer Science) and EMCC (European Molecular Computing Consortium). Giancarlo Mauri: He is full professor of Computer Science at the University of Milano-Bicocca. His research interests are mainly in the area of theoretical computer science, and include: formal languages and automata, computational complexity, computational learning theory, soft computing techniques, cellular automata, bioinformatics and molecular computing. On these subjects, he published more than 150 scientific papers in international journals, contributed volumes and conference proceedings. Claudio Zandron: He received Ph.D. in Computer Science at the University of Milan, Italy, in 2001. Since 2002 he is assistant professor at the University of Milano-Bicocca, Italy. He is member of the EATCS (European Association for Theoretical Computer Science) and of EMCC (European Molecular Computing Consortium). His research interests are Molecular Computing (DNA and Membrane Computing) and Formal Languages.  相似文献   

12.
We suggest the use of ranking-based evaluation measures for regression models, as a complement to the commonly used residual-based evaluation. We argue that in some cases, such as the case study we present, ranking can be the main underlying goal in building a regression model, and ranking performance is the correct evaluation metric. However, even when ranking is not the contextually correct performance metric, the measures we explore still have significant advantages: They are robust against extreme outliers in the evaluation set; and they are interpretable. The two measures we consider correspond closely to non-parametric correlation coefficients commonly used in data analysis (Spearman's ρ and Kendall's τ); and they both have interesting graphical representations, which, similarly to ROC curves, offer useful various model performance views, in addition to a one-number summary in the area under the curve. An interesting extension which we explore is to evaluate models on their performance in “partially” ranking the data, which we argue can better represent the utility of the model in many cases. We illustrate our methods on a case study of evaluating IT Wallet size estimation models for IBM's customers. Saharon Rosset is Research Staff Member in the Data Analytics Research Group at IBM's T. J. Watson Research Center. He received his B.S. in Mathematics and M.Sc., in Statistics from Tel Aviv University in Israel, and his Ph.D. in Statistics from Stanford University in 2003. In his research, he aspires to develop practically useful predictive modeling methodologies and tools, and apply them to solve problems in business and scientific domains. Currently, his major projects include work on customer wallet estimation and analysis of genetic data. Claudia Perlich has received a M.Sc. in Computer Science from Colorado University at Boulder, a Diploma in Computer Science from Technische Universitaet in Darmstadt, and her Ph.D. in Information Systems from Stern School of Business, New York University. Her Ph.D. thesis concentrated on probability estimation in multi-relational domains that capture information of multiple entity types and relationships between them. Her dissertation was recognized as an additional winner of the International SAP Doctoral Support Award Competition. Claudia joined the Data Analytics Research group at IBM's T.J. Watson Research Center as a Research Staff Member in October 2004. Her research interests are in statistical machine learning for complex real-world domains and business applications. Bianca Zadrozny is currently an associate professor at the Computer Science Department of Federal Fluminense University in Brazil. Her research interests are in the areas of applied machine learning and data mining. She received her B.Sc. in Computer Engineering from the Pontifical Catholic University in Rio de Janeiro, Brazil, and her M.Sc. and Ph.D. in Computer Science from the University of California at San Diego. She has also worked as a research staff member in the data analytics research group at IBM T.J. Watson Research Center.  相似文献   

13.
Data extraction from the web based on pre-defined schema   总被引:7,自引:1,他引:7       下载免费PDF全文
With the development of the Internet,the World Web has become an invaluable information source for most organizations,However,most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents.Effectively extracting data from such documents remains a non-trivial task.In this paper,we present a schema-guided approach to extracting data from HTML pages .Under the approach,the user defines a schema specifying what to be extracted and provides sample mappings between the schema and th HTML page.The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required datas in the form of XML conforming to the use-defined schema .A prototype system implementing the approach has been developed .The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy.  相似文献   

14.
Many supervised machine learning tasks can be cast as multi-class classification problems. Support vector machines (SVMs) excel at binary classification problems, but the elegant theory behind large-margin hyperplane cannot be easily extended to their multi-class counterparts. On the other hand, it was shown that the decision hyperplanes for binary classification obtained by SVMs are equivalent to the solutions obtained by Fisher's linear discriminant on the set of support vectors. Discriminant analysis approaches are well known to learn discriminative feature transformations in the statistical pattern recognition literature and can be easily extend to multi-class cases. The use of discriminant analysis, however, has not been fully experimented in the data mining literature. In this paper, we explore the use of discriminant analysis for multi-class classification problems. We evaluate the performance of discriminant analysis on a large collection of benchmark datasets and investigate its usage in text categorization. Our experiments suggest that discriminant analysis provides a fast, efficient yet accurate alternative for general multi-class classification problems. Tao Li is currently an assistant professor in the School of Computer Science at Florida International University. He received his Ph.D. degree in Computer Science from University of Rochester in 2004. His primary research interests are: data mining, machine learning, bioinformatics, and music information retrieval. Shenghuo Zhu is currently a researcher in NEC Laboratories America, Inc. He received his B.E. from Zhejiang University in 1994, B.E. from Tsinghua University in 1997, and Ph.D degree in Computer Science from University of Rochester in 2003. His primary research interests include information retrieval, machine learning, and data mining. Mitsunori Ogihara received a Ph.D. in Information Sciences at Tokyo Institute of Technology in 1993. He is currently Professor and Chair of the Department of Computer Science at the University of Rochester. His primary research interests are data mining, computational complexity, and molecular computation.  相似文献   

15.
A logic-based approach to the specification of active database functionality is presented which not only endows active databases with a well-defined and well-understood formal semantics, but also tightly integrates them with deductive databases. The problem of endowing deductive databases with rule-based active behaviour has been addressed in different ways. Typical approaches include accounting for active behaviour by extending the operational semantics of deductive databases, or, conversely, accounting for deductive capabilities by constraining the operational semantics of active databases. The main contribution of the paper is an alternative approach in which a class of active databases is defined whose operational semantics is naturally integrated with the operational semantics of deductive databases without either of them strictly subsuming the other. The approach is demonstrated via the formalization of the syntax and semantics of an active-rule language that can be smoothly incorporated into existing deductive databases, due to the fact that the standard formalization of deductive databases is reused, rather than altered or extended. One distinctive feature of the paper is its use of ahistory, as defined in the Kowalski-Sergot event-calculus, to define event occurrences, database states and actions on these. This has proved to be a suitable foundation for a comprehensive logical account of the concept set underpinning active databases. The paper thus contributes a logical perspective to the ongoing task of developing a formal theory of active databases. Alvaro Adolfo Antunes Fernandes, Ph.D.: He received a B.Sc. in Economics (Rio de Janeiro, 1984), an M.Sc. in Knowledge-Based Systems (Edinburgh, 1990) and a Ph.D. in Computer Science (Heriot-Watt, 1995). He worked as a Research Associate at Heriot-Watt University from December 1990 until December 1995. In January 1996 he joined the Department of Mathematical and Computing Sciences at Goldsmiths College, University of London, as a Lecturer. His current research interests include advanced data- and knowledge-base technology, logic programming, and software engineering. M. Howard Williams, Ph.D., D.Sc.: He obtained his Ph.D. in ionospheric physics and recently a D.Sc. in Computer Science. He was appointed as the first lecturer in Computer Science at Rhodes University in 1970. During the following decade he rose to Professor of Computer Science and in 1980 was appointed as Professor of Computer Science at Heriot-Watt University. From 1980 to 1988 he served as Head of Department and then as director of research until 1992. He is now head of the Database Research Group at Heriot-Watt University. His current research interests include active databases, deductive objectoriented databases, spatial databases, parallel databases and telemedicine. Norman W. Paton, Ph.D.: He received a B.Sc. in Computing Science from the University of Aberdeen in 1986. From 1986 to 1989 he worked as a Research Assistant at the University of Aberdeen, receiving a Ph. D. in 1989. From 1989 to 1995 he was a Lecturer in Computer Science at Heriot-Watt University. Since July 1995, he has been a Senior Lecturer in Department of Computer Science at the University of Manchester. His current research interests include active databases, deductive object-oriented databases, spatial databases and database interfaces.  相似文献   

16.
We present a system for performing belief revision in a multi-agent environment. The system is called GBR (Genetic Belief Revisor) and it is based on a genetic algorithm. In this setting, different individuals are exposed to different experiences. This may happen because the world surrounding an agent changes over time or because we allow agents exploring different parts of the world. The algorithm permits the exchange of chromosomes from different agents and combines two different evolution strategies, one based on Darwin’s and the other on Lamarck’s evolutionary theory. The algorithm therefore includes also a Lamarckian operator that changes the memes of an agent in order to improve their fitness. The operator is implemented by means of a belief revision procedure that, by tracing logical derivations, identifies the memes leading to contradiction. Moreover, the algorithm comprises a special crossover mechanism for memes in which a meme can be acquired from another agent only if the other agent has “accessed” the meme, i.e. if an application of the Lamarckian operator has read or modified the meme. Experiments have been performed on the η-queen problem and on a problem of digital circuit diagnosis. In the case of the η-queen problem, the addition of the Lamarckian operator in the single agent case improves the fitness of the best solution. In both cases the experiments show that the distribution of constraints, even if it may lead to a reduction of the fitness of the best solution, does not produce a significant reduction. Evelina Lamma, Ph.D.: She is Full Professor at the University of Ferrara. She got her degree in Electrical Engineering at the University of Bologna in 1985, and her Ph.D. in Computer Science in 1990. Her research activity centers on extensions of logic programming languages and artificial intelligence. She was coorganizers of the 3rd International Workshop on Extensions of Logic Programming ELP92, held in Bologna in February 1992, and of the 6th Italian Congress on Artificial Intelligence, held in Bologna in September 1999. Currently, she teaches Artificial Intelligence and Fondations of Computer Science. Fabrizio Riguzzi, Ph.D.: He is Assistant Professor at the Department of Engineering of the University of Ferrara, Italy. He received his Laurea from the University of Bologna in 1995 and his Ph.D. from the University of Bologna in 1999. He joined the Department of Engineering of the University of Ferrara in 1999. He has been a visiting researcher at the University of Cyprus and at the New University of Lisbon. His research interests include: data mining (and in particular methods for learning from multirelational data), machine learning, belief revision, genetic algorithms and software engineering. Luís Moniz Pereira, Ph.D.: He is Full Professor of Computer Science at Departamento de Informática, Universidade Nova de Lisboa, Portugal. He received his Ph.D. in Artificial Intelligence from Brunel University in 1974. He is the director of the Artificial Intelligence Centre (CENTRIA) at Universidade Nova de Lisboa. He has been elected Fellow of the European Coordinating Committee for Artificial Intelligence in 2001. He has been a visiting Professor at the U. California at Riverside, USA, the State U. NY at Stony Brook, USA and the U. Bologna, Italy. His research interests include: knowledge representation, reasoning, learning, rational agents and logic programming.  相似文献   

17.
Published scientific articles are linked together into a graph, the citation graph, through their citations. This paper explores the notion of similarity based on connectivity alone, and proposes several algorithms to quantify it. Our metrics take advantage of the local neighborhoods of the nodes in the citation graph. Two variants of link-based similarity estimation between two nodes are described, one based on the separate local neighborhoods of the nodes, and another based on the joint local neighborhood expanded from both nodes at the same time. The algorithms are implemented and evaluated on a subgraph of the citation graph of computer science in a retrieval context. The results are compared with text-based similarity, and demonstrate the complementarity of link-based and text-based retrieval. Wangzhong Lu holds a Bachelor's degree from Hefei University of Technology (1993), and a Master's degree from Dalhousie University (2001), both in computer science. From 1993 to 1999 he worked as a developer with China National Computer Software and Technical Service Corp. in Beijing. From 2001 to 2005 he held industrial positions as a senior software architect in Atlantic Canada. He is currently with DST Systems, Charlotte, NC, as a senior data architect. Jeannette Janssen's research area is applied graph theory. She has worked on the problem of frequency assignment in cellular and digital broadcasting networks. Her current interest is in graph theory applied to the World Wide Web and other networked information spaces. Dr. Janssen did her Master's studies at Eindhoven University of Technology in the Netherlands, and her doctorate at Lehigh University, USA. She is currently an associate professor at Dalhousie University, Canada. Evangelos Milios received a diploma in electrical engineering from the National Technical University of Athens, and Master's and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology. He held faculty positions at the University of Toronto and York University. He is currently a professor of computer science at Dalhousie University, Canada, where he was Director of the Graduate Program. He has served on the committees of the ACM Dissertation Award, and the AAAI/SIGART Doctoral Consortium. He has worked on the interpretation of visual and range signals for landmark-based positioning, navigation and map construction in single- and multi-agent robotics. His current research activity is centered on Networked Information Spaces, Web information retrieval, and aquatic robotics. He is a senior member of the IEEE. Nathalie Japkowicz is an associate professor at the School of Information Technology and Engineering of the University of Ottawa. She obtained her Ph.D. from Rutgers University, her M.Sc. from the University of Toronto, and her B.Sc. from McGill University. Prior to joining the University of Ottawa, she taught at Ohio State University and Dalhousie University. Her area of specialization is Machine Learning and her most recent research interests focused on the class imbalance problem. She made over 50 contributions in the form of journal articles, conference articles, workshop articles, magazine articles, technical reports or edited volumes. Yongzheng Zhang obtained a B.E. in computer applications from Southeast University, China, in 1997 and a M.S. in computer science from Dalhousie University in 2002. From 1997 to 1999 he was an instructor and undergraduate advisor at Southeast University. He also worked as a software engineer in Ricom Information and Telecommunications Co. Ltd., China. He is currently a Ph.D. candidate at Dalhousie University. His research interests are in the areas of Information Retrieval, Machine Learning, Natural Language Processing, and Web Mining, particularly centered on Web Document Summarization. A paper based on his Master's thesis received the best paper award at the 2003 Canadian Artificial Intelligence conference.  相似文献   

18.
Partial evaluation is a semantics-based program optimization technique which has been investigated within different programming paradigms and applied to a wide variety of languages. Recently, a partial evaluation framework for functional logic programs has been proposed. In this framework, narrowing—the standard operational semantics of integrated languages—is used to drive the partial evaluation process. This paper surveys the essentials of narrowing-driven partial evaluation. Elvira Albert, Ph.D.: She is an associate professor in Computer Science at the Technical University of Valencia, Spain. She received her bachelors degree in computer science in 1998 and her Ph.D. in computer science in 2001, both from the Technical University of Valencia. She has investigated on program optimization and on partial evaluation for declarative multi-paradigm programming languages. Her current research interests include term rewriting, multi-paradigm declarative programming, and formal methods, in particular semantics-based program analysis, transformation, specification, verification, and debugging. Germán Vidal, Ph.D.: He is an associate professor in Computer Science at the Technical University of Valencia, Spain. He obtained his bachelors degree in computer science in 1992 and his Ph.D. in computer science in 1996, both from the Technical University of Valencia. He is active on several research topics in Functional Logic Programming. He has worked on compositionality, on abstract interpretation, and on program transformation techniques for functional logic programs. Currently, his research interests include declarative multi-paradigm programming languages, term rewriting, and semantics-based program manipulation, in particular partial evaluation.  相似文献   

19.
Inductive logic programming (ILP) is concerned with the induction of logic programs from examples and background knowledge. In ILP, the shift of attention from program synthesis to knowledge discovery resulted in advanced techniques that are practically applicable for discovering knowledge in relational databases. This paper gives a brief introduction to ILP, presents selected ILP techniques for relational knowledge discovery and reviews selected ILP applications. Nada Lavrač, Ph.D.: She is a senior research associate at the Department of Intelligent Systems, J. Stefan Institute, Ljubljana, Slovenia (since 1978) and a visiting professor at the Klagenfurt University, Austria (since 1987). Her main research interest is in machine learning, in particular inductive logic programming and intelligent data analysis in medicine. She received a BSc in Technical Mathematics and MSc in Computer Science from Ljubljana University, and a PhD in Technical Sciences from Maribor University, Slovenia. She is coauthor of KARDIO: A Study in Deep and Qualitative Knowledge for Expert Systems, The MIT Press 1989, and Inductive Logic Programming: Techniques and Applications, Ellis Horwood 1994, and coeditor of Intelligent Data Analysis in Medicine and Pharmacology, Kluwer 1997. She was the coordinator of the European Scientific Network in Inductive Logic Programming ILPNET (1993–1996) and program cochair of the 8th European Machine Learning Conference ECML’95, and 7th International Workshop on Inductive Logic Programming ILP’97. Sašo Džeroski, Ph.D.: He is a research associate at the Department of Intelligent Systems, J. Stefan Institute, Ljubljana, Slovenia (since 1989). He has held visiting researcher positions at the Turing Institute, Glasgow (UK), Katholieke Universiteit Leuven (Belgium), German National Research Center for Computer Science (GMD), Sankt Augustin (Germany) and the Foundation for Research and Technology-Hellas (FORTH), Heraklion (Greece). His research interest is in machine learning and knowledge discovery in databases, in particular inductive logic programming and its applications and knowledge discovery in environmental databases. He is co-author of Inductive Logic Programming: Techniques and Applications, Ellis Horwood 1994. He is the scientific coordinator of ILPnet2, The Network of Excellence in Inductive Logic Programming. He was program co-chair of the 7th International Workshop on Inductive Logic Programming ILP’97 and will be program co-chair of the 16th International Conference on Machine Learning ICML’99. Masayuki Numao, Ph.D.: He is an associate professor at the Department of Computer Science, Tokyo Institute of Technology. He received a bachelor of engineering in electrical and electronics engineering in 1982 and his Ph.D. in computer science in 1987 from Tokyo Institute of Technology. He was a visiting scholar at CSLI, Stanford University from 1989 to 1990. His research interests include Artificial Intelligence, Global Intelligence and Machine Learning. Numao is a member of Information Processing Society of Japan, Japanese Society for Artificial Intelligence, Japanese Cognitive Science Society, Japan Society for Software Science and Technology and AAAI.  相似文献   

20.
In this paper, we propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency-based pruning and the supersequence frequency-based pruning are applied. In MSPS, a sampling technique is used to identify long frequent sequences earlier, instead of enumerating all their subsequences. We propose how to adjust the user-specified minimum support level for mining a sample of the database to achieve better overall performance. This method makes sampling more efficient when the minimum support is small. A signature-based method and a hash-based method are developed for the subsequence infrequency-based pruning when the seed set of frequent sequences for the candidate generation is too big to be loaded into memory. A prefix tree structure is developed to count the candidate sequences of different sizes during the database scanning, and it also facilitates the customer sequence trimming. Our experiments showed MSPS has very good performance and better scalability than other algorithms. Congnan Luo received the B.E. degree in Computer Science from Tsinghua University, Beijing, P.R. China, in 1997, the M.S. degree in Computer Science from the Institute of Software, Chinese Academy of Sciences, Beijing, P.R. China, in 2000, and the Ph.D. degree in Computer Science and Engineering from Wright State University, Dayton, OH, in 2006. Currently he is a technical staff at the Teradata division of NCR in San Diego, CA, and his research interests include data mining, machine learning, and databases. Soon M. Chung received the B.S. degree in Electronic Engineering from Seoul National University, Korea, in 1979, the M.S. degree in Electrical Engineering from Korea Advanced Institute of Science and Technology, Korea, in 1981, and the Ph.D. degree in Computer Engineering from Syracuse University, Syracuse, New York, in 1990. He is currently a Professor in the Department of Computer Science and Engineering at Wright State University, Dayton, OH. His research interests include database, data mining, Grid computing, text mining, XML, and parallel and distributed processing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号