首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data extraction from the web based on pre-defined schema   总被引:8,自引:1,他引:7       下载免费PDF全文
With the development of the Internet,the World Web has become an invaluable information source for most organizations,However,most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents.Effectively extracting data from such documents remains a non-trivial task.In this paper,we present a schema-guided approach to extracting data from HTML pages .Under the approach,the user defines a schema specifying what to be extracted and provides sample mappings between the schema and th HTML page.The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required datas in the form of XML conforming to the use-defined schema .A prototype system implementing the approach has been developed .The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy.  相似文献   

2.
A Horn definition is a set of Horn clauses with the same predicate in all head literals. In this paper, we consider learning non-recursive, first-order Horn definitions from entailment. We show that this class is exactly learnable from equivalence and membership queries. It follows then that this class is PAC learnable using examples and membership queries. Finally, we apply our results to learning control knowledge for efficient planning in the form of goal-decomposition rules. Chandra Reddy, Ph.D.: He is currently a doctoral student in the Department of Computer Science at Oregon State University. He is completing his Ph.D. on June 30, 1998. His dissertation is entitled “Learning Hierarchical Decomposition Rules for Planning: An Inductive Logic Programming Approach.” Earlier, he had an M. Tech in Artificial Intelligence and Robotics from University of Hyderabad, India, and an M.Sc.(tech) in Computer Science from Birla Institute of Technology and Science, India. His current research interests broadly fall under machine learning and planning/scheduling—more specifically, inductive logic programming, speedup learning, data mining, and hierarchical planning and optimization. Prasad Tadepalli, Ph.D.: He has an M.Tech in Computer Science from Indian Institute of Technology, Madras, India and a Ph.D. from Rutgers University, New Brunswick, USA. He joined Oregon State University, Corvallis, as an assistant professor in 1989. He is now an associate professor in the Department of Computer Science of Oregon State University. His main area of research is machine learning, including reinforcement learning, inductive logic programming, and computational learning theory, with applications to classification, planning, scheduling, manufacturing, and information retrieval.  相似文献   

3.
Event-based systems are seen as good candidates for supporting distributed applications in dynamic and ubiquitous environments because they support decoupled and asynchronous one-to-many and many-to-many information dissemination. Event systems are widely used because asynchronous messaging provides a flexible alternative to RPC. They are typically implemented using an overlay network of routers. A content-based router forwards event messages based on filters that are installed by subscribers and other routers. This paper addresses the optimization of content-based routing tables organized using the covering relation and presents novel configurations for improving local and distributed operation. We present the poset-derived forest data structure and variants that perform considerably better under frequent filter additions and removals than existing data structures. The results offer a significant performance increase to currently known covering-based routing mechanisms. Sasu Tarkoma received his M.Sc. and Ph.Lic degrees in Computer Science from the University of Helsinki, Department of Computer Science. He has over 20 scientific publications and has also contributed to several books on mobile middleware. His research interests include distributed computing and middleware. Jaakko Kangasharju is a PhD student at the University of Helsinki and working as a researcher at the Helsinki Institute for Information Technology. His research is concentrated on XML messaging and processing in the mobile wireless environment. He has participated in related standardization efforts at the Object Management Group and the World Wide Web Consortium.  相似文献   

4.
With the growing popularity of the World Wide Web, large volume of user access data has been gathered automatically by Web servers and stored in Web logs. Discovering and understanding user behavior patterns from log files can provide Web personalized recommendation services. In this paper, a novel clustering method is presented for log files called Clustering large Weblog based on Key Path Model (CWKPM), which is based on user browsing key path model, to get user behavior profiles. Compared with the previous Boolean model, key path model considers the major features of users‘ accessing to the Web: ordinal, contiguous and duplicate. Moreover, for clustering, it has fewer dimensions. The analysis and experiments show that CWKPM is an efficient and effective approach for clustering large and high-dimension Web logs.  相似文献   

5.
In this paper we describe deployment of most important life sciences applications on the grid. The build grid is heterogenous and consist of systems of different architecture as well as operating systems and various middleware. We have used UNICORE infrastructure as framework for development dedicated user interface to the number of existing computational chemistry codes and molecular biology databases. Developed solution allows for access to the resources provided with UNICORE as well as Globus with exactly the same interface which gives access to the general grid functionality such as single login, job submission and control mechanism. Jarosław Wypychowski: He is a student at the Faculty of Mathematics and Computer Science, Warsaw University, Poland. He is involved in the development of grid tools. He has been working as programmer in the private company. Jarosław Pytliński, M.Sc.: He received his M.Sc. in 2002 from Department of Mathematic and Computer Science of Nicolaus Copernicus University in Torun. His thesis on “Quantum Chemistry Computations in Grid Environment” was distincted in XIX Polish Contest for the best M.Sc. Thesis of Computer Science. He also worked in Laboratory of High Performance Systems at UCI, Torun. His interests are Artificial Intelligence and GRID technology. Łukasz Skorwider, M.Sc.: He is programmer in the private pharmaceutical company. He obtained M.Sc. degree from the Faculty of Mathematics and Computer Science N. Copernicus University. As graduate student he was involved in the development of grid tools for drug design. His private and professional interest is Internet technology. Mirosław Nazaruk, M.Sc.: He is a senior computer and network administrator at ICM Warsaw University. He provides professional support for the users of the high performance facilities located at the ICM. He obtained M.Sc. in Computer Science from Warsaw University in 1991. Before joining ICM, he was a member of technical staff at Institute of Applied Mathematics, Warsaw University. Krzysztof Benedyczak: He is a student at the Faculty of Mathematics and Computer Science, N. Copernicus University, Torun, Poland. He is involved in the development of grid tools. Michał Wroński: He is a student at the Faculty of Mathematics and Computer Science, N. Copernicus University, Torun, Poland. He is involved in the development of grid tools. Piotr Bała, Ph.D.: He is an adiunkt at Faculty of Mathematics and Computer Science N. Copernicus University, Torun, Poland, and tightly cooperates with ICM, Warsaw University. He obtained Ph.D. in Physics in 1993 in Institute of Physics, N. Copernicus University and in 2000 habilitation in physics. From 2001 he was appointed director of Laboratory of Parallel and Distributed Processing at Faculty of Mathematics, N. Copernicus University. His main research interest is development and application of Quantum-Classical Molecular Dynamics and Approximated Valence Bond method to study of enzymatic reactions in biological systems. In the last few years, he has been involved in development of parallel and grid tools for large scale scientific applications.  相似文献   

6.
Web image indexing by using associated texts   总被引:1,自引:0,他引:1  
In order to index Web images, the whole associated texts are partitioned into a sequence of text blocks, then the local relevance of a term to the corresponding image is calculated with respect to both its local occurrence in the block and the distance of the block to the image. Thus, the overall relevance of a term is determined as the sum of all its local weight values multiplied by the corresponding distance factors of the text blocks. In the present approach, the associated text of a Web image is firstly partitioned into three parts, including a page-oriented text (TM), a link-oriented text (LT), and a caption-oriented text (BT). Since the big size and semantic divergence, the caption-oriented text is further partitioned into finer blocks based on the tree structure of the tag elements within the BT text. During the processing, all heading nodes are pulled up in order to correlate with their semantic scopes, and a collapse algorithm is also exploited to remove the empty blocks. In our system, the relevant factors of the text blocks are determined by using a greedy Two-Way-Merging algorithm. Zhiguo Gong is an associate Professor in the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao, China. He received his BS, MS, and PhD from the Hebei Normal University, Peking University, and the Chinese Academy of Science in 1983, 1988, and 1998, respectively. His research interests include Distributed Database, Multimedia Database, Digital Library, Web Information Retrieval, and Web Mining. Leong Hou U is currently a Master Candidate in the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao, China. He received his BS from National Chi Nan University, Taiwan in 2003. His research interests include Web Information Retrieval and Web Mining. Chan Wa Cheang is currently a Master Candidate in the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao, China. He received his BS from the National Taiwan University, Taiwan in 2003. His research interests include Web Information Retrieval and Web Mining.  相似文献   

7.
Merging uncertain information with semantic heterogeneity in XML   总被引:1,自引:1,他引:0  
Semistructured information can be merged in a logic-based framework [6, 7]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.e. textentries) in the XML documents [3]. In this paper we further extend this approach to modelling and merging uncertain information that is defined at different levels of granularity of XML textentries, and to modelling and reasoning with XML documents that contain semantically heterogeneous uncertain information on more complex elements in XML subtrees. We present the formal definitions for modelling, propagating and merging semantically heterogeneous uncertain information and explain how they can be handled using logic-based fusion techniques. Anthony Hunter received a B.Sc. (1984) from the University of Bristol and an M.Sc. (1987) and Ph.D. (1992) from Imperial College, London. He is currently a reader in the Department of Computer Science at University College London. His main research interests are: Knowledge representation and reasoning, Analysing inconsistency, Argumentation, Default reasoning and Knowledge Fusion. Weiru Liu is a senior lecturer at the School of Computer Science, Queen's University Belfast. She received her B.Sc. and M.Sc. degrees in Computer Science from Jilin University, P.R China, and her Ph.D. degree in Artificial Intelligence from the University of Edinburgh. Her main research interests include reasoning under uncertainty, knowledge representation and reasoning, uncertain knowledge and information fusion, and knowledge discovery in databases. She has published over 50 journal and conference papers in these areas.  相似文献   

8.
Classification is an important technique in data mining.The decision trees builty by most of the existing classification algorithms commonly feature over-branching,which will lead to poor efficiency in the subsequent classification period.In this paper,we present a new value-oriented classification method,which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible,based on the concepts of frequent-pattern-node and exceptive-child-node.The experiments show that while using relevant anal-ysis as pre-processing ,our classification method,without loss of accuracy,can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.  相似文献   

9.
The study on database technologies, or more generally, the technologies of data and information management, is an important and active research field. Recently, many exciting results have been reported. In this fast growing field, Chinese researchers play more and more active roles. Research papers from Chinese scholars, both in China and abroad,appear in prestigious academic forums.In this paper,we, nine young Chinese researchers working in the United States, present concise surveys and report our recent progress on the selected fields that we are working on.Although the paper covers only a small number of topics and the selection of the topics is far from balanced, we hope that such an effort would attract more and more researchers,especially those in China,to enter the frontiers of database research and promote collaborations. For the obvious reason, the authors are listed alphabetically, while the sections are arranged in the order of the author list.  相似文献   

10.
In the field of computer vision and pattern recognition, data processing and data analysis tasks are often implemented as a consecutive or parallel application of more-or-less complex operations. In the following we will present DocXS, a computing environment for the design and the distributed and parallel execution of such tasks. Algorithms can be programmed using an Eclipse-based user interface, and the resulting Matlab and Java operators can be visually connected to graphs representing complex data processing workflows. DocXS is platform independent due to its implementation in Java, is freely available for noncommercial research, and can be installed on standard office computers. One advantage of DocXS is that it automatically takes care about the task execution and does not require its users to care about code distribution or parallelization. Experiments with DocXS show that it scales very well with only a small overhead. The text was submitted by the authors in English. Steffen Wachenfeld received B.Sc. and M.Sc. (honors) degrees in Information Systems in 2003 and 2005 from the University of Muenster, Germany, and an M.Sc. (honors) degree in Computer Science in 2003 from the University of Muenster. He is currently a research fellow and PhD student in the Computer Science at the Dept. of Computer Science, University of Muenster. His research interests include low resolution text recognition, computer vision on mobile devices, and systems/system architectures for computer vision and image analysis. He is author or coauthor of more than ten scientific papers and a member of IAPR. Tobias Lohe, M.Sc. degree in Computer Science in 2007 from the University of Muenster, Germany, is currently a research associate and PhD student in Computer Science at the Institute for Robotics and Cognitive Systems, University of Luebeck, Germany. His research interests include medical imaging, signal processing, and robotics for minimally invasive surgery. Michael Fieseler is currently a student of Computer Science at the University of Muenster, Germany. He has participated in research in the field of computer vision and medical imaging. Currently he is working on his Master thesis on depth-based image rendering (DBIR). Xiaoyi Jiang studied Computer Science at Peking University, China, and received his PhD and Venia Docendi (Habilitation) degree in Computer Science from the University of Bern, Switzerland. In 2002 he became an associate professor at the Technical University of Berlin, Germany. Since October 2002 he has been a full professor at the University of Münster, Germany. He has coauthored and coedited two books published by Springer and has served as the co-guest-editor of two special issues in international journals. Currently, he is the Coeditor-in-Chief of the International Journal of Pattern Recognition and Artificial Intelligence. In addition he also serves on the editorial advisory board of the International Journal of Neural Systems and the editorial board of IEEE Transactions on Systems, Man, and Cybernetics—Part B, the International Journal of Image and Graphics, Electronic Letters on Computer Vision and Image Analysis, and Pattern Recognition. His research interests include medical image analysis, vision-based man-machine interface, 3D image analysis, structural pattern recognition, and mobile multimedia. He is a member of IEEE and a Fellow of IAPR.  相似文献   

11.
Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods. Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C. She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information Engineering at National Chung Cheng University. Her current research interests include database design, image processing and data mining techniques. S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database marketing, knowledge management, data mining, and decision support systems. Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science and Technology. He is currently an associate professor of information management department in the National Chung Hsing University at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas focus on data engineering, database techniques and information security. Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan, R.O.C. His research interests include data mining, XML, and database compression.  相似文献   

12.
Summary Algorithms for mutual exclusion that adapt to the current degree of contention are developed. Afilter and a leader election algorithm form the basic building blocks. The algorithms achieve system response times that are independent of the total number of processes and governed instead by the current degree of contention. The final algorithm achieves a constant amortized system response time. Manhoi Choy was born in 1967 in Hong Kong. He received his B.Sc. in Electrical and Electronic Engineerings from the University of Hong Kong in 1989, and his M.Sc. in Computer Science from the University of California at Santa Barbara in 1991. Currently, he is working on his Ph.D. in Computer Science at the University of California at Santa Barbara. His research interests are in the areas of parallel and distributed systems, and distributed algorithms. Ambuj K. Singh is an Assistant Professor in the Department of Computer Science at the University of California, Santa Barbara. He received a Ph.D. in Computer Science from the University of Texas at Austin in 1989, an M.S. in Computer Science from Iowa State University in 1984, and a B.Tech. from the Indian Institute of Technology at Kharagpur in 1982. His research interests are in the areas of adaptive resource allocation, concurrent program development, and distributed shared memory.A preliminary version of the paper appeared in the 12th Annual ACM Symposium on Principles of Distributed ComputingWork supported in part by NSF grants CCR-9008628 and CCR-9223094  相似文献   

13.
Tracking clusters in evolving data streams over sliding windows   总被引:6,自引:4,他引:2  
Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement. Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters but also the evolving behaviors of individual clusters. In this paper, we present a novel method for tracking the evolution of clusters over sliding windows. In our SWClustering algorithm, we combine the exponential histogram with the temporal cluster features, propose a novel data structure, the Exponential Histogram of Cluster Features (EHCF). The exponential histogram is used to handle the in-cluster evolution, and the temporal cluster features represent the change of the cluster distribution. Our approach has several advantages over existing methods: (1) the quality of the clusters is improved because the EHCF captures the distribution of recent records precisely; (2) compared with previous methods, the mechanism employed to adaptively maintain the in-cluster synopsis can track the cluster evolution better, while consuming much less memory; (3) the EHCF provides a flexible framework for analyzing the cluster evolution and tracking a specific cluster efficiently without interfering with other clusters, thus reducing the consumption of computing resources for data stream clustering. Both the theoretical analysis and extensive experiments show the effectiveness and efficiency of the proposed method. Aoying Zhou is currently a Professor in Computer Science at Fudan University, Shanghai, P.R. China. He won his Bachelor and Master degrees in Computer Science from Sichuan University in Chengdu, Sichuan, P.R. China in 1985 and 1988, respectively, and Ph.D. degree from Fudan University in 1993. He served as the member or chair of program committee for many international conferences such as WWW, SIGMOD, VLDB, EDBT, ICDCS, ER, DASFAA, PAKDD, WAIM, and etc. His papers have been published in ACM SIGMOD, VLDB, ICDE, and several other international journals. His research interests include Data mining and knowledge discovery, XML data management, Web mining and searching, data stream analysis and processing, peer-to-peer computing. Feng Cao is currently an R&D engineer in IBM China Research Laboratories. He received a B.E. degree from Xi'an Jiao Tong University, Xi'an, P.R. China, in 2000 and an M.E. degree from Huazhong University of Science and Technology, Wuhan, P.R. China, in 2003. From October 2004 to March 2005, he worked in Fudan-NUS Competency Center for Peer-to-Peer Computing, Singapore. In 2006, he received his Ph.D. degree from Fudan University, Shanghai, P.R. China. His current research interests include data mining and data stream. Weining Qian is currently an Assistant Professor in computer science at Fudan University, Shanghai, P.R. China. He received his M.S. and Ph.D. degree in computer science from Fudan University in 2001 and 2004, respectively. He is supported by Shanghai Rising-Star Program under Grant No. 04QMX1404 and National Natural Science Foundation of China (NSFC) under Grant No. 60673134. He served as the program committee member of several international conferences, including DASFAA 2006, 2007 and 2008, APWeb/WAIM 2007, INFOSCALE 2007, and ECDM 2007. His papers have been published in ICDE, SIAM DM, and CIKM. His research interests include data stream query processing and mining, and large-scale distributed computing for database applications. Cheqing Jin is currently an Assistant Professor in Computer Science at East China University of Science and Technology. He received his Bachelor and Master degrees in Computer Science from Zhejiang University in Hangzhou, P.R. China in 1999 and 2002, respectively, and the Ph.D. degree from Fudan University, Shanghai, P.R. China. He worked as a Research Assistant at E-business Technology Institute, the Hong Kong University from December 2003 to May 2004. His current research interests include data mining and data stream.  相似文献   

14.
The present contribution describes a potential application of Grid Computing in Bioinformatics. High resolution structure determination of biological specimens is critical in BioSciences to understanding the biological function. The problem is computational intensive. Distributed and Grid Computing are thus becoming essential. This contribution analyzes the use of Grid Computing and its potential benefits in the field of electron microscope tomography of biological specimens. Jose-Jesus Fernandez, Ph.D.: He received his M.Sc. and Ph.D. degrees in Computer Science from the University of Granada, Spain, in 1992 and 1997, respectively. He was a Ph.D. student at the Bio-Computing unit of the National Center for BioTechnology (CNB) from the Spanish National Council of Scientific Research (CSIC), Madrid, Spain. He became an Assistant Professor in 1997 and, subsequently, Associate Professor in 2000 in Computer Architecture at the University of Almeria, Spain. He is a member of the supercomputing-algorithms research group. His research interests include high performance computing (HPC), image processing and tomography. Jose-Roman Bilbao-Castro: He received his M.Sc. degree in Computer Science from the University of Almeria in 2001. He is currently a Ph.D. student at the BioComputing unit of the CNB (CSIC) through a Ph.D. CSIC-grant in conjuction with Dept. Computer Architecture at the University of Malaga (Spain). His current research interestsinclude tomography, HPC and distributed and grid computing. Roberto Marabini, Ph.D.: He received the M.Sc. (1989) and Ph.D. (1995) degrees in Physics from the University Autonoma de Madrid (UAM) and University of Santiago de Compostela, respectively. He was a Ph.D. student at the BioComputing Unit at the CNB (CSIC). He worked at the University of Pennsylvania and the City University of New York from 1998 to 2002. At present he is an Associate Professor at the UAM. His current research interests include inverse problems, image processing and HPC. Jose-Maria Carazo, Ph.D.: He received the M.Sc. degree from the Granada University, Spain, in 1981, and got his Ph.D. in Molecular Biology at the UAM in 1984. He left for Albany, NY, in 1986, coming back to Madrid in 1989 to set up the BioComputing Unit of the CNB (CSIC). He was involved in the Spanish Ministry of Science and Technology as Deputy General Director for Research Planning. Currently, he keeps engaged in his activities at the CNB, the Scientific Park of Madrid and Integromics S.L. Immaculada Garcia, Ph.D.: She received her B.Sc. (1977) and Ph.D. (1986) degrees in Physics from the Complutense University of Madrid and University of Santiago de Compostela, respectively. From 1977 to 1987 she was an Assistant professor at the University of Granada, from 1987 to 1996 Associate professor at the University of Almeria and since 1997 she is a Full Professor and head of Dept. Computer Architecture. She is head of the supercomputing-algorithms research group. Her research interest lies in HPC for irregular problems related to image processing, global optimization and matrix computation.  相似文献   

15.
A range query finds the aggregated values over all selected cells of an online analytical processing (OLAP) data cube where the selection is specified by the ranges of contiguous values for each dimension. An important issue in reality is how to preserve the confidential information in individual data cells while still providing an accurate estimation of the original aggregated values for range queries. In this paper, we propose an effective solution, called the zero-sum method, to this problem. We derive theoretical formulas to analyse the performance of our method. Empirical experiments are also carried out by using analytical processing benchmark (APB) dataset from the OLAP Council. Various parameters, such as the privacy factor and the accuracy factor, have been considered and tested in the experiments. Finally, our experimental results show that there is a trade-off between privacy preservation and range query accuracy, and the zero-sum method has fulfilled three design goals: security, accuracy, and accessibility. Sam Y. Sung is an Associate Professor in the Department of Computer Science, School of Computing, National University of Singapore. He received a B.Sc. from the National Taiwan University in 1973, the M.Sc. and Ph.D. in computer science from the University of Minnesota in 1977 and 1983, respectively. He was with the University of Oklahoma and University of Memphis in the United States before joining the National University of Singapore. His research interests include information retrieval, data mining, pictorial databases and mobile computing. He has published more than 80 papers in various conferences and journals, including IEEE Transaction on Software Engineering, IEEE Transaction on Knowledge & Data Engineering, etc. Yao Liu received the B.E. degree in computer science and technology from Peking University in 1996 and the MS. degree from the Software Institute of the Chinese Science Academy in 1999. Currently, she is a Ph.D. candidate in the Department of Computer Science at the National University of Singapore. Her research interests include data warehousing, database security, data mining and high-speed networking. Hui Xiong received the B.E. degree in Automation from the University of Science and Technology of China, Hefei, China, in 1995, the M.S. degree in Computer Science from the National University of Singapore, Singapore, in 2000, and the Ph.D. degree in Computer Science from the University of Minnesota, Minneapolis, MN, USA, in 2005. He is currently an Assistant Professor of Computer Information Systems in the Management Science & Information Systems Department at Rutgers University, NJ, USA. His research interests include data mining, databases, and statistical computing with applications in bioinformatics, database security, and self-managing systems. He is a member of the IEEE Computer Society and the ACM. Peter A. Ng is currently the Chairperson and Professor of Computer Science at the University of Texas—Pan American. He received his Ph.D. from the University of Texas–Austin in 1974. Previously, he had served as the Vice President at the Fudan International Institute for Information Science and Technology, Shanghai, China, from 1999 to 2002, and the Executive Director for the Global e-Learning Project at the University of Nebraska at Omaha, 2000–2003. He was appointed as an Advisory Professor of Computer Science at Fudan University, Shanghai, China in 1999. His recent research focuses on document and information-based processing, retrieval and management. He has published many journal and conference articles in this area. He had served as the Editor-in-Chief for the Journal on Systems Integration (1991–2001) and as Advisory Editor for the Data and Knowledge Engineering Journal since 1989.  相似文献   

16.
Many algorithms in distributed systems assume that the size of a single message depends on the number of processors. In this paper, we assume in contrast that messages consist of a single bit. Our main goal is to explore how the one-bit translation of unbounded message algorithms can be sped up by pipelining. We consider two problems. The first is routing between two processors in an arbitrary network and in some special networks (ring, grid, hypercube). The second problem is coloring a synchronous ring with three colors. The routing problem is a very basic subroutine in many distributed algorithms; the three coloring problem demonstrates that pipelining is not always useful. Amotz Bar-Noy received his B.Sc. degree in Mathematics and Computer Science in 1981, and his Ph.D. degree in Computer Science in 1987, both from the Hebrew University of Jerusalem, Israel. Between 1987 and 1989 he was a post-doctoral fellow in the Department of Computer Science at Stanford University. He is currently a visiting scientist at the IBM Thomas J. Watson Research Center. His current research interests include the theoretical aspects of distributed and parallel computing, computational complexity and combinatorial optimization. Joseph (Seffi) Naor received his B.A. degree in Computer Science in 1981 from the Technion, Israel Institute of Technology. He received his M.Sc. in 1983 and Ph.D. in 1987 in Computer Science, both from the Hebrew University of Jerusalem, Israel. Between 1987 and 1988 he was a post-doctoral fellow at the University of Southern California, Los Angeles, CA. Since 1988 he has been a post-doctoral fellow in the Department of Computer Science at Stanford University. His research interests include combinatorial optimization, randomized algorithms, computational complexity and the theoretical aspects of parallel and distributed computing. Moni Naor received his B.A. in Computer Science from the Technion, Israel Institute of Technology, in 1985, and his Ph.D. in Computer Science from the University of California at Berkeley in 1989. He is currently a visiting scientist at the IBM Almaden Research Center. His research interests include computational complexity, data structures, cryptography, and parallel and distributed computation.Supported in part by a Weizmann fellowship and by contract ONR N00014-85-C-0731Supported by contract ONR N00014-88-K-0166 and by a grant from Stanford's Center for Integrated Systems. This work was done while the author was a post-doctoral fellow at the University of Southern California, Los Angeles, CAThis work was done while the author was with the Computer Science Division, University of California at Berkeley, and Supported by NSF grant DCR 85-13926  相似文献   

17.
STAMP: A Model for Generating Adaptable Multimedia Presentations   总被引:1,自引:1,他引:0  
The STAMP model addresses the dynamic generation of multimedia presentations in the domain of Multimedia Web-based Information Systems. STAMP allows the presentation of multimedia data obtained from XML compatible data sources by means of query. Assuming that the size and the nature of the elements of information provided by a data source is not known a priori, STAMP proposes templates which describe the spatial, temporal, navigational structuration of multimedia presentations whose content varies. The instantiation of a template is done with respect to the set of spatial and temporal constraints associated with the delivery context. A set of adaptations preserving the initial intention of the presentation is proposed.Ioan Marius Bilasco is a Ph.D. student at the University Joseph Fourier in Grenoble, France, since 2003. He received his BS degree in Computer Science form the University Babes Bolyai in Cluj-Napoca, Romania and his MS degree in Computer Science from the University Joseph Fourier in Grenoble, France. He joined the LSR-IMAG Laboratory in Grenoble in 2001. His research interests include adaptability in Web-based Information Systems, 3D multimedia data modelling and mobile communications.Jérôme Gensel is an Assistant Professor at the University Pierre Mendès France in Grenoble, France, since 1996. He received his Ph.D. in 1995 from the University of Grenoble for his work on Constraint Programming and Knowledge Representation in the Sherpa project at the French National Institute of Computer Sciences and Automatics (INRIA). He joined the LSR-IMAG Laboratory in Grenoble in 2001. His research interests include adaptability and cooperation in Web-based Information Systems, multimedia data (especially video) modeling, semi-structured and object-based knowledge representation and constraint programming.Marlène Villanova-Oliver is an Assistant Professor at the University Pierre Mendès France in Grenoble, France, since 2003. In 1999, she received her MS degree in Computer Science from the University Joseph Fourier of Grenoble and the European Diploma of 3rd cycle in Management and Technology of Information Systems (MATIS). She received her Ph.D. in 2002 from the National Polytechnic Institute of Grenoble (INPG). She is a member of the LSR-IMAG Laboratory in Grenoble since 1998. Her research interests include adaptability in Web-based Information Systems, user modeling, adaptable Web Services.  相似文献   

18.
A logic-based approach to the specification of active database functionality is presented which not only endows active databases with a well-defined and well-understood formal semantics, but also tightly integrates them with deductive databases. The problem of endowing deductive databases with rule-based active behaviour has been addressed in different ways. Typical approaches include accounting for active behaviour by extending the operational semantics of deductive databases, or, conversely, accounting for deductive capabilities by constraining the operational semantics of active databases. The main contribution of the paper is an alternative approach in which a class of active databases is defined whose operational semantics is naturally integrated with the operational semantics of deductive databases without either of them strictly subsuming the other. The approach is demonstrated via the formalization of the syntax and semantics of an active-rule language that can be smoothly incorporated into existing deductive databases, due to the fact that the standard formalization of deductive databases is reused, rather than altered or extended. One distinctive feature of the paper is its use of ahistory, as defined in the Kowalski-Sergot event-calculus, to define event occurrences, database states and actions on these. This has proved to be a suitable foundation for a comprehensive logical account of the concept set underpinning active databases. The paper thus contributes a logical perspective to the ongoing task of developing a formal theory of active databases. Alvaro Adolfo Antunes Fernandes, Ph.D.: He received a B.Sc. in Economics (Rio de Janeiro, 1984), an M.Sc. in Knowledge-Based Systems (Edinburgh, 1990) and a Ph.D. in Computer Science (Heriot-Watt, 1995). He worked as a Research Associate at Heriot-Watt University from December 1990 until December 1995. In January 1996 he joined the Department of Mathematical and Computing Sciences at Goldsmiths College, University of London, as a Lecturer. His current research interests include advanced data- and knowledge-base technology, logic programming, and software engineering. M. Howard Williams, Ph.D., D.Sc.: He obtained his Ph.D. in ionospheric physics and recently a D.Sc. in Computer Science. He was appointed as the first lecturer in Computer Science at Rhodes University in 1970. During the following decade he rose to Professor of Computer Science and in 1980 was appointed as Professor of Computer Science at Heriot-Watt University. From 1980 to 1988 he served as Head of Department and then as director of research until 1992. He is now head of the Database Research Group at Heriot-Watt University. His current research interests include active databases, deductive objectoriented databases, spatial databases, parallel databases and telemedicine. Norman W. Paton, Ph.D.: He received a B.Sc. in Computing Science from the University of Aberdeen in 1986. From 1986 to 1989 he worked as a Research Assistant at the University of Aberdeen, receiving a Ph. D. in 1989. From 1989 to 1995 he was a Lecturer in Computer Science at Heriot-Watt University. Since July 1995, he has been a Senior Lecturer in Department of Computer Science at the University of Manchester. His current research interests include active databases, deductive object-oriented databases, spatial databases and database interfaces.  相似文献   

19.
In some business applications such as trading management in financial institutions, it is required to accurately answer ad hoc aggregate queries over data streams. Materializing and incrementally maintaining a full data cube or even its compression or approximation over a data stream is often computationally prohibitive. On the other hand, although previous studies proposed approximate methods for continuous aggregate queries, they cannot provide accurate answers. In this paper, we develop a novel prefix aggregate tree (PAT) structure for online warehousing data streams and answering ad hoc aggregate queries. Often, a data stream can be partitioned into the historical segment, which is stored in a traditional data warehouse, and the transient segment, which can be stored in a PAT to answer ad hoc aggregate queries. The size of a PAT is linear in the size of the transient segment, and only one scan of the data stream is needed to create and incrementally maintain a PAT. Although the query answering using PAT costs more than the case of a fully materialized data cube, the query answering time is still kept linear in the size of the transient segment. Our extensive experimental results on both synthetic and real data sets illustrate the efficiency and the scalability of our design. Moonjung Cho is a Ph.D. candidate in the Department of Computer Science and Engineering at State University of New York at Buffalo. She obtained her Master from same university in 2003. She has industry experiences as associate researcher for 4 years. Her research interests are in the area of data mining, data warehousing and data cubing. She has received a full scholarship from Institute of Information Technology Assessment in Korea. Jian Pei received the Ph.D. degree in Computing Science from Simon Fraser University, Canada, in 2002. He is currently an Assistant Professor of Computing Science at Simon Fraser University, Canada. In 2002–2004, he was an Assistant Professor of Computer Science and Engineering at the State University of New York at Buffalo, USA. His research interests can be summarized as developing advanced data analysis techniques for emerging applications. Particularly, he is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) and National Science Foundation (NSF). He has published over 70 papers in refereed journals, conferences, and workshops, has served in the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD. Ke Wang received Ph.D from Georgia Institute of Technology. He is currently a professor at School of Computing Science, Simon Fraser University. Before joining Simon Fraser, he was an associate professor at National University of Singapore. He has taught in the areas of database and data mining. Ke Wang's research interests include database technology, data mining and knowledge discovery, machine learning, and emerging applications, with recent interests focusing on the end use of data mining. This includes explicitly modeling the business goal (such as profit mining, bio-mining and web mining) and exploiting user prior knowledge (such as extracting unexpected patterns and actionable knowledge). He is interested in combining the strengths of various fields such as database, statistics, machine learning and optimization to provide actionable solutions to real life problems. Ke Wang has published in database, information retrieval, and data mining conferences, including SIGMOD, SIGIR, PODS, VLDB, ICDE, EDBT, SIGKDD, SDM and ICDM. He is an associate editor of the IEEE TKDE journal and has served program committees for international conferences including DASFAA, ICDE, ICDM, PAKDD, PKDD, SIGKDD and VLDB.  相似文献   

20.
Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics. Jian Tang received an MS degree from the University of Iowa in 1983, and PhD from the Pennsylvania State University in 1988, both from the Department of Computer Science. He joined the Department of Computer Science, Memorial University of Newfoundland, Canada, in 1988, where he is currently a professor. He has visited a number of research institutions to conduct researches ranging over a variety of topics relating to theories and practices for database management and systems. His current research interests include data mining, e-commerce, XML and bioinformatics. Zhixiang Chen is an associate professor in the Computer Science Department, University of Texas-Pan American. He received his PhD in computer science from Boston University in January 1996, BS and MS degrees in software engineering from Huazhong University of Science and Technology. He also studied at the University of Illinois at Chicago. He taught at Southwest State University from Fall 1995 to September 1997, and Huazhong University of Science and Technology from 1982 to 1990. His research interests include computational learning theory, algorithms and complexity, intelligent Web search, informational retrieval, and data mining. Ada Waichee Fu received her BSc degree in computer science in the Chinese University of Hong Kong in 1983, and both MSc and PhD degrees in computer science in Simon Fraser University of Canada in 1986, 1990, respectively; worked at Bell Northern Research in Ottawa, Canada, from 1989 to 1993 on a wide-area distributed database project; joined the Chinese University of Hong Kong in 1993. Her research interests are XML data, time series databases, data mining, content-based retrieval in multimedia databases, parallel, and distributed systems. David Wai-lok Cheung received the MSc and PhD degrees in computer science from Simon Fraser University, Canada, in 1985 and 1989, respectively. He also received the BSc degree in mathematics from the Chinese University of Hong Kong. From 1989 to 1993, he was a member of Scientific Staff at Bell Northern Research, Canada. Since 1994, he has been a faculty member of the Department of Computer Science in the University of Hong Kong. He is also the Director of the Center for E-Commerce Infrastructure Development. His research interests include data mining, data warehouse, XML technology for e-commerce and bioinformatics. Dr. Cheung was the Program Committee Chairman of the Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2001), Program Co-Chair of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2005). Dr. Cheung is a member of the ACM and the IEEE Computer Society.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号