期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Extracting frequent connected subgraphs from large graph sets

WeiWang Qing-QingYuan Hao-FengZhou Ming-ShengHong Bai-LeShi 《计算机科学技术学报》2004,19(6):0-0

Mining frequent patterns from datasets is one of the key success of data mining research. Currently,most of the studies focus on the data sets in which the elements are independent, such as the items in the marketing basket. However, the objects in the real world often have close relationship with each other. How to extract frequent patterns from these relations is the objective of this paper. The authors use graphs to model the relations, and select a simple type for analysis. Combining the graph theory and algorithms to generate frequent patterns, a new algorithm called Topology, which can mine these graphs efficiently, has been proposed.The performance of the algorithm is evaluated by doing experiments with synthetic datasets and real data. The experimental results show that Topology can do the job well. At the end of this paper, the potential improvement is mentioned. 相似文献

2.

Efficient Computation of k-Medians over Data Streams Under Memory Constraints

下载免费PDF全文

Zhi-Hong Chong Jeffrey Xu Yu Zhen-Jie Zhang Xue-Min Lin Wei Wang and Ao-Ying Zhou 《计算机科学技术学报》2006,21(2):284-296

In this paper, we study the problem of efficiently computing k-medians over high-dimensional and high speed data streams. The focus of this paper is on the issue of minimizing CPU time to handle high speed data streams on top of the requirements of high accuracy and small memory. Our work is motivated by the following observation： the existing algorithms have similar approximation behaviors in practice, even though they make noticeably different worst case theoretical guarantees. The underlying reason is that in order to achieve high approximation level with the smallest possible memory, they need rather complex techniques to maintain a sketch, along time dimension, by using some existing off-line clustering algorithms. Those clustering algorithms cannot guarantee the optimal clustering result over data segments in a data stream but accumulate errors over segments, which makes most algorithms behave the same in terms of approximation level, in practice. We propose a new grid-based approach which divides the entire data set into cells （not along time dimension）. We can achieve high approximation level based on a novel concept called （1 - ε）-dominant. We further extend the method to the data stream context, by leveraging a density-based heuristic and frequent item mining techniques over data streams. We only need to apply an existing clustering once to computing k-medians, on demand, which reduces CPU time significantly. We conducted extensive experimental studies, and show that our approaches outperform other well-known approaches. 相似文献

3.

Finding centric local outliers in categorical/numerical spaces 总被引：2，自引：0，他引：2

Jeffrey Xu Yu Weining Qian Hongjun Lu Aoying Zhou 《Knowledge and Information Systems》2006,9(3):309-338

Outlier detection techniques are widely used in many applications such as credit-card fraud detection, monitoring criminal activities in electronic commerce, etc. These applications attempt to identify outliers as noises, exceptions, or objects around the border. The existing density-based local outlier detection assigns the degree to which an object is an outlier in a numerical space. In this paper, we propose a novel mutual-reinforcement-based local outlier detection approach. Instead of detecting local outliers as noise, we attempt to identify local outliers in the center, where they are similar to some clusters of objects on one hand, and are unique on the other. Our technique can be used for bank investment to identify a unique body, similar to many good competitors, in which to invest. We attempt to detect local outliers in categorical, ordinal as well as numerical data. In categorical data, the challenge is that there are many similar but different ways to specify relationships among the data items. Our mutual-reinforcement-based approach is stable, with similar but different user-defined relationships. Our technique can reduce the burden for users to determine the relationships among data items, and find the explanations why the outliers are found. We conducted extensive experimental studies using real datasets. Jeffrey Xu Yu received his B.E., M.E. and Ph.D. in computer science, from the University of Tsukuba, Japan, in 1985, 1987 and 1990, respectively. Jeffrey Xu Yu was a research fellow in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1990–Mar. 1991), and held teaching positions in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1991–July 1992) and in the Department of Computer Science, Australian National University (July 1992–June 2000). Currently he is an Associate Professor in the Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong. His major research interests include data mining, data stream mining/processing, XML query processing and optimization, data warehouse, on-line analytical processing, and design and implementation of database management systems. Weining Qian is currently an assistant professor of computer science at Fudan University, Shanghai, China. He received his M.S. and Ph.D. degrees in computer science from Fudan University in 2001 and 2004, respectively. He was supported by a Microsoft Research Fellowship when he was doing the research presented in this paper, and he is supported by the Shanghai Rising Star Program. His research interests include data mining for very large databases, data stream query processing and mining and peer-to-peer computing. Hongjun Lu received his B.Sc. from Tsinghua University, China, and M.Sc. and Ph.D. from the Department of Computer Science, University of Wisconsin–Madison. He worked as an engineer in the Chinese Academy of Space Technology, and a principal research scientist in the Computer Science Center of Honeywell Inc., Minnesota, USA (1985–1987), and a professor at the School of Computing of the National University of Singapore (1987–2000), and is a full professor of the Hong Kong University of Science and Technology. His research interests are in data/knowledge-base management systems with an emphasis on query processing and optimization, physical database design, and database performance. Hongjun Lu is currently a trustee of the VLDB Endowment, an associate editor of the IEEE Transactions on Knowledge and Data Engineering (TKDE), and a member of the review board of the Journal of Database Management. He served as a member of the ACM SIGMOD Advisory Board in 1998–2002. Aoying Zhou born in 1965, is currently a professor of computer science at Fudan University, Shanghai, China. He won his Bachelor degree and Master degree in Computer Science from Sichuan University in Chengdu, Sichuan, China in 1985 and 1988. respectively, and a Ph.D. degree from Fudan University in 1993. He has served as a member or chair of the program committees for many international conferences such as VLDB, ER, DASFAA, WAIM, and etc. His papers have been published in ACM SIGMOD, VLDB, ICDE and some international journals. His research interests include data mining and knowledge discovery, XML data management, web query and searching, data stream analysis and processing and peer-to-peer computing. 相似文献

4.

A New Classification Method to Overcome Over—Branching

下载免费PDF全文

周傲英钱卫宁钱海蕾金文《计算机科学技术学报》2002,17(1):18-27

Classification is an important technique in data mining.The decision trees builty by most of the existing classification algorithms commonly feature over-branching,which will lead to poor efficiency in the subsequent classification period.In this paper,we present a new value-oriented classification method,which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible,based on the concepts of frequent-pattern-node and exceptive-child-node.The experiments show that while using relevant anal-ysis as pre-processing ,our classification method,without loss of accuracy,can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do. 相似文献

5.

ARMiner: A Data Mining Tool Based on Association Rules 总被引：3，自引：0，他引：3

下载免费PDF全文

周皓峰朱建秋朱扬勇施伯乐《计算机科学技术学报》2002,17(5):0-0

In this paper,ARM iner,a data mining tool based on association rules,is introduced.Beginning with the system architecture,the characteristics and functions are discussed in details,including data transfer,concept hierarchy generalization,mining rules with negative items and the re-development of the system.An example of the tool‘s application is also shown.Finally,Some issues for future research are presented. 相似文献

6.

Privacy preservation for data cubes

Sam Y. Sung Yao Liu Hui Xiong Peter A. Ng 《Knowledge and Information Systems》2006,9(1):38-61

A range query finds the aggregated values over all selected cells of an online analytical processing (OLAP) data cube where the selection is specified by the ranges of contiguous values for each dimension. An important issue in reality is how to preserve the confidential information in individual data cells while still providing an accurate estimation of the original aggregated values for range queries. In this paper, we propose an effective solution, called the zero-sum method, to this problem. We derive theoretical formulas to analyse the performance of our method. Empirical experiments are also carried out by using analytical processing benchmark (APB) dataset from the OLAP Council. Various parameters, such as the privacy factor and the accuracy factor, have been considered and tested in the experiments. Finally, our experimental results show that there is a trade-off between privacy preservation and range query accuracy, and the zero-sum method has fulfilled three design goals: security, accuracy, and accessibility. Sam Y. Sung is an Associate Professor in the Department of Computer Science, School of Computing, National University of Singapore. He received a B.Sc. from the National Taiwan University in 1973, the M.Sc. and Ph.D. in computer science from the University of Minnesota in 1977 and 1983, respectively. He was with the University of Oklahoma and University of Memphis in the United States before joining the National University of Singapore. His research interests include information retrieval, data mining, pictorial databases and mobile computing. He has published more than 80 papers in various conferences and journals, including IEEE Transaction on Software Engineering, IEEE Transaction on Knowledge & Data Engineering, etc. Yao Liu received the B.E. degree in computer science and technology from Peking University in 1996 and the MS. degree from the Software Institute of the Chinese Science Academy in 1999. Currently, she is a Ph.D. candidate in the Department of Computer Science at the National University of Singapore. Her research interests include data warehousing, database security, data mining and high-speed networking. Hui Xiong received the B.E. degree in Automation from the University of Science and Technology of China, Hefei, China, in 1995, the M.S. degree in Computer Science from the National University of Singapore, Singapore, in 2000, and the Ph.D. degree in Computer Science from the University of Minnesota, Minneapolis, MN, USA, in 2005. He is currently an Assistant Professor of Computer Information Systems in the Management Science & Information Systems Department at Rutgers University, NJ, USA. His research interests include data mining, databases, and statistical computing with applications in bioinformatics, database security, and self-managing systems. He is a member of the IEEE Computer Society and the ACM. Peter A. Ng is currently the Chairperson and Professor of Computer Science at the University of Texas—Pan American. He received his Ph.D. from the University of Texas–Austin in 1974. Previously, he had served as the Vice President at the Fudan International Institute for Information Science and Technology, Shanghai, China, from 1999 to 2002, and the Executive Director for the Global e-Learning Project at the University of Nebraska at Omaha, 2000–2003. He was appointed as an Advisory Professor of Computer Science at Fudan University, Shanghai, China in 1999. His recent research focuses on document and information-based processing, retrieval and management. He has published many journal and conference articles in this area. He had served as the Editor-in-Chief for the Journal on Systems Integration (1991–2001) and as Advisory Editor for the Data and Knowledge Engineering Journal since 1989. 相似文献

7.

Privacy-preserving SVM classification 总被引：2，自引：2，他引：0

Jaideep Vaidya Hwanjo Yu Xiaoqian Jiang 《Knowledge and Information Systems》2008,14(2):161-178

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges. Jaideep Vaidya received the Bachelor’s degree in Computer Engineering from the University of Mumbai. He received the Master’s and the Ph.D. degrees in Computer Science from Purdue University. He is an Assistant Professor in the Management Science and Information Systems Department at Rutgers University. His research interests include data mining and analysis, information security, and privacy. He has received best paper awards for papers in ICDE and SIDKDD. He is a Member of the IEEE Computer Society and the ACM. Hwanjo Yu received the Ph.D. degree in Computer Science in 2004 from the University of Illinois at Urbana-Champaign. He is an Assistant Professor in the Department of Computer Science at the University of Iowa. His research interests include data mining, machine learning, database, and information systems. He is an Associate Editor of Neurocomputing and served on the NSF Panel in 2006. He has served on the program committees of 2005 ACM SAC on Data Mining track, 2005 and 2006 IEEE ICDM, 2006 ACM CIKM, and 2006 SIAM Data Mining. Xiaoqian Jiang received the B.S. degree in Computer Science from Shanghai Maritime University, Shanghai, 2003. He received the M.C.S. degree in Computer Science from the University of Iowa, Iowa City, 2005. Currently, he is pursuing a Ph.D. degree from the School of Computer Science, Carnegie Mellon University. His research interests are computer vision, machine learning, data mining, and privacy protection technologies. 相似文献

8.

Recent Progress on Selected Topics in Database Research ——-A Report by Nine Young Chinese Researchers Working in the United States

下载免费PDF全文

ZhiyuanChen ChenLi JianPei YufeiTao HaixunWang WeiWang JiongYang JunYang DonghuiZhang 《计算机科学技术学报》2003,18(5):0-0

The study on database technologies, or more generally, the technologies of data and information management, is an important and active research field. Recently, many exciting results have been reported. In this fast growing field, Chinese researchers play more and more active roles. Research papers from Chinese scholars, both in China and abroad,appear in prestigious academic forums.In this paper,we, nine young Chinese researchers working in the United States, present concise surveys and report our recent progress on the selected fields that we are working on.Although the paper covers only a small number of topics and the selection of the topics is far from balanced, we hope that such an effort would attract more and more researchers,especially those in China,to enter the frontiers of database research and promote collaborations. For the obvious reason, the authors are listed alphabetically, while the sections are arranged in the order of the author list. 相似文献

9.

Effective Discovery of Exception Class Association Rules

下载免费PDF全文

周傲英魏藜俞舫《计算机科学技术学报》2002,17(3):0-0

In this paper,a new effective method is proposed to find class association rules (CAR),to get useful class associaiton rules（UCAR）by removing the spurious class association rules (SCAR),and to generate exception class associaiton rules(ECAR)for each UCAR.CAR mining,which integrates the techniques of classification and association,is of great interest recently.However,it has two drawbacks:one is that a large part of CARs are spurious and maybe misleading to users ;the other is that some important ECARs are diffcult to find using traditional data mining techniques .The method introduced in this paper aims to get over these flaws.According to our approach,a user can retrieve correct information from UCARs and konw the influence from different conditions by checking corresponding ECARs.Experimental results demonstrate the effectiveness of our proposed approach. 相似文献

10.

Binary-coding-based ant colony optimization and its convergence

下载免费PDF全文

Tian-Ming?Bu Email author Song-Nian?Yu Hui-Wei?Guan 《计算机科学技术学报》2004,19(4):0-0

Ant colony optimization (ACO for short) is a meta-heuristics for hard combinatorial optimization problems. It is a population-based approach that uses exploitation of positive feedback as well as greedy search. In this paper, genetic algorithm's (GA for short) ideas are introduced into ACO to present a new binary-coding based ant colony optimization. Compared with the typical ACO, the algorithm is intended to replace the problem's parameter-space with coding-space, which links ACO with GA so that the fruits of GA can be applied to ACO directly. Furthermore, it can not only solve general combinatorial optimization problems, but also other problems such as function optimization. Based on the algorithm, it is proved that if the pheromone remainder factor ρ is under the condition of ρ≥1, the algorithm can promise to converge at the optimal, whereas if 0<ρ<1, it does not. This work is supported by the Science Foundation of Shanghai Municipal Commission of Science and Technology under Grant No.00JC14052. Tian-Ming Bu received the M.S. degree in computer software and theory from Shanghai University, China, in 2003. And now he is a Ph.D. candidate of Fudan University in the same area of theory computer science. His research interests include algorithms, especially, heuristic algorithms and heuristic algorithms and parallel algorithms, quantum computing and computational complexity. Song-Nian Yu received the B.S. degree in mathematics from Xi'an University of Science and Technology, Xi'an, China, in 1981, the Ph.D. degree under Prof. L. Lovasz's guidance and from Lorand University, Budapest, Hungary, in 1990. Dr. Yu is a professor in the School of Computer Engineering and Science at Shanghai University. He was a visiting professor as a faculty member in Department of Computer Science at Nelson College of Engineering, West Virginia University, from 1998 to 1999. His current research interests include parallel algorithms' design and analyses, graph theory, combinatorial optimization, wavelet analyses, and grid computing. Hui-Wei Guan received the B.S. degree in electronic engineering from Shanghai University, China, in 1982, the M.S. degree in computer engineering from China Textile University, China, in 1989, and the Ph.D. degree in computer science and engineering from Shanghai Jiaotong University, China, in 1993. He is an associate professor in the Department of Computer Science at North Shore Community College, USA. He is a member of IEEE. His current research interests are parallel and distributed computing, high performance computing, distributed database, massively parallel processing system, and intelligent control. 相似文献