Record-level peculiarity-based data analysis and classifications |
| |
Authors: | Jian Yang Ning Zhong Yiyu Yao Jue Wang |
| |
Affiliation: | 1.International WIC Institute,Beijing University of Technology,Beijing,China;2.The Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation,Chinese Academy of Sciences,Beijing,China;3.Department of Life Science and Informatics,Maebashi Institute of Technology,Maebashi,Japan;4.Department of Computer Science,University of Regina,Regina,Canada |
| |
Abstract: | Peculiarity-oriented mining is a data mining method consisting of peculiar data identification and peculiar data analysis.
Peculiarity factor and local peculiarity factor are important concepts employed to describe the peculiarity of a data point
in the identification step. One can study the notions at both attribute and record levels. In this paper, a new record LPF
called distance-based record LPF (D-record LPF) is proposed, which is defined as the sum of distances between a point and
its nearest neighbors. The authors prove that D-record LPF can characterize the probability density of a continuous m-dimensional distribution accurately. This provides a theoretical basis for some existing distance-based anomaly detection
techniques. More importantly, it also provides an effective method for describing the class-conditional probabilities in a
Bayesian classifier. The result enables us to apply D-record LPF to solve classification problems. A novel algorithm called
LPF-Bayes classifier and its kernelized implementation are proposed, which have some connection to the Bayesian classifier.
Experimental results on several benchmark datasets demonstrate that the proposed classifiers are competitive to some excellent
classifiers such as AdaBoost, support vector machines and kernel Fisher discriminant. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|