首页 | 本学科首页   官方微博 | 高级检索  
     


Data-dependent dissimilarity measure: an effective alternative to geometric distance measures
Authors:Sunil Aryal  Kai Ming Ting  Takashi Washio  Gholamreza Haffari
Affiliation:1.School of Engineering and Information Technology, Faculty of Science and Technology,Federation University,Mount Helen,Australia;2.Clayton School of Information Technology,Monash University,Clayton,Australia;3.School of Engineering and Information Technology,Federation University,Churchill,Australia;4.The Institute of Scientific and Industrial Research,Osaka University,Ibaraki,Japan
Abstract:Nearest neighbor search is a core process in many data mining algorithms. Finding reliable closest matches of a test instance is still a challenging task as the effectiveness of many general-purpose distance measures such as (ell _p)-norm decreases as the number of dimensions increases. Their performances vary significantly in different data distributions. This is mainly because they compute the distance between two instances solely based on their geometric positions in the feature space, and data distribution has no influence on the distance measure. This paper presents a simple data-dependent general-purpose dissimilarity measure called ‘(m_p)-dissimilarity’. Rather than relying on geometric distance, it measures the dissimilarity between two instances as a probability mass in a region that encloses the two instances in every dimension. It deems two instances in a sparse region to be more similar than two instances of equal inter-point geometric distance in a dense region. Our empirical results in k-NN classification and content-based multimedia information retrieval tasks show that the proposed (m_p)-dissimilarity measure produces better task-specific performance than existing widely used general-purpose distance measures such as (ell _p)-norm and cosine distance across a wide range of moderate- to high-dimensional data sets with continuous only, discrete only, and mixed attributes.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号