首页 | 本学科首页   官方微博 | 高级检索  
     


A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering
Authors:Randel  Rodrigo  Aloise  Daniel  Blanchard  Simon J  Hertz  Alain
Affiliation:1.Département de Génie Informatique et Génie Logiciel, Polytechnique Montréal and GERAD, Montréal, QC, Canada
;2.McDonough School of Business, Georgetown University, Washington, DC, USA
;3.Département de Mathématiques et de Génie Industriel, Polytechnique Montréal and GERAD, Montréal, QC, Canada
;
Abstract:

Clustering algorithms help identify homogeneous subgroups from data. In some cases, additional information about the relationship among some subsets of the data exists. When using a semi-supervised clustering algorithm, an expert may provide additional information to constrain the solution based on that knowledge and, in doing so, guide the algorithm to a more useful and meaningful solution. Such additional information often takes the form of a cannot-link constraint (i.e., two data points cannot be part of the same cluster) or a must-link constraint (i.e., two data points must be part of the same cluster). A key challenge for users of such constraints in semi-supervised learning algorithms, however, is that the addition of inaccurate or conflicting constraints can decrease accuracy and little is known about how to detect whether expert-imposed constraints are likely incorrect. In the present work, we introduce a method to score each must-link and cannot-link pairwise constraint as likely incorrect. Using synthetic experimental examples and real data, we show that the resulting impact score can successfully identify individual constraints that should be removed or revised.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号