首页 | 本学科首页   官方微博 | 高级检索  
     


The aspect Bernoulli model: multiple causes of presences and absences
Authors:Ella Bingham  Ata Kabán  Mikael Fortelius
Affiliation:(1) Helsinki Institute for Information Technology, University of Helsinki and Helsinki University of Technology, P.O. Box 68, 00014 Helsinki, Finland;(2) School of Computer Science, University of Birmingham, Birmingham, UK;(3) Division of Palaeontology, University of Helsinki, Helsinki, Finland
Abstract:We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between “true absences” and “false absences” (both of which are coded as 0 in the data), and similarly, between “true presences” and “false presences” (both of which are coded as 1). This is accomplished by specific additive noise components which explicitly account for such non-content bearing causes. The AB model is thus suitable for noise removal and data explanatory purposes, including omission/addition detection. An important application of AB that we demonstrate is data-driven reasoning about palaeontological recordings. Additionally, results on recovering corrupted handwritten digit images and expanding short text documents are also given, and comparisons to other methods are demonstrated and discussed.
Contact Information Mikael ForteliusEmail:

Ella Bingham   received her M.Sc. degree in Engineering Physics and Mathematics at Helsinki University of Technology in 1998, and her Dr.Sc. degree in Computer Science at Helsinki University of Technology in 2003. She is currently at Helsinki Institute for Information Technology, located at the University of Helsinki. Her research interests include statistical data analysis and machine learning. MediaObjects/10044_2007_96_Figa_HTML.jpg Ata Kabán   is a lecturer in the School of Computer Science of the University of Birmingham, since 2003. She holds a B.Sc. degree in computer science (1999) from the University “Babes-Bolya” of Cluj-Napoca, Romania, and a Ph.D. in computer science (2001) from the University of Paisley, UK. Her current research interests concern statistical machine learning and data mining. Prior to her career in computer science, she obtained a B.A. degree in musical composition (1994) and the M.A. (1995) and Ph.D. (1999) degrees in musicology from the Music Academy “Gh. Dima” of Cluj-Napoca, Romania. MediaObjects/10044_2007_96_Figb_HTML.jpg Mikael Fortelius   is a palaeontologist with special interest in plant-eating mammals of the Cenozoic, especially ungulates and their relationship with habitat and climate change (the Ungulate Condition). Mikael is Professor of Evolutionary Palaeontology in the Department of Geology and Group Leader in the Institute of Biotechnology (BI), University of Helsinki. Since 1992, he has been engaged in developing a database of Neogene Old World Mammals (). The NOW database is maintained at the Finnish Museum of Natural History and developed in collaboration with an extensive Advisory Board; data access and downloading are entirely public. MediaObjects/10044_2007_96_Figc_HTML.jpg
Keywords:Data mining  Probabilistic latent variable models  Multiple cause models  0–  1 data
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号