The aspect Bernoulli model: multiple causes of presences and absences |
| |
Authors: | Ella Bingham Ata Kabán Mikael Fortelius |
| |
Affiliation: | (1) Helsinki Institute for Information Technology, University of Helsinki and Helsinki University of Technology, P.O. Box 68, 00014 Helsinki, Finland;(2) School of Computer Science, University of Birmingham, Birmingham, UK;(3) Division of Palaeontology, University of Helsinki, Helsinki, Finland |
| |
Abstract: | We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect
Bernoulli (AB) model is its ability to automatically detect and distinguish between “true absences” and “false absences” (both
of which are coded as 0 in the data), and similarly, between “true presences” and “false presences” (both of which are coded
as 1). This is accomplished by specific additive noise components which explicitly account for such non-content bearing causes.
The AB model is thus suitable for noise removal and data explanatory purposes, including omission/addition detection. An important
application of AB that we demonstrate is data-driven reasoning about palaeontological recordings. Additionally, results on
recovering corrupted handwritten digit images and expanding short text documents are also given, and comparisons to other
methods are demonstrated and discussed.
Ella Bingham
received her M.Sc. degree in Engineering Physics and Mathematics at Helsinki University of Technology in 1998, and her Dr.Sc.
degree in Computer Science at Helsinki University of Technology in 2003. She is currently at Helsinki Institute for Information
Technology, located at the University of Helsinki. Her research interests include statistical data analysis and machine learning.
Ata Kabán
is a lecturer in the School of Computer Science of the University of Birmingham, since 2003. She holds a B.Sc. degree in
computer science (1999) from the University “Babes-Bolya” of Cluj-Napoca, Romania, and a Ph.D. in computer science (2001)
from the University of Paisley, UK. Her current research interests concern statistical machine learning and data mining. Prior
to her career in computer science, she obtained a B.A. degree in musical composition (1994) and the M.A. (1995) and Ph.D.
(1999) degrees in musicology from the Music Academy “Gh. Dima” of Cluj-Napoca, Romania.
Mikael Fortelius
is a palaeontologist with special interest in plant-eating mammals of the Cenozoic, especially ungulates and their relationship
with habitat and climate change (the Ungulate Condition). Mikael is Professor of Evolutionary Palaeontology in the Department
of Geology and Group Leader in the Institute of Biotechnology (BI), University of Helsinki. Since 1992, he has been engaged
in developing a database of Neogene Old World Mammals (). The NOW database is maintained at the Finnish Museum of Natural History and developed in collaboration with an extensive
Advisory Board; data access and downloading are entirely public.
![MediaObjects/10044_2007_96_Figc_HTML.jpg](/content/r17741l4882q4742/MediaObjects/10044_2007_96_Figc_HTML.jpg) |
| |
Keywords: | Data mining Probabilistic latent variable models Multiple cause models 0– 1 data |
本文献已被 SpringerLink 等数据库收录! |
|