Mining the Semantic Web |
| |
Authors: | Achim Rettinger Uta Lösch Volker Tresp Claudia d’Amato Nicola Fanizzi |
| |
Affiliation: | 1.Institute AIFB, Karlsruhe Institute of Technology,Karlsruhe,Germany;2.Siemens Corporate Technologies,Munich,Germany;3.Dipartimento di Informatica,Università degli Studi di Bari “Aldo Moro”,Bari,Italy |
| |
Abstract: | In the Semantic Web vision of the World Wide Web, content will not only be accessible to humans but will also be available
in machine interpretable form as ontological knowledge bases. Ontological knowledge bases enable formal querying and reasoning
and, consequently, a main research focus has been the investigation of how deductive reasoning can be utilized in ontological
representations to enable more advanced applications. However, purely logic methods have not yet proven to be very effective
for several reasons: First, there still is the unsolved problem of scalability of reasoning to Web scale. Second, logical
reasoning has problems with uncertain information, which is abundant on Semantic Web data due to its distributed and heterogeneous
nature. Third, the construction of ontological knowledge bases suitable for advanced reasoning techniques is complex, which
ultimately results in a lack of such expressive real-world data sets with large amounts of instance data. From another perspective,
the more expressive structured representations open up new opportunities for data mining, knowledge extraction and machine
learning techniques. If moving towards the idea that part of the knowledge already lies in the data, inductive methods appear
promising, in particular since inductive methods can inherently handle noisy, inconsistent, uncertain and missing data. While
there has been broad coverage of inducing concept structures from less structured sources (text, Web pages), like in ontology
learning, given the problems mentioned above, we focus on new methods for dealing with Semantic Web knowledge bases, relying
on statistical inference on their standard representations. We argue that machine learning research has to offer a wide variety
of methods applicable to different expressivity levels of Semantic Web knowledge bases: ranging from weakly expressive but
widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches
to mining the Semantic Web. We specifically cover similarity and distance-based methods, kernel machines, multivariate prediction
models, relational graphical models and first-order probabilistic learning approaches and discuss their applicability to Semantic
Web representations. Finally we present selected experiments which were conducted on Semantic Web mining tasks for some of
the algorithms presented before. This is intended to show the breadth and general potential of this exiting new research and
application area for data mining. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|