Opinion mining from noisy text data |
| |
Authors: | Lipika Dey Sk Mirajul Haque |
| |
Affiliation: | 1. Innovation Labs, Tata Consultancy Services, Phase 4, Udyog Vihar, Gurgaon, India
|
| |
Abstract: | The proliferation of Internet has not only led to the generation of huge volumes of unstructured information in the form of
web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks, etc. The data generated
from online communication acts as potential gold mines for discovering knowledge, particularly for market researchers. Text
analytics has matured and is being successfully employed to mine important information from unstructured text documents. The
chief bottleneck for designing text mining systems for handling blogs arise from the fact that online communication text data
are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation
and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and
consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. We have proposed a framework
in which these texts are first cleaned using domain knowledge and then subjected to mining. Ours is a semi-automated approach,
in which the system aids in the process of knowledge assimilation for knowledge-base building and also performs the analytics.
Domain experts ratify the knowledge base and also provide training samples for the system to automatically gather more instances
for ratification. The system identifies opinion expressions as phrases containing opinion words, opinionated features and
also opinion modifiers. These expressions are categorized as positive or negative with membership values varying from zero
to one. Opinion expressions are identified and categorized using localized linguistic techniques. Opinions can be aggregated
at any desired level of specificity i.e. feature level or product level, user level or site level, etc. We have developed
a system based on this approach, which provides the user with a platform to analyze opinion expressions crawled from a set
of pre-defined blogs. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|