A novel framework for semantic entity identification and relationship integration in large scale text data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A novel framework for semantic entity identification and relationship integration in large scale text data

Affiliation:	1. Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai, China;2. Northwest University of China, Xi’an, China;1. National Laboratory of Radar Signal Processing, Xidian University, Xi''an, Shaanxi 710071, China;2. College of Information and Technology, Northwest University of China, Xi''an, Shaanxi 710127, China;3. The State Key Lab of CAD & CG, Zhejiang University, No.388 Yu Hang Tang Road, Hangzhou 310058, China;1. Xi?an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi?an 710119, China;2. Institute of Artificial Intelligence and Robotics, Xi?an Jiaotong University, Xi?an 710049, China;3. School of Information Science and Technology, Northwest University, Xi?an 710069, China

Abstract:	Semantic entities carry the most important semantics of text data. Therefore, the identification and the relationship integration of semantic entities are very important for applications requiring semantics of text data. However, current strategies are still facing many problems such as semantic entity identification, new word identification and relationship integration among semantic entities. To address these problems, a two-phase framework for semantic entity identification with relationship integration in large scale text data is proposed in this paper. In the first semantic entities identification phase, we propose a novel strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. Compared with traditional approaches, our strategy is more effective in detecting semantic entities and more sensitive to new entities that just appear in the fresh data. After extracting the semantic entities, the second phase of our framework is for the integration of Semantic Entities Relationships (SER) which can help to cluster the semantic entities. A novel classification method using features such as similarity measures and co-occurrence probabilities is applied to tackle the clustering problem and discover the relationships among semantic entities. Comprehensive experimental results have shown that our framework can beat state-of-the-art strategies in semantic entity identification and discover over 80% relationship pairs among related semantic entities in large scale text data.

Keywords:	Semantic entity identification New word identification Decision tree SVM Semantic Entities Relationships
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏