Chinese text semantic representation for text classification |
| |
Authors: | SONG Shengli WANG Shaolong CHEN Ping |
| |
Affiliation: | (Research Inst. of Software Engineering, Xidian Univ., Xi'an 710071, China) |
| |
Abstract: | Text representation based on word frequency statistics is often unsatisfactory because it ignores the semantic relationships between words, and considers them as independent features. In this paper, a new Chinese text semantic representation model is proposed by considering contextual semantic and background information on the words in the text. The method captures the semantic relationships between words using Wikipedia as a knowledge base. Words with strong semantic relationships are combined into a word-package as indicated by a graph node, which is weighted with the sum of the number and frequency of the words it contains. The contextual relationship between words in different word-packages is stated by a directed edge, which is weighted with the maximum weight of its adjacent nodes. The model retains the contextual information on each word with a large extent. Meanwhile, the semantic meaning between words is strengthened. Experimental results of Chinese text classification show that the proposed model can express the content of a text accurately and improve the performance of text classification. Compared to Support Vector Machines, Text Semantic Graph-based Classification can improve the efficiency by 7.8%, reduce the error rate by 1/3, and show more stability. |
| |
Keywords: | classification knowledge representation similarity text semantic graph |
|
| 点击此处可从《西安电子科技大学学报(自然科学版)》浏览原始摘要信息 |
|
点击此处可从《西安电子科技大学学报(自然科学版)》下载全文 |