首页 | 本学科首页   官方微博 | 高级检索  
     


Chinese text semantic representation for text classification
Authors:SONG Shengli  WANG Shaolong  CHEN Ping
Affiliation:(Research Inst. of Software Engineering, Xidian Univ., Xi'an  710071, China)
Abstract:Text representation based on word frequency statistics is often unsatisfactory because it ignores the semantic relationships between words, and considers them as independent features. In this paper, a new Chinese text semantic representation model is proposed by considering contextual semantic and background information on the words in the text. The method captures the semantic relationships between words using Wikipedia as a knowledge base. Words with strong semantic relationships are combined into a word-package as indicated by a graph node, which is weighted with the sum of the number and frequency of the words it contains. The contextual relationship between words in different word-packages is stated by a directed edge, which is weighted with the maximum weight of its adjacent nodes. The model retains the contextual information on each word with a large extent. Meanwhile, the semantic meaning between words is strengthened. Experimental results of Chinese text classification show that the proposed model can express the content of a text accurately and improve the performance of text classification. Compared to Support Vector Machines, Text Semantic Graph-based Classification can improve the efficiency by 7.8%, reduce the error rate by 1/3, and show more stability.
Keywords:classification   knowledge representation   similarity   text semantic graph  
点击此处可从《西安电子科技大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《西安电子科技大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号