首页 | 本学科首页   官方微博 | 高级检索  
     

基于领域词典的文本特征表示
引用本文:陈文亮,朱靖波,朱慕华,姚天顺. 基于领域词典的文本特征表示[J]. 计算机研究与发展, 2005, 42(12): 2155-2160
作者姓名:陈文亮  朱靖波  朱慕华  姚天顺
作者单位:东北大学自然语言处理实验室,沈阳,110004;东北大学自然语言处理实验室,沈阳,110004;东北大学自然语言处理实验室,沈阳,110004;东北大学自然语言处理实验室,沈阳,110004
基金项目:国家自然科学基金和微软亚洲研究院联合资助项目(60203019);国家自然科学基金项目(60473140);国家教育部科学技术研究重点项目(104065).
摘    要:为提高文本分类性能,提出一种结合机器学习和领域词典的文本特征表示方法.基于领域词典的文本特征表示方法可以增强文本特征表示能力。并降低文本特征空间维数,但是领域词典存在覆盖度不足的问题.为此,提出一种学习模型——自划分模型——来解决这个覆盖度不足的问题.实验结果表明,采用基于自划分模型的领域特征属性作为文本特征。可以提高文本分类性能,特别是特征数目少的情况下,该方法表现出很好的分类效果.相对于传统词文本特征方法。在特征数为500时分类的F1值提高6.58%.

关 键 词:文本分类  知识获取  领域知识  文本表示
收稿时间:2004-06-29
修稿时间:2004-06-292004-10-29

Text Representation Using Domain Dictionary
Chen Wenliang,Zhu Jingbo,Zhu Muhua,Yao Tianshun. Text Representation Using Domain Dictionary[J]. Journal of Computer Research and Development, 2005, 42(12): 2155-2160
Authors:Chen Wenliang  Zhu Jingbo  Zhu Muhua  Yao Tianshun
Affiliation:Natural Language Processing Laboratory, Northeastern University, Shenyang 110004
Abstract:In this paper an approach to improving the performance of text categorization is presented by using machine learning technique and domain-dictionary. Domain-dictionary based text representation can enhance the ability of text feature expression and reduce the feature dimensionality. But the size of domain dictionary is limited; some words are not included in domain dictionary, so a machine learning technique named self-partition model is proposed to resolve it. The proposed model can automatically map the words to domain features. Then a text categorization system is developed that uses these learned domain features as text features. The experimental results show that the proposed approach can improve the performance of text categorization. And it can provide high accuracy when the size of feature set is small. When the number of features is 500, it yields 6.58 % F1 over the system based on BOW.
Keywords:text categorization   knowledge acquisition   domain knowledge   text representation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号