首页 | 本学科首页   官方微博 | 高级检索  
     

基于词间关系分析的文本特征选择算法
引用本文:吴双,张文生,徐海瑞.基于词间关系分析的文本特征选择算法[J].计算机工程与科学,2012,34(6):140-145.
作者姓名:吴双  张文生  徐海瑞
作者单位:中国科学院自动化研究所,北京,100190
基金项目:国家自然科学基金资助项目
摘    要:传统的特征选择方法通常使用特征评价函数从原始词集中筛选出最具有类别区分能力的特征。这些方法是基于以独立的词作为语义单元的向量空间模型,忽略了词与词之间的关联关系,难以突出文本内容中的关键特征。针对传统特征选择方法的不足,本文提出一种新的基于词间关系的文本特征选择算法。该方法考虑对文本内容表示起到关键性作用的词,利用关联规则挖掘算法发现词语之间的关联关系,并且通过相关分析对强关联规则进行筛选,最终生成与类别属性密切相关的特征空间。实验结果表明,该方法更好地表示了文本的语义内容,而且分类效果优于传统算法。

关 键 词:词间关系  特征选择  关联规则  文本分类

A Text Feature Selection Algorithm Based on Analysing the Relationship Between Words
WU Shuang , ZHANG Wen-sheng , XU Hai-rui.A Text Feature Selection Algorithm Based on Analysing the Relationship Between Words[J].Computer Engineering & Science,2012,34(6):140-145.
Authors:WU Shuang  ZHANG Wen-sheng  XU Hai-rui
Affiliation:(Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
Abstract:The traditional feature selection algorithms usually select features distinguishing the different types of documents by the evaluation functions.However,these methods take the separate word as unit to establish a vector space model.The important words in the documents and the relationship between words are not realized.In allusion to the disadvantages mentioned above,a new feature selection algorithm based on the relationship between words is presented.This algorithm considers ,mines words’ association and checks these association rules by a correlation analysis to produce a feature space which closely relates to the category attributes.The experiment indicates that this method is better to express the semantic content of the documents and has a good categorization result.
Keywords:relationship between words  feature selection  association rule  text categorization
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号