首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于词聚类的中文文本主题抽取方法
引用本文:陈炯,张永奎.一种基于词聚类的中文文本主题抽取方法[J].计算机应用,2005,25(4):754-756.
作者姓名:陈炯  张永奎
作者单位:山西大学,计算机与信息技术学院,山西,太原,030006;山西综合职业技术学院,电子分院,山西,太原,030006;山西大学,计算机与信息技术学院,山西,太原,030006
基金项目:国家自然科学基金(60475022),山西省自然科学基金(20041041),山西省回国留学人员基金(2002004)
摘    要:提出了一种基于词聚类的中文文本主题抽取方法,该方法利用相关度对词的共现进行分析,建立词之间的语义关联,并生成代表某一主题概念的用种子词表示的词类。对于给定文档,先进行特征词抽取,再借助词类生成该文档的主题因子,最后按权重输出主题因子,作为文本的主题。实验结果表明,该方法具有较高的抽准率。

关 键 词:主题抽取  词聚类  种子词  主题因子  信息论  词同现  CHI统计
文章编号:1001-9081(2005)04-0754-03

Novel chinese text subject extraction method based on word clustering
CHEN Jiong,ZHANG Yong-kui.Novel chinese text subject extraction method based on word clustering[J].journal of Computer Applications,2005,25(4):754-756.
Authors:CHEN Jiong  ZHANG Yong-kui
Affiliation:CHEN Jiong 1,2,ZHANG Yong-kui 1
Abstract:A novel chinese text subject extraction method based on word clustering was presented. This method analysed the co-occurrence of words by using relativity calculation to create semantic relativity and generated a word cluster which represents a subject conception and is presented by seed words. To a given text, its features were extracted firstly. Then its subject genes was producted by means of word cluster. At last,the top subject genes were sorted in descending order of weights and selected as the subject. The experimental results indicate that the method has higher precision.
Keywords:subject extraction  word clustering  seed words  subject gene  information theory  word co-occurrence  CHI statistics
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号