一种基于词聚类的中文文本主题抽取方法 Novel chinese text subject extraction method based on word clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于词聚类的中文文本主题抽取方法

引用本文：	陈炯,张永奎.一种基于词聚类的中文文本主题抽取方法[J].计算机应用,2005,25(4):754-756.

作者姓名：	陈炯张永奎

作者单位：	山西大学,计算机与信息技术学院,山西,太原,030006;山西综合职业技术学院,电子分院,山西,太原,030006;山西大学,计算机与信息技术学院,山西,太原,030006

基金项目：	国家自然科学基金(60475022)，山西省自然科学基金(20041041)，山西省回国留学人员基金(2002004)

摘要：	提出了一种基于词聚类的中文文本主题抽取方法，该方法利用相关度对词的共现进行分析，建立词之间的语义关联，并生成代表某一主题概念的用种子词表示的词类。对于给定文档，先进行特征词抽取，再借助词类生成该文档的主题因子，最后按权重输出主题因子，作为文本的主题。实验结果表明，该方法具有较高的抽准率。
关键词：	主题抽取词聚类种子词主题因子信息论词同现 CHI统计
文章编号：	1001-9081(2005)04-0754-03
Novel chinese text subject extraction method based on word clustering

CHEN Jiong,ZHANG Yong-kui.Novel chinese text subject extraction method based on word clustering[J].journal of Computer Applications,2005,25(4):754-756.

Authors:	CHEN Jiong ZHANG Yong-kui

Affiliation:	CHEN Jiong 1,2,ZHANG Yong-kui 1

Abstract:	A novel chinese text subject extraction method based on word clustering was presented. This method analysed the co-occurrence of words by using relativity calculation to create semantic relativity and generated a word cluster which represents a subject conception and is presented by seed words. To a given text, its features were extracted firstly. Then its subject genes was producted by means of word cluster. At last,the top subject genes were sorted in descending order of weights and selected as the subject. The experimental results indicate that the method has higher precision.

Keywords:	subject extraction word clustering seed words subject gene information theory word co-occurrence CHI statistics
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏