基于朴素贝叶斯模型的中文关键词提取算法研究 Study on Chinese keyword extraction algorithm based on Na(l)ve Bayes model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于朴素贝叶斯模型的中文关键词提取算法研究

引用本文：	程岚岚,何丕廉,孙越恒.基于朴素贝叶斯模型的中文关键词提取算法研究[J].计算机应用,2005,25(12):2780-2782.

作者姓名：	程岚岚何丕廉孙越恒

作者单位：	天津大学,计算机科学与技术系,天津,300072;天津大学,计算机科学与技术系,天津,300072;天津大学,计算机科学与技术系,天津,300072

基金项目：	天津市科技发展计划资助项目（04310941R）;天津市应用基础研究计划资助项目（05YFJMJC11700）

摘要：	提出了一种基于朴素贝叶斯模型的中文关键词提取算法。该算法首先通过训练过程获得朴素贝叶斯模型中的各个参数，然后以之为基础，在测试过程完成关键词提取。实验表明，相对于传统的if*idf方法，该算法可从小规模的文档集中提取出更为准确的关键词，而且可灵活地增加表征词语重要性的特征项，因而具有更好的可扩展性。
关键词：	关键词提取朴素贝叶斯模型特征项
文章编号：	1001-9081（2005）12-2780-03
收稿时间：	2005-06-19
修稿时间：	2005-06-19
Study on Chinese keyword extraction algorithm based on Na(l)ve Bayes model

CHENG Lan-lan,HE Pi-lian,SUN Yue-heng.Study on Chinese keyword extraction algorithm based on Na(l)ve Bayes model[J].journal of Computer Applications,2005,25(12):2780-2782.

Authors:	CHENG Lan-lan HE Pi-lian SUN Yue-heng

Affiliation:	Department of Computer Science and Technology,Tianjin University,Tianjin 300072,China

Abstract:	A keyword extraction algorithm for Chinese documents based on Na ve Bayes model was proposed,which involved training and testing process.Parameters of the model were first obtained during training process,and then the probability of a word to be a keyword was computed based on the model during testing process.Experiment results show that the algorithm can extract more accurate keywords from a small scale document collection compared with traditional approach of if*idf.Moreover,it can flexibly extend feature items that indicate the importance of words,so it has a good scalability.

Keywords:	keyword extraction Naive Bayes model feature items
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏