首页 | 本学科首页   官方微博 | 高级检索  
     

基于朴素贝叶斯模型的中文关键词提取算法研究
引用本文:程岚岚,何丕廉,孙越恒.基于朴素贝叶斯模型的中文关键词提取算法研究[J].计算机应用,2005,25(12):2780-2782.
作者姓名:程岚岚  何丕廉  孙越恒
作者单位:天津大学,计算机科学与技术系,天津,300072;天津大学,计算机科学与技术系,天津,300072;天津大学,计算机科学与技术系,天津,300072
基金项目:天津市科技发展计划资助项目(04310941R);天津市应用基础研究计划资助项目(05YFJMJC11700)
摘    要:提出了一种基于朴素贝叶斯模型的中文关键词提取算法。该算法首先通过训练过程获得朴素贝叶斯模型中的各个参数,然后以之为基础,在测试过程完成关键词提取。实验表明,相对于传统的if*idf方法,该算法可从小规模的文档集中提取出更为准确的关键词,而且可灵活地增加表征词语重要性的特征项,因而具有更好的可扩展性。

关 键 词:关键词提取  朴素贝叶斯模型  特征项
文章编号:1001-9081(2005)12-2780-03
收稿时间:2005-06-19
修稿时间:2005-06-19

Study on Chinese keyword extraction algorithm based on Na(l)ve Bayes model
CHENG Lan-lan,HE Pi-lian,SUN Yue-heng.Study on Chinese keyword extraction algorithm based on Na(l)ve Bayes model[J].journal of Computer Applications,2005,25(12):2780-2782.
Authors:CHENG Lan-lan  HE Pi-lian  SUN Yue-heng
Affiliation:Department of Computer Science and Technology,Tianjin University,Tianjin 300072,China
Abstract:A keyword extraction algorithm for Chinese documents based on Na ve Bayes model was proposed,which involved training and testing process.Parameters of the model were first obtained during training process,and then the probability of a word to be a keyword was computed based on the model during testing process.Experiment results show that the algorithm can extract more accurate keywords from a small scale document collection compared with traditional approach of if*idf.Moreover,it can flexibly extend feature items that indicate the importance of words,so it has a good scalability.
Keywords:keyword extraction  Naive Bayes model  feature items
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号