首页 | 本学科首页   官方微博 | 高级检索  
     

生物医学文本分类方法比较研究
引用本文:倪茂树,赵晶,林鸿飞.生物医学文本分类方法比较研究[J].计算机工程与应用,2007,43(12):147-149,172.
作者姓名:倪茂树  赵晶  林鸿飞
作者单位:大连理工大学,计算机科学与工程系,辽宁,大连,116024;大连理工大学,计算机科学与工程系,辽宁,大连,116024;大连理工大学,计算机科学与工程系,辽宁,大连,116024
摘    要:文本分类技术对处理海量的生物医学文献起着重要的作用。TREC(The Text Retrieval Conference)2005 Genomics Track的测评结果显示,支持向量机(Surport Vector Machine,SVM)在生物医学文本分类问题上,比其他模型具有明显的优势。在TREC的测评语料上,使用简单向量距离分类法与SVM进行比较,同时讨论了使用命名实体识别的预处理对不同算法的影响。得出结论:简单向量距离分类法在该领域的效果与SVM不相上下,并且命名实体识别会使结果有一定提高。

关 键 词:文本分类  支持向量机  简单向量距离分类  命名实体识别
文章编号:1002-8331(2007)12-0147-03
修稿时间:2006-11

Comparison study on categorization algorithms for biomedical literatures
NI Mao-shu,ZHAO Jing,LIN Hong-fei.Comparison study on categorization algorithms for biomedical literatures[J].Computer Engineering and Applications,2007,43(12):147-149,172.
Authors:NI Mao-shu  ZHAO Jing  LIN Hong-fei
Affiliation:Department of Computer, Dalian University of Technology, Dalian, Liaoning 116024, China
Abstract:Automation text classification can greatly help people to analyze a mass of biomedical literature.The results of TREC2005 Genomics Track showed that Support Vector Machine has obvious advantages over other models.The paper compares the performance of classification based on distance of simple vectors with those based on SVM on the TREC data sets.The results show that classification based on distance of simple vectors are not worse than those based on SVM in this domain and the pre-process via a named entity recognition can improve the performance.
Keywords:automatic text classification  Support Vector Machine(SVM)  simply vector distance clustering  named entity recognition
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号