首页 | 本学科首页   官方微博 | 高级检索  
     

机器学习算法在药物专利分类中的应用研究
引用本文:梁静,徐亮,程文堂.机器学习算法在药物专利分类中的应用研究[J].计算机与应用化学,2007,24(10):1341-1344.
作者姓名:梁静  徐亮  程文堂
作者单位:大连理工大学化工学院化学结构信息处理实验室,大连理工大学化工学院化学结构信息处理实验室,大连理工大学化工学院化学结构信息处理实验室 288信箱辽宁,大连,116024,288信箱辽宁,大连,116024,288信箱辽宁,大连,116024
基金项目:国家自然科学基金 , 国家高技术研究发展计划(863计划)
摘    要:为实现药物专利的自动分类,本文结合药物专利的特点,研究了机器学习算法如何应用于药物专利分类。将2000余份药物专利按疗效分类,选取其中五类作为训练样本,对每一类提取特征文本,利用向量空间模型将非结构化的文本数字化,用支持向量机、Naive Bayes、RBFNetwork三种机器学习算法,分别测试专利样本的分类,使用5重交叉验证比较了三种算法的查准率(precision)和召回率(recall),结果表明支持向量机的分类效果最好。将机器学习算法应用于药物化学专利分类,有助于提高药物化学专利信息检索的效率。

关 键 词:药物专利  文本分类  支持向量机  信息检索
文章编号:1001-4160(2007)10-1341-1344
修稿时间:2007-02-28

Application of machine learning algorithms to pharmaceutical patent categorization
Liang Jing,Xu Liang,Cheng Wentang.Application of machine learning algorithms to pharmaceutical patent categorization[J].Computers and Applied Chemistry,2007,24(10):1341-1344.
Authors:Liang Jing  Xu Liang  Cheng Wentang
Affiliation:College of Chemical Engineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
Abstract:The applications of several machine learning methods to the categorization of pharmaceutical patents were presented in this paper. About 2000 pieces of pharmaceutical patents were categorized into five classes according to the curative effects and were selected as training instances. Features in text form were first extracted from each class and then were expressed in numerical vector form. Three machine learning algorithms, i. e. , SVM, Naive Bayes and RBFNetwork were evaluated using two most commonly used performance measures, precision and recall by a series of experiments. Results of five fold cross validation show that SVM outperforms than the other two algorithms. Methods proposed in this paper maybe helpful to the pharmaceutical patent categorization.
Keywords:pharmaceutical patent  text categorization  SVM  information retrieval
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号