首页 | 本学科首页   官方微博 | 高级检索  
     

基于词典法和机器学习法相结合的蛋白质名识别
引用本文:李刚,郭崇慧,林鸿飞,杨志豪,唐焕文. 基于词典法和机器学习法相结合的蛋白质名识别[J]. 计算机与应用化学, 2006, 23(5): 395-398
作者姓名:李刚  郭崇慧  林鸿飞  杨志豪  唐焕文
作者单位:大连理工大学应用数学系,辽宁,大连,116024;大连理工大学计算机科学与工程系,辽宁,大连,116024
摘    要:生物实体名识别对生物医学文献的信息抽取有重要的意义。本文针对如何识别蛋白质名进行了有益的尝试,主要采用了基于词典的方法,其中运用了近似搭配算法和首词查询的方法进行蛋白质名识别,同时结合机器学习方法训练了一个分类器来过滤候选词以提高识别的准确率。

关 键 词:候选词  编辑距离  分类器
文章编号:1001-4160(2006)05-395-398
收稿时间:2005-11-10
修稿时间:2005-11-102006-02-25

Protein name recognition based on dictionary and machine learning method
Li Gang,Guo Chonghui,Lin Hongfei,Yang Zhihao,Tang Huanwen. Protein name recognition based on dictionary and machine learning method[J]. Computers and Applied Chemistry, 2006, 23(5): 395-398
Authors:Li Gang  Guo Chonghui  Lin Hongfei  Yang Zhihao  Tang Huanwen
Affiliation:1. Department of Applied Mathematics, Dalian University of Technology, Dalian, 116024, Liaoning, China; 2. Department of Computer Science and Engineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
Abstract:Identification of biomedical entities is one of important techniques to extract information from biomedical documents.This paper proposes an effective model based on dictionary to identify protein names.The approximate string searching method and first name searching are used to identify the candidate protein names,and a Naive Bayes classifier filtering the candidates is appliecl to im- prove the accuracy.
Keywords:candidates  edit-distance  classifier
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号