基于散列辞典的蛋白质二级结构预测方法 A Protein Secondary Structure Prediction Method Based on Hash-Dictionary期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于散列辞典的蛋白质二级结构预测方法

引用本文：	南雨宏,陈绮.基于散列辞典的蛋白质二级结构预测方法[J].微机发展,2011(10):168-170,175.

作者姓名：	南雨宏陈绮

作者单位：	海南大学信息科学技术学院,海南海口570228

基金项目：	海南省自然科学基金资助项目（609003）

摘要：	提出一种易于修改的蛋白质二级结构预测算法。以蛋白质数据银行中PDB文本数据作为数据源，提取所有蛋白质氨基酸序列并以此建立样本数据库，然后针对α-螺旋、β-折叠分别利用基于散列辞典的不同改进方法编程实现蛋白质二级结构序列片段预测，在预测过程中，随机抽取68421个蛋白质中部分样本作为测试集，对未知序列根据建立的散列辞典中的片段使用正向最大匹配分词法进行切分对比。从实验结果来看，对未知序列片段预测的准确度达到了83．9％，而且能够较好地体现片段之间的连接顺序。
关键词：	蛋白质二级结构序列片段散列辞典 α-螺旋 β-折叠
A Protein Secondary Structure Prediction Method Based on Hash-Dictionary

NAN Yu-hong,CHEN Qi.A Protein Secondary Structure Prediction Method Based on Hash-Dictionary[J].Microcomputer Development,2011(10):168-170,175.

Authors:	NAN Yu-hong CHEN Qi

Affiliation:	( College of Information Science and technology, Hainan University, Haikou 570228, China)

Abstract:	This paper proposes a kind of easy to modify protein secondary structure prediction algorithm. Using PDB files from Protein Data Bank as a data source, extract all the protein amino acid sequences and build up a database, then for α-helix, β-sheet, use different improved methods based on hash dictionary to implements the fragments prediction of protein＇ s secondary structure. During the forecasting process, taking 68 421 samples as part of the protein in the test set. For unknown sequence according to the established the fragments of hash dictionary use positive maximal matching points for segmentation lexical contrast. The results shows the prediction of segment reached 83.9% accuracy ,but also to better reflect the sequence of amino acids connection.

Keywords:	protein secondary structure sequence fragments hash dictionaries α-helix β-sheet
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏