首页 | 本学科首页   官方微博 | 高级检索  
     


Compression of nucleic acid and protein sequence data
Authors:J R Walker  P Willett
Affiliation:Department of Information Studies, University of Sheffield, Western Bank, UK.
Abstract:This paper describes the application of text compression methods to machine-readable files of nucleic acid and protein sequence data. Two main methods are used to reduce the storage requirements of such files, these being n-gram coding and run-length coding. A Pascal program combining both of these techniques resulted in a compression figure of 74.6% for the GenBank data-base and a program that used only n-gram coding gave a compression figure of 42.8% for the Protein Identification Resource database.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号