首页 | 本学科首页   官方微博 | 高级检索  
     


Data management and analysis for high-throughput DNA sequencingprojects
Authors:Kerlavage   A.R. FitzHugh   W. Gladek   A. Kelley   J. Scott   J. Shirley   R. Sutton   G. Man Wai-Chiu White   O. Adams   D.
Affiliation:Dept. of Bioinf., Inst. for Genomic Res., Gaithersburg, MD, USA;
Abstract:The rapid advances in molecular biology have begun to shift many of the bottlenecks in genome research from the laboratory to the data analysis facility. The pace at which this has occurred creates a situation in which software development always has to catch up with the flow of data. Since such large-scale processes were not anticipated, the analysis infrastructure has not been fully established. Furthermore, most systems that have been built were designed by the biologists who collected the data. More recently, computer scientists, mathematicians, and engineers have taken an interest in this problem. This has had a positive effect, since it has created a tight synergy between the informatics and the biology. Several principles affected the design of the system developed at TIGR. Each of the sample preparation, sequencing, and analysis steps had to be managed, scheduled, and tracked. This information had to be made readily available to those who needed it for carrying out their tasks. Different skill levels of the users had to be taken into account. The degree of human intervention at each step had to be evaluated and built into the design. A mixed processing environment of Macintosh and Unix platforms had to be integrated. Most importantly, the system had to save time, reduce error, and ensure uniformity of the analysis and quality of the results. In the authors' experience, the tools they have built work well because of their early decisions as to which systems to use for development. The authors settled on a robust relational database management system (Sybase) and a portable development environment (C, C++)
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号