首页 | 本学科首页   官方微博 | 高级检索  
     

ReDE:一个基于正则表达式的生物数据抽取方法
引用本文:邓绪斌,朱扬勇.ReDE:一个基于正则表达式的生物数据抽取方法[J].计算机研究与发展,2005,42(12):2184-2191.
作者姓名:邓绪斌  朱扬勇
作者单位:1. 浙江财经学院信息学院,杭州,310018
2. 复旦大学计算机与信息技术系,上海,200433;上海生物信息技术研究中心,上海,201203
基金项目:国家“八六三”高技术研究发展计划基金项目(2002AA231011);上海市重大科技基金项目(02DJ14013)
摘    要:从异构生物数据源抽取数据,建立查询分析平台是目前研究的热点,而抽取过程会涉及大量相互依赖的元数据.充分利用这种依赖关系可降低维护工作量.基于正则表达式(RE)提出了ReDE抽取方法:通过围绕RE组建立分析树,设计了基于RE的关系数据库模式生成算法和通用抽取与组装算法,其特点是:RE是惟一的元数据,易于管理和维护.该方法奠定了生物数据库辅助设计工具和高自动化抽取工具的基础,已用于构建国内第1个整合的生物信息在线数据仓库.

关 键 词:生物数据源  数据抽取  元数据  正则表达式  抽取算法
收稿时间:2004-06-14
修稿时间:2004-06-142005-08-01

ReDE: A Regular Expression-Based Method for Extracting Biological Data
Deng Xubin,Zhu Yangyong.ReDE: A Regular Expression-Based Method for Extracting Biological Data[J].Journal of Computer Research and Development,2005,42(12):2184-2191.
Authors:Deng Xubin  Zhu Yangyong
Abstract:Extracting data from heterogeneous biological data sources to build a query and analysis platform for biological scientists is currently a hot research topic. In general, data extraction process concerns many interdependent metadata. Making full use of dependencies among metadata to generate one metadata from another can reduce metadata maintenance overhead. However, many data extraction methods overlook these dependencies and require much effort to construct and maintain many metadata. In this paper, a regular expression (RE) based method named as ReDE is proposed to avoid this drawback: by building a parse tree for RE groups, an RE-based algorithm for generating relational database scheme and a general data extraction and assembling algorithm are designed. The novelty is that the RE is the only necessary metadata whose management and maintenance are relatively easy. This method can serve as the basis for building a biological database design-aiding tool and a high automatic tool for data extraction, and has been applied to extract data for the first online integrated biological data warehouse of China.
Keywords:biological data source  data extraction  metadata  regular expression  extraction algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号