基于XML的政府公文信息抽取中间件的设计与实现 Design and implementation of information extracting middleware for government archives based on XML technology期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于XML的政府公文信息抽取中间件的设计与实现

引用本文：	聂哲,顾明.基于XML的政府公文信息抽取中间件的设计与实现[J].计算机工程与设计,2007,28(5):1158-1160.

作者姓名：	聂哲顾明

作者单位：	深圳职业技术学院,软件工程系,广东,深圳,518055;深圳职业技术学院,软件工程系,广东,深圳,518055

基金项目：	深圳市科学与信息局科技基金

摘要：	超文本信息抽取是Internet信息重组的重要手段.通过对政府公文信息格式进行了研究,提出了一种基于XML的信息抽取中间件模型,通过基于串匹配与串频统计相结合的分词处理、利用遗传算法的词类标注以及基于改进的隐马尔科夫模型的XML模板自动填充,可以快速的对Internet上的政府公文信息进行信息重组,以供相关应用系统使用.
关键词：	政府公文信息抽取中间件分词处理词类标注模板填充
文章编号：	1000-7024（2007）05-1158-03
修稿时间：	2006-02-16
Design and implementation of information extracting middleware for government archives based on XML technology

NIE Zhe,GU Ming.Design and implementation of information extracting middleware for government archives based on XML technology[J].Computer Engineering and Design,2007,28(5):1158-1160.

Authors:	NIE Zhe GU Ming

Affiliation:	Department of Computer Sot, ware, ShenzhenPolytechnic, Shenzhen518055, China

Abstract:	Information extraction is one of the most important way to reorganize the HTML text.Through researching on the format of government archives.The information extracting middleware module based on XML technology is given,step by word segmenting based on string match and string frequency stat,POS tagging based on heredity arithmetic theory,XML Template filling based on HMM model,that can reorganize the government archives with XML formatting rapidly.

Keywords:	government archives information extract middleware Chinese word segmentation POS tagging template filling
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏