首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于XML的非结构化数据转换方法
引用本文:杨晶,周双娥.一种基于XML的非结构化数据转换方法[J].计算机科学,2017,44(Z11):414-417.
作者姓名:杨晶  周双娥
作者单位:湖北大学计算机与信息工程学院 武汉430062,湖北大学计算机与信息工程学院 武汉430062
基金项目:本文受湖北省统计科研计划重点项目(HB131-32)资助
摘    要:XML作为半结构化的语言,因其能预先定义标记等优势被普遍应用于非结构化到结构化信息的转换中。利用POI技术把网络上繁杂的非结构化数据转化为XML半结构化数据,把半结构化数据转化为结构化数据,使用户能够简便地查询所需信息。通过实验对SAX,DOM的解析效率进行了对比,实验表明解析相同大小的XML文件,SAX比DOM效率更高,而且此种差距会随着XML文件的增大而逐渐增大。

关 键 词:大数据  非结构化数据  可扩展标记语言  文档解析技术

Method for Unstructured Data Transformation Based on XML Technology
YANG Jing and ZHOU Shuang-e.Method for Unstructured Data Transformation Based on XML Technology[J].Computer Science,2017,44(Z11):414-417.
Authors:YANG Jing and ZHOU Shuang-e
Affiliation:College of Computer and Information Engineering,Hubei University,Wuhan 430062,China and College of Computer and Information Engineering,Hubei University,Wuhan 430062,China
Abstract:XML,as a semi-structured language,is widely used in converting unstructured information to structured information because of its special characteristic of pre-defined mark.In this work,the complicated unstructured data on the network was converted to XML semi-structured data through POI technology,then the semi-structured data was converted to structured data by parsing XML file through SAX,which would provide convenience for users to search for information.In addition,those efficiencies of parsing of XML files though methods of SAX and DOM were compared in this work for the first time.It demonstrates that the parsing efficiency of SAX is higher than DOM when they are used to parse the same file,and this gap will increase with the size of XML file.
Keywords:Big data  Unstructured data  Extensible markup language  Document resolution technology
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号