首页 | 本学科首页   官方微博 | 高级检索  
     

基于OEM模型的半结构化数据的模式发现
引用本文:吕橙,魏楚元,张瀚韬.基于OEM模型的半结构化数据的模式发现[J].计算机工程与应用,2006,42(34):162-165,181.
作者姓名:吕橙  魏楚元  张瀚韬
作者单位:北京建筑工程学院,计算机系,北京,100044
基金项目:北京市教委科技发展计划项目
摘    要:随着Web数据和数据集成技术的飞速发展,半结构化数据越来越引起人们的重视。半结构化数据是指那些具有隐含结构或结构不严谨的自描述数据。它不同于传统数据中的模式,它是先有数据后有模式,而且半结构化数据的模式是用于描述数据的结构信息而不是对数据结构进行强制约束。为此,半结构化数据的模式发现就成为知识发现的首要步骤。采用了层次数据的概念,提出了分层事务数据库和“累加变换”的计数原则,并据此提出了基于SHDP-tree树结构的SHDP-mine算法和挖掘出半结构、层次数据的基本模式。最后从理论和实验分析和验证了它的有效性和高效性。

关 键 词:半结构化层次数据  OEM模型  分层事务数据库  SHDP-tree结构
文章编号:1002-8331(2006)34-0162-04
收稿时间:2006-08
修稿时间:2006-08

Schema Discovery of Semi-structured Data Based on OEM Model
LV Cheng,WEI Chu-yuan,ZHANG Han-tao.Schema Discovery of Semi-structured Data Based on OEM Model[J].Computer Engineering and Applications,2006,42(34):162-165,181.
Authors:LV Cheng  WEI Chu-yuan  ZHANG Han-tao
Abstract:Along with the rapid development of Web data and data integration technology,semi-structured data have aroused people's more recognition.The semi-structured data is a kind of self-described data whose structure is unprecise or connotative.It is different from the schema of conventional data.It has data first but schema later and its mode is used to describe structure information of data not to implement compulsive restriction.The schema discovery of semi-structured data has become the first step of knowledge discovery.The concept of hierarchical data is adopted and a counting principle of hierarchical transaction database and accumulating transform are offered in this paper.A new algorithm SHDP-mine based on SHDP-tree and a basic schema used to mine semi-structured and hierarchical data is also offered in this paper.At last,its validity and efficiency is analyzed and validated through experiment and theory.
Keywords:semi-structured hierarchical data  OEM model  hierarchical transaction database  SHDP-tree structure
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号