期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于XML虚拟数据库的异构数据源集成模型研究 总被引：4，自引：2，他引：2

周运牟占生徐久成《计算机技术与发展》2008,18(4):84-88

解决企业数据源异构需要一种公共的数据源模型给用户提供统一的用户视图,XML以其所具有的自描述性、灵活性、强大的数据交换能力等优势克服了其他数据模式的缺点.结合当前数据集成方面的技术,提出了一种基于XML虚拟数据库的异构数据源集成模型,该模型很好地解决了异构数据源集成方面存在的一系列问题,并从数据模型和数据交换上阐述了该模型的可行性.剖析了该模型中的模式集成、异构数据集成视图、全局查询等.介绍了该模型在中国石油QHSE信息系统中的成功应用. 相似文献

2.

基于XML虚拟数据库的异构数据源集成模型研究

周运牟占生徐久成《微机发展》2008,18(4):84-87

解决企业数据源异构需要一种公共的数据源模型给用户提供统一的用户视图,XML以其所具有的自描述性、灵活性、强大的数据交换能力等优势克服了其他数据模式的缺点。结合当前数据集成方面的技术,提出了一种基于Ⅺ∥几虚拟数据库的异构数据源集成模型,该模型很好地解决了异构数据源集成方面存在的一系列问题,并从数据模型和数据交换上阐述了该模型的可行性。剖析了该模型中的模式集成、异构数据集成视图、全局查询等。介绍了该模型在中国石油、QHSE信息系统中的成功应用。相似文献

3.

异构专利数据源集成系统中查询的研究

孙涌王志张书奎凌兴宏王永山《计算机应用与软件》2010,27(8)

为了有效地对异构专利数据源进行统一的查询,提出一个基于本体的异构专利数据源集成系统.该系统引入本体解决数据源集成中存在的语义异构,通过全局数据模式为用户提供统一的查询接口,将用户针对全局数据模式的查询重写为针对各个局部数据源的子查询.使用该系统,用户可以从异构的专利源中得到正确的查询结果. 相似文献

4.

一种基于多属性权重的分类数据子空间聚类算法

庞宁张继福秦啸《自动化学报》2018,44(3):517-532

采用多属性频率权重以及多目标簇集质量聚类准则,提出一种分类数据子空间聚类算法.该算法利用粗糙集理论中的等价类,定义了一种多属性权重计算方法,有效地提高了属性的聚类区分能力;在多目标簇集质量函数的基础上,采用层次凝聚策略,迭代合并子簇,有效地度量了各类尺度的聚类簇;利用区间离散度,解决了使用阈值删除噪音点所带来的参数问题;利用属性对簇的依附程度,确定了聚类簇的属性相关子空间,提高了聚类簇的可理解性.最后,采用人工合成、UCI和恒星光谱数据集,实验验证了该聚类算法的可行性和有效性. 相似文献

5.

基于SimHash和混合相似度的多模式匹配方法

曹卫东胡炜王家亮王静《计算机应用研究》2020,37(1):198-202

为了解决多源异构民航旅客服务数据集成过程中存在多模式匹配的效率不高、精确性不足、完整模式信息获取难度较大等问题,提出了一种基于SimHash和混合相似度的多模式匹配方法。该方法首先基于PMI计算特征单元权重,并通过SimHash算法构造属性列的签名来表示属性特征,以降低特征维度,进而引入K-means++算法对属性聚类并生成候选匹配集。最后基于属性的混合相似度构建属性映射图,以直观的方式展示属性间的匹配关系,同时提高多模式匹配效率。实验结果表明该方法具有可行性,为高效地解决多源异构民航旅客服务数据集成中的模式冲突问题提供新的解决方案。相似文献

6.

融合密度聚类与集成学习的数据库异常检测

李勃寿增刘昕禹高明慧马力徐剑《小型微型计算机系统》2021,(3):666-672

目前,针对数据库系统内部攻击与威胁的检测方法较少,且已有的数据库异常检测方案存在代价开销高、检测准确率低等问题.为此,将密度聚类和集成学习融合,提出一种基于密度聚类和集成学习的数据库异常检测方法.利用OPTICS(Ordering Points To Identify the Clustering Structure)密度聚类算法对用户产生的数据库SQL操作日志进行聚类,通过对SQL语句中的各属性进行分析,提取用户的异常行为,形成先验知识;将Bagging、Boosting和Stacking进行组合,形成集成学习模型,以OPTICS聚类形成的先验知识为基础,并利用该集成学习模型对用户行为作进一步分析,并创建用户行为特征库.基于用户形成特征库,对用户行为进行检测.给出了方案的详细构建过程,包括数据预处理、训练、学习模型建立以及异常检测;利用相关实验数据进行测试,结果表明本方案能以较高的效率检测出数据库异常行为,并且在准确率方面优于同类方案. 相似文献

7.

结合K均值与Laplacian的聚类集成算法

徐森周天李先锋曹瑞《计算机应用与软件》2012,(10):69-70,140

聚类集成可以有效提高传统聚类算法的精度,其关键问题在于如何根据聚类成员提供的信息获得更加优越的聚类结果.设计一种聚类集成算法,它结合K均值算法与基于拉普拉斯矩阵的谱聚类算法,充分利用聚类成员提供的属性信息与关系信息.为了降低算法计算复杂度,通过代数变换方法有效避免了大规模矩阵的特征值分解问题.在多组真实数据集上的实验结果表明,提出的算法优于其他聚类集成算法. 相似文献

8.

一种基于二部图谱划分的聚类集成方法

徐森皋军徐秀芳花小朋徐静安晶《控制与决策》2018,33(12):2208-2212

将二部图模型引入聚类集成问题中,使用二部图模型同时建模对象集和超边集,充分挖掘潜藏在对象之间的相似度信息和超边提供的属性信息.设计正则化谱聚类算法解决二部图划分问题,在低维嵌入空间运行K-means++算法划分对象集,获得最终的聚类结果.在多组基准数据集上进行实验,实验结果表明所提出方法不仅能获得优越的结果,而且具有较高的运行效率. 相似文献

9.

频繁项集在Deep Web数据源聚类中的应用

张蓬飞朱群雄《计算机工程与应用》2012,48(14):152-157

在Deep Web页面的背后隐藏着海量的可以通过结构化的查询接口进行访问的数据源。将这些数据源按所属领域进行组织划分,是DeepWeb数据集成中的一个关键步骤。已有的划分方法主要是基于查询接口模式和提交查询返回结果,存在查询接口特征难以完全抽取和提交数据库查询效率不高等问题。提出了一种结合网页文本信息,基于频繁项集的聚类方法,根据数据源查询接口所在页面的标题、关键词和提示文本,将数据源按照领域进行聚类,有效解决了传统方法中依赖查询接口特征以及文本模型的高维性问题。实验结果表明该方法是可行的,具有较高的效率。相似文献

10.

基于本体的专利数据源集成的研究及应用

王志孙涌张书奎王永山《计算机技术与发展》2009,19(7)

分析了异构专利数据源集成中存在的困难,针对不同专利数据源间存在的分布性、自治性、异构性等问题,提出了一个基于混合本体的专利数据源集成解决方案.该方案采用局部本体描述单个专利数据源中的语义,通过本体合并构建全局本体实现多个专利数据源问的语义集成,定义全局本体与局部本体之间的映射关系解决多个异构数据源集成中存在的语义异构问题.使用该方案,用户可从集成的专利数据源中获取正确的查询结果,有效地解决了"信息孤岛"问题. 相似文献

11.

A Query Interface Matching Approach Based on Extended Evidence Theory for Deep Web 总被引：1，自引：0，他引：1

下载免费PDF全文

董永权李庆忠丁艳辉彭朝晖《计算机科学技术学报》2010,25(3):537-547

Matching query interfaces is a crucial step in data integration across multiple Web databases. Different types of information about query interface schemas have been used to match attributes between schemas. Relying on a single aspect of information is not suffcient and the matching results of individual matchers are often inaccurate and uncertain. The evidence theory is the state-of-the-art approach for combining multiple sources of uncertain information. However, traditional evidence theory has the limita... 相似文献

12.

Semantic integration of heterogeneous information sources 总被引：15，自引：0，他引：15

Sonia Bergamaschi Silvana Castano Maurizio Vincini Domenico Beneventano 《Data & Knowledge Engineering》2001,36(3):215-249

相似文献

13.

自治异构数据源聚集模型与算法研究 总被引：1，自引：0，他引：1

王博郭波《计算机研究与发展》2008,45(9)

自治异构数据源信息共享的主要问题是如何在P2P环境下对自治数据节点的信息进行统一访问.采用分层结构组织数据源节点能够提高查询效率,减小计算开销,但需要节点根据彼此相似度实现局部的聚类.给出了数据源节点信息发布的形式化描述,提出了基于模式元素匹配的自治异构数据源多重聚集模型以及聚类组织构建过程,采用TA算法解决top-K聚类节点搜索问题,并在此基础上提出TAL算法.实验结果表明,TA和TAL算法能够高效地解决节点聚类排序的问题,特别是TAL算法在聚类节点范围较大时计算性能优于TA. 相似文献

14.

Rewriting of visibly pushdown languages for XML data integration

A. Thomo S. Venkatesh 《Theoretical computer science》2011,412(39):5285-5297

In this work, we focus on XML data integration by studying rewritings of XML target schemas in terms of source schemas. Rewriting is very important in data integration systems where the system is asked to find and assemble XML documents from the data sources and produce documents that satisfy a target schema.As schema representation, we consider Visibly Pushdown Automata (VPAs), which accept Visibly Pushdown Languages (VPLs). The latter have been shown to coincide with the family of (word-encoded) regular tree languages, which are the basis of formalisms for specifying XML schemas. Furthermore, practical semi-formal XML schema specifications (defined by simple pattern conditions on XML) compile into VPAs that are exponentially more concise than other representations based on tree automata.Notably, VPLs enjoy a “well-behavedness” that facilitates us in addressing rewriting problems for XML data integration. Based on VPAs, we positively solve these problems, and present detailed complexity analyses. 相似文献

15.

Double-layered schema integration of heterogeneous XML sources

Hong-Quang NguyenAuthor Vitae David Taniar^{Author Vitae} 《Journal of Systems and Software》2011,84(1):63-76

Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance. 相似文献

16.

XML schema mappings for heterogeneous database access

《Information and Software Technology》2002,44(4):251-257

The unprecedented increase in the availability of information, due to the success of the World Wide Web, has generated an urgent need for new and robust methods that simplify the querying and integration of data. In this research, we investigate a practical framework for data access to heterogeneous data sources. The framework utilizes the extensible markup language (XML) Schema as the canonical data model for the querying and integration of data from heterogeneous data sources. We present algorithms for mapping relational and network schemas into XML schemas using the relational mapping algorithm. We also present library system of databases (libSyD), a prototype of a system for heterogeneous database access. 相似文献

17.

通过查询模式聚类结构化的Deep Web资源

陈娟王贤黄青松《现代计算机》2006,(9):19-21,62

近几年,网络被在线数据库迅速地深化.在深网中,大量的资料提供了丰富的数据模式,这些模式详细说明了它们的目标领域和查询性能,因此对大规模数据的整合是当前面临的挑战.在数据挖掘中,聚类分析是一个重要方法.本文论述通过查询接口采用凝聚层次聚类方法聚类结构化的Web资源,并采用先聚类后分类的方法稍加改进.实验显示对于聚类Web查询模式,凝聚的层次聚类能正确地组织资料. 相似文献

18.

Mashroom+: An Interactive Data Mashup Approach with Uncertainty Handling

Chen Liu Jianwu Wang Yanbo Han 《Journal of Grid Computing》2014,12(2):221-244

To integrate data on the Internet, we often have to deal with uncertainties when matching data schemas from different sources. The paper proposes an approach called Mashroom+ to support human-machine interactive data mashup, which can better handle uncertainties during the semantic matching process. To improve the correctness of matching results, an interactive matching algorithm is proposed to synthesize the matching results from multiple automatic matchers based on user feedbacks. Meanwhile, to avoid bringing too much burden on users, we utilize the entropy in information theory to measure and quantify the ambiguities of different matchers and calculate the best times for users to participate. An interactive integration environment is developed based on our approach with operator recommendation capability to support on-demand data integration. Experiments show that Mashroom+ approach can achieve good balance between high correctness of matching results and low user burden with real data. 相似文献

19.

Multi-data source fusion

Gilles Nachouki Mohamed Quafafou 《Information Fusion》2008,9(4):523-537

This paper describes a new approach of heterogeneous data source fusion. Data sources are either static or active: static data sources can be structured or semi-structured, whereas active sources are services. In order to develop data sources fusion systems in dynamic contexts, we need to study all issues raised by the matching paradigms. This challenging problem becomes crucial with the dominating role of the internet. Classical approaches of data integration, based on schemas mediation, are not suitable to the World Wide Web (WWW) environment where data is frequently modified or deleted. Therefore, we develop a loosely integrated approach that takes into consideration both conflict management and semantic rules which must be enriched in order to integrate new data sources. Moreover, we introduce an XML-based Multi-data source Fusion Language (MFL) that aims to define and retrieve conflicting data from multiple data sources. The system, which is developed according to this approach, is called MDSManager (Multi-Data Source Manager). The benefit of the proposed framework is shown through a real world application based on web data sources fusion which is dedicated to online markets indices tracking. Finally, we give an evaluation of our MFL language. The results show that our language improves significantly the XQuery language especially considering its expressiveness power and its performances. 相似文献

20.

Deep Web集成中数据模式映射失效检测方法研究 总被引：1，自引：1，他引：0

缪嘉嘉李爱平贾焰吴泉源《计算机研究与发展》2008,45(Z1):222-227

查询接口集成是Deep Web数据集成的关键,在动态环境下,Web数据源的变化会引起数据模式映射的失效,使得查询接口集成维护难度增加,因此数据模式映射失效检测是Deep Web数据集成研究中的热点问题.针对目前数据模式映射失效检测方法的局限,在模糊聚集算子的研究基础上,提出一种适用于数据模式映射失效检测的结果融合算法.通过实验对比测试,并对映射失效检测方法的性能和效率进行了分析和实验,结果证明了提出的方法对于失效模型的检测是有效的. 相似文献