期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An evidential approach to query interface matching on the deep Web

Jun Hong Zhongtian He David A. Bell 《Information Systems》2010

Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not sufficient. We propose an evidential approach to combining multiple matchers using Dempster–Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require the use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Finally it uses some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly accurate and effective. 相似文献

2.

通过查询模式聚类结构化的Deep Web资源

陈娟王贤黄青松《现代计算机》2006,(9):19-21,62

近几年,网络被在线数据库迅速地深化.在深网中,大量的资料提供了丰富的数据模式,这些模式详细说明了它们的目标领域和查询性能,因此对大规模数据的整合是当前面临的挑战.在数据挖掘中,聚类分析是一个重要方法.本文论述通过查询接口采用凝聚层次聚类方法聚类结构化的Web资源,并采用先聚类后分类的方法稍加改进.实验显示对于聚类Web查询模式,凝聚的层次聚类能正确地组织资料. 相似文献

3.

基于模式—区别方法聚类结构化的Deep Web源

陈娟王贤黄青松《微机发展》2007,17(11):107-109

近几年,网络被在线数据库迅速深化。在深网中,大量的资料提供了丰富的数据模式。这些模式详细说明了它们的目标领域和查询性能。因此对大规模数据的整合是当前面临的挑战。在数据挖掘中聚类分析是一个重要方法,为了发现通过这种统计分布管理的聚类,提出了一个新的目标函数:模型-区别(model-differentiation)。实验显示对于聚类Web查询模式,凝聚的层次聚类能正确地组织资料,区别模型函数胜过现有的凝聚的层次聚类。相似文献

4.

基于粗糙集的Web结构挖掘

周勇刘锋《微机发展》2008,18(3):151-153

Web站点是由许多Web页面构成的信息系统,随着网络的飞速发展,Web挖掘得到了越来越多的研究。如何从Web中找到与用户查询主题相关的权威页面,是Web结构挖掘的一个重要研究方向。粗糙集理论作为一种有效处理模糊和不确定信息的数学工具,由于其不需要任何先验知识,在数据挖掘领域取得了广泛的应用。文中概述了Web结构挖掘的有关概念,基于粗糙集理论,定义了Web结构挖掘的数据模型,并给出了基于粗糙集的Web结构挖掘的实现流程,分析说明了该方法的性能。相似文献

5.

基于布尔矩阵的Deep Web复杂模式匹配

下载免费PDF全文

龚桂芬伏玉璨程远虎《计算机工程》2011,37(12):47-50

在正负关联规则中引入布尔矩阵的概念,在双重相关性挖掘算法的基础上提出一种Deep Web复杂模式匹配算法。将查询接口模式中的属性项转化成布尔矩阵,通过对矩阵进行正关联规则运算挖掘组属性,对矩阵进行负关联规则运算挖掘同义属性。实验结果表明该算法的执行效率较高。相似文献

6.

一种基于证据理论和任务分配的DeepWeb查询接口匹配方法

董永权李庆忠丁艳辉张永新《模式识别与人工智能》2011,24(2):262-271

针对已有查询接口匹配方法匹配器权重设置困难、匹配决策缺乏有效处理的局限性,提出一种基于证据理论和任务分配的DeepWeb查询接口匹配方法。该方法通过引入改进的D-S证据理论自动融合多个匹配器结果,避免手工设定匹配器权重,有效减少人工干预。通过对任务分配问题进行扩展,将查询接口的一对一匹配决策问题转化为扩展的任务分配问题,为源查询接口中的每一个属性选择合适的匹配,并在此基础上,采用树结构启发式规则进行一对多匹配决策。实验结果表明ETTA-IM方法具有较高的查准率和查全率。相似文献

7.

频繁项集在Deep Web数据源聚类中的应用

张蓬飞朱群雄《计算机工程与应用》2012,48(14):152-157

在Deep Web页面的背后隐藏着海量的可以通过结构化的查询接口进行访问的数据源。将这些数据源按所属领域进行组织划分,是DeepWeb数据集成中的一个关键步骤。已有的划分方法主要是基于查询接口模式和提交查询返回结果,存在查询接口特征难以完全抽取和提交数据库查询效率不高等问题。提出了一种结合网页文本信息,基于频繁项集的聚类方法,根据数据源查询接口所在页面的标题、关键词和提示文本,将数据源按照领域进行聚类,有效解决了传统方法中依赖查询接口特征以及文本模型的高维性问题。实验结果表明该方法是可行的,具有较高的效率。相似文献

8.

一种有效的贪婪模式匹配算法 总被引：2，自引：0，他引：2

张治施鹏飞《计算机研究与发展》2007,44(11):1903-1911

模式匹配问题是意图获得两个模式中所包含个体对象之间的语义匹配和映射,其结果表示源模式的个体对象与目标模式的个体对象之间存在特定的语义关联.它在数据库应用领域起到关键性的作用,例如数据集成、电子商务、数据仓库、XML消息交换等,特别地,它已成为元数据管理的基本问题.然而,模式匹配很大程度上依赖人工的操作,是一个费时费力的过程.模式匹配问题可以归约为一个组合优化问题:多标记图匹配问题.首先,将模式表示为多标记图,将模式匹配转换为多标记图匹配问题.其次,提出多标记图的相似性度量方法,进而提出基于多标记图相似性的模式匹配目标优化函数.最后,在这个目标函数基础上设计实现了一个贪婪匹配算法,其最显著的特点是综合多种可用的标记信息,灵活准确地获得最优的匹配结果. 相似文献

9.

Query optimization in multidatabase systems considering schemaconflicts

Chiang Lee Chia-Jung Chen 《Knowledge and Data Engineering, IEEE Transactions on》1997,9(6):941-955

In a multidatabase system, the participating databases are autonomous. The schemas of these databases may be different in various ways, while the same information is represented. A global query issued against the global database needs to be translated to a proper form before it can be executed in a local database. Since data requested by a query (or a part of a query) is sometimes available in multiple sites, the site (database) that processes the query with the least cost is the desired query processing site. The authors study the effect of differences in schemas on the cost of query processing in a multidatabase environment. They first classify schema conflicts to different types. For each type of conflict, they show how much more or less complex a translated query can become in comparison with the originally user-issued global query. Based on this observation, they propose an analytical method that considers the conflicts between local databases and finds the database(s) that renders the least execution cost in processing a global query. This research introduces a new level of query optimization (termed the schema-level optimization) in multidatabase environments. The results provide a new dimension of enhancement for the capability of a query optimizer in multidatabase systems 相似文献

10.

Learning to Match the Schemas of Data Sources: A Multistrategy Approach 总被引：5，自引：0，他引：5

Doan AnHai Domingos Pedro Halevy Alon 《Machine Learning》2003,50(3):279-301

The problem of integrating data from multiple data sources—either on the Internet or within enterprises—has received much attention in the database and AI communities. The focus has been on building data integration systems that provide a uniform query interface to the sources. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the query interface and the source schemas. Examples of mappings are element location maps to address and price maps to listed-price. We propose a multistrategy learning approach to automatically find such mappings. The approach applies multiple learner modules, where each module exploits a different type of information either in the schemas of the sources or in their data, then combines the predictions of the modules using a meta-learner. Learner modules employ a variety of techniques, ranging from Naive Bayes and nearest-neighbor classification to entity recognition and information retrieval. We describe the LSD system, which employs this approach to find semantic mappings. To further improve matching accuracy, LSD exploits domain integrity constraints, user feedback, and nested structures in XML data. We test LSD experimentally on several real-world domains. The experiments validate the utility of multistrategy learning for data integration and show that LSD proposes semantic mappings with a high degree of accuracy. 相似文献