基于主题与概率模型的非合作深网数据源选择 Non-Cooperative Deep Web Data Source Selection Based on Subject and Probability Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于主题与概率模型的非合作深网数据源选择

引用本文：	邓松,万常选. 基于主题与概率模型的非合作深网数据源选择[J]. 软件学报, 2017, 28(12): 3241-3256

作者姓名：	邓松万常选

作者单位：	江西财经大学软件与通信工程学院, 江西南昌 330013;数据与知识工程江西省高校重点实验室(江西财经大学), 江西南昌 330013,江西财经大学信息管理学院, 江西南昌 330013;数据与知识工程江西省高校重点实验室(江西财经大学), 江西南昌 330013

基金项目：	国家自然科学基金（61462037，61562032，61173146，61363039，61363010）；江西省自然科学基金（20152ACB20003）；江西省高等学校科技落地计划（KJLD12022，KJLD14035）

摘要：	在深网数据集成过程中，用户希望仅检索少量数据源便能获取高质量的检索结果，因而数据源选择成为其核心技术.为满足基于相关性和多样性的集成检索需求，提出一种适合小规模抽样文档摘要的深网数据源选择方法.该方法在数据源选择过程中首先度量数据源与用户查询的相关性，然后进一步考虑候选数据源提供数据的多样性.为提升数据源相关性判别的准确性，构建了基于层次主题的数据源摘要，并在其中引入了主题内容相关性偏差概率模型，且给出了基于人工反馈的偏差概率模型构建方法以及基于概率分析的数据源相关性度量方法.为提升数据源选择结果的多样性程度，在基于层次主题的数据源摘要中建立了多样性链接有向边，并给出了数据源多样性的评价方法.最后，将基于相关性和多样性的数据源选择问题转化为一个组合优化问题，提出了基于优化函数的数据源选择策略.实验结果表明：在基于少量抽样文档进行数据源选择时，该方法具有较高的选择准确率.
关键词：	深网数据源选择主题概率模型 TextRank
收稿时间：	2016-10-12
修稿时间：	2017-03-21
Non-Cooperative Deep Web Data Source Selection Based on Subject and Probability Model

DENG Song and WAN Chang-Xuan. Non-Cooperative Deep Web Data Source Selection Based on Subject and Probability Model[J]. Journal of Software, 2017, 28(12): 3241-3256

Authors:	DENG Song and WAN Chang-Xuan

Affiliation:	School of Software & Communication Engineering, Jiangxi University of Finance and Economics, Nanchang 330013, China;Jiangxi Key Laboratory of Data and Knowledge Engineering(Jiangxi University of Finance and Economics), Nanchang 330013, China and School of Information and Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China;Jiangxi Key Laboratory of Data and Knowledge Engineering(Jiangxi University of Finance and Economics), Nanchang 330013, China

Abstract:	It is desirable for a user to get high-quality query results from only a few data sources in deep Web data integration systems. Therefore, data source selection becomes one of the core technologies in the integration systems. In this paper, a method based on correlations and diversities is proposed for selecting deep Web data sources suitable for small-scale sampling document summaries. Firstly, considering the correlations between the query and the data sources, a hierarchical subject summary with a probability model of correlation deviation of the data sources is constructed to discriminate the data sources. Furthermore, a method is described for constructing a deviation probability model based on artificial feedbacks and correlation measurement of the data sources. Meanwhile, the diversity-oriented directed edges are built in the hierarchical subject summary of data source in consideration of the diversities of data sources, and an evaluation metric is proposed to measure data source diversities. Taking the data source selection based on correlation and diversity as a combinatorial optimization problem, an optimal result of data source selection is achieved by solving an optimization function. Experimental results show that the proposed method achieves better selection accuracy in selecting data sources with small sampling documents.

Keywords:	deep Web data source selection subject probability model TextRank

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏