一种基于序列计算的最近似K对数据搜索方案 The Top-K Closest Pairs of Data Search Scheme Based on Serial Computation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于序列计算的最近似K对数据搜索方案

引用本文：	刘义.一种基于序列计算的最近似K对数据搜索方案[J].微型电脑应用,2014(8):37-41.

作者姓名：	刘义

作者单位：	大连职业技术学院,大连116035

摘要：	多种应用场合需要寻找给定数据库中相似度最大的前k对数据.然而,由于应用领域需要处理的数据规模呈上升趋势,计算这样的最相似k对数据,难度非常大.提出了一种基于序列计算的最相似k对数据搜索方案,首先,将所有数据对分割成多个组,然后,提出了所有数据对分组算法和核心数据对分组算法,通过单独计算每个组中的最近似k对数据,从所有组的最近似k对数据中选择相似度最高的k对数据,进而正确地确定最近似k对数据.最后基于合成数据进行实验,性能评估结果验证了本文算法的有效性和可扩展性.
关键词：	数据库相似度序列计算数据搜索分组
The Top-K Closest Pairs of Data Search Scheme Based on Serial Computation

Liu Yi.The Top-K Closest Pairs of Data Search Scheme Based on Serial Computation[J].Microcomputer Applications,2014(8):37-41.

Authors:	Liu Yi

Affiliation:	Liu Yi (Dalian Vocational ＆ Technical College, Dalianl 16035, China)

Abstract:	There is a wide range of applications that require finding the top-k most similar pairs of records in a given database.However, computing such top-k similarity joins is a challenging problem today, as there is an increasing trend of applications thatexpect to deal with vast amounts of data. This paper proposes a top-k closest pairs data search scheme based on serial computation,firstly, the proposed scheme splits conceptually all pairs of points into partitions, and then the all pair partitioning and the essentialpair partitioning methods are proposed, we can correctly find the top-k closest pairs by computing the top-k closest pairs in eachpartition separately and selecting the top-k closest pairs among the top-k closest pairs from all partitions. We finally perform theexperiments with the synthetic datasets. Our performance study confirms the effectiveness and scalability of our algorithms.

Keywords:	Database Similarity Serial Computation Data Search Partitioning
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏