首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于序列计算的最近似K对数据搜索方案
引用本文:刘义.一种基于序列计算的最近似K对数据搜索方案[J].微型电脑应用,2014(8):37-41.
作者姓名:刘义
作者单位:大连职业技术学院,大连116035
摘    要:多种应用场合需要寻找给定数据库中相似度最大的前k对数据.然而,由于应用领域需要处理的数据规模呈上升趋势,计算这样的最相似k对数据,难度非常大.提出了一种基于序列计算的最相似k对数据搜索方案,首先,将所有数据对分割成多个组,然后,提出了所有数据对分组算法和核心数据对分组算法,通过单独计算每个组中的最近似k对数据,从所有组的最近似k对数据中选择相似度最高的k对数据,进而正确地确定最近似k对数据.最后基于合成数据进行实验,性能评估结果验证了本文算法的有效性和可扩展性.

关 键 词:数据库  相似度  序列计算  数据搜索  分组

The Top-K Closest Pairs of Data Search Scheme Based on Serial Computation
Liu Yi.The Top-K Closest Pairs of Data Search Scheme Based on Serial Computation[J].Microcomputer Applications,2014(8):37-41.
Authors:Liu Yi
Affiliation:Liu Yi (Dalian Vocational & Technical College, Dalianl 16035, China)
Abstract:There is a wide range of applications that require finding the top-k most similar pairs of records in a given database.However, computing such top-k similarity joins is a challenging problem today, as there is an increasing trend of applications thatexpect to deal with vast amounts of data. This paper proposes a top-k closest pairs data search scheme based on serial computation,firstly, the proposed scheme splits conceptually all pairs of points into partitions, and then the all pair partitioning and the essentialpair partitioning methods are proposed, we can correctly find the top-k closest pairs by computing the top-k closest pairs in eachpartition separately and selecting the top-k closest pairs among the top-k closest pairs from all partitions. We finally perform theexperiments with the synthetic datasets. Our performance study confirms the effectiveness and scalability of our algorithms.
Keywords:Database  Similarity  Serial Computation  Data Search  Partitioning
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号