A space efficient solution to the frequent string mining problem for many databases期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A space efficient solution to the frequent string mining problem for many databases

Authors:	Adrian Kügel Enno Ohlebusch

Affiliation:	(1) Faculty of Engineering and Computer Sciences, University of Ulm, 89069 Ulm, Germany

Abstract:	The frequent string mining problem is to find all substrings of a collection of string databases which satisfy database specific minimum and maximum frequency constraints. Our contribution improves the existing linear-time algorithm for this problem in such a way that the peak memory consumption is a constant factor of the size of the largest database of strings. We show how the results for each database can be stored implicitly in space proportional to the size of the database, making it possible to traverse the results in lexicographical order. Furthermore, we present a linear-time algorithm which calculates the intersection of the results of different databases. This algorithm is based on an algorithm to merge two suffix arrays, and our modification allows us to also calculate the LCP table of the resulting suffix array during the merging.

Keywords:	String mining Enhanced suffix array
本文献已被 SpringerLink 等数据库收录！