首页 | 本学科首页   官方微博 | 高级检索  
     


A space efficient solution to the frequent string mining problem for many databases
Authors:Adrian Kügel  Enno Ohlebusch
Affiliation:(1) Faculty of Engineering and Computer Sciences, University of Ulm, 89069 Ulm, Germany
Abstract:The frequent string mining problem is to find all substrings of a collection of string databases which satisfy database specific minimum and maximum frequency constraints. Our contribution improves the existing linear-time algorithm for this problem in such a way that the peak memory consumption is a constant factor of the size of the largest database of strings. We show how the results for each database can be stored implicitly in space proportional to the size of the database, making it possible to traverse the results in lexicographical order. Furthermore, we present a linear-time algorithm which calculates the intersection of the results of different databases. This algorithm is based on an algorithm to merge two suffix arrays, and our modification allows us to also calculate the LCP table of the resulting suffix array during the merging.
Keywords:String mining  Enhanced suffix array
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号