Affiliation: | Department of Computer Science, Faculty of Mathematics and Computing Science, Radboud University of Nijmegen, Toernooiveld 1, 6525, EDNijmegen, Netherlands |
Abstract: | Word frequencies in text documents can be reasonably described by the Mandelbrot distribution, which has Zipf's Law as a special case. Furthermore, the growth of vocabulary size as a function of the text size (its number of words) has been described in Heaps' Law. It has been shown that these two experimental laws are related. In this paper we go a step further, and provide a (formal) derivation of Heaps' Law from the Mandelbrot distribution. We also provide a specification of the validity area for applying Heaps' Law. |