Lower Bounds for Estimating Frequency in a Data Stream

Abstract

In the data stream model, an n-dimensional integer vector f, called the frequency vector (that is initially 0) is subjected to a sequence of arbitrary, coordinate-wise increments or decrements. The data stream model allows processing of the sequence of updates in an online manner, using sub-linear space. A total algorithm is one that works correctly for all data streams over the domain {1..n}, a partial algorithm works correctly for only a subset We consider the problem ApproxFreq(p,epsilon) of finding an estimate vector g to f that is epsilon close with respect to the L_p norm of f in the following sense:

Max_{i} |g_i ^-f_i| <= epsilon ||f||_p.

For p in [1,2), we show an Omega(n^{2-2/p}/epsilon^2) space lower bound for any total deterministic algorithm for this problem, for epsilon at least n^{-1/p}. For p at least 2, we show an Omega(n) space lower bound. The problem is known to have O*(n^{1-1/p}/epsilon) space randomized total algorithm, for p in [1,2), and O*(n^{1-2/p}/epsilon^2) space randomized total algorithm, for p at least 2.