# Stochastic Histograms

Histograms are useful, but their storage requirements can be large: if each bin is N bytes, then the storage is (number of bins  ×  N) bytes.

For many applications (imaging, profiling), the exact count is not important.

Here's a way of reducing the storage required for each bin. Instead of holding a simple count, the bin contains an approximation to the logarithm to base b of the count. This logarithm is a smaller number than the count, so can use less storage.

In this scheme, when incrementing a bin with value n, which represents bn, the probability that it should increment is given by:

P(increment) = (1)/(bn + 1 − bn) = (1)/(bn(b − 1))

For b = 2 then the probability is:

P(increment) = (1)/(2n)

A simple demonstration using a bases of 1.1, 2.0 and 1.0:

```import random

class LinearHistogram:

def __init__(self):
self.bins = [ 0 for i in range(10) ]

def increment(self, i):
self.bins[i] += 1

def counts(self):
return self.bins

class StochasticHistogram:

def __init__(self, base):
self.bins = [ 0 for i in range(10) ]
self.base = base

def increment(self, i):
prob = 1.0 / ((self.base ** self.bins[i]) * (self.base - 1))
if random.random() < prob:
self.bins[i] += 1

def counts(self):
return [int(self.base ** n) for n in self.bins]

data = [random.randrange(10) for i in range(1000000)]

lh = LinearHistogram()
for d in data:
lh.increment(d)
print 'linear      ', lh.counts()

for base in [1.1, 2, 10]:
sh = StochasticHistogram(base)
for d in data:
sh.increment(d)
print 'stochastic %g:' % base, sh.counts()

```

Gives:

```linear       [100398, 99812, 99592, 100007, 100130, 100025, 99895, 100272, 99970, 99899]

stochastic 1.1: [101979, 76619, 84280, 84280, 84280, 101979, 135735, 84280, 92709, 123395]

stochastic 2: [131072, 65536, 65536, 262144, 131072, 65536, 131072, 131072, 131072, 131072]

stochastic 10: [100000, 100000, 100000, 100000, 10000, 100000, 100000, 10000, 100000, 100000]
```