More
This website only touches upon the details of the original paper, since it was released, some improvements have been made such as HyperLogLog++, which
- Uses 64-bit integers rather than 32-bit
- Introduces sparse representation for the registers to save memory (rather than having one huge array)
- Introduces a further set of bias corrections to improve the count at lower cardinalities
Seen out in the wild
- Redis
PF*
commands use HyperLogLog - PrestoDB
approx_distinct
SQL function uses HyperLogLog - Github topic page for some implementations of HyperLogLog
Command line tool
I wrote a tool called card that you can use to determine the approximate cardinality of an input (stdin or file), this makes use of the the HyperLogLog++ library written by Clark Duvall
Background reading
- Using Linear Counting, LogLog, and HyperLogLog to Estimate Cardinality
- Damn Cool Algorithms: Cardinality estimation
Previous research
HyperLogLog builds on the shoulders of:
LogLog
LinearCount