How Big Tech Scales View Counts: The Power of HyperLogLog and Harmonic Means

TL;DR: Scaling unique view counts for millions of posts requires more than just a COUNT(DISTINCT) query. Modern platforms use HyperLogLog, a probabilistic data structure that estimates cardinality ...

By · · 1 min read
How Big Tech Scales View Counts: The Power of HyperLogLog and Harmonic Means

Source: DEV Community

TL;DR: Scaling unique view counts for millions of posts requires more than just a COUNT(DISTINCT) query. Modern platforms use HyperLogLog, a probabilistic data structure that estimates cardinality using hashing and bucketing. By applying a harmonic mean across thousands of independent buckets, engineers can maintain high accuracy with a tiny memory footprint. When we talk about scale, we often focus on throughput or latency, but memory consumption for unique metrics is a silent killer. If you are building a platform with millions of users and millions of posts, your first instinct might be to store a set of User IDs for every post to ensure you don't count the same person twice. If you have 10 million unique visitors on a post, and each user ID is a 64-bit integer, you are burning roughly 80MB of memory just for one post's view count. Multiply that by a million posts, and you are looking at an 80TB memory requirement just for view counts. It is simply not feasible. To solve this, we us