Leaderboards look trivial β sort by score, show the top N. They are not. Behind every live ranking is a data-structure question (how do I keep millions of rows sorted while they change thousands of times a second?) and a systems question (how do I serve "what's my rank?" in a millisecond without melting the database?). This page builds the answer from the ground up, with interactive D3 visualizations you can poke at.
Imagine a game (or a stock-trading app ranking strategies by Sharpe ratio, or a social feed ranking posts by engagement). You have 1,000,000 players. Scores change ~10,000 times per second. You must answer three questions instantly, on demand:
The naΓ―ve answer β keep an array, sort it whenever someone asks β is a trap. Let's see why.
nΒ·logβn).
At 10,000 score updates per second you'd be re-sorting constantly. You need a structure that stays
sorted as you write, so reads never trigger a sort.
A leaderboard has three hard requirements that pull in different directions:
O(log n), not O(n).O(log n)."O(log n) vs O(n)" sounds abstract until you plot it. The chart below shows how many basic operations each complexity class needs as the dataset n grows. Drag the slider to your dataset size and read off the real numbers β the gap between a logarithmic and a linear algorithm is the whole reason this topic exists.
| Operation | Sorted Array | Hash Table | B-Tree / Skip List |
|---|---|---|---|
| Read at known rank | O(1) | O(n log n) β οΈ | O(log n) |
| Insert / update score | O(n) β | O(1) | O(log n) |
| Range query (top 100) | O(100) | O(n log n) β οΈ | O(log n + 100) |
| Stays sorted on write? | only if you pay O(n) | no | β yes |
A sorted array reads beautifully but every insert shifts elements β O(n).
A hash table writes beautifully but has no order, so any ranked read means sorting
everything β O(n log n). Only the balanced tree and the skip
list give you O(log n) on both sides while staying sorted. Those two
are the rest of this page.
A B-tree is how relational databases keep an index sorted on disk. When you write
CREATE INDEX ... ON players(score) in PostgreSQL, MySQL, SQL Server, or SQLite, you are
almost always creating a B-tree (technically a BβΊ-tree). It is the workhorse behind essentially every
ordered query you've ever run.
A binary tree stores one key per node. A B-tree stores many keys per node β often 32 to 64, sometimes hundreds. Why? Because disks and memory pages are read in big blocks. If one node fills a 4 KB page and holds 100 keys, then a tree of just 3β4 levels can index millions of rows. Fewer levels means fewer disk reads to find anything.
The danger with any tree is degeneration. If you naΓ―vely insert sorted data (1, 2, 3, 4 β¦) into
a plain binary search tree, you don't get a tree β you get a linked list, and search collapses to
O(n). A B-tree defends against this: when a node gets too full, it splits,
pushing its middle key up to the parent. This keeps every leaf at the same depth, so height stays
O(log n) no matter what order data arrives in.
1
\
2
\
3
\
4
\
5
Height grows with n β search becomes O(n) π’
[3]
/ \
[1,2] [4,5]
Every leaf at equal depth β search stays O(log n) β
Insert values below and watch the tree restructure. Try the "Insert 1β¦20 in order" button β a binary tree would collapse into a vertical stick, but the B-tree stays wide and shallow. The most recently inserted key is highlighted; nodes that split flash as they rebalance.
(score DESC, user_id ASC) β the B-tree maintains exactly the order you rank by.O(log n + 100).O(log n).
Skip lists reach the same O(log n) performance as a balanced tree, but they're far easier to
reason about and implement β no rotation rules, no split-and-promote bookkeeping. Redis
uses a skip list internally for its sorted sets (ZSET), which is the single most popular
production leaderboard primitive in the world.
Start with an ordinary sorted linked list β searching it is O(n) because you must walk every
node. Now add "express lanes" on top: a sparser list that links roughly every other node, then an even
sparser one above that, and so on. To search, you ride the highest express lane as far as you can, drop
down a level when you'd overshoot, and repeat. Each level halves the remaining distance β that's where the
log n comes from.
How does a node decide how tall to be? A coin flip. Insert a node at level 1; flip a coin β heads, promote it to level 2; flip again β heads, level 3; stop on tails. This gives, on average, Β½ the nodes at level 2, ΒΌ at level 3, β at level 4β¦ a perfectly balanced pyramid in expectation, with zero rebalancing logic. That probabilistic simplicity is the whole appeal.
Insert values (heights are assigned by random coin flips, just like the real thing), then type a target and hit Search to animate the traversal: the highlighted pointer rides the express lanes, dropping down whenever the next hop would overshoot. The comparison counter shows how few nodes it actually touches.
| B-Tree (BβΊ-tree) | Skip List | |
|---|---|---|
| Where it lives | Mostly on disk (databases) | Mostly in memory (Redis, caches) |
| Balancing | Deterministic (split/merge) | Probabilistic (coin flips) |
| Implementation | Fiddly (split, promote, merge, borrow) | Simple, ~100 lines |
| Cache/disk locality | Excellent (fat nodes = whole pages) | Pointer-chasing, weaker locality |
| Worst case | Guaranteed O(log n) | O(n) (astronomically unlikely) |
| Used by | PostgreSQL, MySQL, SQLite indexes | Redis ZSET, LevelDB memtable |
"Top 100" is easy β it's just the first 100 nodes. The genuinely hard question is the second one from our
motivating problem: "What is player X's rank?" NaΓ―vely, you'd count how many players score
higher β but counting is O(n), and at 1M players that's exactly the scan we're trying to avoid.
Augment each pointer with the number of nodes it skips over (Redis calls this the
span; in an augmented tree it's the subtree size). To compute a rank, walk the search
path and add up the spans of every forward hop you take. You never visit the skipped nodes β you just add
their count. Rank in O(log n).
This is why ZRANK / ZREVRANK in Redis are O(log n) and not
O(n) β the skip list carries span counts. In SQL the analog is a B-tree that stores subtree
row-counts, or you approximate it with ROW_NUMBER() OVER (ORDER BY score DESC) and let the
planner range-scan the index.
Now that you know the data structures, here are the three ways teams actually wire a leaderboard together.
Idea: store finished rankings in a separate table; recompute on a schedule or trigger.
Best for: tournament final standings, daily/weekly boards, anything where a few minutes of staleness is fine.
Idea: keep a B-tree index and compute the slice you need at query time.
OFFSET 900000) get slow.Best for: general-purpose boards, APIs, anything read-moderate with a strong correctness requirement.
Idea: keep the hot ranking in Redis. ZADD on every score change; serve reads from the sorted set.
Best for: real-time games, live sports, trading dashboards β anything latency-critical. This is what most teams ship.
Two players, same score. Who's ranked higher? If you don't decide, the order is undefined and ranks flicker between requests.
Fix: sort by a compound key so ordering is total and deterministic.
(score, tiebreaker), not on
score alone. Redis trick: pack the timestamp into the float score (e.g. score - achieved_atΒ·1e-9)
so a single ZSET float encodes both.
This is where most interview answers fall apart. Scores don't update one at a time β thousands of writes race each other. There are two different failure modes here, and they need different fixes. Conflating them is the classic mistake.
Step through two concurrent +10 updates to Alice's score. NaΓ―vely, both read 100, both write 110 β and one increment is lost. Toggle the atomic version: it can't lose either.
+10" is commutative β A-then-B and B-then-A both
reach 120, so the database can fold both into the value without anyone needing to have seen the latest number.
ZINCRBY / UPDATE β¦ SET score = score + 10 do exactly that, indivisibly.
Atomic increment is the wrong tool the moment a write isn't a relative delta but an absolute value the user computed from what they read. Examples on a leaderboard:
If two such writes overlap, the second one commits a number derived from a value that's already obsolete β it silently overwrites the first. No atomic counter can save you, because the operation isn't "add"; it's "set to this, which I worked out from that." The fix is to make the write conditional on the state not having changed since you read it.
The record carries a version number. You read it, and your write only commits if the version is still what you saw. Toggle the guard off to see the stale write silently win (last-write-wins); on, to see it rejected so the user must re-read and reconcile.
Assume conflicts are rare. Don't lock β just check on write that nothing changed, and retry if it did. Same idea, four surfaces:
Assume conflicts are likely. Take a lock up front so no one else can read-to-write the row until you're done.
| Optimistic (version / CAS) | Pessimistic (locks) | |
|---|---|---|
| Assumes | conflicts are rare | conflicts are common |
| Cost in the happy path | ~zero (one extra column check) | holds a lock, blocks others |
| Cost on conflict | retry (re-read & recompute) | waiting / contention, deadlock risk |
| Scales with readers | excellent | poorly (writers serialize) |
| Best for | web edits, APIs, most leaderboards | hot rows, bank-balance-style invariants |
If-Match) for
human-driven "set to X" edits from a UI β that's exactly where stale-state clobbering happens. Reserve
locks for the rare hot row where retries would thrash. And accept eventual consistency for
the displayed rank β a rank that's a second stale is invisible; what must never be stale is the
write decision itself.
One Redis instance or one Postgres table takes you remarkably far. When it doesn't, here's the toolkit.
Nobody designs the hybrid Redis-and-Postgres system on day one β and they shouldn't. The right design is the simplest one that survives your current scale. This section walks the journey both stacks take: the naΓ―ve first attempt, the specialized fix it forces, and how both roads converge on the same ideal. Everything here reuses the building blocks from the earlier sections (B-trees Β§2, skip lists Β§3, rank-by-span Β§4, patterns Β§5).
The SQL track and the NoSQL track each start naΓ―ve, hit a wall, specialize, and then merge into the hybrid ideal. Click any node to see its query code, what it handles well, and what eventually forces the next step.
players table; query it directly. Top-N is
ORDER BY score DESC LIMIT 100, rank is COUNT(*) WHERE score > mine. Correct,
zero infra β and it falls over the moment the table is big and busy, because nothing is sorted in advance.
O(1) and scale horizontally β but there is no global order, so any ranked
query degrades to a full scan plus a client-side sort. It's the hash-table problem from Β§1, in production form.
(score DESC, id ASC). Now top-N and slices
ride the index in O(log n + k) and stay perfectly fresh. The lingering thorn: "what's my
rank?" is still a COUNT(*) scan β a vanilla B-tree has no order statistics (Β§4).
ZREVRANGE, ZREVRANK, and ZINCRBY
are all O(log n) β including the rank query that indexed SQL couldn't do cheaply. The catch:
it lives in RAM and isn't durable on its own.
| Capability | NaΓ―ve SQL | Indexed SQL | NaΓ―ve NoSQL | Redis ZSET | Hybrid (ideal) |
|---|---|---|---|---|---|
| Top-N | O(n log n) | β O(log n+k) | O(n log n) | β O(log n+k) | β O(log n+k) |
| "My rank" | O(n) | β οΈ O(n) | O(n) | β O(log n) | β O(log n) |
| Score update | O(1)* | O(log n) | β O(1) | O(log n) | O(log n) atomic |
| Always fresh | β | β | β | β | β οΈ eventual |
| Durable | β | β | β | β οΈ needs AOF/RDB | β (DB is truth) |
| Scale ceiling | ~10k rows | ~1M, read-bound | huge writes, no ranking | RAM / single node | sharded β very high |
* O(1) write, but every read pays the full sort, so it isn't really a win.
Back to the opening problem: 1M players, 10k updates/sec, instant rank/top-N/slice queries.
Two admins both open Alice's profile (score 100). One saves 150, the other saves 120 β each value typed against the 100 they saw. How do you stop the second save from silently clobbering the first?
If-Match) fixes stale-state "set to X" writes. Accept eventual consistency for displayed ranks, never for the write decision.ZADD, ZREVRANGE, ZREVRANK, ZINCRBY.