How Discord Scaled to Billions of Messages: Architecture Deep Dive

Discord went from a gaming chat app to infrastructure handling 4 billion messages per day, 19 million active servers, and millions of concurrent WebSocket connections — all with a latency guarantee that makes it feel like the other person is in the same room. This kind of scale breaks every assumption you carry over from building normal web apps.

The engineering decisions Discord made — and unmade — along the way are a masterclass in distributed systems under real production pressure.

Where Discord Started: MongoDB

In 2015, Discord chose MongoDB. It was the right call for a startup: flexible schema, easy horizontal scaling, fast iteration. A single messages collection, indexed by channel.

// The original simple schema
{
  _id: ObjectId,
  channel_id: String,
  author_id: String,
  content: String,
  timestamp: Date,
  attachments: Array,
  reactions: Object
}

For the first year, this worked fine. But as Discord grew, a fundamental mismatch emerged between how MongoDB stores data and how Discord accesses it.

The Problem: Random I/O at Scale

MongoDB stores documents on disk in B-trees. As the messages collection grew to hundreds of millions of records, queries for recent messages in a channel would touch hot pages — but older data scattered across cold disk sectors caused massive I/O amplification.

Query: "Get last 50 messages in channel X"
MongoDB: Traverse B-tree index → random seek across disk pages → 
         evict hot data from RAM to load cold pages
Result:  p99 latency spikes during traffic bursts

Worse, MongoDB's RAM was consumed by the working set of active channels. When traffic spiked (game releases, esports events), the working set exploded, RAM filled up, and the database started hitting disk constantly. Latency p99 went from milliseconds to seconds.

By 2017, Discord was storing 100 million messages per day and MongoDB was visibly buckling.

The Migration to Cassandra

Cassandra's data model is built for time-series append workloads — exactly what chat is. Messages are almost always written once, read in time order, and never updated (except reactions and edits, which are a small fraction).

Cassandra's storage engine (LSM tree) is optimized for sequential writes and range reads by partition key. Discord's access pattern — "get the last N messages in channel C" — maps perfectly.

Schema Design

CREATE TABLE messages (
  channel_id    bigint,
  bucket        int,        -- time bucket (10-day windows)
  message_id    bigint,     -- Snowflake ID (time-ordered)
  author_id     bigint,
  content       text,
  attachments   list<text>,
  reactions     map<text, int>,
  PRIMARY KEY   ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'DAYS',
    'compaction_window_size': 10
  };

The composite partition key (channel_id, bucket) is the most important design decision here. Bucketing by time window means:

Hot data (recent messages) stays on a small set of nodes — no scatter-gather across the ring for most queries
Cold data (old buckets) can be archived or tiered to cheaper storage
Compaction is bounded — Cassandra only compacts within the time window, not the entire table

const BUCKET_SIZE_MS = 10 * 24 * 60 * 60 * 1000; // 10 days
 
function getBucket(messageId: bigint): number {
  // Snowflake IDs embed the timestamp in the top 42 bits
  const timestamp = Number(messageId >> 22n) + DISCORD_EPOCH;
  return Math.floor(timestamp / BUCKET_SIZE_MS);
}
 
async function getMessages(
  channelId: bigint,
  beforeMessageId: bigint,
  limit: number
): Promise<Message[]> {
  const results: Message[] = [];
  let bucket = getBucket(beforeMessageId);
 
  while (results.length < limit && bucket >= 0) {
    const rows = await cassandra.execute(
      `SELECT * FROM messages
       WHERE channel_id = ? AND bucket = ? AND message_id < ?
       ORDER BY message_id DESC
       LIMIT ?`,
      [channelId, bucket, beforeMessageId, limit - results.length],
      { prepare: true }
    );
 
    results.push(...rows);
    bucket--;
    beforeMessageId = BigInt(bucket + 1) * BigInt(BUCKET_SIZE_MS) << 22n;
  }
 
  return results.slice(0, limit);
}

Write Path: Idempotent Appends

async function insertMessage(message: Message): Promise<void> {
  const bucket = getBucket(BigInt(message.id));
 
  await cassandra.execute(
    `INSERT INTO messages
       (channel_id, bucket, message_id, author_id, content, attachments)
     VALUES (?, ?, ?, ?, ?, ?)
     USING TIMESTAMP ?`,
    [
      message.channelId,
      bucket,
      message.id,
      message.authorId,
      message.content,
      message.attachments,
      message.id >> 22n, // use Snowflake timestamp as write timestamp
    ],
    { prepare: true }
  );
}

Using the Snowflake timestamp as the Cassandra write timestamp makes inserts idempotent. Retries from network failures or message broker redelivery produce the same result — no duplicates in the database.

Snowflake IDs: Time-Ordered at Scale

Discord's message IDs are Snowflakes — 64-bit integers encoding time, worker ID, and sequence number. This is the same scheme Twitter introduced.

63     22 21     12 11      0
[timestamp][worker_id][sequence]
    42 bits    10 bits   12 bits

const DISCORD_EPOCH = 1420070400000n; // Jan 1, 2015
 
function generateSnowflake(workerId: bigint, sequence: bigint): bigint {
  const timestamp = BigInt(Date.now()) - DISCORD_EPOCH;
  return (timestamp << 22n) | (workerId << 12n) | sequence;
}
 
function snowflakeToTimestamp(id: bigint): Date {
  const ms = (id >> 22n) + DISCORD_EPOCH;
  return new Date(Number(ms));
}

Snowflakes give Discord:

Globally unique IDs without coordination (no UUID collisions, no central counter)
Time-ordered by default — sorting by ID sorts by creation time
Shard affinity — messages from the same time period land in the same Cassandra partition
Efficient range queries — "messages before ID X" is a natural Cassandra range scan

Real-Time Delivery: WebSockets at Scale

Storing messages in Cassandra is the persistence layer. Getting messages to users in milliseconds requires a completely different system.

The Gateway: Millions of Persistent Connections

Discord's Gateway service maintains the WebSocket connection for every connected client. At peak, this is millions of simultaneous connections across thousands of Gateway instances.

class GatewaySession {
  private ws: WebSocket;
  private userId: bigint;
  private subscribedGuilds: Set<bigint>;
  private heartbeatInterval: NodeJS.Timeout;
 
  constructor(ws: WebSocket, userId: bigint) {
    this.ws = ws;
    this.userId = userId;
    this.subscribedGuilds = new Set();
    this.startHeartbeat();
  }
 
  dispatch(event: GatewayEvent) {
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify({
        op: GatewayOpcode.DISPATCH,
        t: event.type,
        d: event.data,
        s: this.nextSequence(),
      }));
    }
  }
 
  private startHeartbeat() {
    this.heartbeatInterval = setInterval(() => {
      this.ws.send(JSON.stringify({ op: GatewayOpcode.HEARTBEAT }));
    }, 41250); // 41.25 second interval
  }
}

Pub/Sub Fan-Out

When a message is sent to a channel, Discord needs to deliver it to every user currently viewing that channel — potentially across thousands of Gateway instances.

The fan-out pipeline:

Client sends message
       │
       ▼
  API Server
  ├── Write to Cassandra (persistence)
  └── Publish to Pub/Sub topic: "channel:{channel_id}"
              │
    ┌─────────┴────────────────┐
    ▼         ▼                ▼
Gateway A  Gateway B  ...  Gateway N
(fans out  (fans out        (fans out
to local   to local         to local
sessions)  sessions)        sessions)

// API server — after writing to Cassandra
async function handleSendMessage(req: Request): Promise<Message> {
  const message = await persistMessage(req.body);
 
  // Fan out to all gateways subscribed to this channel
  await pubsub.publish(`channel:${message.channelId}`, {
    type: "MESSAGE_CREATE",
    data: message,
  });
 
  return message;
}
 
// Gateway — subscribes to channels where users are connected
class ChannelSubscriptionManager {
  private subscriptions = new Map<bigint, Set<GatewaySession>>();
 
  subscribe(channelId: bigint, session: GatewaySession) {
    if (!this.subscriptions.has(channelId)) {
      this.subscriptions.set(channelId, new Set());
      // Subscribe this gateway instance to the channel topic
      pubsub.subscribe(`channel:${channelId}`, (event) => {
        this.broadcast(channelId, event);
      });
    }
    this.subscriptions.get(channelId)!.add(session);
  }
 
  private broadcast(channelId: bigint, event: GatewayEvent) {
    const sessions = this.subscriptions.get(channelId);
    if (!sessions) return;
 
    for (const session of sessions) {
      session.dispatch(event);
    }
  }
}

Read States: The Hidden Scaling Nightmare

Read states — tracking which messages each user has read in each channel — turned out to be one of Discord's hardest engineering problems. It sounds simple: store {user_id, channel_id, last_read_message_id}. In practice, this table receives billions of writes per day.

Every time a user views a channel, their read state updates. With millions of active users across tens of millions of channels, this was the hottest write path in all of Discord.

Why It Needed a Rewrite

Discord originally implemented read states in a separate service backed by Cassandra. The volume was manageable — until server boosts, threads, and forum channels multiplied the cardinality of (user_id, channel_id) combinations by 10x.

At peak, the read states service was handling write latency spikes that cascaded into visible unread badge lag — users would navigate to a channel and still see it marked unread seconds later.

The Rust Rewrite

Discord rewrote the read states service in Rust, replacing their Elixir/Erlang implementation. The key reasons:

Memory efficiency: Rust's ownership model eliminates GC pauses. Elixir's actor model has per-process overhead that adds up at millions of concurrent users.
Predictable latency: No GC stop-the-world events means consistent p99 latency.
Cache density: Rust structs are tightly packed. More read states fit in the same RAM.

The Rust service batches writes aggressively before flushing to Cassandra:

use std::collections::HashMap;
use tokio::sync::RwLock;
use std::time::{Duration, Instant};
 
struct ReadStateCache {
    // In-memory: user_id → channel_id → last_read_message_id
    states: RwLock<HashMap<u64, HashMap<u64, u64>>>,
    dirty: RwLock<HashMap<(u64, u64), u64>>,
    last_flush: RwLock<Instant>,
    flush_interval: Duration,
}
 
impl ReadStateCache {
    async fn mark_read(&self, user_id: u64, channel_id: u64, message_id: u64) {
        let mut states = self.states.write().await;
        states
            .entry(user_id)
            .or_default()
            .insert(channel_id, message_id);
 
        let mut dirty = self.dirty.write().await;
        dirty.insert((user_id, channel_id), message_id);
    }
 
    async fn flush_to_cassandra(&self, cassandra: &CassandraClient) {
        let mut dirty = self.dirty.write().await;
        if dirty.is_empty() {
            return;
        }
 
        let batch: Vec<_> = dirty.drain().collect();
        drop(dirty); // release lock before async I/O
 
        // Write in parallel batches
        for chunk in batch.chunks(500) {
            let futures: Vec<_> = chunk
                .iter()
                .map(|((user_id, channel_id), message_id)| {
                    cassandra.execute(
                        "UPDATE read_states SET last_read_id = ? \
                         WHERE user_id = ? AND channel_id = ?",
                        (*message_id, *user_id, *channel_id),
                    )
                })
                .collect();
 
            futures::future::join_all(futures).await;
        }
    }
}

The result: read state write latency dropped by 99%, the unread badge lag disappeared, and the service used a fraction of the previous memory footprint.

Guaranteed Delivery and Resumption

Discord guarantees at-least-once delivery — no message disappears because of a network blip.

Sequence Numbers and Resumption

Every Gateway session maintains a monotonically increasing sequence number. If a client reconnects, it sends its last received sequence number and the Gateway replays any missed events.

class EventBuffer {
  private events: Map<number, GatewayEvent> = new Map();
  private maxSequence = 0;
 
  push(event: GatewayEvent): number {
    const seq = ++this.maxSequence;
    this.events.set(seq, event);
 
    // Keep only last 250 events in memory
    if (this.events.size > 250) {
      const oldest = this.maxSequence - 250;
      this.events.delete(oldest);
    }
 
    return seq;
  }
 
  replay(fromSequence: number, session: GatewaySession) {
    for (let seq = fromSequence + 1; seq <= this.maxSequence; seq++) {
      const event = this.events.get(seq);
      if (event) {
        session.dispatch(event);
      } else {
        // Gap in buffer — client needs full reconnect
        session.requestFullResync();
        return;
      }
    }
  }
}

Exponential Backoff on Reconnect

The Discord client implements exponential backoff with jitter to prevent thundering herd when a Gateway instance restarts:

async function connectWithRetry(url: string): Promise<WebSocket> {
  let attempt = 0;
  const maxDelay = 32000;
 
  while (true) {
    try {
      return await connectWebSocket(url);
    } catch {
      const baseDelay = Math.min(1000 * 2 ** attempt, maxDelay);
      const jitter = Math.random() * 0.3 * baseDelay;
      const delay = baseDelay + jitter;
 
      console.log(`Reconnecting in ${delay.toFixed(0)}ms (attempt ${++attempt})`);
      await sleep(delay);
    }
  }
}

Without the jitter, millions of clients simultaneously reconnecting to the same Gateway instances after a restart would overwhelm them. The random jitter spreads reconnections over time.

Architecture Overview

                        ┌──────────────────┐
                        │   Discord Client  │
                        │ (Web/Mobile/App)  │
                        └───────┬──────────┘
                   WebSocket    │    REST
          ┌────────────────────┼───────────────────┐
          ▼                    ▼                   ▼
   ┌─────────────┐     ┌──────────────┐    ┌────────────┐
   │   Gateway   │     │  API Server  │    │ Read State │
   │  (Elixir)   │     │  (Elixir)    │    │  Service   │
   │ WebSocket   │     │  REST/gRPC   │    │   (Rust)   │
   └──────┬──────┘     └──────┬───────┘    └─────┬──────┘
          │                   │                   │
          │◄──── Pub/Sub ─────┤                   │
          │    (Redis/NATS)   │                   │
          │                   ▼                   ▼
          │            ┌──────────────────────────────┐
          │            │        Cassandra Cluster      │
          │            │   messages · read_states ·    │
          │            │   channels · guild_members    │
          └────────────└──────────────────────────────┘

Performance at Scale

Metric	Value
Daily messages	4+ billion
Peak messages/second	~100,000
Concurrent WebSocket connections	Millions
Message delivery p99 latency	<50ms
Read state writes/day	Billions
Cassandra nodes	Hundreds
Gateway instances	Thousands

What You Can Take From Discord's Architecture

You do not need Discord's scale to apply these patterns — they solve problems that appear far earlier than billions of messages a day.

Model your access patterns before choosing a database. Cassandra was right for Discord because chat is a time-series append workload. MongoDB was wrong not because it is bad — because its B-tree storage model mismatches sequential time-range reads.
Snowflake IDs give you time-ordering for free. Any distributed system that needs ordered IDs without coordination should consider them.
Batch writes aggressively for high-cardinality update workloads. The read states service was fixed by batching before Cassandra, not by making Cassandra faster.
Rust's predictable latency matters when GC pauses cascade. For latency-sensitive hot paths, GC-managed runtimes introduce tail latency that compounds across service calls.
Jitter in retry logic is not optional at scale. Thundering herd on reconnect is real and expensive.

Key Takeaways

Discord migrated from MongoDB to Cassandra because chat is a time-series workload — bucket-partitioned writes and range reads are Cassandra's strength
Snowflake IDs provide globally unique, time-ordered identifiers without coordination — and enable efficient range queries
Real-time delivery uses a pub/sub fan-out from API servers to all Gateway instances maintaining WebSocket connections
Read states were rewritten in Rust to eliminate GC pauses and maximize cache density, dropping write latency by 99%
Sequence numbers and event buffers enable seamless session resumption after network drops
Exponential backoff with jitter prevents thundering herd on Gateway restarts

Distributed systems at scale are full of decisions that look wrong until you understand the specific constraints. Discord's architecture is a case study in matching your data model to your access patterns — and being willing to rewrite when the mismatch becomes a production problem.

Found this useful? Follow on Twitter/X for more deep dives!