Designing a Gmail-Scale Email System: My High-Level System Design Study

System design interviews get much more interesting when the problem is not just "store messages" but "build a mailbox product that behaves like Gmail at massive scale." That was the focus of this study.

I wanted to design a Gmail-like email system that supports user registration, login with 2FA, profile creation, preferences, contacts and groups, sending emails with attachments, mailbox views like inbox/sent/spam/trash, tagging and labels, search, spam detection, and virus detection.

The more I worked through the problem, the clearer one thing became: this is not a single-service CRUD application. It is a collection of very different systems stitched together carefully, each with different performance, storage, and consistency needs.

The Scope I Designed For

I modeled the system around these assumptions:

2 billion users
50 emails per user per day
5 percent of emails include a 1 MB attachment
1 percent of users are active at a given time
10 percent of users opt into two-factor authentication

That immediately changes the architecture. At this scale, raw email text is not the main problem. Attachments dominate storage. Search needs its own index. Spam and virus processing cannot live fully on the synchronous path. Hot data has to be cached based on active users, not total users.

Capacity Estimates That Shaped the Design

Here are the rough numbers I used during the study:

Email body storage per day: about 20 TB
Attachment storage per day: about 5 PB
With replication, total daily storage can quickly move into the 15 PB range before optimization
Virus scanning is far more compute-heavy than spam classification because attachments are much larger than message bodies
Contact caching should be based on the hot active set, not the entire 2 billion-user base

Those estimates led to a few strong conclusions:

Message metadata and mailbox state belong in a fast distributed data store
Attachments must live in object storage
Search needs a dedicated inverted index
Spam and virus processing need asynchronous pipelines
Cache sizing must follow access patterns, not total dataset size

The Core Design Principle

The most important architectural decision in the whole system is this:

Keep the synchronous write path small, durable, and authoritative. Everything else should happen asynchronously off events.

That means the send-email path should do only a few things:

Validate the request
Persist the canonical message and per-user mailbox entries durably
Emit an event for downstream systems
Return success only after the durable write succeeds

This prevents the mail send path from being slowed down or broken by search indexing, spam classification, notifications, analytics, or contact ranking.

The High-Level Architecture

Here is the architecture I converged on:

High-level architecture for a Gmail-like email system showing the edge plane, identity plane, mail write path, mailbox state, event bus, search, and attachment processing.

I split the system into eight logical planes:

Edge plane: global load balancer, API gateway, WAF, auth middleware, and rate limiting
Identity plane: auth service, MFA service, token/session service, credential store, OTP/session cache
User metadata plane: profile service, preference service, contacts service, and contact group service
Mail write plane: compose/send API, draft service, recipient resolution, attachment validation
Mailbox plane: inbox/sent/spam/trash state, labels, threads, read/unread, archive
Attachment plane: upload service, object storage, metadata store, virus scanning, signed downloads
Async enrichment plane: event bus, search indexer, spam classifier, category classifier, notifications, analytics
Query plane: inbox queries, search API, autocomplete, thread fetch, message fetch, cache

This decomposition matters because the workload classes are completely different. Auth data is security-sensitive and low-latency. Mailbox metadata is heavily queried and updated. Attachments are large and mostly immutable. Search and spam state are derived systems that can lag slightly without breaking correctness.

Why the Mailbox Model Is the Most Important Data Modeling Decision

The biggest modeling insight in this design is that a message is not the same thing as mailbox state.

A naive approach says:

message -> labels
message -> read/unread
message -> spam

That falls apart immediately in a real email product.

The same message can be:

unread for one user
archived by another
labeled "Work" by a third
moved to spam by a fourth

So the canonical message must be separated from the per-user mailbox view.

That is why I introduced three distinct concepts:

Message: immutable shared content like subject, body, sender, headers, and thread ID
MessageRecipient: the recipient mapping for TO, CC, and BCC
MailboxEntry: the per-user mailbox projection containing inbox state, read state, category, star, archive, spam, and timestamps

This is the single most important correctness point in a Gmail-like design.

Core Data Model

The main entities in my design are:

User
UserProfile
UserPreference
Contact
ContactGroup
ContactGroupMember
Message
MessageRecipient
Attachment
MailboxEntry
Label
LabelAssignment
Draft
Thread

Two details matter a lot here:

MailboxEntry is partitioned by user_id because inbox reads and mailbox actions are user-centric
Message can be partitioned by message_id because it is immutable shared content

This split makes reads and writes much easier to scale.

The End-to-End Send Email Flow

The send flow is the heart of the system.

1. Compose submission

The client calls POST /emails/send with:

recipients
subject
body
attachment IDs
an idempotency key

2. Validation

The Mail Write Service:

authenticates the sender
validates recipients and groups
checks attachment scan status
applies quota and rate-limit checks

3. Canonical durable write

The system persists:

the immutable Message
the MessageRecipient rows
the sender's MailboxEntry in SENT
recipient MailboxEntry rows in INBOX or an initial classified state

4. Success acknowledgement

The API returns success only after the durable write succeeds. This protects against acknowledging a send and then losing the message.

5. Event publication

The system emits a MessageCreated or MailboxEntryCreated event to a durable bus.

6. Async enrichment

Downstream consumers then handle:

search indexing
spam classification
promotions/social categorization
notification fanout
contact interaction ranking
analytics

This is the right tradeoff because it keeps the critical path small while still enabling rich product behavior.

Attachments Need a Separate Architecture

Attachments dominate storage and scanning cost, so they cannot be treated like regular email metadata.

The upload flow I designed looks like this:

Client requests a signed upload URL
Client uploads the file directly to object storage
Upload service writes attachment metadata with status PENDING_SCAN
An AttachmentUploaded event triggers the virus scanning pipeline
The attachment becomes SAFE, INFECTED, QUARANTINED, or FAILED

The important design choice here is to avoid proxying large uploads through the app servers whenever possible. Direct-to-object-storage uploads keep the application tier lighter and cheaper.

The second important decision is quarantine: an unscanned or unsafe file should not be downloadable just because the upload succeeded.

Search, Spam, and Virus Processing Belong Off the Critical Path

Search is not a source of truth. It is a retrieval accelerator.

My search design uses a dedicated inverted index that stores:

message_id
user_id
sender and recipient fields
subject tokens
body tokens
attachment names
labels and categories
timestamps

The query flow is:

User hits the search API
Search service returns candidate IDs from the index
Mailbox service fetches canonical metadata from the source-of-truth store
Response is assembled and returned

This means search can be eventually consistent. A just-sent email may take a short time to appear in search, which is acceptable.

Spam handling follows a similar hybrid model:

lightweight checks inline for sender reputation, blocklists, and policy validation
heavier content and behavior analysis asynchronously

Virus detection is even more expensive, so it is even more clearly an asynchronous pipeline with quarantine enforcement.

Storage Strategy

I used three major storage layers:

1. Distributed metadata store

For:

users
profiles
preferences
contacts
messages metadata
recipients
mailbox entries
labels
drafts

2. Object storage

For:

attachments
large message bodies if needed
avatars
large draft bodies

3. Cache and search systems

For:

sessions
OTPs
rate limits
hot mailbox summaries
autocomplete hotsets
search indexes

The reason this split works is simple: structured OLTP workloads and large immutable blobs have completely different economics and performance characteristics.

Partitioning Strategy

Partitioning is easiest when it follows the access pattern.

Mailbox tables by user_id because inbox loads, archive, read/unread, labels, and spam actions are all user-centric
Message table by message_id or time-based shard because messages are immutable shared objects
Search by user_id ownership range because access control and query scoping become easier
Attachments by object key because object storage already scales this naturally

This is one of those design choices that looks small on paper but determines whether the system remains operational at scale.

Consistency Tradeoffs

A strong system design answer always draws the line between what must be strongly consistent and what can be eventually consistent.

Strong consistency required for

registration and account verification
password and session correctness
durable message write before send acknowledgement
mailbox entry creation tied to send success
label updates on source-of-truth mailbox records

Eventual consistency acceptable for

search indexing
spam and category updates
autocomplete freshness
contact ranking
notifications
analytics

That boundary is what keeps the system reliable without making every subsystem part of the critical path.

Failure Modes I Explicitly Designed Around

The architecture is only credible if it handles partial failure well.

Message written but event never published

Fix: use the transactional outbox pattern so the message write and outbox record happen in the same transaction.

Event delivered more than once

Fix: make all consumers idempotent using event IDs, upserts, and dedupe checkpoints.

Attachment uploaded but never scanned

Fix: retries, timeout handling, dead-letter queues, and reconciliation jobs for stuck PENDING_SCAN files.

Search indexing lag grows too much

Fix: monitor queue lag, autoscale indexers, and optionally fall back to recent-message DB reads for fresh content.

Spam classifier goes down

Fix: keep lightweight inline defenses active and reclassify asynchronously when the classifier recovers.

These are the kinds of operational details that move a design from "diagram-level" to "senior-level."

Security Architecture

Because this is email, security is not a side note. It is a first-class subsystem.

I included:

password hashing with Argon2 or bcrypt
MFA support
refresh token revocation
brute-force and resend rate limiting
suspicious login detection
TLS in transit
encryption at rest
least-privilege service authentication
send-rate abuse controls
malware and phishing defenses

For a Gmail-like product, trust and abuse prevention are inseparable from the core architecture.

The Optional Internet Email Extension

If the interviewer means "Gmail" not just as a mailbox product but as a full internet email provider, I would extend the design with:

SMTP ingress and outbound relay
MX records
bounce processing
SPF, DKIM, and DMARC validation/signing
retry queues for outbound delivery

I think this is an important clarification in interviews because "email app" and "global email provider" are related but meaningfully different scopes.

What I Would Say In the Interview

If I had to summarize the design in a short, interview-ready answer, I would say:

I would design the system around a small, durable mail write path and a set of asynchronous enrichment pipelines. Identity is handled by a dedicated auth subsystem with Redis-backed OTP and session state. Mail stores immutable message content separately from per-user mailbox entries because labels, read state, spam state, and archive state are user-specific. Attachments go to object storage and are scanned asynchronously with quarantine until safe. After a message is durably written, the system emits events that drive search indexing, spam and category classification, notifications, contact-ranking updates, and analytics. The source-of-truth mailbox store is strongly consistent, while search and classification systems are eventually consistent.

That captures the core idea while still showing that the design is grounded in scale, correctness, and operational realism.

Final Takeaways

This study pushed me toward three big lessons:

The mailbox model matters more than most people expect. Shared messages and per-user mailbox state must be separated.
Attachments change everything. They dominate both storage and scanning cost.
Asynchronous design is not optional at this scale. Search, spam, virus scanning, notifications, and analytics all need to scale independently of the send path.

What I like most about this problem is that it looks familiar on the surface, but underneath it forces you to think clearly about data modeling, consistency, queues, storage economics, and failure handling all at once.

If you are preparing for system design interviews, this is one of the best problems to study because it tests both breadth and architectural judgment.

Designing a Gmail-Scale Email System: My High-Level System Design Study

The Scope I Designed For

Capacity Estimates That Shaped the Design

The Core Design Principle

The High-Level Architecture

Why the Mailbox Model Is the Most Important Data Modeling Decision

Core Data Model

The End-to-End Send Email Flow

1. Compose submission

2. Validation

3. Canonical durable write

4. Success acknowledgement

5. Event publication

6. Async enrichment

Attachments Need a Separate Architecture

Search, Spam, and Virus Processing Belong Off the Critical Path

Storage Strategy

1. Distributed metadata store

2. Object storage

3. Cache and search systems

Partitioning Strategy

Consistency Tradeoffs

Strong consistency required for

Eventual consistency acceptable for

Failure Modes I Explicitly Designed Around

Message written but event never published

Event delivered more than once

Attachment uploaded but never scanned

Search indexing lag grows too much

Spam classifier goes down

Security Architecture

The Optional Internet Email Extension

What I Would Say In the Interview

Final Takeaways

Ayush Jaipuriar

Share this post