Designing a Gmail-Scale Email System: My High-Level System Design Study
System design interviews get much more interesting when the problem is not just "store messages" but "build a mailbox product that behaves like Gmail at massive scale." That was the focus of this study.
I wanted to design a Gmail-like email system that supports user registration, login with 2FA, profile creation, preferences, contacts and groups, sending emails with attachments, mailbox views like inbox/sent/spam/trash, tagging and labels, search, spam detection, and virus detection.
The more I worked through the problem, the clearer one thing became: this is not a single-service CRUD application. It is a collection of very different systems stitched together carefully, each with different performance, storage, and consistency needs.
The Scope I Designed For
I modeled the system around these assumptions:
- 2 billion users
- 50 emails per user per day
- 5 percent of emails include a 1 MB attachment
- 1 percent of users are active at a given time
- 10 percent of users opt into two-factor authentication
That immediately changes the architecture. At this scale, raw email text is not the main problem. Attachments dominate storage. Search needs its own index. Spam and virus processing cannot live fully on the synchronous path. Hot data has to be cached based on active users, not total users.
Capacity Estimates That Shaped the Design
Here are the rough numbers I used during the study:
- Email body storage per day: about 20 TB
- Attachment storage per day: about 5 PB
- With replication, total daily storage can quickly move into the 15 PB range before optimization
- Virus scanning is far more compute-heavy than spam classification because attachments are much larger than message bodies
- Contact caching should be based on the hot active set, not the entire 2 billion-user base
Those estimates led to a few strong conclusions:
- Message metadata and mailbox state belong in a fast distributed data store
- Attachments must live in object storage
- Search needs a dedicated inverted index
- Spam and virus processing need asynchronous pipelines
- Cache sizing must follow access patterns, not total dataset size
The Core Design Principle
The most important architectural decision in the whole system is this:
Keep the synchronous write path small, durable, and authoritative. Everything else should happen asynchronously off events.
That means the send-email path should do only a few things:
- Validate the request
- Persist the canonical message and per-user mailbox entries durably
- Emit an event for downstream systems
- Return success only after the durable write succeeds
This prevents the mail send path from being slowed down or broken by search indexing, spam classification, notifications, analytics, or contact ranking.
The High-Level Architecture
Here is the architecture I converged on:
I split the system into eight logical planes:
- Edge plane: global load balancer, API gateway, WAF, auth middleware, and rate limiting
- Identity plane: auth service, MFA service, token/session service, credential store, OTP/session cache
- User metadata plane: profile service, preference service, contacts service, and contact group service
- Mail write plane: compose/send API, draft service, recipient resolution, attachment validation
- Mailbox plane: inbox/sent/spam/trash state, labels, threads, read/unread, archive
- Attachment plane: upload service, object storage, metadata store, virus scanning, signed downloads
- Async enrichment plane: event bus, search indexer, spam classifier, category classifier, notifications, analytics
- Query plane: inbox queries, search API, autocomplete, thread fetch, message fetch, cache
This decomposition matters because the workload classes are completely different. Auth data is security-sensitive and low-latency. Mailbox metadata is heavily queried and updated. Attachments are large and mostly immutable. Search and spam state are derived systems that can lag slightly without breaking correctness.
Why the Mailbox Model Is the Most Important Data Modeling Decision
The biggest modeling insight in this design is that a message is not the same thing as mailbox state.
A naive approach says:
message -> labelsmessage -> read/unreadmessage -> spam
That falls apart immediately in a real email product.
The same message can be:
- unread for one user
- archived by another
- labeled "Work" by a third
- moved to spam by a fourth
So the canonical message must be separated from the per-user mailbox view.
That is why I introduced three distinct concepts:
- Message: immutable shared content like subject, body, sender, headers, and thread ID
- MessageRecipient: the recipient mapping for TO, CC, and BCC
- MailboxEntry: the per-user mailbox projection containing inbox state, read state, category, star, archive, spam, and timestamps
This is the single most important correctness point in a Gmail-like design.
Core Data Model
The main entities in my design are:
UserUserProfileUserPreferenceContactContactGroupContactGroupMemberMessageMessageRecipientAttachmentMailboxEntryLabelLabelAssignmentDraftThread
Two details matter a lot here:
MailboxEntryis partitioned byuser_idbecause inbox reads and mailbox actions are user-centricMessagecan be partitioned bymessage_idbecause it is immutable shared content
This split makes reads and writes much easier to scale.
The End-to-End Send Email Flow
The send flow is the heart of the system.
1. Compose submission
The client calls POST /emails/send with:
- recipients
- subject
- body
- attachment IDs
- an idempotency key
2. Validation
The Mail Write Service:
- authenticates the sender
- validates recipients and groups
- checks attachment scan status
- applies quota and rate-limit checks
3. Canonical durable write
The system persists:
- the immutable
Message - the
MessageRecipientrows - the sender's
MailboxEntryinSENT - recipient
MailboxEntryrows inINBOXor an initial classified state
4. Success acknowledgement
The API returns success only after the durable write succeeds. This protects against acknowledging a send and then losing the message.
5. Event publication
The system emits a MessageCreated or MailboxEntryCreated event to a durable bus.
6. Async enrichment
Downstream consumers then handle:
- search indexing
- spam classification
- promotions/social categorization
- notification fanout
- contact interaction ranking
- analytics
This is the right tradeoff because it keeps the critical path small while still enabling rich product behavior.
Attachments Need a Separate Architecture
Attachments dominate storage and scanning cost, so they cannot be treated like regular email metadata.
The upload flow I designed looks like this:
- Client requests a signed upload URL
- Client uploads the file directly to object storage
- Upload service writes attachment metadata with status
PENDING_SCAN - An
AttachmentUploadedevent triggers the virus scanning pipeline - The attachment becomes
SAFE,INFECTED,QUARANTINED, orFAILED
The important design choice here is to avoid proxying large uploads through the app servers whenever possible. Direct-to-object-storage uploads keep the application tier lighter and cheaper.
The second important decision is quarantine: an unscanned or unsafe file should not be downloadable just because the upload succeeded.
Search, Spam, and Virus Processing Belong Off the Critical Path
Search is not a source of truth. It is a retrieval accelerator.
My search design uses a dedicated inverted index that stores:
message_iduser_id- sender and recipient fields
- subject tokens
- body tokens
- attachment names
- labels and categories
- timestamps
The query flow is:
- User hits the search API
- Search service returns candidate IDs from the index
- Mailbox service fetches canonical metadata from the source-of-truth store
- Response is assembled and returned
This means search can be eventually consistent. A just-sent email may take a short time to appear in search, which is acceptable.
Spam handling follows a similar hybrid model:
- lightweight checks inline for sender reputation, blocklists, and policy validation
- heavier content and behavior analysis asynchronously
Virus detection is even more expensive, so it is even more clearly an asynchronous pipeline with quarantine enforcement.
Storage Strategy
I used three major storage layers:
1. Distributed metadata store
For:
- users
- profiles
- preferences
- contacts
- messages metadata
- recipients
- mailbox entries
- labels
- drafts
2. Object storage
For:
- attachments
- large message bodies if needed
- avatars
- large draft bodies
3. Cache and search systems
For:
- sessions
- OTPs
- rate limits
- hot mailbox summaries
- autocomplete hotsets
- search indexes
The reason this split works is simple: structured OLTP workloads and large immutable blobs have completely different economics and performance characteristics.
Partitioning Strategy
Partitioning is easiest when it follows the access pattern.
- Mailbox tables by
user_idbecause inbox loads, archive, read/unread, labels, and spam actions are all user-centric - Message table by
message_idor time-based shard because messages are immutable shared objects - Search by
user_idownership range because access control and query scoping become easier - Attachments by object key because object storage already scales this naturally
This is one of those design choices that looks small on paper but determines whether the system remains operational at scale.
Consistency Tradeoffs
A strong system design answer always draws the line between what must be strongly consistent and what can be eventually consistent.
Strong consistency required for
- registration and account verification
- password and session correctness
- durable message write before send acknowledgement
- mailbox entry creation tied to send success
- label updates on source-of-truth mailbox records
Eventual consistency acceptable for
- search indexing
- spam and category updates
- autocomplete freshness
- contact ranking
- notifications
- analytics
That boundary is what keeps the system reliable without making every subsystem part of the critical path.
Failure Modes I Explicitly Designed Around
The architecture is only credible if it handles partial failure well.
Message written but event never published
Fix: use the transactional outbox pattern so the message write and outbox record happen in the same transaction.
Event delivered more than once
Fix: make all consumers idempotent using event IDs, upserts, and dedupe checkpoints.
Attachment uploaded but never scanned
Fix: retries, timeout handling, dead-letter queues, and reconciliation jobs for stuck PENDING_SCAN files.
Search indexing lag grows too much
Fix: monitor queue lag, autoscale indexers, and optionally fall back to recent-message DB reads for fresh content.
Spam classifier goes down
Fix: keep lightweight inline defenses active and reclassify asynchronously when the classifier recovers.
These are the kinds of operational details that move a design from "diagram-level" to "senior-level."
Security Architecture
Because this is email, security is not a side note. It is a first-class subsystem.
I included:
- password hashing with Argon2 or bcrypt
- MFA support
- refresh token revocation
- brute-force and resend rate limiting
- suspicious login detection
- TLS in transit
- encryption at rest
- least-privilege service authentication
- send-rate abuse controls
- malware and phishing defenses
For a Gmail-like product, trust and abuse prevention are inseparable from the core architecture.
The Optional Internet Email Extension
If the interviewer means "Gmail" not just as a mailbox product but as a full internet email provider, I would extend the design with:
- SMTP ingress and outbound relay
- MX records
- bounce processing
- SPF, DKIM, and DMARC validation/signing
- retry queues for outbound delivery
I think this is an important clarification in interviews because "email app" and "global email provider" are related but meaningfully different scopes.
What I Would Say In the Interview
If I had to summarize the design in a short, interview-ready answer, I would say:
I would design the system around a small, durable mail write path and a set of asynchronous enrichment pipelines. Identity is handled by a dedicated auth subsystem with Redis-backed OTP and session state. Mail stores immutable message content separately from per-user mailbox entries because labels, read state, spam state, and archive state are user-specific. Attachments go to object storage and are scanned asynchronously with quarantine until safe. After a message is durably written, the system emits events that drive search indexing, spam and category classification, notifications, contact-ranking updates, and analytics. The source-of-truth mailbox store is strongly consistent, while search and classification systems are eventually consistent.
That captures the core idea while still showing that the design is grounded in scale, correctness, and operational realism.
Final Takeaways
This study pushed me toward three big lessons:
- The mailbox model matters more than most people expect. Shared messages and per-user mailbox state must be separated.
- Attachments change everything. They dominate both storage and scanning cost.
- Asynchronous design is not optional at this scale. Search, spam, virus scanning, notifications, and analytics all need to scale independently of the send path.
What I like most about this problem is that it looks familiar on the surface, but underneath it forces you to think clearly about data modeling, consistency, queues, storage economics, and failure handling all at once.
If you are preparing for system design interviews, this is one of the best problems to study because it tests both breadth and architectural judgment.
Ayush Jaipuriar
Full Stack Software Engineer at TransUnion, specializing in modern web technologies and cloud solutions.