Designing a Gmail-Scale Email System: My High-Level System Design Study

Designing a Gmail-Scale Email System: My High-Level System Design Study

System design interviews get much more interesting when the problem is not just "store messages" but "build a mailbox product that behaves like Gmail at massive scale." That was the focus of this study.

I wanted to design a Gmail-like email system that supports user registration, login with 2FA, profile creation, preferences, contacts and groups, sending emails with attachments, mailbox views like inbox/sent/spam/trash, tagging and labels, search, spam detection, and virus detection.

The more I worked through the problem, the clearer one thing became: this is not a single-service CRUD application. It is a collection of very different systems stitched together carefully, each with different performance, storage, and consistency needs.

The Scope I Designed For

I modeled the system around these assumptions:

  • 2 billion users
  • 50 emails per user per day
  • 5 percent of emails include a 1 MB attachment
  • 1 percent of users are active at a given time
  • 10 percent of users opt into two-factor authentication

That immediately changes the architecture. At this scale, raw email text is not the main problem. Attachments dominate storage. Search needs its own index. Spam and virus processing cannot live fully on the synchronous path. Hot data has to be cached based on active users, not total users.

Capacity Estimates That Shaped the Design

Here are the rough numbers I used during the study:

  • Email body storage per day: about 20 TB
  • Attachment storage per day: about 5 PB
  • With replication, total daily storage can quickly move into the 15 PB range before optimization
  • Virus scanning is far more compute-heavy than spam classification because attachments are much larger than message bodies
  • Contact caching should be based on the hot active set, not the entire 2 billion-user base

Those estimates led to a few strong conclusions:

  • Message metadata and mailbox state belong in a fast distributed data store
  • Attachments must live in object storage
  • Search needs a dedicated inverted index
  • Spam and virus processing need asynchronous pipelines
  • Cache sizing must follow access patterns, not total dataset size

The Core Design Principle

The most important architectural decision in the whole system is this:

Keep the synchronous write path small, durable, and authoritative. Everything else should happen asynchronously off events.

That means the send-email path should do only a few things:

  1. Validate the request
  2. Persist the canonical message and per-user mailbox entries durably
  3. Emit an event for downstream systems
  4. Return success only after the durable write succeeds

This prevents the mail send path from being slowed down or broken by search indexing, spam classification, notifications, analytics, or contact ranking.

The High-Level Architecture

Here is the architecture I converged on:

High-level architecture for a Gmail-like email system showing the edge plane, identity plane, mail write path, mailbox state, event bus, search, and attachment processing.

I split the system into eight logical planes:

  • Edge plane: global load balancer, API gateway, WAF, auth middleware, and rate limiting
  • Identity plane: auth service, MFA service, token/session service, credential store, OTP/session cache
  • User metadata plane: profile service, preference service, contacts service, and contact group service
  • Mail write plane: compose/send API, draft service, recipient resolution, attachment validation
  • Mailbox plane: inbox/sent/spam/trash state, labels, threads, read/unread, archive
  • Attachment plane: upload service, object storage, metadata store, virus scanning, signed downloads
  • Async enrichment plane: event bus, search indexer, spam classifier, category classifier, notifications, analytics
  • Query plane: inbox queries, search API, autocomplete, thread fetch, message fetch, cache

This decomposition matters because the workload classes are completely different. Auth data is security-sensitive and low-latency. Mailbox metadata is heavily queried and updated. Attachments are large and mostly immutable. Search and spam state are derived systems that can lag slightly without breaking correctness.

Why the Mailbox Model Is the Most Important Data Modeling Decision

The biggest modeling insight in this design is that a message is not the same thing as mailbox state.

A naive approach says:

  • message -> labels
  • message -> read/unread
  • message -> spam

That falls apart immediately in a real email product.

The same message can be:

  • unread for one user
  • archived by another
  • labeled "Work" by a third
  • moved to spam by a fourth

So the canonical message must be separated from the per-user mailbox view.

That is why I introduced three distinct concepts:

  • Message: immutable shared content like subject, body, sender, headers, and thread ID
  • MessageRecipient: the recipient mapping for TO, CC, and BCC
  • MailboxEntry: the per-user mailbox projection containing inbox state, read state, category, star, archive, spam, and timestamps

This is the single most important correctness point in a Gmail-like design.

Core Data Model

The main entities in my design are:

  • User
  • UserProfile
  • UserPreference
  • Contact
  • ContactGroup
  • ContactGroupMember
  • Message
  • MessageRecipient
  • Attachment
  • MailboxEntry
  • Label
  • LabelAssignment
  • Draft
  • Thread

Two details matter a lot here:

  • MailboxEntry is partitioned by user_id because inbox reads and mailbox actions are user-centric
  • Message can be partitioned by message_id because it is immutable shared content

This split makes reads and writes much easier to scale.

The End-to-End Send Email Flow

The send flow is the heart of the system.

1. Compose submission

The client calls POST /emails/send with:

  • recipients
  • subject
  • body
  • attachment IDs
  • an idempotency key

2. Validation

The Mail Write Service:

  • authenticates the sender
  • validates recipients and groups
  • checks attachment scan status
  • applies quota and rate-limit checks

3. Canonical durable write

The system persists:

  • the immutable Message
  • the MessageRecipient rows
  • the sender's MailboxEntry in SENT
  • recipient MailboxEntry rows in INBOX or an initial classified state

4. Success acknowledgement

The API returns success only after the durable write succeeds. This protects against acknowledging a send and then losing the message.

5. Event publication

The system emits a MessageCreated or MailboxEntryCreated event to a durable bus.

6. Async enrichment

Downstream consumers then handle:

  • search indexing
  • spam classification
  • promotions/social categorization
  • notification fanout
  • contact interaction ranking
  • analytics

This is the right tradeoff because it keeps the critical path small while still enabling rich product behavior.

Attachments Need a Separate Architecture

Attachments dominate storage and scanning cost, so they cannot be treated like regular email metadata.

The upload flow I designed looks like this:

  1. Client requests a signed upload URL
  2. Client uploads the file directly to object storage
  3. Upload service writes attachment metadata with status PENDING_SCAN
  4. An AttachmentUploaded event triggers the virus scanning pipeline
  5. The attachment becomes SAFE, INFECTED, QUARANTINED, or FAILED

The important design choice here is to avoid proxying large uploads through the app servers whenever possible. Direct-to-object-storage uploads keep the application tier lighter and cheaper.

The second important decision is quarantine: an unscanned or unsafe file should not be downloadable just because the upload succeeded.

Search, Spam, and Virus Processing Belong Off the Critical Path

Search is not a source of truth. It is a retrieval accelerator.

My search design uses a dedicated inverted index that stores:

  • message_id
  • user_id
  • sender and recipient fields
  • subject tokens
  • body tokens
  • attachment names
  • labels and categories
  • timestamps

The query flow is:

  1. User hits the search API
  2. Search service returns candidate IDs from the index
  3. Mailbox service fetches canonical metadata from the source-of-truth store
  4. Response is assembled and returned

This means search can be eventually consistent. A just-sent email may take a short time to appear in search, which is acceptable.

Spam handling follows a similar hybrid model:

  • lightweight checks inline for sender reputation, blocklists, and policy validation
  • heavier content and behavior analysis asynchronously

Virus detection is even more expensive, so it is even more clearly an asynchronous pipeline with quarantine enforcement.

Storage Strategy

I used three major storage layers:

1. Distributed metadata store

For:

  • users
  • profiles
  • preferences
  • contacts
  • messages metadata
  • recipients
  • mailbox entries
  • labels
  • drafts

2. Object storage

For:

  • attachments
  • large message bodies if needed
  • avatars
  • large draft bodies

3. Cache and search systems

For:

  • sessions
  • OTPs
  • rate limits
  • hot mailbox summaries
  • autocomplete hotsets
  • search indexes

The reason this split works is simple: structured OLTP workloads and large immutable blobs have completely different economics and performance characteristics.

Partitioning Strategy

Partitioning is easiest when it follows the access pattern.

  • Mailbox tables by user_id because inbox loads, archive, read/unread, labels, and spam actions are all user-centric
  • Message table by message_id or time-based shard because messages are immutable shared objects
  • Search by user_id ownership range because access control and query scoping become easier
  • Attachments by object key because object storage already scales this naturally

This is one of those design choices that looks small on paper but determines whether the system remains operational at scale.

Consistency Tradeoffs

A strong system design answer always draws the line between what must be strongly consistent and what can be eventually consistent.

Strong consistency required for

  • registration and account verification
  • password and session correctness
  • durable message write before send acknowledgement
  • mailbox entry creation tied to send success
  • label updates on source-of-truth mailbox records

Eventual consistency acceptable for

  • search indexing
  • spam and category updates
  • autocomplete freshness
  • contact ranking
  • notifications
  • analytics

That boundary is what keeps the system reliable without making every subsystem part of the critical path.

Failure Modes I Explicitly Designed Around

The architecture is only credible if it handles partial failure well.

Message written but event never published

Fix: use the transactional outbox pattern so the message write and outbox record happen in the same transaction.

Event delivered more than once

Fix: make all consumers idempotent using event IDs, upserts, and dedupe checkpoints.

Attachment uploaded but never scanned

Fix: retries, timeout handling, dead-letter queues, and reconciliation jobs for stuck PENDING_SCAN files.

Search indexing lag grows too much

Fix: monitor queue lag, autoscale indexers, and optionally fall back to recent-message DB reads for fresh content.

Spam classifier goes down

Fix: keep lightweight inline defenses active and reclassify asynchronously when the classifier recovers.

These are the kinds of operational details that move a design from "diagram-level" to "senior-level."

Security Architecture

Because this is email, security is not a side note. It is a first-class subsystem.

I included:

  • password hashing with Argon2 or bcrypt
  • MFA support
  • refresh token revocation
  • brute-force and resend rate limiting
  • suspicious login detection
  • TLS in transit
  • encryption at rest
  • least-privilege service authentication
  • send-rate abuse controls
  • malware and phishing defenses

For a Gmail-like product, trust and abuse prevention are inseparable from the core architecture.

The Optional Internet Email Extension

If the interviewer means "Gmail" not just as a mailbox product but as a full internet email provider, I would extend the design with:

  • SMTP ingress and outbound relay
  • MX records
  • bounce processing
  • SPF, DKIM, and DMARC validation/signing
  • retry queues for outbound delivery

I think this is an important clarification in interviews because "email app" and "global email provider" are related but meaningfully different scopes.

What I Would Say In the Interview

If I had to summarize the design in a short, interview-ready answer, I would say:

I would design the system around a small, durable mail write path and a set of asynchronous enrichment pipelines. Identity is handled by a dedicated auth subsystem with Redis-backed OTP and session state. Mail stores immutable message content separately from per-user mailbox entries because labels, read state, spam state, and archive state are user-specific. Attachments go to object storage and are scanned asynchronously with quarantine until safe. After a message is durably written, the system emits events that drive search indexing, spam and category classification, notifications, contact-ranking updates, and analytics. The source-of-truth mailbox store is strongly consistent, while search and classification systems are eventually consistent.

That captures the core idea while still showing that the design is grounded in scale, correctness, and operational realism.

Final Takeaways

This study pushed me toward three big lessons:

  1. The mailbox model matters more than most people expect. Shared messages and per-user mailbox state must be separated.
  2. Attachments change everything. They dominate both storage and scanning cost.
  3. Asynchronous design is not optional at this scale. Search, spam, virus scanning, notifications, and analytics all need to scale independently of the send path.

What I like most about this problem is that it looks familiar on the surface, but underneath it forces you to think clearly about data modeling, consistency, queues, storage economics, and failure handling all at once.

If you are preparing for system design interviews, this is one of the best problems to study because it tests both breadth and architectural judgment.

Ayush Jaipuriar

Ayush Jaipuriar

Full Stack Software Engineer at TransUnion, specializing in modern web technologies and cloud solutions.

Share this post

Ayush Jaipuriar

AI Agent Engineer & Senior Full-Stack Developer

jaipuriar.ayush@gmail.com

Currently exploring Senior SWE & AI Engineering roles

Connect

© 2026 Ayush Jaipuriar. All rights reserved.

Built with Vue.js & Nuxt 3. Deployed on GitHub Pages.