Designing Twitter - A System Design Interview Question
Last Updated : 31 Oct, 2025
Designing Twitter (or Facebook feed or Facebook search..) is a quite common question that interviewers ask candidates. A lot of candidates get afraid of this round more than the coding round because they don't get an idea of what topics and tradeoffs they should cover within this limited timeframe.
Don't jump into the technical details immediately when you are asked this question in your interviews. Do not run in one direction, it will just create confusion between you and the interviewer. Most of the candidates make mistakes here and immediately they start listing out some bunch of tools or frameworks like MongoDB, Bootstrap, MapReduce, etc.
You can put yourself in a situation where you're working on real-life projects. Firstly, define the problem and clarify the problem statement.
2. Requirements for Twitter System Design
2.1 Functional Requirements:
Should be able to post new tweets (can be text, image, video etc).
Should be able to follow other users.
Should have a newsfeed feature consisting of tweets from the people the user is following.
Home timeline (“following” feed) and user profile timeline.
High availability for read‑heavy surfaces (home, profiles, search): p99 < 300–500 ms.
Write availability for tweet/engagement: p99 < 150–250 ms end‑to‑end.
Durability: no lost acknowledged tweets.
Scalability: horizontal scale across regions.
Consistency: timeline eventually consistent (< few seconds), strong for identity/handles.
Cost: leverage CDN for media, caching for hot keys, tiered storage for cold data.
2.3 Explicit Non‑Goals (MVP)
Direct Messages, Audio Spaces, Ads, Recommendations beyond basic ranking (note as future work).
3. Capacity Estimation for Twitter System Design
To estimate the system's capacity, we need to analyze the expected daily click rate.
3.1 Traffic Estimation:
Let us assume we have 1 billion total users with 200 million daily active users (DAU), and on average each user tweets 5 times a day. This give us 1 billion tweets per day.
200 million * 5 tweets = 1 billion/day
Tweets can also contains media such as images, or videos. We can assume that 10 percent of tweets are media files shared by users, which gives us additional 100 million files we would need to store.
10 percent * 1 billion = 100 million/day
For our System Request per Second (RPS) will be:
1 billion requests per day translate into 12K requests per second.
Lets assume each message on average is 100 bytes, we will require about 100 GB of database storage every day.
1 billion * 100 bytes = 100 GB/day
10 percent of our daily messages (100 million) are media files per our requirements. Let's assume each file is 50KB on average, we will require 5 TB of storage everyday.
A 64-bit integer that encodes time + machine identity + per-millisecond sequence, so IDs are roughly increasing by time and unique without round-trips to a DB.
41 bits — milliseconds since a chosen epoch (e.g., 2010-11-04). Range ≈ 69 years from the epoch.
10 bits — node identity (often split as 5 bits datacenter + 5 bits worker). Up to 1024 workers globally.
12 bits — sequence within the same millisecond. Up to 4096 IDs/ms/worker (~4.1M/sec across 1000 workers).
2. Where Snowflake IDs are used
Tweet IDs, Retweet IDs, Notification IDs, Timeline Entry IDs → fast range scans by time, stable pagination cursors.
Shard routing: For high write locality you’ll usually shard tweets by authorId hash, not by tweetId. Use tweetId for lookups and secondary indexes.
Search & analytics: The time component helps build time-windowed indexes and stream processors (late data detection).
3. Assigning worker IDs (10-bit space)
Static mapping on service boot via a small control plane (e.g., “ID Authority” service) that leases a (datacenterId, workerId) pair with TTL; renew on heartbeat.
Protect against double assignment (e.g., VM reimage): lease IDs, don’t hardcode; on lease loss, stop issuing IDs.
4. Failure & edge cases to plan for
Sequence rollover within 1 ms: wait next ms (most common), or route a portion of traffic to a shadow generator.
Clock regression: refuse to generate (fail fast) vs. “regression sequence” mode; always page oncall it’s a production smell.
Exhaustion horizon: 41-bit time runs ~69 years from your epoch; document the epoch and set a calendar alert long in advance.
5. User IDs vs. Tweet IDs
You can also issue userId via Snowflake (good to avoid DB sequences).
Keep handle and userId separate: handle is mutable (rename), userId is immutable.
A dedicated “handles” table maps handle → userId with a strict unique constraint; a secondary mapping userId → currentHandle helps joins and backfills.
4. Use Case Design for Twitter System Design
In the above Diagram,
User will click on Twitter Page they will get the main page inside main page, there will be Home Page, Search Page, Notification Page.
Inside Home Page there will be new Tweet page as well as Post Image or Videos.
In new Tweet there we will be like, dislike, comments as well as follow / unfollow button.
Guest User will have only access to view any tweet.
Registered use can view and post tweets. Can follow and unfollow other users.
Registered User will able to create new tweets.
5. Low Level Design for Twitter System Design
A low-level design of Twitter dives into the details of individual components and functionalities. Here's a breakdown of some key aspects:
User accounts: Store user data like username, email, password (hashed), profile picture, bio, etc. in a relational database like MySQL.
Tweets: Store tweets in a separate table within the same database, including tweet content, author ID, timestamp, hashtags, mentions, retweets, replies, etc.
Follow relationships: Use a separate table to map followers and followees, allowing efficient retrieval of user feeds.
Additional data: Store media assets like images or videos in a dedicated storage system like S3 and reference them in the tweet table.
5.2 Core functionalities:
Posting a Tweet: User submits content -> validated (length, media, etc.) ->stored with hashtags/mentions -> followers & mentions notified.
Timeline Generation: Fetch followed users/hashtags -> retrieve & rank tweets (relevance, recency, engagement) -> cache in Redis for speed.
For twitter we are using microservices architecture since it will make it easier to horizontally scale and decouple our services. Each service will have ownership of its own data model. We will divide our system into some cores services.
6.2 User Services
This service handles user related concern such as authentication and user information. Login Page, Sign Up page, Profile Page and Home page will be handle into User services.
6.3 Newsfeed Service:
This service will handle the generation and publishing of user newsfeed. We will discuss about newsfeed in more details. When it comes to the newsfeed, it seems easy enough to implement, but there are a lot of things that can make or break this features. So, let's divide our problem into two parts:
6.3.1 Generation:
Let's assume we want to generate the feed for user A, we will perform the following steps:
Retrieve the ID's of all the users and the enitities( hashtags, topics, etc.)
Fetch the relevant algorithm to rank the tweets on paramaters such as relevance, time management, etc.
Use a ranking algorithm to rank the tweets based on parameters such as relevance, time, engagement, etc.
Return the ranked tweets data to the client in a paginated manner.
Feed geneartion is an intensive process and can take quite a lot of time, especially for users following a lot of people. To imporve the performance, the feed can be pre-generated and stored in the cache, then we can have a mechanism to periodically update the feed and apply or ranking algorithm to the new tweets.
6.3.2 Publishing
Publishing is the step where the feed data us pushed according to each specify user. This can be a quite heavy operation, as a user may have million of friend or followers. To deal with this, we have three different approcahes:
Pull Model (Fan-out on Load): Feeds are generated only when a user requests them, reducing database writes. Drawback: Recent tweets appear only on reload, increasing read operations.
Push Model (Fan-out on Write): Tweets are immediately pushed to all followers’ feeds, reducing read-time computation. Drawback: Increases write operations on the database.
Hybrid Model: Combines push and pull strategies—users with few followers use push, while high-follower users (e.g., celebrities) use pull, balancing read and write operations.
6.4 Tweet service:
The tweet service handle tweet-related use case such as posting a tweet, favorites, etc.
6.5 Retweets :
Retweets are one of our extended requirements. To implement this feature, we can simply create a new tweet with the user id of the user retweeting the original tweet and then modify the type enum and content property of the new tweet to link it with the original tweet.
6.6 Search Service:
This service is responsible for handling search related functionality. In search service we get the Top post, latest post etc. These things we get because of ranking.
6.7 Media Service:
This service will handle the media(images, videos, files etc.) uploads.
6.8 Analytics Service:
This service will be use for metrics and analytics use cases.
6.9 Ranking Algorithm:
We will need a ranking algorithm to rank each tweet according to its relevance to each specific user.
Example: Facebook used to utilize an EdgeRank algorithm. Here, the rank of each feed item is described by:
Rank = Affinity * Weight * Decay
Where,
Affinity : is the "closeness" of the user to the creator of the edge. If a user frequently likes, comments, or messages the edge creator, then the value of affinity will be higher, resulting in a higher rank for the post.
Weight : is the value assigned according to each edge. A comment can have a higher weightage than likes, and thus a post with more comments is more likely to get a higher rank.
Decay : is the measure of the creation of the edge. The older the edge, the lesser will be the value of decay and eventually the rank.
Now a days, algorithms are much more complex and ranking is done using machine learning models which can take thousands of factors into consideration.
6.10 Search Service
Sometimes traditional DBMS are not performant enough, we need something which allows us to store, search, and analyze huge volumes of data quickly and in near real-time and give results within milliseconds. Elasticsearch can help us with this use case.
Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It is built on top of Apache Lucene.
6.11 How do we identify trending topics?
Trending functionality will be based on top of the search functionality.
We can cache the most frequently searched queries, hashtags, and topics in the last N seconds and update them every M seconds using some sort of batch job mechanism.
Our ranking algorithm can also be applied to the trending topics to give them more weight and personalize them for the user.
6.12 Notifications Service:
Push notifications are an integral part of any social media platform.
We can use a message queue or a message broker such as Apache Kafka with the notification service to dispatch requests to Firebase Cloud Messaging (FCM) or Apple Push Notification Service (APNS) which will handle the delivery of the push notifications to user devices.
6.13 Architecture Overview
[Mobile/Web] │ HTTPS / HTTP2 + TLS ▼ [API Gateway / Edge] ──> [Auth] ──> [Rate Limiter] ──> [Service Mesh] │ │ │ ├── Tweet Service (write path) │ ├── Engagement Service (likes/retweets/replies) │ ├── Social Graph Service (follow edges) │ ├── Timeline Service (home/profile) │ ├── Search Service (query/index) │ ├── Media Service (upload/origin) │ ├── Notification Service (push/email) │ └── User Service (profiles, identity) │ ├─► CDN (images/video, thumbnails) └─► WebSocket/SSE fanout (live updates)
Core Data Plane:
- Caches (Redis/Memcache clusters) in front of stores
While our data model quite relational, we don't necessarily need to store everything in a single database, as this can limit our scalability and quickly become a bottleneck.
We will split the data between different services each having ownership over a particular table. The we can use a realtional database such as PostgreSQL or a distributed NoSQL database such as Apache Cassandra PostgreSQL for our usecase.
Media URL ( string ) : URL of the attached media (optional) .
Result ( boolean ) : Represents whether the operation was successful or not.
8.2 Follow or unfollow a user
This API will allow the user to follow or unfollow another user.
Parameters
Follower ID (UUID): ID of the current user.
Followee ID (UUID): ID of the user we want to follow or unfollow.
Media URL (string): URL of the attached media (optional).
Result (boolean) : Represents whether the operation was successful or not.
8.3 Get NewsFeed
This API will return all the tweets to be shown within a given newsfeed.
Parameters
User ID (UUID): ID of the user.
Tweets (Tweet[]): All the tweets to be shown within a given newsfeed.
9. Microservices Used for Twitter System Design
9.1 Data Partitioning
To scale out our databases we will need to partition our data. Horizontal partitioning (aka Sharding) can be a good first step. We can use partitions schemes such as:
Hash-Based Partitioning
List-Based Partitioning
Range Based Partitioning
Composite Partitioning
The above approaches can still cause uneven data and load distribution, we can solve this using Consistent hashing.
9.2 Mutual friends
For mutual friends, we can build a social graph for every user. Each node in the graph will represent a user and a directional edge will represent followers and followees.
After that, we can traverse the followers of a user to find and suggest a mutual friend. This would require a graph database such as Neo4j and ArangoDB.
This is a pretty simple algorithm, to improve our suggestion accuracy, we will need to incorporate a recommendation model which uses machine learning as part of our algorithm.
9.3 Metrics and Analytics
Recording analytics and metrics is one of our extended requirements.
As we will be using Apache Kafka to publish all sorts of events, we can process these events and run analytics on the data using Apache Spark which is an open-source unified analytics engine for large-scale data processing.
9.4 Caching
In a social media application, we have to be careful about using cache as our users expect the latest data. So, to prevent usage spikes from our resources we can cache the top 20% of the tweets.
To further improve efficiency we can add pagination to our system APIs. This decision will be helpful for users with limited network bandwidth as they won't have to retrieve old messages unless requested.
9.5 Media access and storage
As we know, most of our storage space will be used for storing media files such as images, videos, or other files. Our media service will be handling both access and storage of the user media files.
But where can we store files at scale? Well, object storage is what we're looking for. Object stores break data files up into pieces called objects.
It then stores those objects in a single repository, which can be spread out across multiple networked systems. We can also use distributed file storage such as HDFS or GlusterFS.
9.6 Content Delivery Network (CDN)
Content Delivery Network (CDN) increases content availability and redundancy while reducing bandwidth costs.
Generally, static files such as images, and videos are served from CDN. We can use services like Amazon CloudFront or Cloudflare CDN for this use case.
10. Scalability for Twitter System Design
Let us identify and resolve Scalability such as single points of failure in our design:
"What if one of our services crashes?"
"How will we distribute our traffic between our components?"
"How can we reduce the load on our database?"
"How to improve the availability of our cache?"
"How can we make our notification system more robust?"
"How can we reduce media storage costs"?
To make our system more resilient we can do the following:
Running multiple instances of each of our services.
Introducing load balancers between clients, servers, databases, and cache servers.
Using multiple read replicas for our databases.
Multiple instances and replicas for our distributed cache.
Exactly once delivery and message ordering is challenging in a distributed system, we can use a dedicated message broker such as Apache Kafka or NATS to make our notification system more robust.
We can add media processing and compression capabilities to the media service to compress large files which will save a lot of storage space and reduce cost.
11. Caching Strategy
Frontline: CDN for media & static assets; HTTP cache headers.
Data: Redis/Memcache for tweet bodies, user profiles, timeline pages; use write-through for counters; read-through with TTL + jitter; negative-cache 404s briefly.
Key invalidation: on tweet edit/delete, on follow changes, on engagement thresholds.
12. Partitioning & Hot Keys
Use per-author queues to smooth spikes; backpressure and QoS.
13. Consistency Model
Strong: identity (handles/emails), payments.
Eventual: home timeline materialization, counts, trends.
Read-your-writes: route user to home region; session stickiness or client-side merge to show newly sent tweet immediately.
Rate limiting per IP/user/app; dynamic shields during events.
Abuse/Spam: ML signals (account age, device fingerprints, graph patterns), risk scoring on write; shadow-ban/quarantine queues; user reporting workflows.
Privacy: blocks/mutes filtering in all reads; Right-to-Erasure: tombstones + GC in OLTP, deindex in search, purge from caches/CDN; data retention policy by region (GDPR/CCPA).
Content safety: media hashing (perceptual), NSFW classifiers; age gates.
16. Cost Controls
Push more through CDN (thumbs/variants, far-future TTL).
Compute autoscaling on QPS/backlog; spot/preemptible for batch.
Defer heavy recompute (e.g., rank refresh) for low-value cohorts.
17. Failure Modes & Mitigations (Interview-Gold)
Hot celebrity tweet: push for normal users, pull for celebrity; cap fanout concurrency, enqueue per-follower buckets; protect caches from thundering herds with request collapse.
Search indexing lag: serve latest via realtime tier; degrade to “latest only” if top tier fails.
Cache outage: hit stores with read fences + partial pagination; reduce page size; disable ranked sort.
Kafka partition loss: dual-write to mirror; consumer offsets backed by transactional store.