System Design - News Feed

1. Understand the problem and establish design scope

Is this a mobile app or a web app: Both
What are the important features: A user can publish a post and see her friends' posts on the news feed page
Is the news feed sorted by reverse chronological order or any particular order: the feed is sorted by reverse chronological order
How many friends can a user have: 5000
What is the traffic volume: 10 million DAU
Can feed contain images, videos, or just text: It can contain media files, including both images and videos.

The design is divided into two flows:

Feed publishing: when a user publishes a post, data is written into cache and database. A post is populated to her friends' news feed.
Newsfeed building: the news feed is built by aggregating friends' posts in reverse chronological order

Feed publishing API: client sends a HTTP POST request to the server to publish a post. For example: POST /v1/me/feed, params:
- content: content is the text of the post
- auth_token: it is used to authenticate API request
Newsfeed retrieval API: client sends a HTTP GET request to the server to retrieve news feed. For example: GET /v1/me/feed, params:
- auth_token: it is used to authenticate API requests

User: user can view news feeds on a browser or mobile app
Load balancer: distribute traffic to web servers
Web servers: redirect traffic to different internal services
Post service: persist post in the database and cache
Fanout service: push new content to friends' news feed. Newsfeed data is stored in the cache for fast retrieval
Notification service: inform friends that new content is available and send out push notifications

Newsfeed Building

The high-level design briefly covered two flows: feed publishing and news feed building.

We will focus on two components:

Web servers: only users signed in with valid auth_token are allowed to make posts. The system limits the number of posts a user can make within a certain period, prevent spam and abusive content
fanout service: Fanout is the process of delivering a post to all friends. Two types of fanout models: fanout on write (also called push model) and fanout on read (also called pull model)

News feed is pre-computed during write time. A new post is delivered to friends' cache immediately after it is published.

Pros:
- The news feed can be pushed to friends immediately
- Fetching news feed is fast because the news feed is pre-computed
Cons:
- If a user has many friends, fetching the friend list and generating news feeds for all of them are slow
- For inactive users or those rarely log in, pre-computing news feeds waste computing resources

The news feed is generated during read time. Recent posts are pulled when a user loads her home page.

Pros:
- For inactive users or those who rarely log in, fanout on read works better
- Data is not pushed to friends so there is no hotkey problem
Cons:
- Fetching the news feed becomes slow

We adopt a hybrid approach to get benefits of both approaches and avoid pitfalls in them.

For the majority of users: use a push model
For users who have many friends/followers: let friends/followers pull news content to avoid system overload

Fanout Service Workflow

Fetch friend IDs from the graph database. Graph databases are suited for managing friend relationship and friend recommendations.
Get friends info from the user cache: filter out friends based on user settings
- If user mutes someone, user's posts will not show up on their news feed
- User could selectively share information with specific friends
Send friends list and new post ID to the message queue
Fanout workers fetch data from the message queue and store news feed data in the news feed cache
Store <post_id, user_id> in news feed cache

Newsfeed Retrieval Deep Dive

Media content (images, videos, etc.) are stored in CDN for fast retrieval. Below is how a client retrieves news feed:

A user sends a request to retrieve her news feed
The load balancer distributes requests to web servers
Web servers call the news feed service to fetch news feeds
News feed service gets a list post IDs from the news feed cache
A user's news feed is more than just a list of feed IDs. It contains username, profile picture, post content, post image, etc.
The fully hydrated news feed is returned in JSON format back to the client for rendering

Cache Architecture

We divide the cache tier into 5 layers:

News Feed: store IDs of news feeds
Content: store every post data
Social Graph: store user relationship data
Action: store info about whether a user liked a post, replied a post, or took other actions on a post
Counters: store counters for like, reply, follower, following, etc

Scaling the database:

Other talking points: