Design Questions
- Required features: upload and download files, file sync, notification
- Supported platforms: Web app, Mobile app
- Calculation:
- Storage: 10 million DAU * 10 GB = 50 PB
- QPS: 10 million DAU * 2 uploads / day = 240 QPS
High Level Design
- Components
- API servers: serve upload and download requests
- Metadata cache and metadata database
- Block servers: Split file into blocks, save blocks on cloud storage, assemble files from blocks
- Notifications servers: notify users about the file status
- CUJ: upload files
- User sends requests to API servers with metadata, e.g. file name, type, size etc.
- API servers update the metadata cache and database
- API servers send request to notify servers to notify a new file is added
- User uploads file content to block servers
- Block servers will chunk files into blocks, compress, encrypt the blocks and upload to cloud storage
- Block servers update file metadata in cache and database
- Block servers send request to notify servers to notify the file is uploaded
- CUJ: download files
- User send request to API servers
- API servers fetch and return file metadata to user
- User send requests to block servers to download blocks
- Block servers download blocks from cloud storage and send to user
- User reconstruct the file using all blocks received from block servers
- CUJ: file status notification
- User connects to notification server by long polling method
- If file status is updated, notification server break the connection and user need to reconnect to get the latest file status
System Optimization
- Network: The block server can skip uploading a block if the old block is the same as the new one (by hash value).
- Storage:
- For the same user, we can deduplicate blocks saved in the cloud storage. We can share the same block for different files.
- Set limit and expiration strategies for file revisions.
- Use cold storage for non-active files.
Failure Handling
- API servers, notification servers and block servers should all be made stateless to increase the availability
- Metadata cache failure can fall back to metadata database as solution
- Metadata database should be replicated for availability.
- Cloud storage should set correct in-region or cross-region replication policy.