[SysDes1][Chap. 15] Design Google Drive

Design Questions

  • Required features: upload and download files, file sync, notification
  • Supported platforms: Web app, Mobile app
  • Calculation:
    • Storage: 10 million DAU * 10 GB = 50 PB
    • QPS: 10 million DAU * 2 uploads / day = 240 QPS

High Level Design

  • Components
    • API servers: serve upload and download requests
    • Metadata cache and metadata database
    • Block servers: Split file into blocks, save blocks on cloud storage, assemble files from blocks
    • Notifications servers: notify users about the file status
  • CUJ: upload files
    • User sends requests to API servers with metadata, e.g. file name, type, size etc.
    • API servers update the metadata cache and database
    • API servers send request to notify servers to notify a new file is added
    • User uploads file content to block servers
    • Block servers will chunk files into blocks, compress, encrypt the blocks and upload to cloud storage
    • Block servers update file metadata in cache and database
    • Block servers send request to notify servers to notify the file is uploaded
  • CUJ: download files
    • User send request to API servers
    • API servers fetch and return file metadata to user
    • User send requests to block servers to download blocks
    • Block servers download blocks from cloud storage and send to user
    • User reconstruct the file using all blocks received from block servers
  • CUJ: file status notification
    • User connects to notification server by long polling method
    • If file status is updated, notification server break the connection and user need to reconnect to get the latest file status

System Optimization

  • Network: The block server can skip uploading a block if the old block is the same as the new one (by hash value).
  • Storage:
    • For the same user, we can deduplicate blocks saved in the cloud storage. We can share the same block for different files.
    • Set limit and expiration strategies for file revisions.
    • Use cold storage for non-active files.

Failure Handling

  • API servers, notification servers and block servers should all be made stateless to increase the availability
  • Metadata cache failure can fall back to metadata database as solution
  • Metadata database should be replicated for availability.
  • Cloud storage should set correct in-region or cross-region replication policy.

Leave a Reply

Your email address will not be published. Required fields are marked *