Have you ever wondered how to store petabytes and petabytes of data without even owning a single hard drive???
No, no, no, it's not science fiction, but the reality of architecture Zero Disk. You'll see, it's funny.
In a Zero Disk world, where your databases would no longer be limited by the capacity of your local disks, you could grow or shrink your infrastructure in a few clicks, without worrying about hardware constraints. Well this world already exists thanks to Zero Disk architecture and S3 cloud storage!
Traditional databases like PostgreSQL or MySQL have an architecture that tightly couples query processing (frontend) with data storage (backend). This approach chains us to our hard drives like a ball and chain and if we need more space, then we have to buy new drives. And in the event of a hardware failure, it's a mess. Not to mention horizontal scalability which is becoming a real Chinese headache!
This is where architecture Zero Disk enters the scene. The idea is disarmingly simple: instead of storing your data on physical disks, you send it directly to the cloud, more precisely to the hard disks of someone else called Amazon S3. This service is the most reliable digital safe in the world with its eleven-nine durability (99.999999999%). Basically, this means that if you stored 10 million objects today, you could lose just one… in 10,000 years!
Like this, instead of writing directly to a local disk, your application sends its data to S3 in the form of objects. These objects are then automatically replicated across different Availability Zones for maximum availability. The architecture also provides remarkable elasticity: you can start or stop instances instantly, without worrying about data migration.
Of course, this approach presents some challenges, the main one being latency. Indeed, a network access to S3 takes longer than a local disk access. So to get around this problem, several strategies are possible:
- Batching: instead of writing each 4KB page individually, we group them into larger packets (for example 512KB) before sending them to S3.
- Intelligent caching: we keep frequently accessed data in RAM.
- Using S3 Express One Zone. It is an optimized version of S3 offering up to 10x reduced latencies.
I feel you are doubtful, especially self-hosting aficionados, but know that this architecture has proven itself! Companies like Snowflake, WarpStream or Clickhouse have already adopted it successfully. Some have even developed hybrid approaches: for example, using a Raft cluster as a write cache before persisting data to S3, thereby optimizing both performance and cost.
Be careful though: this is not a universal solution. If you manage a few small databases, the added complexity may not be worth it. Like buying a tank to go shopping, it's fun when your name is Lulu, but it's not necessarily practical!
The real magic of Zero Disk architecture lies in its ability to transform complex problems into elegant solutions. No more juggling disks, RAIDs or backups. You can focus on what really matters: developing your app.
And the prospects are exciting! As S3's performance continues to improve, costs drop, and new features like conditional writes are added, we can imagine a future where local storage becomes as obsolete as 3.5-inch floppy disks. A future where developers can deploy massive applications without worrying about storage constraints. And where the Americans would have even more data about us!! (well what?)
And maybe in a few years, we will look at our old servers equipped with hard drives and smile, just as we look at old 56k modems today!
Source