What does it mean to design a highly scalable system?

It’s surprising how the volume of data is changing around the world in the Internet. Who would have thought 10 years ago that in the future a physical experiment will generate 25 petabytes (26 214 400 GB) of data, yearly? Yes, I’m looking at you, LHC. Times are changing, companies are changing. Everyone is designing for scale a tad different. And that’s good, as it’s important to design for the right scale.

Please note: the views I express are mine alone and they do not necessarily reflect the views of Amazon.com.

Over scaling

Let's assume you are involved in a startup building a brand new, unique CMS (really unique, unlike 2000 other ones). Is it worth thinking about NoSQL/Cassandra/DynamoDB/Azure Blob Storage/etc? Probably not. Unless you are very certain you will quickly reach massive scale, it's safe to assume that most of data will fit into one small or medium SQL database. When performance problems will start to appear, that's good. It means your startup is working (or you are just terrible at SQL...). That also means you have clients, hopefully paying clients. Also, at that point you will have probably completely different idea about the system - you went from "no clients and your imagination only" state to - "working product with customers and viable business" state. You can now reiterate over your architecture according to the real customer requirements. Hopefully you have more founds now. Unfortunately I've heard multiple times that a startup is running out of cash, because someone created complicated, scalable system for 321 000 clients. All the money was spent on technology, none on the business. Failure.

No need for scaling

Now, some systems don't have to scale, or the requirements regarding scale progress slower than the development of the new hardware (so effectively they fall into the first category). Probably some ERP systems for most medium sized companies are good example. In such cases the amount of data in the system grows, but you don't have to worry about scaling the solution.

Some scaling needed

Sometimes throwing an NoSQL database into the solution solves the problem. It's tempting idea for us, geeks, and usually purpose-build databases perform better for particular cases they were built for. However, one can think of other solutions like sharding SQL databases, optimizing the SQL or just optimizing the application.

Scaling definitively needed

Let's get into processing tens of thousands of (heavy) requests per second. There are times when people say "Azure Blob is infinitely scalable". Well, that statement is not true. First, Azure Storage isn't scalable at all. Theirs 20 000 Requests Per Second limit sometimes might be a only a tiny part what you need. Second, there is no such thing as "infinitely scalable". Furthermore, there are other, hard limits: Azure Storage Scalability and Performance Targets. To be fair, DynamoDB has limits too. However, for soft limits you can contact support and request as much throughput as you need. There is one more catch too - pricing. In Azure you pay for requests amount (not throughput) + storage, in DynamoDB for provisioned throughput + storage. Depending on your use case, one might be much cheaper than the other. Getting back to the scale. It's logical that database designed for one thing, can be more optimal than "generic" SQL database. The question is - do you really need it? Is it cheaper and faster than alternatives? Can it handle peaks in traffic better? Do you have a team and resources to utilize potentially unknown technology?

Unique scale

Finally, there are times when you need to build a new solution (or even a database) from scratch and even have a dedicated team (or department) for your challenge. Let's imagine you work at a big company and your company has hundreds of thousands of services, each service is called many thousands of times per second, and each call is generating logs. You want a solution to store the logs for months and search within them. The scale is unusual, and it's expected the number of logs will grow 300% year over year, having 4x throughput peaks on some days. This time, you can probably start thinking about your own, new storage engine, strictly coupled to your needs. Existing solutions might be too expensive for your scale, giving features which are not critical for you, but missing much needed ones. By building a solution tightly coupled to your requirements, you will be able to optimize the solution to your needs.

TL;DR: there are different scalability levels; turns out everyone can have different scale in mind and therefore can approach scaling differently.