System Design — Availability

Moath Obeidat
2 min readMay 20, 2023

Let’s talk about the meaning of availability from a system design perspective, and how this term may seem simple, but it carries with it many important details for any system in the design stage.

What is the availability?

The odds of a particular server or service being up and running at any point in time.

How is availability measured?

In fact, availability is measured by survival over a period of time, usually a year.
An example of a service that is available 90% of the time during the year, meaning that it was available for 328.5 out of 365 days, which means that it was not available for 36.5 days.

Unavailability for 36.5 days is a disaster for some systems such as AWS, YouTube, or Uber for example.

Therefore, let us agree that the unavailability of 90% is unacceptable. In fact, the availability must be at least 99% or higher.

You might think that 99% is a good percentage, for many systems, no, it is not good; because availability is 99% means that system unavailability is 1%, which equals 3.65 days of the year. In the case of the previously mentioned systems, this percentage of unavailability is unacceptable.

From what has been mentioned above, we conclude that availability rates should not be less than 99% and that the higher this percentage, the higher availability.

Some systems provide an availability of 99.999% and systems that provide this percentage or more are called high-availability systems.

How to increase system availability?

Avoid having a single point of failure in your system.

For example, running all your system on a single server means your system will not be available if this server is down.

So, how to avoid a single point of failure?
by redundancy, your system might have to be available on more than one server, making your system available on more than one server will make your availability higher than a system on a single server.

Should all parts of the system have the same availability?

In fact, the answer to this question depends on the business of the system.

For example, in a system like Stripe, the payment part should provide high availability, while the dashboard part, for example, is less important than the payment part and can have less availability.

Another example is Youtube, the part responsible for videoes should provide high availability, while the views part is less important than the videoes part and can have less availability.

--

--