System Design — API Rate Limiting

Moath Obeidat
4 min readJun 2, 2023

Having an open API for use by a third party is similar to open-world games (GTA V for example) as there are a very big number of scenarios that may occur, one of these scenarios is an attempt to overload the server using Dos and DDos attack on APIs endpoints which can lead to shut down the server, it’s a nightmare right?.

What Is API Rate Limiting?

The process of limiting the number of API requests to or from a system.

The main reason to implement an API rate limit is security wise reason at first place, to protect your system from receiving an infinite number of API calls which will lead to shutting down your system and will make your system unavailable or you might receive a huge amount of API calls that overload your system and decrease the system performance, so whether your system receives an infinite number of call by attackers or overloaded by users misusing, you have to protect your system by limiting API calls.

How does API Rate Limit work?

Currently, most modern frameworks have API rate limit implementation which means you have to take care of the setup or customization if needed.

But let’s assume that you ended up using a framework that doesn’t offer an API rate limit, what are the main steps to implement API rate limiting?

  • Your implementation should take place in the middleware layer to make sure that every request will go through your middleware let’s call your middleware with (Throttling).
  • Throttling should accept a number of parameters like $maxAttempts, $perSeconds, and $uniqueKey.
  • caching might be Redis or anything else to count every key number of attempts and set the TTL for the key.

You have to receive requests in the “Throttling” middleware and check for the $uniqueKey (might be a token, it depends on the information coming in the request), store this $uniqueKey in the cache, and calculate how many requests call holding this $uniqueKey, if these requests call that having the unique key less than the $maxAttempts within a period of time defined in $perSeconds then all good otherwise, you have to throw an exception with 429 code (too many attempts) with a message telling the user that he exceeded the allowed calls per specific time, you might include extra info in each response telling the user the remaining calls and the time he can retry after, let’s assume each response having these keys in the header
“X-RateLimit-Limit” which is the $maxAttempts,
“X-RateLimit-Remaining” the $maxAttempts minus user attempts,
“Retry-After” is the time user has to wait if he reaches the $maxAttempts.

Check Laravel 10 Rate limit implementation.

Example:

You have an API service for social media system, all APIs in the service have authentication and can’t be accessed without a token, so after authentication users grant their token and they have to include this token in each API call (the token here is a perfect $uniqueKey since it’s called in each call).

In the Throttling middleware that you have built before you set the $maxAttempts to 30
$perSeconds to 10
$uniqueKey to the users token

Case 1 :
User A calls endpoint “POST v1/posts” 20 times within 10 seconds, nothing will happen here and all posts will be created since the user didn’t exceed the allowed attempts within 10 seconds, and the response header should have these values
X-RateLimit-Limit= 30
X-RateLimit-Remaining=10.
Note: key “Retry-After” will not be included in the header if the user did not exceed the limit.

201 created

Case 2:
User B calls endpoint “POST v1/posts” 31 times within 5 seconds, on the hit number 31 a response with 429 status code and a message “too many attempts” will be returned, and the response header should have these values
X-RateLimit-Limit= 30
X-RateLimit-Remaining=0
Retry-After=5
the user has to wait 5 seconds before he can perform a new 30 requests again within 10 seconds.

429 too many attempts

Can I make the API rate limit more complex?

yes, you can make tiers of limiting for example if the user exceeded 30 requests within 10 seconds the first time, you can limit him to perform 15 requests within 5 seconds the next time, and so on.

--

--