Post

Rate Limiting: Leaky Bucket vs Token Bucket

Rate Limiting: Leaky Bucket vs Token Bucket

Been looking into rate limiting for calling third-party APIs. Two main algorithms keep coming up: Leaky Bucket and Token Bucket. Here’s what I learned.

The Problem

Calling external APIs without rate limiting = bad time:

  • Hit rate limits → errors
  • Waste quota → costs money
  • Bursts → overwhelm services

Need to smooth out requests.

Leaky Bucket

Imagine a bucket with a hole:

  • Requests go in at any rate
  • Leak out at constant rate
  • Bucket full? Drop requests
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
type LeakyBucket struct {
    capacity  int
    remaining int
    leakRate  time.Duration
    lastLeak  time.Time
    mu        sync.Mutex
}

func (lb *LeakyBucket) Allow() bool {
    lb.mu.Lock()
    defer lb.mu.Unlock()

    // Leak based on time elapsed
    now := time.Now()
    elapsed := now.Sub(lb.lastLeak)
    leaked := int(elapsed / lb.leakRate)

    if leaked > 0 {
        lb.remaining += leaked
        if lb.remaining > lb.capacity {
            lb.remaining = lb.capacity
        }
        lb.lastLeak = now
    }

    if lb.remaining > 0 {
        lb.remaining--
        return true
    }
    return false
}

Good for: Smooth, constant output rate. Like streaming data or strict API rate limits.

Token Bucket

Tokens refill at constant rate:

  • Bucket starts with tokens
  • Request consumes token
  • Out of tokens? Wait or reject
  • Allows bursts (if tokens available)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
type TokenBucket struct {
    capacity   int
    tokens     float64
    refillRate float64  // tokens per second
    lastRefill time.Time
    mu         sync.Mutex
}

func (tb *TokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()

    now := time.Now()
    elapsed := now.Sub(tb.lastRefill).Seconds()

    // Add tokens
    tb.tokens += elapsed * tb.refillRate
    if tb.tokens > float64(tb.capacity) {
        tb.tokens = float64(tb.capacity)
    }
    tb.lastRefill = now

    if tb.tokens >= 1 {
        tb.tokens -= 1
        return true
    }
    return false
}

Good for: User-facing APIs. Allows occasional bursts for better UX.

Key Difference

  • Leaky Bucket: Constant output, smooths bursts
  • Token Bucket: Allows bursts, maintains average rate

What I’m Using

For calling third-party APIs: Leaky Bucket. I want predictable, steady rate.

For my own API: Token Bucket. Users can burst occasionally, better experience.

Or just use golang.org/x/time/rate - it’s basically a token bucket:

1
2
3
4
5
6
7
8
9
10
import "golang.org/x/time/rate"

limiter := rate.NewLimiter(10, 100) // 10 req/s, burst of 100

if limiter.Allow() {
    // make request
}

// Or wait
limiter.Wait(ctx)

Redis Implementation

For distributed systems, need shared state. Found this Redis rate limiting article (in Chinese) that shows Lua script approach.

Basic idea:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Token bucket in Redis (simplified)
local tokens = redis.call('get', key) or capacity
local now = redis.call('time')[1]
local last = redis.call('get', key .. ':last') or now

-- Refill tokens
local elapsed = now - last
tokens = math.min(capacity, tokens + elapsed * rate)

-- Consume token
if tokens >= 1 then
    redis.call('set', key, tokens - 1)
    redis.call('set', key .. ':last', now)
    return 1
else
    return 0
end

Atomic operations, works across servers.

That’s my understanding so far. Both algorithms work, just pick based on whether you need smoothing (leaky) or bursts (token).

This post is licensed under CC BY 4.0 by the author.