Affinity

Client affinity (sticky sessions) ensures that repeated connections from the same client IP are routed to the same backend. This is critical for stateful protocols and session-based applications.

Overview

edgeProxy maintains a binding table that maps client IPs to backend IDs. The following diagram shows the binding lifecycle:

Client Affinity

Configuration

Variable	Default	Description
`EDGEPROXY_BINDING_TTL_SECS`	`600`	Binding lifetime (10 minutes)
`EDGEPROXY_BINDING_GC_INTERVAL_SECS`	`60`	Cleanup interval

Hexagonal Architecture

Client affinity is managed through ports and adapters:

domain/entities.rs           → ClientKey, Binding (entities)
domain/ports/binding_repository.rs → BindingRepository trait (port)
adapters/outbound/dashmap_binding_repo.rs → DashMapBindingRepository (adapter)

The domain defines WHAT we need (the trait), the adapter provides HOW (DashMap).

Data Structures

ClientKey (`domain/entities.rs`)

#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct ClientKey {
    pub client_ip: IpAddr,
}

impl ClientKey {
    pub fn new(client_ip: IpAddr) -> Self {
        Self { client_ip }
    }
}

Binding (`domain/entities.rs`)

#[derive(Clone, Debug)]
pub struct Binding {
    pub backend_id: String,
    pub created_at: Instant,
    pub last_seen: Instant,
}

impl Binding {
    pub fn new(backend_id: String) -> Self {
        let now = Instant::now();
        Self { backend_id, created_at: now, last_seen: now }
    }
}

Port (Interface) - `domain/ports/binding_repository.rs`

#[async_trait]
pub trait BindingRepository: Send + Sync {
    async fn get(&self, key: &ClientKey) -> Option<Binding>;
    async fn set(&self, key: ClientKey, binding: Binding);
    async fn remove(&self, key: &ClientKey);
    async fn touch(&self, key: &ClientKey);  // Update last_seen
    async fn cleanup_expired(&self, ttl: Duration) -> usize;
}

Adapter (Implementation) - `adapters/outbound/dashmap_binding_repo.rs`

pub struct DashMapBindingRepository {
    bindings: Arc<DashMap<ClientKey, Binding>>,
}

#[async_trait]
impl BindingRepository for DashMapBindingRepository {
    async fn get(&self, key: &ClientKey) -> Option<Binding> {
        self.bindings.get(key).map(|e| e.value().clone())
    }

    async fn set(&self, key: ClientKey, binding: Binding) {
        self.bindings.insert(key, binding);
    }

    async fn touch(&self, key: &ClientKey) {
        if let Some(mut entry) = self.bindings.get_mut(key) {
            entry.last_seen = Instant::now();
        }
    }
    // ...
}

Lifecycle

All lifecycle operations go through the ProxyService (application layer), which uses the BindingRepository trait.

1. New Connection

When a client connects for the first time:

// application/proxy_service.rs
pub async fn resolve_backend(&self, client_ip: IpAddr) -> Option<Backend> {
    let client_key = ClientKey::new(client_ip);

    // 1. Check for existing binding via repository trait
    if let Some(binding) = self.binding_repo.get(&client_key).await {
        // ... use existing binding
    }

    // 2. No binding - use LoadBalancer (pure domain logic)
    let backend = LoadBalancer::pick_backend(
        &backends,
        &self.local_region,
        client_geo.as_ref(),
        |id| self.metrics.get_connection_count(id),
    )?;

    // 3. Create new binding via repository trait
    self.binding_repo.set(
        client_key,
        Binding::new(backend.id.clone()),
    ).await;

    Some(backend)
}

2. Subsequent Connections

When the same client reconnects:

// application/proxy_service.rs
if let Some(binding) = self.binding_repo.get(&client_key).await {
    // Update last_seen via repository
    self.binding_repo.touch(&client_key).await;

    // Verify backend is still healthy
    if let Some(backend) = self.backend_repo.get_by_id(&binding.backend_id).await {
        if backend.healthy {
            return Some(backend);
        }
    }

    // Backend unhealthy - remove stale binding
    self.binding_repo.remove(&client_key).await;
}

3. Binding Expiration (GC)

The adapter handles garbage collection:

// adapters/outbound/dashmap_binding_repo.rs
impl DashMapBindingRepository {
    pub fn start_gc(&self, ttl: Duration, interval: Duration) {
        let bindings = self.bindings.clone();
        tokio::spawn(async move {
            loop {
                let now = Instant::now();
                bindings.retain(|_, binding| {
                    now.duration_since(binding.last_seen) <= ttl
                });
                tokio::time::sleep(interval).await;
            }
        });
    }
}

4. Backend Failure

If the bound backend becomes unhealthy:

// application/proxy_service.rs
if let Some(binding) = self.binding_repo.get(&client_key).await {
    // Check backend health via repository
    if let Some(backend) = self.backend_repo.get_by_id(&binding.backend_id).await {
        if backend.healthy {
            return Some(backend);
        }
    }

    // Backend unhealthy or gone - remove binding
    self.binding_repo.remove(&client_key).await;
    // Fall through to LoadBalancer...
}

Flow Diagram

Client Affinity Flow

Use Cases

1. Stateful Applications

Games, chat servers, or any application maintaining connection state:

Client A ──▶ game-server-1 (player state)
Client A ──▶ game-server-1 (same server, state preserved)

2. Session-Based Protocols

Applications using session cookies or tokens:

Client B ──▶ web-server-2 (session created)
Client B ──▶ web-server-2 (session retrieved)

3. Connection Pooling

Database connections or persistent HTTP connections:

Client C ──▶ db-replica-1 (connection 1)
Client C ──▶ db-replica-1 (connection 2, same replica)

Performance

Memory Usage

Each binding uses approximately:

ClientKey: 16 bytes (IPv4) or 40 bytes (IPv6)
Binding: ~80 bytes (String + 2 Instants)
DashMap overhead: ~64 bytes per entry

Total: ~160 bytes per client

For 1 million clients: ~160 MB

Garbage Collection

GC runs every BINDING_GC_INTERVAL_SECS:

// Iterate all bindings
bindings.retain(|_, binding| {
    now.duration_since(binding.last_seen) < ttl
});

Time complexity: O(n) where n = total bindings

Concurrency

DashMap provides lock-free reads and sharded writes:

Read (binding lookup): No blocking
Write (binding create/update): Per-shard locking
GC (retain): Brief per-shard locks

Tuning

High-Frequency Connections

For clients making many short connections:

# Shorter TTL to free memory faster
export EDGEPROXY_BINDING_TTL_SECS=60

# More frequent GC
export EDGEPROXY_BINDING_GC_INTERVAL_SECS=10

Long-Lived Sessions

For persistent connections or infrequent reconnects:

# Longer TTL to maintain affinity
export EDGEPROXY_BINDING_TTL_SECS=3600  # 1 hour

# Less frequent GC (lower CPU)
export EDGEPROXY_BINDING_GC_INTERVAL_SECS=300

High Client Volume

For millions of unique clients:

# Aggressive TTL to bound memory
export EDGEPROXY_BINDING_TTL_SECS=300

# Frequent GC
export EDGEPROXY_BINDING_GC_INTERVAL_SECS=30

Limitations

1. IP-Based Only

Affinity is based on client IP, not:

HTTP cookies
TLS session tickets
Application tokens

Implication: Clients behind NAT share affinity.

2. No Cross-POP Sync

Bindings are local to each POP instance:

Client → POP-SA → sa-node-1 (binding created)
Client → POP-US → us-node-1 (different binding!)

Solution: Use DNS geo-routing to ensure clients hit consistent POPs.

3. Backend Changes

If a backend is removed from routing.db:

Existing bindings remain until TTL
Next connection fails backend health check
Binding removed, new backend selected

Monitoring

Binding Count

# Check active bindings (requires debug endpoint)
curl http://localhost:8080/debug/bindings/count

GC Activity

With DEBUG=1:

DEBUG edge_proxy::state: binding GC removed 150 expired entries

Memory Usage

Monitor process RSS to track binding memory:

ps -o rss= -p $(pgrep edge-proxy)

Future Improvements

Distributed bindings: Sync across POPs via built-in replication
Configurable keys: Support for headers, cookies
Weighted affinity: Probability-based stickiness
Metrics export: Prometheus counters for bindings

Next Steps

Load Balancer - Backend selection algorithm
Architecture - System overview
Configuration - All options

Overview​

Configuration​

Hexagonal Architecture​

Data Structures​

ClientKey (domain/entities.rs)​

Binding (domain/entities.rs)​

Port (Interface) - domain/ports/binding_repository.rs​

Adapter (Implementation) - adapters/outbound/dashmap_binding_repo.rs​

Lifecycle​

1. New Connection​

2. Subsequent Connections​

3. Binding Expiration (GC)​

4. Backend Failure​

Flow Diagram​

Use Cases​

1. Stateful Applications​

2. Session-Based Protocols​

3. Connection Pooling​

Performance​

Memory Usage​

Garbage Collection​

Concurrency​

Tuning​

High-Frequency Connections​

Long-Lived Sessions​

High Client Volume​

Limitations​

1. IP-Based Only​

2. No Cross-POP Sync​

3. Backend Changes​

Monitoring​

Binding Count​

GC Activity​

Memory Usage​

Future Improvements​

Next Steps​

Overview

Configuration

Hexagonal Architecture

Data Structures

ClientKey (`domain/entities.rs`)

Binding (`domain/entities.rs`)

Port (Interface) - `domain/ports/binding_repository.rs`

Adapter (Implementation) - `adapters/outbound/dashmap_binding_repo.rs`

Lifecycle

1. New Connection

2. Subsequent Connections

3. Binding Expiration (GC)

4. Backend Failure

Flow Diagram

Use Cases

1. Stateful Applications

2. Session-Based Protocols

3. Connection Pooling

Performance

Memory Usage

Garbage Collection

Concurrency

Tuning

High-Frequency Connections

Long-Lived Sessions

High Client Volume

Limitations

1. IP-Based Only

2. No Cross-POP Sync

3. Backend Changes

Monitoring

Binding Count

GC Activity

Memory Usage

Future Improvements

Next Steps