Performance

edgeProxy is designed to handle thousands of concurrent connections with minimal overhead. This page explains the internal architecture that makes this possible.

High-Performance Request Flow

Performance Architecture

When a client connects to edgeProxy, the request flows through several optimized stages:

Stage	Latency	Description
TCP Accept	~1μs	Kernel hands off connection to userspace
GeoIP Lookup	~100ns	In-memory MaxMind database query
Backend Selection	~10μs	DashMap lookup + scoring algorithm
WireGuard Tunnel	~0.5ms	Encryption overhead (ChaCha20-Poly1305)
Total Proxy Overhead	<1ms	End-to-end proxy latency

Tokio Async Runtime

edgeProxy uses the Tokio async runtime to handle thousands of connections with minimal threads:

How It Works

Thread Pool = CPU Cores
- By default, Tokio creates one worker thread per CPU core
- A 4-core server runs 4 threads, handling 10,000+ connections
Lightweight Tasks (~200 bytes each)
- Each connection is a Tokio "task", not a thread
- Tasks are multiplexed onto the thread pool
- No context switching overhead between connections
Non-Blocking I/O
- Uses epoll (Linux) or kqueue (macOS) for efficient polling
- A task waiting for I/O doesn't block its thread

Memory Efficiency

Connections	Memory (Tasks Only)	Total Memory (Realistic)
1,000	~200KB	~10MB
10,000	~2MB	~100MB
100,000	~20MB	~1GB

info

The "realistic" memory includes socket buffers, DashMap entries, and routing data. The proxy itself remains very efficient.

Operation Costs

Understanding the cost of each operation helps identify bottlenecks:

Operation	Time	Notes
DashMap read	~50ns	Lock-free concurrent hashmap
DashMap write	~100ns	Atomic updates
GeoIP lookup	~100ns	In-memory MMDB
Backend scoring	~1μs	Iterate and score backends
SQLite read	~10μs	Hot reload from routing.db
WireGuard encrypt	~500ns	Per-packet overhead
TCP connect	~1ms	Depends on network distance

Concurrency Model

// Client bindings: lock-free reads
let bindings: DashMap<ClientKey, Binding> = DashMap::new();

// Backend pool: read-heavy, write-rare
let backends: DashMap<String, Backend> = DashMap::new();

// Connection counts: atomic updates
let conn_count: AtomicUsize = AtomicUsize::new(0);

The use of DashMap allows:

Concurrent reads without blocking
Fine-grained locking on writes (per-shard)
No global lock that would serialize requests

System Bottlenecks

System Limits

The proxy itself is rarely the bottleneck. These are the real limits:

Network Layer (1-10 Gbps)

NIC Speed	Throughput	Typical Limit
1 Gbps	~125 MB/s	Most cloud VMs
10 Gbps	~1.25 GB/s	Premium instances
25 Gbps	~3.1 GB/s	Bare metal

Solution: Deploy multiple POPs to distribute load geographically.

Kernel Layer (File Descriptors)

Each TCP connection consumes one file descriptor. Default limits are often too low:

# Check current limit
ulimit -n

# Typical default: 1024
# Recommended for production: 1,000,000+

Solution: Increase ulimit -n in systemd service or /etc/security/limits.conf:

# /etc/security/limits.conf
*    soft    nofile    1048576
*    hard    nofile    1048576

Backend Layer (Connection Limits)

Each backend has soft_limit and hard_limit in routing.db:

Limit	Purpose
`soft_limit`	Comfortable connection count, used for scoring
`hard_limit`	Maximum connections, rejects when reached

Tuning: Adjust based on backend capacity:

-- Increase limits for high-capacity backends
UPDATE backends SET soft_limit = 100, hard_limit = 200
WHERE id = 'us-node-1';

Kernel Tuning

For high-performance deployments, tune these kernel parameters:

# /etc/sysctl.conf

# Maximum connections queued for accept
net.core.somaxconn = 65535

# Maximum socket receive/send buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# TCP buffer sizes (min, default, max)
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Enable TCP Fast Open
net.ipv4.tcp_fastopen = 3

# Increase port range for outbound connections
net.ipv4.ip_local_port_range = 1024 65535

# Reduce TIME_WAIT sockets
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1

Apply with:

sudo sysctl -p

Performance Metrics

Based on benchmarks with a 4-core VM:

Metric	Value
Connections/second	50,000+
Concurrent connections	10,000+
Proxy latency	<1ms
Memory per 1K connections	~10MB
WireGuard CPU overhead	~3%
Cold start time	~50ms
Binary size	~5MB

tip

These numbers are conservative. Real-world performance depends on network conditions, backend response times, and workload characteristics.

Comparison with Other Proxies

Feature	edgeProxy	HAProxy	Nginx	Envoy
Language	Rust	C	C	C++
Async Model	Tokio	Multi-process	Event loop	Multi-thread
Memory per 10K conn	~100MB	~50MB	~30MB	~200MB
Geo-routing	Built-in	Plugin	Module	Plugin
WireGuard	Native	External	External	External
Config reload	Hot	Hot	Hot	Hot

edgeProxy trades some raw throughput for:

Built-in geo-routing without external dependencies
WireGuard integration for secure backhaul
Rust safety with predictable latency (no GC)

Monitoring Performance

Track these metrics in production:

# Connection rate
curl localhost:9090/metrics | grep edge_connections_total

# Current connections
curl localhost:9090/metrics | grep edge_connections_current

# Backend latency
curl localhost:9090/metrics | grep edge_backend_latency_ms

note

Prometheus metrics export is planned for a future release. See the Roadmap for details.

Best Practices

Deploy close to users: Use POPs in each major region
Size your backends: Set soft_limit to 70% of true capacity
Monitor file descriptors: Alert when approaching ulimit
Use WireGuard: The 0.5ms overhead is worth the security
Enable TCP Fast Open: Reduces connection latency by 1 RTT
Scale horizontally: Add more POPs, not bigger VMs

High-Performance Request Flow​

Tokio Async Runtime​

How It Works​

Memory Efficiency​

Operation Costs​

Concurrency Model​

System Bottlenecks​

Network Layer (1-10 Gbps)​

Kernel Layer (File Descriptors)​

Backend Layer (Connection Limits)​

Kernel Tuning​

Performance Metrics​

Comparison with Other Proxies​

Monitoring Performance​

Best Practices​