Skip to content

Understanding Performance Design

This document explains the performance philosophy behind AirsSys RT, design decisions that impact performance, and how to reason about actor system scalability.

Table of Contents


Performance Philosophy

Guiding Principles

1. "Fast Enough" is Not Enough

Philosophy: AirsSys RT aims for predictable, measurable performance with clear baselines and characteristics.

Why: "Fast enough" is subjective and changes with scale. Concrete baselines enable capacity planning and performance regression detection.

Approach:

  • Establish baseline measurements (actor spawn: 624ns, messaging: 737ns)
  • Document scaling characteristics (linear O(n), sub-linear, super-linear)
  • Provide performance reference for capacity planning

2. Zero-Cost Abstractions (Rust Philosophy)

Philosophy: Abstractions should not cost performance. Pay only for what you use.

Why: Actor model provides high-level abstractions (message passing, supervision). These must not impose prohibitive overhead.

Approach:

  • Generic traits compile to concrete types (no virtual dispatch)
  • Inline hot paths (message handling, mailbox operations)
  • Benchmark abstractions against hand-coded alternatives

3. Performance is a Feature

Philosophy: Performance is not an afterthought - it's designed into the architecture from the start.

Why: Retrofitting performance into a slow system is harder than building it in from day one.

Approach:

  • Benchmarks from RT-TASK-008 (baseline measurement)
  • Performance regression detection in CI (future)
  • Documentation of performance characteristics

Zero-Cost Abstractions

What are Zero-Cost Abstractions?

Definition: High-level abstractions that compile down to the same code a programmer would write by hand.

Rust's Promise: "What you don't use, you don't pay for. What you do use, you couldn't hand code any better."

Example: Generic Message Handlers

High-Level Code (What You Write):

#[async_trait]
impl Handler<MyMessage> for MyActor {
    async fn handle(&mut self, msg: MyMessage, ctx: &mut ActorContext<Self>) -> String {
        format!("Processed: {:?}", msg)
    }
}

Compiled Code (What Runs):

// Compiler monomorphizes generics into concrete types
impl MyActor {
    async fn handle_MyMessage(&mut self, msg: MyMessage, ctx: &mut ActorContext<MyActor>) -> String {
        format!("Processed: {:?}", msg)
    }
}

// Direct function call (no virtual dispatch, no runtime overhead)
actor.handle_MyMessage(msg, ctx).await

Cost: Zero runtime overhead compared to writing the concrete function directly.

Benefit: Type-safe, generic API without performance penalty.

Example: Mailbox Abstraction

High-Level API:

// User-friendly API
let mailbox = Mailbox::bounded(100);
mailbox.enqueue(msg).await?;
let msg = mailbox.dequeue().await?;

Implementation (Under the Hood):

// Compiles to efficient tokio::sync::mpsc
struct BoundedMailbox<M> {
    tx: mpsc::Sender<MessageEnvelope<M>>,  // Tokio channel
    rx: mpsc::Receiver<MessageEnvelope<M>>,
}

// Zero abstraction overhead - direct channel operations

Cost: 181ns enqueue, 150ns dequeue (same as raw tokio::mpsc)

Benefit: Ergonomic API without sacrificing performance.


Performance by Design

Design Decision: Lightweight Actor Spawn

Goal: Enable millions of actors without excessive memory overhead.

Design:

// Minimal actor footprint
struct ActorRuntime<A> {
    actor: A,               // User's actor struct
    mailbox: Mailbox<A>,    // Message queue
    context: ActorContext<A>, // Lifecycle management
}

Memory Footprint: ~1KB per actor (struct size + mailbox buffer)

Spawn Performance:

  • Single spawn: 624.74ns (P50)
  • Batch spawn (10 actors): 681.40ns per actor (P50)

Capacity: 1.6M actors/second spawn rate

Rationale: Lightweight actors enable actor-per-entity patterns (one actor per user session, device, game entity).

Design Decision: Direct Mailbox Access

Goal: Minimize message latency for high-frequency messaging.

Design:

// Direct enqueue without routing overhead
impl<A: Actor> ActorRef<A> {
    pub async fn send<M>(&self, msg: M) -> Result<M::Result, SendError>
    where
        A: Handler<M>,
        M: Message,
    {
        self.mailbox.enqueue(msg).await?;  // Direct access, no indirection
    }
}

Performance: 737ns roundtrip (enqueue + process + reply)

Rationale: Hot paths (request-reply within service) benefit from direct access. Message broker reserved for pub-sub.

Design Decision: Bounded and Unbounded Mailboxes

Goal: Support both high-throughput (unbounded) and backpressure (bounded) use cases.

Design:

pub enum Mailbox<A> {
    Bounded(BoundedMailbox<A>),    // Fixed capacity, backpressure
    Unbounded(UnboundedMailbox<A>), // Unlimited capacity
}

Performance:

Mailbox Type Enqueue Dequeue Memory
Bounded 181ns 150ns capacity * msg_size
Unbounded 181ns 150ns queue_depth * msg_size

Tradeoff:

  • Bounded: Prevents memory exhaustion, applies backpressure (may block sender)
  • Unbounded: Never blocks sender, but may exhaust memory

Guideline: Use bounded for production, unbounded for development/testing.

Design Decision: Inline Message Processing

Goal: Minimize virtual dispatch overhead in message handling hot path.

Design:

// Generic trait compiles to concrete implementations (monomorphization)
#[async_trait]
pub trait Handler<M: Message>: Actor {
    async fn handle(&mut self, msg: M, ctx: &mut ActorContext<Self>) -> M::Result;
}

// Compiler generates:
impl MyActor {
    #[inline]  // Inlined for zero overhead
    async fn handle_MyMessage(&mut self, msg: MyMessage, ctx: &mut ActorContext<Self>) -> String {
        // User implementation
    }
}

Performance: 31.55ns per message processing (after dequeue, before enqueue reply)

Rationale: Inlining eliminates function call overhead. Generic monomorphization enables compiler optimizations.


Scalability Characteristics

Linear Scaling (O(n))

Characteristic: Performance scales linearly with number of actors.

Evidence:

Actors Total Spawn Time Per-Actor Cost
1 624ns 624ns
10 6.81µs 681ns
100 68.1µs 681ns
1,000 681µs 681ns

Slope: ~681ns per actor (constant)

Implication: Doubling actors doubles total time, but per-actor cost remains constant.

Why Linear:

  • Independent actor spawning (no shared locks)
  • Per-actor mailboxes (no contention)
  • Isolated state (no synchronization overhead)

Message Broker Overhead

Characteristic: Broker routing adds constant overhead per message.

Measurement:

Path Latency Overhead
Direct (no broker) 737ns Baseline
Via Broker 917ns +180ns

Overhead Factor: 6.7x slower than direct messaging

Implication: Broker suitable for infrequent messaging (pub-sub, discovery), not hot paths.

Why Constant:

  • Hash map lookup for topic → subscribers (O(1))
  • Fixed routing logic (no dynamic dispatch)
  • Parallel delivery to subscribers

Broadcast Scaling

Characteristic: Broadcast latency scales linearly with subscriber count.

Measurement:

Subscribers Total Latency Per-Subscriber
1 395ns 395ns
10 3.95µs 395ns
100 39.5µs 395ns

Slope: 395ns per subscriber (constant)

Implication: Each subscriber adds fixed cost. 1,000 subscribers = ~400µs total.

Why Linear:

  • Independent message delivery (parallel sends)
  • No synchronization between subscribers
  • Each subscriber has own mailbox

Supervision Overhead

Characteristic: Supervision adds constant overhead per child operation.

Measurement:

Operation Without Supervisor With Supervisor Overhead
Spawn Actor 624ns 1.28µs +656ns (105%)
Restart Actor - 1.28µs -

Overhead Factor: ~2x slower with supervision

Implication: Supervision worthwhile for fault tolerance, but not free.

Why Constant:

  • Fixed supervisor bookkeeping (child registration)
  • One-time restart policy evaluation
  • No scaling with child count (per-child overhead)

Performance Tradeoffs

Tradeoff 1: Type Safety vs. Runtime Flexibility

Type-Safe Approach (AirsSys RT):

// Compiler ensures MyActor implements Handler<MyMessage>
impl Handler<MyMessage> for MyActor { ... }

// Compile-time error if not implemented
actor_ref.send(MyMessage).await?;  // Type-checked!

Pros:

  • Catch errors at compile time
  • Zero runtime type checking overhead
  • Self-documenting (handler existence proven by types)

Cons:

  • Cannot send arbitrary messages at runtime
  • Requires implementing trait for each message type

Dynamic Approach (Erlang, Akka):

% Erlang: Send any message to any actor (runtime checked)
Actor ! {my_message, Data}.

Pros:

  • Maximum flexibility (send any message anytime)
  • Rapid prototyping (no trait implementations)

Cons:

  • Runtime errors (message not handled crashes actor)
  • No compile-time verification
  • Runtime type matching overhead

AirsSys RT Choice: Prioritize type safety and compile-time guarantees over runtime flexibility.

Tradeoff 2: Message Passing vs. Shared Memory

Message Passing:

// Actor approach: Copy message to mailbox
actor_ref.send(LargeData { vec: large_vec }).await?;
// Message copied to mailbox (memory overhead)

Pros:

  • Isolation (no data races)
  • Type-safe (compiler-checked message types)
  • Location transparent (can be made distributed)

Cons:

  • Memory copying overhead (message size dependent)
  • Latency overhead (~737ns per message)

Shared Memory:

// Shared memory: Reference shared data
let data = Arc::new(Mutex::new(large_vec));
let data_clone = data.clone();  // Cheap Arc clone
// No copying, just reference counting

Pros:

  • No memory copying (shared reference)
  • Minimal overhead (Arc increment/decrement)

Cons:

  • Potential data races (if locking incorrect)
  • Deadlock risk (complex lock orderings)
  • Not location transparent (cannot distribute)

AirsSys RT Choice: Message passing by default, but users can use Arc<T> for large shared data:

// Hybrid: Message passing with shared data
#[derive(Clone)]
struct EfficientMessage {
    data: Arc<Vec<u8>>,  // Shared via Arc, cheap clone
}

Tradeoff 3: Bounded vs. Unbounded Mailboxes

Bounded Mailbox:

let mailbox = Mailbox::bounded(100);  // Max 100 messages

Pros:

  • Prevents memory exhaustion (finite memory use)
  • Applies backpressure (slows down fast producers)
  • Predictable memory footprint

Cons:

  • May block senders (if mailbox full)
  • May drop messages (if Drop strategy)
  • Requires capacity tuning

Unbounded Mailbox:

let mailbox = Mailbox::unbounded();  // No limit

Pros:

  • Never blocks senders (always accepts messages)
  • Simple (no capacity configuration)
  • Matches Erlang semantics

Cons:

  • Risk of memory exhaustion (queue grows unbounded)
  • No backpressure (fast producer can overwhelm slow consumer)
  • Unpredictable memory use

AirsSys RT Choice: Provide both, recommend bounded for production with appropriate capacity.


When Performance Matters

High-Frequency Messaging

Scenario: Actors exchange messages at >100K msgs/sec.

Optimization:

// Use direct references (avoid broker overhead)
let worker_ref = system.spawn(Worker::new()).await?;

for i in 0..100_000 {
    worker_ref.send(HighFrequency { data: i }).await?;  // Direct path
}

Avoid: Routing through message broker (+180ns per message)

Expected: ~1M msgs/sec throughput via direct references

Large-Scale Actor Systems

Scenario: System with millions of concurrent actors.

Optimization:

// Batch actor spawning
let actors: Vec<_> = (0..1_000_000)
    .map(|i| system.spawn(Worker::new()))
    .collect();
futures::future::join_all(actors).await;

// Expected: 1.6M actors/sec spawn rate

Memory Planning: 1M actors × 1KB = ~1GB minimum

Avoid: Spawning actors synchronously in loop (slower)

Low-Latency Request-Reply

Scenario: Client requires <1ms response time.

Optimization:

// Minimize message processing time
#[async_trait]
impl Handler<Query> for FastActor {
    async fn handle(&mut self, msg: Query, ctx: &mut ActorContext<Self>) -> Data {
        // Keep handler simple and fast
        self.cache.get(&msg.id).cloned().unwrap_or_default()
        // Avoid: External I/O, complex computation, blocking operations
    }
}

Expected: 737ns messaging + handler time

Avoid: Blocking operations in handler (use spawn_blocking)

Pub-Sub with Many Subscribers

Scenario: Broadcasting events to 100+ subscribers.

Optimization:

// Consider batching events to reduce broadcast frequency
let mut batch = Vec::new();
for event in events {
    batch.push(event);
    if batch.len() >= 100 {
        broker.publish("topic", EventBatch { events: batch.clone() }).await?;
        batch.clear();
    }
}

Expected: 395ns × 100 subscribers = ~40µs per broadcast

Avoid: Individual events if batch semantics acceptable


Performance Monitoring

Metrics to Track

1. Message Latency (P50, P95, P99)

// Measure end-to-end message latency
let start = Instant::now();
let result = actor_ref.send(msg).await?;
let latency = start.elapsed();

// Compare against baseline: 737ns for simple messages
if latency > Duration::from_micros(10) {
    log::warn!("High latency: {:?}", latency);
}

2. Mailbox Queue Depth

// Monitor mailbox backlog
let queue_depth = ctx.mailbox_size();

if queue_depth > 1000 {
    log::warn!("Mailbox backlog: {} messages", queue_depth);
    // Consider: Add more workers, increase capacity, apply backpressure
}

3. Actor Spawn Rate

// Track actor creation throughput
let start = Instant::now();
let actors: Vec<_> = (0..1000)
    .map(|_| system.spawn(Worker::new()))
    .collect();
futures::future::join_all(actors).await;
let elapsed = start.elapsed();
let rate = 1000.0 / elapsed.as_secs_f64();

// Expected: ~1.6M actors/sec
if rate < 1_000_000.0 {
    log::warn!("Low spawn rate: {:.0} actors/sec", rate);
}

4. Supervisor Restart Rate

// Monitor fault tolerance overhead
// High restart rate may indicate:
// - Buggy actors (frequent crashes)
// - Configuration issues (max_restarts too high)
// - System overload (resource exhaustion)

Benchmarking Workflow

1. Establish Baseline:

# Run benchmarks to establish baseline
cargo bench --bench actor_benchmarks
cargo bench --bench message_benchmarks

2. Compare Against Baseline:

// In your application
assert!(latency < BASELINE_LATENCY * 2.0, "Performance regression detected");

3. Profile Hot Paths:

# CPU profiling
cargo flamegraph --bin my_app

# Memory profiling
heaptrack target/release/my_app

Design Guidelines for Performance

1. Design for Hot and Cold Paths

Hot Path (High-Frequency):

  • Use direct actor references
  • Minimize message size (use Arc for large data)
  • Keep handlers simple and fast
  • Avoid broker overhead

Cold Path (Infrequent):

  • Message broker acceptable
  • Pub-sub for events
  • Complex processing ok
  • External I/O acceptable

2. Size Mailboxes Appropriately

Formula:

mailbox_capacity = peak_msg_rate × burst_duration

Example:
- Peak rate: 1,000 msgs/sec
- Burst duration: 5 seconds
- Capacity: 1,000 × 5 = 5,000 messages

3. Use Batching When Appropriate

Instead of:

for event in events {
    broker.publish("topic", event).await?;  // N broadcasts
}

Prefer:

broker.publish("topic", EventBatch { events }).await?;  // 1 broadcast

Tradeoff: Latency (batching delays) vs. Throughput (fewer broadcasts)

4. Profile Before Optimizing

Measure, don't guess:

// Add timing to suspected hot paths
let start = Instant::now();
// ... suspected slow code ...
let elapsed = start.elapsed();
if elapsed > Duration::from_millis(1) {
    log::warn!("Slow operation: {:?}", elapsed);
}

Use profiling tools:

  • cargo flamegraph for CPU profiling
  • heaptrack for memory profiling
  • criterion for micro-benchmarking

Further Reading

AirsSys RT Documentation

External Resources

  • Zero-Cost Abstractions in Rust: Rust language philosophy
  • Performance Matters (Emery Berger): Academic perspective on performance
  • The Art of Performance Engineering: Systematic performance optimization

Last Updated: 2025-01-18 (RT-TASK-011 Phase 4 Day 7)