Theta-lang: Core Design Decisions

November 21, 2025

Design Choices

These fundamental design decisions have been confirmed and should guide all implementation work.

1. Memory Model: Arena Allocation

Decision: Use arena-based memory allocation for FlatBuffer regions.

Rationale

Perfect fit with FlatBuffers: Append-only FlatBuffer building aligns naturally with arena allocation
Predictable lifecycle: Entire arena can be freed at once when a computation completes
Zero fragmentation: Sequential allocation eliminates memory fragmentation
Fast allocation: Bump-pointer allocation is O(1)
Immutability enforcement: Once built, FlatBuffer regions are immutable - arenas reinforce this

Implementation Implications

VM maintains multiple arenas for different buffer generations
Each major computation phase (loading module, executing function, materializing dataflow result) gets its own arena
Host can retain references to old arenas even after VM creates new ones
Cleanup is simple: drop entire arena when no references remain
No need for garbage collection or reference counting within an arena

Memory Layout Strategy

Arena 1: Program module (immutable, lives for VM lifetime)
Arena 2: Execution state generation N (replaced on each instruction batch)
Arena 3: Dataflow intermediate results (replaced on materialization)
Arena 4: Final results (retained by host, freed when host releases)

2. Register Architecture: Typed Registers

Decision: Registers are statically typed; type checking occurs at module load time.

Rationale

Performance: No runtime type checking overhead in hot loop
Safety: Catch type errors before execution
C-like semantics: Matches procedural C language model
Optimization: Typed registers enable better code generation and JIT opportunities
Columnar alignment: Typed registers map naturally to typed columns in dataflow

Register File Design

Fixed-size register file: 256 registers per function (r0-r255)
Type tags: Each register has a static type (i32, i64, f32, f64, bool, ptr, table_ref, node_ref)
Type validation: Instruction decoder verifies operand types match instruction requirements
Special registers:
- r0: Always contains integer zero (read-only)
- r1: Function return value
- r2-r15: Argument passing registers
- r16-r255: General purpose

Type System

Scalar types: i8, i16, i32, i64, f32, f64, bool
Reference types: ptr (raw pointer), string (UTF-8)
Table types: table_ref (reference to FlatBuffer table)
Dataflow types: node_ref (reference to dataflow node)

3. Procedural Language: Typed, C-like

Decision: Procedural layer uses a typed, C-like imperative language.

Rationale

Familiarity: C-like syntax reduces learning curve
Explicit control: Imperative style gives developers clear execution model
Type safety: Static typing catches errors early
Performance: Predictable code generation without dynamic dispatch
Interop: Natural mapping to C host integration

Language Characteristics

Static typing: All variables and function signatures are typed
Imperative control flow: if/else, while, for, break, continue, return
Functions: First-class functions with typed signatures
No implicit conversions: Explicit casts required (safety over convenience)
No garbage collection: Manual memory is host-managed via arenas
No classes/OOP: Procedural only, structs for data aggregation

Syntax Example (Illustrative)

fn calculate_revenue(orders: table_ref) -> f64 {
    i64 count = 0;
    f64 total = 0.0;

    // Create dataflow pipeline
    node_ref filtered = filter(orders, "status == 'completed'");
    node_ref projected = project(filtered, ["amount"]);

    // Execute and materialize
    table_ref results = materialize(projected);

    // Procedural aggregation (could also use dataflow aggregate)
    for (i64 i = 0; i < table_row_count(results); i++) {
        f64 amount = table_get_f64(results, i, 0);
        total = total + amount;
        count = count + 1;
    }

    return total;
}

4. Table Storage: Columnar Layout

Decision: Tables use columnar storage with typed columns.

Rationale

Analytical performance: Columnar storage is optimal for OLAP-style queries
Compression: Column-wise compression is more effective (similar values)
Cache efficiency: Scanning a single column has better cache locality
SIMD-friendly: Contiguous typed arrays enable vectorization
Projection efficiency: Selecting subset of columns avoids reading unused data

Column Layout in FlatBuffers

Table {
    schema: Schema;           // Column names and types
    row_count: u64;
    columns: [Column];        // Array of column buffers
}

Column {
    name: string;
    type: ColumnType;         // i32, i64, f32, f64, bool, string
    nullable: bool;
    null_bitmap: [u8];        // Bit-packed null flags
    data: [u8];               // Typed data buffer
}

ColumnType enum {
    Int8, Int16, Int32, Int64,
    Float32, Float64,
    Boolean,
    String,                   // Variable-length, stored as offsets + data
}

String Column Representation

Strings require special handling in columnar format:

StringColumn {
    offsets: [u32];           // Length = row_count + 1
    data: [u8];               // Concatenated UTF-8 bytes
}
// String at row i: data[offsets[i]..offsets[i+1]]

Null Handling

Null bitmap: 1 bit per row (0 = null, 1 = valid)
Packed into byte array: null_bitmap[(row / 8)] & (1 << (row % 8))
Non-nullable columns omit null_bitmap

5. Dataflow Execution: Pull-Based Lazy Evaluation

Decision: Dataflow nodes execute lazily using pull-based evaluation on a DAG.

Rationale

Memory efficiency: Only materialize results when needed
Composition: Build complex pipelines without intermediate materialization
Optimization opportunities: Entire DAG visible before execution, enables fusion
Natural fit with DAG: Pull model traverses DAG from sinks to sources
Host control: Host decides when to pull results (explicit materialize())

Execution Model

Build Phase (Lazy)

// These operations only construct the DAG, no execution yet
node_ref scan = create_scan_node("sales_data");
node_ref filter = create_filter_node(scan, "revenue > 1000");
node_ref project = create_project_node(filter, ["customer", "revenue"]);
// DAG exists, but no data has been processed

Execution Phase (Pull)

// Host explicitly requests materialization
table_ref result = materialize(project);
// Now the executor walks the DAG backwards:
// 1. project pulls from filter
// 2. filter pulls from scan
// 3. scan reads source table
// 4. filter evaluates predicate on each row
// 5. project selects columns
// 6. Result materialized to new FlatBuffer

DAG Interpreter Design

Node Interface

Every dataflow node implements:

typedef struct DataflowNode {
    NodeType type;
    NodeID id;
    InputRefs inputs[];       // Parent nodes in DAG
    Schema output_schema;
    void* operator_state;     // Operator-specific params
} DataflowNode;

// Pull interface (iterator pattern)
typedef struct NodeIterator {
    DataflowNode* node;
    void* state;              // Iteration state
    bool (*next)(NodeIterator* self, Row* out);
    void (*reset)(NodeIterator* self);
} NodeIterator;

Pull Execution

// Pseudocode for pull-based execution
Iterator materialize(DataflowNode* sink) {
    Iterator it = create_iterator(sink);
    FlatBufferBuilder builder = create_builder(arena);

    Row row;
    while (it.next(&row)) {
        append_row_to_builder(builder, row);
    }

    return finalize_builder(builder);
}

// Each operator implements next() by pulling from inputs
bool filter_next(FilterIterator* self, Row* out) {
    while (self->input.next(out)) {
        if (evaluate_predicate(self->predicate, out)) {
            return true;  // Found matching row
        }
    }
    return false;  // No more rows
}

DAG Representation in FlatBuffers

DataflowGraph {
    nodes: [DataflowNode];
    topological_order: [NodeID];  // Pre-computed for validation
}

DataflowNode {
    id: NodeID;
    operator: Operator;
    inputs: [NodeID];             // References to parent nodes
    parameters: OperatorParams;   // Operator-specific config
}

Operator union {
    ScanOp,
    FilterOp,
    ProjectOp,
    MapOp,
    JoinOp,
    AggregateOp,
    SortOp,
}

FilterOp {
    predicate: Expression;        // Expression tree
}

ProjectOp {
    columns: [u32];               // Column indices to keep
}

Optimization Opportunities

Lazy evaluation enables optimizations before execution:

Predicate pushdown: Move filters closer to scans
Projection pushdown: Only read needed columns from source
Operator fusion: Combine filter→project into single pass
Common subexpression elimination: Reuse shared subtrees

Design Synergies

These five decisions reinforce each other:

Arena + FlatBuffers

Arenas hold immutable FlatBuffer regions
Building a FlatBuffer = appending to arena
Completed computation = drop arena

Typed Registers + Columnar Storage

Register type table_ref holds reference to columnar table
Extracting column = typed array access (no type checking)
Pushing value to column = typed append

C-like Language + Static Types

Function signatures map to register types
Type checker validates before execution
No runtime surprises

Lazy Evaluation + DAG

DAG structure enables analysis before execution
Pull model = explicit materialization points
Host controls when computation happens

Columnar + Pull Execution

Pull one row at a time across columns
SIMD processes column chunks when beneficial
Cache-friendly sequential access patterns

Implementation Priorities

Given these decisions, the implementation should proceed:

Phase 1A: Type system and arena allocator
- Define type enum and type descriptors
- Implement arena allocator (bump pointer)
- Create FlatBuffer schemas for types
Phase 1B: Columnar table schema
- Define Column and Table FlatBuffer schemas
- Implement column builders (for each type)
- Add null bitmap handling
Phase 2A: Typed register file
- Define RegisterFile FlatBuffer schema
- Implement typed register access functions
- Create register type validator
Phase 2B: Basic instruction set
- Define typed instructions (ADD_I32, LOAD_F64, etc.)
- Implement instruction decoder
- Create simple interpreter loop
Phase 3A: Dataflow DAG representation
- Define DataflowNode schema
- Create DAG builder API
- Implement topological sort validation
Phase 3B: Pull-based iterator interface
- Define iterator protocol
- Implement basic operators (Scan, Filter, Project)
- Create materialization function
Phase 4: Language frontend (optional, can use direct bytecode generation initially)
- Design C-like syntax
- Implement parser
- Create type checker
- Generate typed bytecode

Open Questions (Lower Priority)

With core decisions made, these can be deferred:

SIMD strategy: Which intrinsics? Portable fallback?
String interning: Should duplicate strings be deduplicated?
Operator fusion: Compile-time or runtime?
Expression JIT: Should hot expressions be compiled?
Parallelization: Can operators run on multiple threads?

These can be addressed during implementation or optimization phases.

Tags: