Theta-lang: Core Design Decisions

Design Choices

These fundamental design decisions have been confirmed and should guide all implementation work.


1. Memory Model: Arena Allocation

Decision: Use arena-based memory allocation for FlatBuffer regions.

Rationale

  • Perfect fit with FlatBuffers: Append-only FlatBuffer building aligns naturally with arena allocation
  • Predictable lifecycle: Entire arena can be freed at once when a computation completes
  • Zero fragmentation: Sequential allocation eliminates memory fragmentation
  • Fast allocation: Bump-pointer allocation is O(1)
  • Immutability enforcement: Once built, FlatBuffer regions are immutable - arenas reinforce this

Implementation Implications

  • VM maintains multiple arenas for different buffer generations
  • Each major computation phase (loading module, executing function, materializing dataflow result) gets its own arena
  • Host can retain references to old arenas even after VM creates new ones
  • Cleanup is simple: drop entire arena when no references remain
  • No need for garbage collection or reference counting within an arena

Memory Layout Strategy

Arena 1: Program module (immutable, lives for VM lifetime)
Arena 2: Execution state generation N (replaced on each instruction batch)
Arena 3: Dataflow intermediate results (replaced on materialization)
Arena 4: Final results (retained by host, freed when host releases)

2. Register Architecture: Typed Registers

Decision: Registers are statically typed; type checking occurs at module load time.

Rationale

  • Performance: No runtime type checking overhead in hot loop
  • Safety: Catch type errors before execution
  • C-like semantics: Matches procedural C language model
  • Optimization: Typed registers enable better code generation and JIT opportunities
  • Columnar alignment: Typed registers map naturally to typed columns in dataflow

Register File Design

  • Fixed-size register file: 256 registers per function (r0-r255)
  • Type tags: Each register has a static type (i32, i64, f32, f64, bool, ptr, table_ref, node_ref)
  • Type validation: Instruction decoder verifies operand types match instruction requirements
  • Special registers:
    • r0: Always contains integer zero (read-only)
    • r1: Function return value
    • r2-r15: Argument passing registers
    • r16-r255: General purpose

Type System

Scalar types: i8, i16, i32, i64, f32, f64, bool
Reference types: ptr (raw pointer), string (UTF-8)
Table types: table_ref (reference to FlatBuffer table)
Dataflow types: node_ref (reference to dataflow node)

3. Procedural Language: Typed, C-like

Decision: Procedural layer uses a typed, C-like imperative language.

Rationale

  • Familiarity: C-like syntax reduces learning curve
  • Explicit control: Imperative style gives developers clear execution model
  • Type safety: Static typing catches errors early
  • Performance: Predictable code generation without dynamic dispatch
  • Interop: Natural mapping to C host integration

Language Characteristics

  • Static typing: All variables and function signatures are typed
  • Imperative control flow: if/else, while, for, break, continue, return
  • Functions: First-class functions with typed signatures
  • No implicit conversions: Explicit casts required (safety over convenience)
  • No garbage collection: Manual memory is host-managed via arenas
  • No classes/OOP: Procedural only, structs for data aggregation

Syntax Example (Illustrative)

fn calculate_revenue(orders: table_ref) -> f64 {
    i64 count = 0;
    f64 total = 0.0;

    // Create dataflow pipeline
    node_ref filtered = filter(orders, "status == 'completed'");
    node_ref projected = project(filtered, ["amount"]);

    // Execute and materialize
    table_ref results = materialize(projected);

    // Procedural aggregation (could also use dataflow aggregate)
    for (i64 i = 0; i < table_row_count(results); i++) {
        f64 amount = table_get_f64(results, i, 0);
        total = total + amount;
        count = count + 1;
    }

    return total;
}

4. Table Storage: Columnar Layout

Decision: Tables use columnar storage with typed columns.

Rationale

  • Analytical performance: Columnar storage is optimal for OLAP-style queries
  • Compression: Column-wise compression is more effective (similar values)
  • Cache efficiency: Scanning a single column has better cache locality
  • SIMD-friendly: Contiguous typed arrays enable vectorization
  • Projection efficiency: Selecting subset of columns avoids reading unused data

Column Layout in FlatBuffers

Table {
    schema: Schema;           // Column names and types
    row_count: u64;
    columns: [Column];        // Array of column buffers
}

Column {
    name: string;
    type: ColumnType;         // i32, i64, f32, f64, bool, string
    nullable: bool;
    null_bitmap: [u8];        // Bit-packed null flags
    data: [u8];               // Typed data buffer
}

ColumnType enum {
    Int8, Int16, Int32, Int64,
    Float32, Float64,
    Boolean,
    String,                   // Variable-length, stored as offsets + data
}

String Column Representation

Strings require special handling in columnar format:

StringColumn {
    offsets: [u32];           // Length = row_count + 1
    data: [u8];               // Concatenated UTF-8 bytes
}
// String at row i: data[offsets[i]..offsets[i+1]]

Null Handling

  • Null bitmap: 1 bit per row (0 = null, 1 = valid)
  • Packed into byte array: null_bitmap[(row / 8)] & (1 << (row % 8))
  • Non-nullable columns omit null_bitmap

5. Dataflow Execution: Pull-Based Lazy Evaluation

Decision: Dataflow nodes execute lazily using pull-based evaluation on a DAG.

Rationale

  • Memory efficiency: Only materialize results when needed
  • Composition: Build complex pipelines without intermediate materialization
  • Optimization opportunities: Entire DAG visible before execution, enables fusion
  • Natural fit with DAG: Pull model traverses DAG from sinks to sources
  • Host control: Host decides when to pull results (explicit materialize())

Execution Model

Build Phase (Lazy)

// These operations only construct the DAG, no execution yet
node_ref scan = create_scan_node("sales_data");
node_ref filter = create_filter_node(scan, "revenue > 1000");
node_ref project = create_project_node(filter, ["customer", "revenue"]);
// DAG exists, but no data has been processed

Execution Phase (Pull)

// Host explicitly requests materialization
table_ref result = materialize(project);
// Now the executor walks the DAG backwards:
// 1. project pulls from filter
// 2. filter pulls from scan
// 3. scan reads source table
// 4. filter evaluates predicate on each row
// 5. project selects columns
// 6. Result materialized to new FlatBuffer

DAG Interpreter Design

Node Interface

Every dataflow node implements:

typedef struct DataflowNode {
    NodeType type;
    NodeID id;
    InputRefs inputs[];       // Parent nodes in DAG
    Schema output_schema;
    void* operator_state;     // Operator-specific params
} DataflowNode;

// Pull interface (iterator pattern)
typedef struct NodeIterator {
    DataflowNode* node;
    void* state;              // Iteration state
    bool (*next)(NodeIterator* self, Row* out);
    void (*reset)(NodeIterator* self);
} NodeIterator;

Pull Execution

// Pseudocode for pull-based execution
Iterator materialize(DataflowNode* sink) {
    Iterator it = create_iterator(sink);
    FlatBufferBuilder builder = create_builder(arena);

    Row row;
    while (it.next(&row)) {
        append_row_to_builder(builder, row);
    }

    return finalize_builder(builder);
}

// Each operator implements next() by pulling from inputs
bool filter_next(FilterIterator* self, Row* out) {
    while (self->input.next(out)) {
        if (evaluate_predicate(self->predicate, out)) {
            return true;  // Found matching row
        }
    }
    return false;  // No more rows
}

DAG Representation in FlatBuffers

DataflowGraph {
    nodes: [DataflowNode];
    topological_order: [NodeID];  // Pre-computed for validation
}

DataflowNode {
    id: NodeID;
    operator: Operator;
    inputs: [NodeID];             // References to parent nodes
    parameters: OperatorParams;   // Operator-specific config
}

Operator union {
    ScanOp,
    FilterOp,
    ProjectOp,
    MapOp,
    JoinOp,
    AggregateOp,
    SortOp,
}

FilterOp {
    predicate: Expression;        // Expression tree
}

ProjectOp {
    columns: [u32];               // Column indices to keep
}

Optimization Opportunities

Lazy evaluation enables optimizations before execution:

  • Predicate pushdown: Move filters closer to scans
  • Projection pushdown: Only read needed columns from source
  • Operator fusion: Combine filter→project into single pass
  • Common subexpression elimination: Reuse shared subtrees

Design Synergies

These five decisions reinforce each other:

Arena + FlatBuffers

  • Arenas hold immutable FlatBuffer regions
  • Building a FlatBuffer = appending to arena
  • Completed computation = drop arena

Typed Registers + Columnar Storage

  • Register type table_ref holds reference to columnar table
  • Extracting column = typed array access (no type checking)
  • Pushing value to column = typed append

C-like Language + Static Types

  • Function signatures map to register types
  • Type checker validates before execution
  • No runtime surprises

Lazy Evaluation + DAG

  • DAG structure enables analysis before execution
  • Pull model = explicit materialization points
  • Host controls when computation happens

Columnar + Pull Execution

  • Pull one row at a time across columns
  • SIMD processes column chunks when beneficial
  • Cache-friendly sequential access patterns

Implementation Priorities

Given these decisions, the implementation should proceed:

  1. Phase 1A: Type system and arena allocator

    • Define type enum and type descriptors
    • Implement arena allocator (bump pointer)
    • Create FlatBuffer schemas for types
  2. Phase 1B: Columnar table schema

    • Define Column and Table FlatBuffer schemas
    • Implement column builders (for each type)
    • Add null bitmap handling
  3. Phase 2A: Typed register file

    • Define RegisterFile FlatBuffer schema
    • Implement typed register access functions
    • Create register type validator
  4. Phase 2B: Basic instruction set

    • Define typed instructions (ADD_I32, LOAD_F64, etc.)
    • Implement instruction decoder
    • Create simple interpreter loop
  5. Phase 3A: Dataflow DAG representation

    • Define DataflowNode schema
    • Create DAG builder API
    • Implement topological sort validation
  6. Phase 3B: Pull-based iterator interface

    • Define iterator protocol
    • Implement basic operators (Scan, Filter, Project)
    • Create materialization function
  7. Phase 4: Language frontend (optional, can use direct bytecode generation initially)

    • Design C-like syntax
    • Implement parser
    • Create type checker
    • Generate typed bytecode

Open Questions (Lower Priority)

With core decisions made, these can be deferred:

  • SIMD strategy: Which intrinsics? Portable fallback?
  • String interning: Should duplicate strings be deduplicated?
  • Operator fusion: Compile-time or runtime?
  • Expression JIT: Should hot expressions be compiled?
  • Parallelization: Can operators run on multiple threads?

These can be addressed during implementation or optimization phases.

Tags: