Theta-lang: Core Design Decisions
Design Choices
These fundamental design decisions have been confirmed and should guide all implementation work.
1. Memory Model: Arena Allocation
Decision: Use arena-based memory allocation for FlatBuffer regions.
Rationale
- Perfect fit with FlatBuffers: Append-only FlatBuffer building aligns naturally with arena allocation
- Predictable lifecycle: Entire arena can be freed at once when a computation completes
- Zero fragmentation: Sequential allocation eliminates memory fragmentation
- Fast allocation: Bump-pointer allocation is O(1)
- Immutability enforcement: Once built, FlatBuffer regions are immutable - arenas reinforce this
Implementation Implications
- VM maintains multiple arenas for different buffer generations
- Each major computation phase (loading module, executing function, materializing dataflow result) gets its own arena
- Host can retain references to old arenas even after VM creates new ones
- Cleanup is simple: drop entire arena when no references remain
- No need for garbage collection or reference counting within an arena
Memory Layout Strategy
Arena 1: Program module (immutable, lives for VM lifetime)
Arena 2: Execution state generation N (replaced on each instruction batch)
Arena 3: Dataflow intermediate results (replaced on materialization)
Arena 4: Final results (retained by host, freed when host releases)
2. Register Architecture: Typed Registers
Decision: Registers are statically typed; type checking occurs at module load time.
Rationale
- Performance: No runtime type checking overhead in hot loop
- Safety: Catch type errors before execution
- C-like semantics: Matches procedural C language model
- Optimization: Typed registers enable better code generation and JIT opportunities
- Columnar alignment: Typed registers map naturally to typed columns in dataflow
Register File Design
- Fixed-size register file: 256 registers per function (r0-r255)
- Type tags: Each register has a static type (i32, i64, f32, f64, bool, ptr, table_ref, node_ref)
- Type validation: Instruction decoder verifies operand types match instruction requirements
- Special registers:
r0: Always contains integer zero (read-only)r1: Function return valuer2-r15: Argument passing registersr16-r255: General purpose
Type System
Scalar types: i8, i16, i32, i64, f32, f64, bool
Reference types: ptr (raw pointer), string (UTF-8)
Table types: table_ref (reference to FlatBuffer table)
Dataflow types: node_ref (reference to dataflow node)
3. Procedural Language: Typed, C-like
Decision: Procedural layer uses a typed, C-like imperative language.
Rationale
- Familiarity: C-like syntax reduces learning curve
- Explicit control: Imperative style gives developers clear execution model
- Type safety: Static typing catches errors early
- Performance: Predictable code generation without dynamic dispatch
- Interop: Natural mapping to C host integration
Language Characteristics
- Static typing: All variables and function signatures are typed
- Imperative control flow: if/else, while, for, break, continue, return
- Functions: First-class functions with typed signatures
- No implicit conversions: Explicit casts required (safety over convenience)
- No garbage collection: Manual memory is host-managed via arenas
- No classes/OOP: Procedural only, structs for data aggregation
Syntax Example (Illustrative)
fn calculate_revenue(orders: table_ref) -> f64 {
i64 count = 0;
f64 total = 0.0;
// Create dataflow pipeline
node_ref filtered = filter(orders, "status == 'completed'");
node_ref projected = project(filtered, ["amount"]);
// Execute and materialize
table_ref results = materialize(projected);
// Procedural aggregation (could also use dataflow aggregate)
for (i64 i = 0; i < table_row_count(results); i++) {
f64 amount = table_get_f64(results, i, 0);
total = total + amount;
count = count + 1;
}
return total;
}
4. Table Storage: Columnar Layout
Decision: Tables use columnar storage with typed columns.
Rationale
- Analytical performance: Columnar storage is optimal for OLAP-style queries
- Compression: Column-wise compression is more effective (similar values)
- Cache efficiency: Scanning a single column has better cache locality
- SIMD-friendly: Contiguous typed arrays enable vectorization
- Projection efficiency: Selecting subset of columns avoids reading unused data
Column Layout in FlatBuffers
Table {
schema: Schema; // Column names and types
row_count: u64;
columns: [Column]; // Array of column buffers
}
Column {
name: string;
type: ColumnType; // i32, i64, f32, f64, bool, string
nullable: bool;
null_bitmap: [u8]; // Bit-packed null flags
data: [u8]; // Typed data buffer
}
ColumnType enum {
Int8, Int16, Int32, Int64,
Float32, Float64,
Boolean,
String, // Variable-length, stored as offsets + data
}
String Column Representation
Strings require special handling in columnar format:
StringColumn {
offsets: [u32]; // Length = row_count + 1
data: [u8]; // Concatenated UTF-8 bytes
}
// String at row i: data[offsets[i]..offsets[i+1]]
Null Handling
- Null bitmap: 1 bit per row (0 = null, 1 = valid)
- Packed into byte array:
null_bitmap[(row / 8)]&(1 << (row % 8)) - Non-nullable columns omit null_bitmap
5. Dataflow Execution: Pull-Based Lazy Evaluation
Decision: Dataflow nodes execute lazily using pull-based evaluation on a DAG.
Rationale
- Memory efficiency: Only materialize results when needed
- Composition: Build complex pipelines without intermediate materialization
- Optimization opportunities: Entire DAG visible before execution, enables fusion
- Natural fit with DAG: Pull model traverses DAG from sinks to sources
- Host control: Host decides when to pull results (explicit
materialize())
Execution Model
Build Phase (Lazy)
// These operations only construct the DAG, no execution yet
node_ref scan = create_scan_node("sales_data");
node_ref filter = create_filter_node(scan, "revenue > 1000");
node_ref project = create_project_node(filter, ["customer", "revenue"]);
// DAG exists, but no data has been processed
Execution Phase (Pull)
// Host explicitly requests materialization
table_ref result = materialize(project);
// Now the executor walks the DAG backwards:
// 1. project pulls from filter
// 2. filter pulls from scan
// 3. scan reads source table
// 4. filter evaluates predicate on each row
// 5. project selects columns
// 6. Result materialized to new FlatBuffer
DAG Interpreter Design
Node Interface
Every dataflow node implements:
typedef struct DataflowNode {
NodeType type;
NodeID id;
InputRefs inputs[]; // Parent nodes in DAG
Schema output_schema;
void* operator_state; // Operator-specific params
} DataflowNode;
// Pull interface (iterator pattern)
typedef struct NodeIterator {
DataflowNode* node;
void* state; // Iteration state
bool (*next)(NodeIterator* self, Row* out);
void (*reset)(NodeIterator* self);
} NodeIterator;
Pull Execution
// Pseudocode for pull-based execution
Iterator materialize(DataflowNode* sink) {
Iterator it = create_iterator(sink);
FlatBufferBuilder builder = create_builder(arena);
Row row;
while (it.next(&row)) {
append_row_to_builder(builder, row);
}
return finalize_builder(builder);
}
// Each operator implements next() by pulling from inputs
bool filter_next(FilterIterator* self, Row* out) {
while (self->input.next(out)) {
if (evaluate_predicate(self->predicate, out)) {
return true; // Found matching row
}
}
return false; // No more rows
}
DAG Representation in FlatBuffers
DataflowGraph {
nodes: [DataflowNode];
topological_order: [NodeID]; // Pre-computed for validation
}
DataflowNode {
id: NodeID;
operator: Operator;
inputs: [NodeID]; // References to parent nodes
parameters: OperatorParams; // Operator-specific config
}
Operator union {
ScanOp,
FilterOp,
ProjectOp,
MapOp,
JoinOp,
AggregateOp,
SortOp,
}
FilterOp {
predicate: Expression; // Expression tree
}
ProjectOp {
columns: [u32]; // Column indices to keep
}
Optimization Opportunities
Lazy evaluation enables optimizations before execution:
- Predicate pushdown: Move filters closer to scans
- Projection pushdown: Only read needed columns from source
- Operator fusion: Combine filter→project into single pass
- Common subexpression elimination: Reuse shared subtrees
Design Synergies
These five decisions reinforce each other:
Arena + FlatBuffers
- Arenas hold immutable FlatBuffer regions
- Building a FlatBuffer = appending to arena
- Completed computation = drop arena
Typed Registers + Columnar Storage
- Register type
table_refholds reference to columnar table - Extracting column = typed array access (no type checking)
- Pushing value to column = typed append
C-like Language + Static Types
- Function signatures map to register types
- Type checker validates before execution
- No runtime surprises
Lazy Evaluation + DAG
- DAG structure enables analysis before execution
- Pull model = explicit materialization points
- Host controls when computation happens
Columnar + Pull Execution
- Pull one row at a time across columns
- SIMD processes column chunks when beneficial
- Cache-friendly sequential access patterns
Implementation Priorities
Given these decisions, the implementation should proceed:
Phase 1A: Type system and arena allocator
- Define type enum and type descriptors
- Implement arena allocator (bump pointer)
- Create FlatBuffer schemas for types
Phase 1B: Columnar table schema
- Define Column and Table FlatBuffer schemas
- Implement column builders (for each type)
- Add null bitmap handling
Phase 2A: Typed register file
- Define RegisterFile FlatBuffer schema
- Implement typed register access functions
- Create register type validator
Phase 2B: Basic instruction set
- Define typed instructions (ADD_I32, LOAD_F64, etc.)
- Implement instruction decoder
- Create simple interpreter loop
Phase 3A: Dataflow DAG representation
- Define DataflowNode schema
- Create DAG builder API
- Implement topological sort validation
Phase 3B: Pull-based iterator interface
- Define iterator protocol
- Implement basic operators (Scan, Filter, Project)
- Create materialization function
Phase 4: Language frontend (optional, can use direct bytecode generation initially)
- Design C-like syntax
- Implement parser
- Create type checker
- Generate typed bytecode
Open Questions (Lower Priority)
With core decisions made, these can be deferred:
- SIMD strategy: Which intrinsics? Portable fallback?
- String interning: Should duplicate strings be deduplicated?
- Operator fusion: Compile-time or runtime?
- Expression JIT: Should hot expressions be compiled?
- Parallelization: Can operators run on multiple threads?
These can be addressed during implementation or optimization phases.