Theta-lang: VM Design and Implementation

Design Decisions Implementation Plan Developer Feedback & Questions Recommendations

This page consolidates all technical documentation, design decisions, implementation plans, developer feedback, and recommendations for the theta-lang virtual machine project. The content below is organized into major sections for clarity. Subpages may be created for deeper dives if needed.

1. Overview

Theta-lang is a new virtual machine (VM) designed for embeddable, high-performance procedural and dataflow computation, with all runtime state and data represented using FlatBuffers. The VM is intended to be host-friendly, introspectable, and suitable for analytics, scripting, and integration with C, Rust, Python, and Go.

2. Core Design Decisions

Memory Model: Arena Allocation

  • Arena-based allocation for FlatBuffer regions
  • Predictable lifecycle, zero fragmentation, fast O(1) allocation
  • Immutability: arenas are dropped as a unit, no GC or refcounting

Register Architecture: Typed Registers

  • 256 statically typed registers per function (r0–r255)
  • Type tags: i8, i16, i32, i64, f32, f64, bool, ptr, string, table_ref, node_ref
  • r0: constant zero, r1: return value, r2–r15: arguments, r16–r255: general purpose

Procedural Language: Typed, C-like

  • Statically typed, imperative, C-like syntax
  • No implicit conversions, no GC, no OOP
  • Control flow: if/else, while, for, break, continue, return

Table Storage: Columnar Layout

  • Tables use columnar storage with typed columns
  • Null bitmap for nullable columns, special handling for strings
  • Optimized for OLAP queries, SIMD, and cache locality

Dataflow Execution: Pull-Based Lazy Evaluation

  • Dataflow nodes form a DAG, executed lazily via pull-based iterators
  • Host triggers materialization, enabling optimization and memory efficiency

3. Implementation Plan

Phased approach:

  • Phase 1: FlatBuffer schema design (types, memory, values, instructions, program, execution, operators, dataflow, tables, integration, stdlib)
  • Phase 2: Procedural interpreter core (typed instructions, register file, call stack)
  • Phase 3: Dataflow engine (operators, DAG, iterators, materialization)
  • Phase 4: Integration layer (procedural-dataflow bridge, buffer management)
  • Phase 5: Host API and bindings (C, Python, Rust, Go)
  • Phase 6+: Advanced operators, safety, tooling, standard library, optimization

4. Developer Feedback & Open Questions

A comprehensive set of clarifying questions and feedback from developers is maintained to guide implementation and avoid costly refactoring. Topics include:

  • FlatBuffer schema evolution and organization
  • Buffer ownership and lifecycle
  • Calling conventions and control flow encoding
  • Instruction encoding density
  • Streaming vs. batch processing
  • Thread safety and host API abstraction
  • Error handling, debugging, and profiling
  • Security, testing, and deployment

5. Recommendations

  • Use multiple .fbs files for schema organization
  • Store strings as UTF-8, consider interning for repeated values
  • Batched updates for ExecutionState, not per instruction
  • Argument registers (r2–r15) with stack fallback
  • Absolute offsets for branch targets
  • Fixed-size instructions (32-bit) for core ISA
  • Support both streaming and batching in dataflow
  • Start with single-threaded VM core, allow concurrent host access to FlatBuffers

6. Detailed Design Documents

6.1 Design Decisions

See Design Decisions for rationale and technical implications.

6.2 Implementation Plan

See Implementation Plan for phased deliverables and timeline.

6.3 Developer Questions

See Developer Feedback & Open Questions for all clarifications and priorities.

6.4 Recommendations

See Recommendations for architectural choices and next steps.

7. Next Steps

  • Review and validate plan with stakeholders
  • Begin schema design and initial interpreter prototype
  • Iterate based on feedback and implementation learnings

This page is a living document. For deeper technical details, see the subpages or contact the development team.

Tags: