Theta-lang: Implementation Plan: FlatBuffer-Backed Procedural Dataflow VM
Project: Theta Virtual Machine
Target Language: C11 Architecture: Three-layer VM with procedural execution, dataflow engine, and FlatBuffer-backed shared memory
Core Design Decisions:
- Memory Model: Arena allocation with bump-pointer allocators
- Register Architecture: 256 typed registers (r0-r255)
- Language: Statically typed, C-like procedural language
- Storage: Columnar table layout with typed columns
- Execution: Pull-based lazy evaluation with DAG interpreter
Phase 1: Foundation - FlatBuffers Schema Definition
1.1 Core Type System Schema
- Define primitive type enumeration (i8, i16, i32, i64, f32, f64, bool, string)
- Define reference types (ptr, table_ref, node_ref)
- Create type descriptor tables with nullability, stride, and alignment metadata
- Design schema versioning strategy
1.2 Memory Layout Schema (Arena-Based)
- Define Arena table (base pointer, current offset, capacity, generation ID)
- Define ArenaPool table for managing multiple arena generations
- Create buffer metadata (arena reference, offset, size)
- Design arena lifecycle tracking (creation, active, retired, freed)
1.3 Value Representation Schema
- Define Value union type (scalar, table reference, dataflow node reference)
- Create ScalarValue table (type tag, raw bytes)
- Design TableValue table (schema reference, row count, column references)
- Define DataflowNodeReference (node ID, output port)
1.4 Schema Evolution Strategy
- Implement schema version numbering
- Design backward compatibility layer for schema changes
- Create validation utilities for schema conformance
Deliverables:
schemas/types.fbs- Type system definitionsschemas/memory.fbs- Memory management structuresschemas/values.fbs- Runtime value representationsschemas/versioning.fbs- Schema evolution support
Phase 2: Procedural Layer Foundation
2.1 Instruction Set Architecture (Typed Instructions)
- Define typed instruction enumeration (LOAD_I32, STORE_F64, ADD_I64, etc.)
- Create Instruction table (opcode, typed operands)
- Design register addressing (r0-r255 with type tags)
- Define control flow instructions (BR, BR_IF, CALL, RET)
- Implement type validation for instruction operands
2.2 Program Representation
- Define Function table (name, typed parameters, typed locals, typed return, instruction sequence)
- Create Module table (function list with signatures, typed constant pool, import/export tables)
- Design ConstantPool table (typed constants: scalar, string, table_ref)
- Implement function signature validation (type checking at load time)
2.3 Execution State Schema (Typed Register File)
- Define TypedRegister table (register number, type tag, value union)
- Create RegisterFile table (256 registers: r0-r255)
- r0: constant zero (read-only)
- r1: function return value
- r2-r15: argument passing
- r16-r255: general purpose
- Design CallFrame table (return address, base register, saved register snapshot)
- Define CallStack table (array of call frames, frame pointer)
- Create ExecutionState table (program counter, typed register file, call stack, current arena)
2.4 C11 Interpreter Core
- Implement instruction decoder (FlatBuffer accessor → operation)
- Create register file manipulator (load/store operations)
- Build call stack manager (push/pop frames)
- Implement control flow executor (branches, loops)
- Design error handling model (trap codes, error propagation)
Deliverables:
schemas/instructions.fbs- Instruction set schemaschemas/program.fbs- Program representationschemas/execution.fbs- Runtime execution statesrc/interpreter/decoder.c- Instruction decodersrc/interpreter/executor.c- Execution enginesrc/interpreter/callstack.c- Call stack management
Phase 3: Dataflow Layer Foundation
3.1 Operator Schema Definition
- Define operator enumeration (SCAN, FILTER, PROJECT, MAP, REDUCE, JOIN, AGGREGATE, SORT)
- Create Operator table (operator type, parameters, metadata)
- Design operator parameter encoding (predicate expressions, projection lists, join conditions)
3.2 Dataflow Graph Schema (DAG)
- Define DataflowNode table (node ID, operator, input node references, output schema)
- Create DataflowGraph table (node list, topological ordering for validation)
- Design node dependency tracking (enforce DAG constraint, detect cycles)
- Implement graph validation (no cycles, valid input references)
3.3 Table Schema Definition (Columnar Layout)
- Define Column table:
- name: string
- type: ColumnType (i8, i16, i32, i64, f32, f64, bool, string)
- nullable: bool
- null_bitmap: [u8] (bit-packed, 1 bit per row)
- data: [u8] (typed data buffer or string offsets+data)
- Create Table table (schema, column array, row count)
- Design StringColumn (offsets array + concatenated UTF-8 data)
- Implement schema inference utilities
3.4 Dataflow Executor (Pull-Based Iterator Model)
- Define NodeIterator interface (next(), reset() operations)
- Implement operator iterator dispatch (node type → iterator constructor)
- Create SCAN iterator (read from columnar FlatBuffer table)
- Implement FILTER iterator (pull from input, evaluate predicate, pass matching rows)
- Build PROJECT iterator (pull from input, select/reorder columns)
- Design MAP iterator (pull from input, apply transformation per row)
- Implement materialize() function (pull all rows, build new FlatBuffer in arena)
3.5 Expression Evaluation
- Define expression tree schema (binary ops, unary ops, literals, column references)
- Implement expression evaluator (recursive descent on expression tree)
- Create predicate evaluator for FILTER operations
Deliverables:
schemas/operators.fbs- Dataflow operator definitionsschemas/dataflow.fbs- Dataflow graph structureschemas/tables.fbs- Table and column schemasschemas/expressions.fbs- Expression treessrc/dataflow/executor.c- Operator executorsrc/dataflow/operators/- Individual operator implementationssrc/dataflow/expressions.c- Expression evaluator
Phase 4: Integration Layer
4.1 Procedural-Dataflow Bridge
- Design instruction set for dataflow operations (CREATE_NODE, EXECUTE_NODE, MATERIALIZE)
- Implement dataflow node creation from procedural code
- Create mechanism to pass dataflow results to registers
- Build reference counting for dataflow node lifecycle
4.2 FlatBuffer Builder Integration
- Create wrapper for FlatBuffers builder API
- Implement result materialization (dataflow output → new FlatBuffer)
- Design buffer pool management (allocation, reuse, freeing)
- Build buffer versioning layer (maintain multiple generations)
4.3 Immutability Enforcement
- Implement copy-on-write semantics for modified data
- Design buffer snapshot mechanism
- Create buffer lineage tracking (derivation chains)
Deliverables:
src/integration/bridge.c- Procedural-dataflow bridgesrc/integration/builder.c- FlatBuffer builder wrappersrc/integration/buffers.c- Buffer pool managerschemas/integration.fbs- Integration layer schemas
Phase 5: Host Interoperability
5.1 Host API Definition
- Define VM initialization API (
vm_create,vm_destroy) - Create program loading API (
vm_load_module,vm_unload_module) - Design execution API (
vm_execute_function,vm_step,vm_resume) - Implement state inspection API (
vm_get_state,vm_get_register,vm_get_dataflow_graph) - Build result extraction API (
vm_get_result_table,vm_materialize_node)
5.2 Zero-Copy Access Guarantees
- Document buffer lifetime semantics
- Implement buffer pinning mechanism (prevent premature freeing)
- Create buffer accessor utilities for host languages
- Design error propagation to host (error codes, structured error objects)
5.3 Host Language Bindings
- Generate C header files from FlatBuffer schemas
- Create Python binding layer (using FlatBuffers Python library)
- Implement Rust binding layer (using FlatBuffers Rust library)
- Design Go binding layer (using FlatBuffers Go library)
Deliverables:
include/theta_vm.h- Public C APIbindings/python/- Python host bindingsbindings/rust/- Rust host bindingsbindings/go/- Go host bindingsdocs/host_integration.md- Integration guide
Phase 6: Advanced Dataflow Operations
6.1 Join Operators
- Implement hash join (build hash table, probe)
- Create nested loop join (fallback for non-equi joins)
- Design sort-merge join (for sorted inputs)
6.2 Aggregation Operators
- Implement GROUP BY execution (hash-based grouping)
- Create aggregate functions (SUM, COUNT, AVG, MIN, MAX)
- Design streaming aggregation for memory efficiency
6.3 Sort Operator
- Implement multi-column sort with collation
- Create external sort for large datasets (spill to disk)
6.4 Query Optimization (Pre-Execution)
- Implement predicate pushdown (move filters closer to scans)
- Create projection pushdown (only read needed columns from source)
- Design operator fusion (combine filter→project→map into single pass)
- Implement common subexpression elimination (reuse shared DAG subtrees)
- Note: Lazy pull-based evaluation is already implemented in Phase 3.4
Deliverables:
src/dataflow/operators/join.c- Join implementationssrc/dataflow/operators/aggregate.c- Aggregation operatorssrc/dataflow/operators/sort.c- Sort operatorsrc/dataflow/lazy_executor.c- Lazy evaluation enginesrc/dataflow/optimizer.c- Query optimization
Phase 7: Safety and Correctness
7.1 Memory Safety
- Implement bounds checking for all buffer accesses
- Create assertion framework for invariant checking
- Design buffer validity verification (detect corruption)
- Build memory leak detection utilities
7.2 Determinism Validation
- Create reproducibility test suite (same input → same output)
- Implement instruction execution tracing
- Design regression testing framework
7.3 Platform Portability
- Test on x86-64, ARM64, RISC-V architectures
- Validate endianness handling
- Test alignment requirement compliance
- Verify float/double precision consistency
7.4 Error Handling
- Define error code enumeration (type errors, bounds errors, execution errors)
- Create structured error reporting (error message, location, stack trace)
- Implement graceful degradation (trap and unwind vs. abort)
Deliverables:
src/safety/bounds_check.c- Bounds checking utilitiessrc/safety/validator.c- Buffer validationtests/determinism/- Determinism test suitetests/portability/- Cross-platform testssrc/errors/- Error handling infrastructure
Phase 8: Tooling and Developer Experience
8.1 VM Introspection Tools
- Build execution state dumper (print current VM state)
- Create dataflow graph visualizer (DOT format export)
- Implement instruction disassembler (bytecode → human-readable)
- Design execution tracer (log each instruction)
8.2 Debugging Support
- Implement breakpoint mechanism
- Create single-step execution mode
- Build register watch facility
- Design dataflow node inspection utilities
8.3 Performance Analysis
- Create execution profiler (instruction counts, operator timings)
- Implement memory usage analyzer
- Build dataflow bottleneck detector
8.4 Documentation
- Write architecture overview documentation
- Create FlatBuffer schema documentation
- Document instruction set reference
- Write operator semantics guide
- Create host integration tutorial
Deliverables:
tools/vm_inspect- State inspection tooltools/df_visualizer- Dataflow graph visualizertools/disassembler- Bytecode disassemblertools/profiler- Performance profilerdocs/architecture.md- Architecture documentationdocs/instruction_set.md- ISA referencedocs/operators.md- Dataflow operator reference
Phase 9: Standard Library and Built-ins
9.1 Built-in Functions
- Implement string manipulation functions
- Create mathematical functions (sin, cos, sqrt, log, exp)
- Design date/time manipulation functions
- Build type conversion functions
9.2 I/O Operations
- Design file reading operators (CSV, JSON, Parquet)
- Implement table serialization (export to formats)
- Create streaming I/O for large datasets
9.3 Standard Dataflow Patterns
- Create common query templates (group-aggregate, window functions)
- Implement reusable transformation pipelines
Deliverables:
src/stdlib/- Standard library implementationssrc/io/- I/O operation implementationsschemas/stdlib.fbs- Standard library schemasdocs/stdlib.md- Standard library reference
Phase 10: Optimization and Production Readiness
10.1 Performance Optimization
- Implement SIMD vectorization for hot paths (columnar operations, filters)
- Create instruction dispatch optimization (computed goto, threaded code)
- Optimize FlatBuffer access patterns (cache locality, prefetching)
- Tune arena allocation parameters (arena size, growth strategy)
10.2 Benchmarking
- Create benchmark suite (micro-benchmarks, end-to-end scenarios)
- Compare against Lua (procedural execution speed)
- Compare against DuckDB (dataflow operation speed)
- Measure memory overhead
10.3 Production Hardening
- Implement comprehensive error recovery
- Create resource limitation enforcement (memory caps, execution timeouts)
- Design security sandboxing (restrict host access)
- Build graceful shutdown mechanism
10.4 Release Engineering
- Create build system (CMake or Meson)
- Set up continuous integration
- Implement versioning and release process
- Design upgrade path for schema evolution
Deliverables:
- Optimized production build
- Benchmark results and analysis
- Security audit report
- Release artifacts (binaries, headers, documentation)
Dependencies and Prerequisites
Required Tools
- C11 compliant compiler (GCC 4.9+, Clang 3.4+, MSVC 2015+)
- FlatBuffers compiler (
flatc) - CMake 3.15+ or Meson build system
- Python 3.8+ (for tooling and bindings)
External Libraries
- FlatBuffers C library
- Testing framework (e.g., Check, Unity, or cmocka)
Risk Assessment
Technical Risks
FlatBuffer performance overhead - Accessor indirection may slow hot paths
- Mitigation: Benchmark early, optimize critical sections with SIMD, cache hot paths
Arena memory overhead - Long-running computations may accumulate large arenas
- Mitigation: Implement arena compaction, allow arena reuse for similar-sized allocations
Determinism on edge cases - Floating-point operations may vary across platforms
- Mitigation: Use strict FP modes, document platform requirements
Schema evolution complexity - Breaking changes may complicate upgrades
- Mitigation: Version all schemas, maintain compatibility layer
Project Risks
Scope creep - Feature additions may delay core implementation
- Mitigation: Strict adherence to phased plan, defer non-essential features
Documentation debt - Complex design may be hard to document
- Mitigation: Document schemas and APIs during development, not after
Success Criteria
- VM can execute procedural scripts with Lua-like semantics
- Dataflow operations match DuckDB’s expressiveness for core operators
- Host languages can access VM state without copying memory
- All FlatBuffer regions remain valid and accessible to host
- Execution is deterministic across supported platforms
- Memory safety guaranteed (no UB, no leaks in normal operation)
- Performance within 2x of native C for procedural code
- Performance within 3x of DuckDB for dataflow operations
Timeline Estimation (Rough Order)
- Phase 1: 2-3 weeks (schema design is iterative)
- Phase 2: 4-6 weeks (interpreter core is complex)
- Phase 3: 4-6 weeks (dataflow engine requires careful design)
- Phase 4: 2-3 weeks (integration layer)
- Phase 5: 3-4 weeks (host bindings for multiple languages)
- Phase 6: 4-5 weeks (advanced operators are intricate)
- Phase 7: 3-4 weeks (safety testing is thorough)
- Phase 8: 2-3 weeks (tooling development)
- Phase 9: 2-3 weeks (standard library)
- Phase 10: 3-4 weeks (optimization and hardening)
Total estimated effort: 29-41 weeks (single developer, full-time)
For a team of 2-3 developers, overlapping phases could reduce calendar time to 6-9 months.
Next Steps
- Review and validate this plan with stakeholders
- Set up project repository and build infrastructure
- Begin Phase 1: Design and iterate on FlatBuffer schemas
- Create minimal working prototype (simple procedural interpreter)
- Iterate on design based on early implementation learnings