Theta-lang: Feedback Recommendations

This document summarizes recommendations for several foundational design questions, with pros and cons for each option. These are based on the design and implementation plan for the Theta VM.


1. Schema Organization: One Big .fbs File vs. Multiple Files

Recommendation: Use multiple .fbs files, organized by subsystem (types, memory, values, instructions, program, execution, operators, dataflow, tables, integration, stdlib).

Pros:

  • Easier to maintain and evolve schemas independently
  • Clear separation of concerns; reduces merge conflicts
  • Enables targeted regeneration of bindings
  • Facilitates schema versioning and compatibility

Cons:

  • Slightly more complex build process (need to include multiple files)
  • Cross-file references require careful management

2. String Handling: Interning vs. Direct Storage; UTF-8 Enforcement

Recommendation: Store strings as UTF-8 in FlatBuffers, enforce UTF-8 validity at schema boundaries. Consider optional string interning for frequently repeated values (e.g., column names, identifiers).

Pros:

  • UTF-8 is FlatBuffers’ default and widely supported
  • Direct storage is simple and fast for most cases
  • Interning can reduce memory for repeated strings

Cons:

  • Interning adds complexity (need a global pool, lifetime management)
  • Enforcing UTF-8 may require validation utilities

3. Buffer Replacement: Rebuild ExecutionState Every Instruction vs. Batched Updates

Recommendation: Use batched updates—rebuild ExecutionState only at control boundaries (function call/return, materialize, etc.), not every instruction.

Pros:

  • Reduces FlatBuffer builder overhead
  • Improves performance for tight loops
  • Allows host inspection at meaningful points

Cons:

  • VM state may be transiently out of sync with host view
  • Requires careful definition of update boundaries

4. Calling Convention: Argument Registers and Stack Fallback

Recommendation: Use a fixed set of argument registers (e.g., r2–r15 for up to 14 args), with stack fallback for additional arguments.

Pros:

  • Fast access for common cases (few arguments)
  • Stack fallback supports arbitrarily large signatures
  • Matches C-like conventions

Cons:

  • Stack management adds complexity
  • Need to define register/stack mapping clearly

5. Branch Encoding: Absolute Offsets vs. Relative Jumps

Recommendation: Use absolute offsets for branch targets in instruction encoding.

Pros:

  • Easier to decode and validate
  • More robust to code motion and optimization
  • Simplifies disassembly and debugging

Cons:

  • Slightly larger encoding for each branch
  • Code relocation requires offset adjustment

6. Instruction Size: Fixed 32-bit vs. Variable-Length

Recommendation: Use fixed-size (e.g., 32-bit) instructions for the core ISA.

Pros:

  • Fast decoding and predictable memory access
  • SIMD-friendly and cache-efficient
  • Simplifies instruction fetch and dispatch

Cons:

  • May waste space for simple instructions
  • Complex instructions may need multiple slots or extension

7. Processing Model: Pure Streaming vs. Batching

Recommendation: Support both streaming (pull-based iterators) and batching (materialize entire table), with streaming as the default for dataflow.

Pros:

  • Streaming enables low-latency, memory-efficient execution
  • Batching is useful for host extraction and bulk operations
  • Flexible for different workloads

Cons:

  • Dual model adds implementation complexity
  • Need to define clear API for switching modes

8. Thread Safety: Single-Threaded vs. Concurrent Access

Recommendation: Start with single-threaded VM core; design FlatBuffer regions and arenas to allow concurrent host access (read-only) and future multi-threaded extensions.

Pros:

  • Simpler initial implementation
  • FlatBuffers’ immutability enables safe concurrent reads
  • Can evolve to multi-threaded execution later

Cons:

  • No parallel execution in initial version
  • Must document thread safety guarantees for host integration

Next Steps

  • Begin schema design using multiple .fbs files as outlined in the plan
  • Implement UTF-8 validation utilities and consider string interning for identifiers
  • Define update boundaries for ExecutionState
  • Specify calling convention and register/stack mapping
  • Document branch encoding and instruction format
  • Design streaming/batching API for dataflow
  • Clarify thread safety in host API documentation
Tags: