Theta-lang: Feedback Recommendations
This document summarizes recommendations for several foundational design questions, with pros and cons for each option. These are based on the design and implementation plan for the Theta VM.
1. Schema Organization: One Big .fbs File vs. Multiple Files
Recommendation: Use multiple .fbs files, organized by subsystem (types, memory, values, instructions, program, execution, operators, dataflow, tables, integration, stdlib).
Pros:
- Easier to maintain and evolve schemas independently
- Clear separation of concerns; reduces merge conflicts
- Enables targeted regeneration of bindings
- Facilitates schema versioning and compatibility
Cons:
- Slightly more complex build process (need to include multiple files)
- Cross-file references require careful management
2. String Handling: Interning vs. Direct Storage; UTF-8 Enforcement
Recommendation: Store strings as UTF-8 in FlatBuffers, enforce UTF-8 validity at schema boundaries. Consider optional string interning for frequently repeated values (e.g., column names, identifiers).
Pros:
- UTF-8 is FlatBuffers’ default and widely supported
- Direct storage is simple and fast for most cases
- Interning can reduce memory for repeated strings
Cons:
- Interning adds complexity (need a global pool, lifetime management)
- Enforcing UTF-8 may require validation utilities
3. Buffer Replacement: Rebuild ExecutionState Every Instruction vs. Batched Updates
Recommendation: Use batched updates—rebuild ExecutionState only at control boundaries (function call/return, materialize, etc.), not every instruction.
Pros:
- Reduces FlatBuffer builder overhead
- Improves performance for tight loops
- Allows host inspection at meaningful points
Cons:
- VM state may be transiently out of sync with host view
- Requires careful definition of update boundaries
4. Calling Convention: Argument Registers and Stack Fallback
Recommendation: Use a fixed set of argument registers (e.g., r2–r15 for up to 14 args), with stack fallback for additional arguments.
Pros:
- Fast access for common cases (few arguments)
- Stack fallback supports arbitrarily large signatures
- Matches C-like conventions
Cons:
- Stack management adds complexity
- Need to define register/stack mapping clearly
5. Branch Encoding: Absolute Offsets vs. Relative Jumps
Recommendation: Use absolute offsets for branch targets in instruction encoding.
Pros:
- Easier to decode and validate
- More robust to code motion and optimization
- Simplifies disassembly and debugging
Cons:
- Slightly larger encoding for each branch
- Code relocation requires offset adjustment
6. Instruction Size: Fixed 32-bit vs. Variable-Length
Recommendation: Use fixed-size (e.g., 32-bit) instructions for the core ISA.
Pros:
- Fast decoding and predictable memory access
- SIMD-friendly and cache-efficient
- Simplifies instruction fetch and dispatch
Cons:
- May waste space for simple instructions
- Complex instructions may need multiple slots or extension
7. Processing Model: Pure Streaming vs. Batching
Recommendation: Support both streaming (pull-based iterators) and batching (materialize entire table), with streaming as the default for dataflow.
Pros:
- Streaming enables low-latency, memory-efficient execution
- Batching is useful for host extraction and bulk operations
- Flexible for different workloads
Cons:
- Dual model adds implementation complexity
- Need to define clear API for switching modes
8. Thread Safety: Single-Threaded vs. Concurrent Access
Recommendation: Start with single-threaded VM core; design FlatBuffer regions and arenas to allow concurrent host access (read-only) and future multi-threaded extensions.
Pros:
- Simpler initial implementation
- FlatBuffers’ immutability enables safe concurrent reads
- Can evolve to multi-threaded execution later
Cons:
- No parallel execution in initial version
- Must document thread safety guarantees for host integration
Next Steps
- Begin schema design using multiple .fbs files as outlined in the plan
- Implement UTF-8 validation utilities and consider string interning for identifiers
- Define update boundaries for ExecutionState
- Specify calling convention and register/stack mapping
- Document branch encoding and instruction format
- Design streaming/batching API for dataflow
- Clarify thread safety in host API documentation