Flash Assembler Architecture

Interactive reference for the S3 flow processor assembler. A classical three-stage compiler—parser, semantic analyzer, code generator—targeting the S3 DPU's instruction set across 8 hardware flavours.

Rust S3 DPU 3-Stage Compiler
InputSource
LexerScanner
SyntaxParser
SyntaxAST
SemanticsAnalyzer
SemanticsScheduler
CodegenEncoder
OutputArtifact
Press / to focus · Esc to clear
📄

Source Code

.flash files · entry point
Flash programs use a hardware-oriented assembly language with struct definitions, memory-mapped variables, conditional execution, and instruction handlers.
Language Constructs
ConstructSyntaxPurpose
importimport "file.flash"Transitive module inclusion (deduplicated)
constconst X = 8'd42Named immediate with explicit bit width
varvar foo = meta[7:0]Named memory-mapped variable
structstruct S { f: 8; }Bit-precise structure (no implicit padding)
unionunion U { a: 8; b: 16; }Overlapping fields at same offset
enumenum E { A, B=3 }Named constants with auto-increment
condcond c = f & 0xF == 0x14-bit condition for conditional execution
handlerhandler H { mov … }Block of grouped instructions
flowflow { mask … match … action H }Packet classification → handler mapping
targettarget S3Hardware target declaration
flavourflavour PTA_ACTIONHardware sub-configuration
Immediate Formats (Verilog-style)
  • 8'd42 — 8-bit decimal
  • 16'hFFFF — 16-bit hex
  • 4'b1010 — 4-bit binary
  • 0xFF — auto-sized hex
  • Underscores for readability: 0xFF_FF
Memory Selectors
msgcsmetaout

Three addressing modes: range meta[7:0], base+count cs[0,16], typed meta @ 0 : MyStruct.

📁

Workspace

mod.rs · multi-file management
Collects all source files via breadth-first import traversal, deduplicating by canonical path, to produce a sequence of CompilationUnits.
Key Structs
StructFieldsRole
Workspaceunits, known_paths, reportRoot container — all parsed files
CompilationUnitpath, rep_path, program, reportSingle parsed file
WorkItemcanonical_path, rep_pathBFS queue entry
Import Resolution
  • Start with seed file path from CLI
  • Canonicalize path, parse into AST
  • Extract import declarations
  • Resolve relative to importing file's directory
  • Skip already-processed paths (dedup by canonical path)
  • BFS until all transitive imports collected

Handles diamond imports correctly — if A→B and A→C both import D, file D is parsed only once.

🔍

Scanner

scanner.rs · tokenization
Character-by-character lexer producing typed tokens with source spans. Recognizes 16 keywords, multi-radix numbers, strings, comments, and operators.
Token Categories (47 types)

Keywords

condconsthandlerif nextimportstructunion varflavourtargetflow maskmatchactionenum

Operators & Punctuation

&@::: ,--.= ==#-; /' ( ){ }[ ]

Values

WordNumberString GlyphCommentNewlineEOF
Design Decisions
  • Newline-sensitive — newlines are tokens, significant in handler bodies
  • Multi-radix — 0x hex, 0o octal, 0b binary, decimal; underscore separators
  • SourceReader — tracks line/col with single-char lookahead
  • Error recovery — skips to newline or } on error
🌳

Parser

parser.rs · recursive descent
Hand-written recursive descent parser with single-token lookahead, newline sensitivity, and error recovery. Transforms the token stream into a typed AST.
Parser State
FieldTypePurpose
scannerScannerToken source
ignore_newlinesboolOff in structs, on in handlers
buffered_tokenOption<Token>Single-token lookahead
reportVec<ReportItem>Parse errors and warnings
Key Grammar Rules
FunctionGrammar
parse_program()program_item*
parse_handler_def()handler IDENT { handler_item* }
parse_instruction()OPCODE operand* (if condition)?
parse_struct_def()struct IDENT? { member* }
parse_cond_expr()base & mask == value
parse_immediate()size'radix_digits
parse_flow_decl()flow { mask M match V action A }
parse_enum_def()enum IDENT { variant* }
Error Recovery
  • recover_to_next_program_item() — skip to next keyword
  • recover_past_current_instruction() — skip to newline or brace
📐

Abstract Syntax Tree

ast.rs · 11 node types
Strongly-typed tree representing the full program. Every node carries a source Span for precise error messages. Supports 11 top-level declaration types.
ProgramItem Variants
VariantStructDescription
ConstDefname, valueNamed constant with explicit bit width
EnumDefname, variants[]Enum with auto-incrementing values
CondDefname, valueSymbolic condition expression
ObjectDefname, valueMemory-mapped variable
StructDefname?, members[]Named or anonymous struct
UnionDefname?, members[]Named or anonymous union
HandlerDefname, items[]Instruction handler block
FlowDeclmask, match, actionPacket classification rule
ImportDeclpathFile import
TargetDeclkindHardware target (S3)
FlavourDeclflavourHardware sub-configuration
Value Types
IdentIntImmediate PathAddrExprCondExpr

Instruction = opcode + operands (Op::Addr, Op::Ref, Op::Imm) + optional condition. Handler items include GroupBreak (--) and Next boundaries.

🧠

Semantic Analyzer

model.rs · 7-stage pipeline
Builds the intermediate representation through 7 sequential analysis stages. Validates correctness at each step and builds a symbol table of constants, types, objects, conditions, and handlers.
Model Fields
FieldTypePurpose
constantsHashMap<String, Definition<Imm>>Named constants & enum values
layoutsHashMap<String, ObjectLayout>Struct/union memory layouts
objectsHashMap<String, Definition<Object>>Memory-mapped variables
conditionsHashMap<String, Definition<Condition>>Symbolic conditions
handlersHashMap<String, Definition<Handler>>Compiled handlers with groups
flowsVec<Flow>Packet classification rules
target&'static TargetHardware configuration
flavourOption<FlavourDecl>Selected hardware flavour
7 Analysis Stages
#StageAction
1process_constant_definitionsEvaluate const declarations, check duplicates
2process_enum_definitionsExpand enums to qualified constants (E::V)
3process_type_definitionsCompute recursive struct/union layouts, detect cycles
4process_object_definitionsCreate objects with layout, check memory overlap
5process_condition_definitionsValidate: 4-bit aligned, 4-bit wide, meta-only
6process_handlersValidate instructions, resolve operands, group & schedule
7process_flowsValidate handler refs, resolve mask/match values

Every model artifact is wrapped in Definition<T> adding name, source_file, and span for precise error messages.

📋

Instruction Scheduler

model.rs · grouping & dependencies
Organizes instructions into execution groups respecting hardware constraints. Builds a dependency graph, performs list scheduling, and pads groups with NOPs.
Grouping Modes
ModeTriggerMechanism
Manual-- separators in sourceUser places group boundaries explicitly
AutomaticNo separatorsCompiler builds dependency graph and schedules
Dependency Graph

Nodes = instructions. Edges = RAW (Read-After-Write) data dependencies detected by checking if source operands overlap destinations via bit-range intersection.

Constraints
  • Max 8 ALU instructions per group
  • Max 7 custom instructions per group
  • Max 1 load/store per group
  • Load latency: 5 groups between load and consumer
  • NOP padding: groups padded to exactly 8 ALU slots
  • CMP destinations must be exactly 3 bits
  • All operand widths ≤ 64 bits
💾

Memory Model

memory.rs · types, chunks, overlap
Defines four memory regions, enforces alignment constraints, and uses an IntervalMap for efficient overlap detection between objects.
Memory Regions
TypeSelectorAlignPurpose
Messagemsg8-bitInput packet buffer
CScs8-bitContext store
Metameta4-bitHW + SW metadata
Outputout4-bitOutput vector
Key Structures
StructPurpose
ChunkContiguous bit range: mem_type + low_bit + high_bit
MemoryFull descriptor: type + bit_size + access_align
IntervalMap<T>Sorted intervals with binary-search overlap detection
ObjectLayoutRecursive struct layout with drill-down into nested fields
📝

Code IR

code.rs · instruction model
Intermediate representation: Handler → InstructionGroup → Instruction → Arg, ready for binary encoding by the code generator.
Hierarchy
Handler ├── name: String └── instr_groups: Vec<InstructionGroup> ├── alu_instrs: Vec<Definition<Instruction>> ├── custom_instrs: Vec<…> └── load_instrs: Vec<…> ├── operation: &'static target::Instruction ├── args: Vec<Definition<Arg>> ├── mem_refs: Vec<Definition<Arg>> └── condition: Option<Condition>
Operand Types
VariantContentExample
Arg::Addr(Chunk)Memory rangemeta[7:0]
Arg::Imm(Imm)Sized immediate8'd42

Condition { base: Chunk, mask: u8, match_value: u8 } — 4-bit condition from meta memory for conditional execution.

⚙️

S3 Encoder

encoder.rs · binary encoding
Transforms the IR into binary S3 machine code. Encodes each instruction group as a 2-byte descriptor header followed by packed ALU and load/store instruction bytes.
Encoding Pipeline
  • encode(handler) — iterate instruction groups
  • encode_instr_group() — write 2B descriptor + instruction bytes
  • alu_instruction_encoding() → 14-byte encoding
  • load/store_instruction_encoding() → 10-byte encoding
  • condition_encoding() → 14-bit condition
  • process_flows() → TCAM entries
  • write_combined_json() → JSON output with metadata
ALU Instruction (14 bytes = 112 bits)
Type4b
Func6b
Src022b
Src122b
Dest22b
Src222b
Cond14b
Load / Store (10 bytes = 80 bits)
Type4b
Rsvd6b
Base16b
Offset16b
Data16b
Cond14b
Pad8b
Composer Utility

Composer::paste(bits, count, low_bit) — places arbitrary bit-width fields at arbitrary offsets into a byte array, handling cross-byte alignment. The core primitive for all encoding.

🔢

Operand Encoding

encoder.rs · SrcOpEncoding
Each ALU instruction has 3 source operand slots (22 bits each). Memory addresses and immediates are encoded, with large immediates spanning multiple slots.
Operand Slot (22 bits)
DataType4b
Size6b
Addr/Val12b
Data Type Codes
CodeType
0b0000Message
0b0001Context Store
0b0010Metadata
0b0100Immediate
0b0111Output
0b1000Sequence (extended immediate)
Immediate Extension
  • ≤12 bits — 1 slot
  • 13–24 bits — 2 slots (src0 = low 12 + type, src1 = high 12)
  • 25–36 bits — 3 slots (all three used)
  • >36 bits — error, exceeds capacity
📦

Artifact & Output

artifact.rs · compiled output
Artifact is a dictionary of compiled handler bytecodes. The encoder also produces a JSON file with instruction memory, TCAM configs, and metadata.
Artifact Structure
Artifact └── handlers: HashMap<String, Vec<u8>> ├── "handler_A" → [0x80, 0x01, …] └── "handler_B" → [0x80, 0x03, …]
JSON Output
{ "metadata": { "version", "timestamp", "signature" }, "target": { "flavour": { "instructions": [{ flow_name, index, mem_group, alu_instructions, ls_instruction }], "tcam": [{ index, mask, match, num_inst_grp, inst_mem_addr }] } } }
TCAM Entry (CSR)

Packs to 45-byte (353-bit) register: valid[352] | mask[351:184] | match[183:16] | num_groups[15:8] | mem_addr[7:0].

🔧

Encoding Constants

constants.rs · bit fields
All magic numbers for S3 encoding: instruction type codes, field widths, bit positions, group descriptor flags, and ALU function codes.
Instruction Types
ConstantValueMeaning
INSTR_TYPE_ALU0b0000ALU instruction
INSTR_TYPE_LOAD0b0010Memory load
INSTR_TYPE_STORE0b0011Memory store
INSTR_TYPE_CUSTOM0b0100Custom instruction
Group Descriptor (16 bits)
V1
L1
H1
Rsvd5
ALU#4
Cust#3
LS1

V = Valid, L = Last group, H = Halt. ALU# = count, Cust# = custom count, LS = load/store present.

Key Sizes
ConstantValue
ALU_INSTRUCTION_SIZE14 bytes
LS_INSTRUCTION_SIZE10 bytes
ALU_OPERAND_FIELD_WIDTH22 bits
LOAD_LATENCY_GROUPS5 cycles
🎯

S3 Target

target.rs · hardware ISA
Defines the S3 flow processor's instruction set (19 ALU + 2 LS), memory layout, constraints, and 8 hardware flavours from JSON configuration.
Instruction Set (21 instructions)
CategoryOpcodes
ALU — Datamov movm add adds sub subs nop
ALU — Logicor and xor not
ALU — Shiftsll srl
ALU — Comparecmp cmpm rngcmp wrngcmp drngcmp sumcmp
Load / Storeload store
Hardware Constraints
ConstraintValue
Max ALU / group8
Max custom / group7
Max load-store / group1
Max operand width64 bits
Condition width4 bits
Issue slots8
8 Flavours
FlavourCoresTCAMInstr MemKey Feature
SGE_KEYGEN4321281 KB HW metadata
SGE_ACTION496256848-bit output
PTA_KEYGEN41696Key-as-output
PTA_ACTION432128Data memory access
PTA_ACU_KEYGEN21696Reduced cores
PTA_ACU_ACTION224128720-bit output
RNIC_KEYGEN13232Custom instructions
RNIC_ACTION16464TCAM index output
🧰

Common Infrastructure

common.rs · spans, errors
Foundation types shared across all compiler phases: source locations, spans, diagnostic notes, and the three-level reporting system.
Core Types
TypeFieldsPurpose
Locationline, col (u32)1-based source position
Spanstart, endRange of source characters
SpannedtraitAnything with source location
Notespan, messageDiagnostic with position
ReportItemError / Warning / InfoClassified diagnostic
Either<T,U>Left, RightSum type

Each stage collects ReportItems. After each stage, the report is printed. Any Error halts compilation. Format: file:line:col: error: message.

🚀

Driver

main.rs + lib.rs · orchestration
Entry point creates a Workspace from the CLI argument. lib.rs orchestrates the full pipeline: analyze() builds the Model, then encode() produces the Artifact.
Execution Flow
main.rs 1. Parse CLI args → filename 2. Workspace::new(filename) → parse all files 3. Check workspace errors → exit(1) if any 4. process_workspace(workspace) lib.rs — process_workspace() 1. Print per-unit parse reports 2. analyze(workspace) → Model (7 stages) 3. encode(model) → Artifact + JSON lib.rs — analyze() Model::build_model() then per unit: → process_constants → process_enums → process_types → process_objects → process_conditions → process_handlers → process_flows lib.rs — encode() S3Encoder::new() → encode each handler → process_flows → write_combined_json

The run_analysis_stage! macro wraps each stage: iterates units, calls the stage, checks for errors, prints reports, aborts early if needed.

🔨

Tooling & Build

scripts/ · Makefile · tests/
Supporting ecosystem: C/C++ header migration scripts, Makefile for FunSDK deployment, integration tests, and hardware configuration files.
Migration Scripts
ScriptPurpose
c_header_to_flash.pyC/C++ structs → Flash (auto union detection, nested deps)
flash_to_c_header.pyFlash structs → C/C++ headers
Build (Makefile)
TargetAction
makecargo build --release
make testcargo test
make installDeploy binary + config to $SDKDIR/bin/
make cleancargo clean
Integration Tests
  • Diamond import — A→B→D, A→C→D: D parsed once
  • Broken import — missing file error handling
  • Cross-directory — relative path resolution with ../

configs/s3_flavours.json defines all 8 flavour specs: memory widths, TCAM depth, instruction memory, core counts.