NerdSembler
What is an assembler?
An assembler is a tool that produces machine code from source code files that are written in assembly language. Assembly language is pretty similar to machine code since assembly instructions map essentially one to one to machine instructions.
Why use an assembler?
Assemblers offer conveniences to programmers by providing features like labels for branch and jump instructions. The use of dynamic labels prevents the need for hard coded jump targets that can move if an extra instruction is inserted into the code.
How does NerdSembler work?
NerdSembler uses a phased approach to assembling code. Each phase takes data in some form and passes it through an algorithm that converts it into another form. After several phases, the source code written by the developer is finally converted all the way to machine code.
- [Data] Assembly Language (source code written by the developer)
- [Algorithm] Lexer
- Make up of a collection of Tokenizers that convert the characters in the source code into tokens.
- Also handles the .include directive to bring in additional source code files for lexing.
- [Data] Token Stream (a collection of tokens identified in the code)
- Represents the source code as a collection of tokens (often involving several characters each).
- Can be used to generate syntax highlighted output of the source code as seen by the Lexer.
- [Algorithm] Parser
- Walks through the tokens in the token stream and tries to validate the syntax of the code using a recursive descent approach.
- Creates an abstract syntax tree and keeps track of syntax errors.
- [Data] Abstract Syntax Tree (AST)
- Represents the structure of the entire program in an abstract way.
- [Algorithm] Emitter
- Uses the AST to generate name tables for labels, constants, and variables.
- Calculates mathematical values, performs indexing, identifies label addresses.
- Combines the AST with Names and Values to produce machine code.
- Aligns the code onto chips and the overlapping banks they represent.
- [Data] Machine Code
- Made up of multiple binary output files that represent the data that is put on the ROM chips that go into the cartridge.