Making a Programming Language
Est. Read Time: ~5 MinutesPosted: March 26, 2022
In the last year or so, I’ve been going through a few books on compilers and programming languages. I highly recommend checking these out:
So, after going through those books. I’ve built up a simple base codebase to expand upon for my own language. I’ve been building it in Rust. I had an early version in Zig, but was running into (user) issues with the Zig language that I didn’t want to fight. The compilation+tooling in Rust makes it pretty clear if your code will work as you expect (at least, as well as you programmed it). Where in Zig I was having values disappear on me. This is likely just me not understanding Zig, but I felt I’d rather focus on the project rather than learning Zig. So I jumped back to Rust.
Another reason I enjoy Rust as my host language is because Rust does enums really well. My entire Abstract Syntax Tree (AST) is enums. I’ll give an example. I have an infix expression: 10*2
. We read that string as some tokens: Number, Asterisk, Number. Then convert those tokens into an AST to represent the expression. This is converting the tokens into something a little bit more contextually meaningful. The AST in this case looks like this:
Expression::Infix(
Infix::Multiply,
Box::new(Expression::Literal(Literal::Number(10.0))),
Box::new(Expression::Literal(Literal::Number(2.0))),
)
Then we read this expression and would convert it into byte code. Which will look something like this:
Constant 0
Constant 1
Add
And the virtual machine understands to do an add operation on those two constants. It’s fun. It’s a challenge to get all the different parts to work together. Definitely recommend all the books above even if you just do the projects in the book and never touch it again. There’s so many cool ideas to gain from working through a project like this.
For my toy language I’ve been playing with a few ideas. Let’s take a look at this bit of code.
a <- 1 + 2 - 4 * 5 / 2;
a
It does some arithmetic and then stores the value into a
and then a is placed on the next line to output the value. But what is the value in this case? Many languages try to respect typical expectations of PEMDAS. This is good language design to lean on knowledge and intuitions the users already have. I’m throwing that out the window. My first experiment was to just simply do left to right evaluation for arithmetic. Is this a good idea? Probably not great for people learning the language. Probably would introduce a bug. But in many programming languages as is, there’s weird ambiguous points and you end up with a bunch of parenthesis to override precedence anyways. So our code just goes left to right and produces a -2.5.
I’ve been learning a bit about array programming languages such as J, K, APL or BQN. They’re rather fascinating. I think the piece of code that got me interested was doing something like this.
+/1 2 3 4 5 6 7 8 9 10
55
We get an array [ 1,2,3,4... ]
and then there’s a +/
which means to do an add reduce
on the array. For reference to do something similar in javascript would be something like this.
const result = [1,2,3,4,5,6,7,8,9,10].reduce(
(accumulator, currentValue) => accumulator + currentValue,
initialValue
);
I was much more familiar with the javascript version. In fact, when learning reduce in javascript I really had a hard time understanding what it did. But for some reason the APL version made more sense to me. It has an intuitiveness to it. Because of that I wanted to try and explore ideas in APL languages in my own language. Here’s how an add reduce
looks in my language.
a <- [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\+
a // outputs 55
But there’s more fun that’s hidden in this. What about nested arrays?
a <- [ [1, 2, 3], [4, 5, 6], [7, 8, 9]]\+
a // outputs [12, 15, 18]
adding arrays in the language behaves like this:
[1, 2] + [3, 4] => [4, 6]
In fact this is how add, subtract, multiply, divide all work between two arrays.
Neat. APL also tends to use unicode symbols such as ≢
or ⍨
in the language itself. It requires either an editor setup or a special keyboard. It’s super annoying, but I keep going down the APL rabbit hole because it’s proving interesting so far. And I decided to implement both ≢
and ⍨
.
≢
is called tally. Could be looked at as a sort of length function. An array [1,2,3]
has 3 elements so [1,2,3]≢
will return 3. Was rather simple to implement this one. And it allows us to do really cool things like this:
a <- [1, 3, 5, 7, 9]
a\+/a≢
Now we’re getting weird. What does this do? a\+
is an add reduce like mentioned above. Cool, then we divide /
and then do a≢
which is the length of a
. So sum up all the values, divide them by the length of a
. Neat, we can calculate the average (our output is 5).
I’ll probably do more write ups on this as I go along, but it’s been a fun side project for me to play with. Once I have a handful of functions that can move around numbers in a way that I’m happy with, then I might look into implementing something to render stuff. I spent a bit of time looking into how people implement gameboy emulators. Which happens to look a lot like writing a programming language VM. I want to see what kind of concepts can I smash together in a fun way.