Tushar Raturi

May 127 min read

How to create a Programming Language: A wizard's guide to spell creation.

PART 1

The art of writing code is considered mundane by some, and extraordinary by others. Then what about those who create the language in which code is written? They are wizards. These "wiz-ards" (wise men and women) are the ones who control how most programmers the world over, instruct their machines to do magic. Programming language creation that is.

Then how do these wizards create their "magic spells" which every "magician" in the world uses? Well, they use other (previously created) spells to create new ones. As simple as that. In this (and subsequent) document(s), we will explore an overview of how to use one language to create another.

Even the first programmer, Ada Lovelace had a bug in her program (The world's first program). That doesn't mean she was not a proficient programmer, of course not! As you know, bugs and glitches are par for the course in the digital world. However, if these bugs/glitches are not caught in time, who knows what could happen? Lovelace had written the program for a machine called the Analytical Engine (the target platform). The Analytical Engine was a theoretical machine designed by Charles Babbage (after his previous prototype of the Difference Engine). The program was an algorithm to output Bernoulli numbers. You might ask what language she used to write the program.

Well, the language used was a combination of English and mathematics. It was devised by Lovelace herself. Now, don't be surprised! This combination of "English + Mathematics" is still the main form in which modern programming languages are designed! However, we cannot call Lovelace's "English + Mathematics" a programming language. Now don't get me wrong! It is a language, definitely, but it has no formal definition provided by the designer (Lovelace). She just wrote the program in a format that could be easily fed to the Analytical Engine (the target platform), if the hardware was ever built.

Now other "wizards" created new techniques like punch cards. Some even used lambda calculus.

Then, in comes Konrad Zuse, with the first high-level language language: Plankalkül (awesome name, I know). In the combination of "Mathematics + English", the mathematics is strong with this one! (Check it out in your own time).

Requirements for Language Creation

First, we need a target hardware. We'll design a theoretical machine, use English as the base language, and formalize a new language for this new theoretical machine!

Let us call the new machine: Tutorial Engine.

Now we will design the tutorial engine together.

Tutorial Engine Design

ARCHITECTURE:-

Memory Unit (MU) - <Let us add a memory unit to the tutorial engine. It will be responsible for remembering all the numbers that we want it to remember!> The tutorial engine has a Memory Unity, MU, which contains 1000 memory slots (we will count (address) these slots from 0 to 999). Each memory slot of the machine can remember (and forget) a list of one or more 3-digit numbers [-999 to 999]. As an example, let us say that the memory slot 112 holds the number (84). We can then call 84 the "content(s) of the memory". Another example is: the memory slot 114 contains the list of 3 numbers (50, 60, 70). The operation possible by the MU is SET. A note on lists of numbers (also called vectors): As we have designed, the memory slot can store a single number or a list of numbers. Here are some examples: i. 52 ii. (256, 125) iii. (512, 555, 125, (125, 256), 112) The last one might be a bit surprising. However, the 4th element of the last one just happens to be a list instead of a single number.
Calculation Unit (CU) - <Let us add a calculation unit to the tutorial engine. It will be responsible for performing calculations between any numbers/combination of numbers!> The tutorial engine has a Calculation Unit. The CU can take any N memory slots and calculate a result using the content(s) of these N slots. The operations possible by the CU are PLUS, MINUS, DIVIDE, and MULTIPLY. Eg. If M5 (memory slot 5) contains 72 and M6 contains 12, we can get the CU to PLUS the contents of memory slots 5 and 6. The result would be 84. Another Eg. If M7 contains (5,6,7), M8 contains (12, 13) and M9 contains (4,5) then if we PLUS M7, M8, M9, the result would be: (21, 24, 7).
Reasoning Unit (RU) - <Let us add a reasoning unit to the tutorial engine. It will be responsible for performing reasoning deductions based on rules (for eg. the positivity on numbers i.e. if the number is positive)> The tutorial engine has a Reasoning Unit. The RU can take N memory slots and perform Reasoning operations on the contents of these slots. The operations possible by the RU are: ALL, SOME, and CPOS. (A word on the operations: ALL checks if all given memory slot contents are positive numbers. SOME checks if one or more of the memory slots contain a positive number. CPOS (check positive) check if the result of any last operation was positive.
Execution Unit (EU) - <Let us add an execution unit to the tutorial engine. This will be the one responsible for actually running a program on the computer (among other things like jumping to different parts of the program)> The Execution Unit can read a memory slot and check if the slot contains an operation (like PLUS, ALL, etc.). If it is an operation, the execution unit performs that operation, otherwise, it skips that memory slot. After performing or skipping the previous slot, it moves on to the next slot (previous slot + 1). If the previous memory slot was the 999th slot it wraps around to the 0th slot again. If the EU finds the STOP operation, it halts the execution. Now you might ask "A memory slot only contains numbers, so how the hell does the execution unit check if a memory slot contains an operation like PLUS?". Well, the answer is simple. We just assign a number (code) to all operations. Let's call this special number the operand code (OpCode for short) and if any OpCode is found, it is an operation, otherwise, skip! As a memory slot can contain a list of numbers (as opposed to a single number), the EU checks only the first number in the list for an OpCode. The EU also checks if the subsequent numbers match the operand requirements. The EU has the operations START, STOP, NEXT, and GOTO.

USAGE:-

The operations available to us: SET, PLUS, MINUS, DIVIDE, MULTIPLY, ALL, SOME, CPOS, START, STOP, NEXT, GOTO. Let's fix their OpCodes to be 0 to 11 respectively (The meaning of OpCodes was described in the EU section).

To use an operation, we use it like this: <OperationName><Hashtag>

Example: SET#

Think of the hashtag as a way to pass electricity to the "SET" wire or activate the signal. When we pass current to the "SET" wire, the memory unit will automatically receive the signal (because the machine knows that SET is an MU operation, i.e. the SET wire is connected to the memory unit). One thing is missing, however. Can you guess? The machine doesn't know what to "SET"!

Here is the second part of the usage:

Example: SET# 5

This operation sets the contents of the memory slots 5. You must be wondering if we had to send the SET signal to the machine, how are we sending the 5? Well, let's just say it is black magic (we don't need to elaborate on everything now, do we?). Now the last thing missing is what to set the memory slot 5 to.

Example: SET# 5,(1,2)

This operation sets the memory slot 5's content to the list (1,2).

A new Example: SET# (5,6,7),(1,1,1)

This sets the contents of memory slots 5,6 and 7 to the list (1,1,1). Meaning, that all the memory slots 5,6 and 7 will be set to the value (1,1,1).

Now let's use the machine to do something! Let's use it to add two numbers 5 and 6. But, how will we do this? We know that the execution unit is the one which is responsible for checking a memory slot and seeing if it contains an operation. This means, we first need to fill the memory slots with operations. This list of operations can then be executed by the execution unit! By the way, this list of operations is usually called a "Program".

One thing before starting, you might ask, how does the EU "execute" an operation? The EU executes an operation the same way we do. For example, we execute SET by passing current to the SET wire (when we write SET#), similarly, the EU internally executes an instruction (for eg. SET) by passing current to SET, automatically. It is a black box that does all this.

Filling the memory slots (Writing the program):

We will use the SET operation (wire) to fill the memory slots (or write the program to memory). Note: We aren't yet executing the program; only writing the program to memory) which the EU will later find and execute.

SET# 0, (0, 256, 5)

SET# 1, (0, 257, 6)

SET# 2, (1, (256, 257), 258)

SET# 3, 9

The above will do the following to the memory:-

Memory Slot 0: (0, 256, 5)

Memory Slot 1: (0, 257, 6)

Memory Slot 2: (1, (256, 257), 258)

Memory Slot 3: 9

Now, the memory contains the above data. However, when we execute this (by sending START# to the execution unit), the EU will try to read the above not like data, but like a "program".

Let's extract some sense out of this "program". Memory Slot 0 contains (0, 256, 5). Now, we know 0 (which is the first number of the list) is the opcode for SET (as we have defined above). We know that SET needs 2 things (as we have described above). Those two are "What to set" and "The Value to set to". So, if we put all this together, the first memory slot does the following: SET the memory slot 256 to value 5

Next slot (Slot 1):

SET the memory slot 257 to value 6

Next, What about the memory slot 2?

Well, it contains the OpCode for PLUS. However, we haven't yet defined PLUS. Let's define it now: Let's say PLUS requires 2 things: "What to add" and "Where to put the result". Hence we can say slot 2 contains the following:

PLUS the contents of slots (256, 257) and store result to 258

Lastly, slot 3 contains:

STOP the machine

Now that we have stored the program in memory, we can execute it. But how? Simply we pass the START signal to the EU. The EU will automatically start executing the program starting from memory slot 0.

START#

After the execution, the result of 5 + 6 (i.e 11) will be stored in memory slot 258.

CONCLUSION:-

In the first part of this series, we have designed a theoretical machine, the Tutorial Engine. In the next part, we will formalize all the operations of this machine in a table. After that, we can move on to talking about the assembly language for this machine.

How to create a Programming Language: A wizard's guide to spell creation.

Recent Posts

Comments