Auto-translation used

Architecture of Thought, Part 1: Tanusha — how we taught Python to read runes

This is the first article from the new technical cycle, where we will look "under the hood" of the Tengri-Lang language. Now that the project is open on GitHub, we can show in detail how its practical implementation began. In this part, we will analyze the very first "master" of our compiler, a Lexer implemented in Python.

In previous articles, we talked a lot about philosophy. Now let's talk about the code.

Before starting to create a "combat" compiler for Go, we needed to quickly test the idea itself. Will the machine be able to recognize our runic syntax in principle? Will the grammar logic work? There is no better tool for such tasks than Python. He allowed us, without getting bogged down in details, to create the first "master" — "Tanusha" (the Discerner).

His work is like that of a reader who slides along a line and recognizes individual words without delving into the meaning of the whole sentence.

First of all, we have determined what the "recognized word" (token) will look like in our system. We have created a simple Token class, which is a container for information about each element of the code.

  • token.py:

Python

# token.py
class Token:
    """Describes one 'recognized' code element."""
def __init__(self, type, value, line=1, column=1):
        self.type = type # Token type (for example, 'Runa_Const', 'Identifier')
        self.value = value # Token value (for example, 'Λ', 'san')
self.line = line # Line number
        self.column = column # Column number

    def __repr__(self):
"""Method for beautiful token output when printing."""
return f"Token({self.type}, '{self.value}')"

Each token has a type and a value. Simple and effective.

Now it's "Tanusha" itself. In Python, it is very convenient to implement it as a Lexer class. The heart of this class is the token_map dictionary. It allows you to instantly, without complex checks, match a character from the code with its type.

  • lexer.py (key fragments):

Python

# lexer.py
from token import Token

class Lexer:
""""Tanusha" — recognizer. The final version of the prototype."""
def __init__(self, source_code):
self.code = source_code
        self.position = 0
# Full dictionary for instant recognition of single characters
        self.token_map = {
            'Π': 'Runa_Func_Def', '—': 'Runa_Var',
            'Λ': 'Runa_Const',    'Y': 'Runa_If',
            'Q': 'Runa_True',     'I': 'Runa_False',
            '↻': 'Runa_Loop',     '→': 'Runa_Return',
            # ... and so on for all runes and operators.
        }

The main get_next_token method looks at the current character and first checks if it is in our dictionary. If yes, the token is created instantly.

Python

# fragment of the get_next_token method in lexer.py
def get_next_token(self):
# ... (omitting spaces and comments)

    current_char = self.code[self.position]

    # Checking the dictionary of single characters
    if current_char in self.token_map:
        token_type = self.token_map[current_char]
        token = Token(token_type, current_char)
        self.position += 1
        return token

    # If the character is not in the dictionary, check if it is a number.
    if current_char.isdigit():
        return self._read_number()

    # Or maybe it's the name (identifier)
if current_char.isalpha():
identifier= self._read_identifier()
        return Token('Identifier', identifier)
    
    # ...

The _read_identifier() method simply reads characters in a row as long as they are letters or numbers, and returns the resulting word. This way, tengri is recognized as a single token, rather than 5 separate letters.

This simple Python Lexer has fulfilled its main task: it has proved that our system of runes and syntax is viable. He was able to correctly read and break down into meaningful parts the code that we wrote for him.

After receiving this confirmation, we realized that it was time to move on to the next level. To create a real, fast and reliable compiler, a more suitable tool was needed. That is why the next step in our history was the transfer of this verified logic to the "combat" language — Go.

In the next article, we will get acquainted with the heart of our compiler — "Kurastyrusha" (Parser), which knows all the laws of the language. It is he who takes a stream of tokens and builds from them a majestic "Oh Baytergi" — a Tree of Thought.

Stay tuned and check out our GitHub repository!

Comments 0

Login to leave a comment

CONSULT A LICENSED BTC, USDT RECOVERY EXPERT / THE HACK ANGELS A LIFESAVER IN MY DARKEST MOMENT I want to extend my deepest gratitude to THE HACK ANGELS for everything you've done for me. I sincerely appreciate your efforts in locating my misplaced $970,000 worth of Bitcoin, as well as your positive attitude. I admire your desire to go above and above, as well as the fact that you and your team are always willing to assist others. I was overwhelmed with stress and confusion, completely unsure of what to do next. Thankfully, a friend introduced me to THE HACK ANGELS, and that connection truly changed everything. Their professionalism, patience, and unwavering support brought back my smile. I’m incredibly grateful for THE HACK ANGELS helping me in during one of the most difficult times in my life. I cannot thank them enough for their exceptional service and dedication. I highly recommend their services to anyone facing a similar situation, they are experienced, efficient, and trustworthy, you can get in contact with them. WhatsApp +1(520)2 0 0-2 3 2 0) Email at support@thehackangels.com Website at www.thehackangels.com They are truly exceptional when it comes to crypto recovery. If you're in London, you can even visit them in person at their office located at 45-46 Red Lion Street, London WC1R 4PF, UK. They’re super helpful and really know their stuff! Don’t hesitate to reach out if you need help!

Reply