Auto-translation used

Read the original

Архитектура Мысли, Часть 1: Танушы — как мы научили Python читать руны

Это первая статья из нового технического цикла, где мы заглянем «под капот» языка Tengri-Lang. Теперь, когда проект открыт на GitHub, мы можем в деталях показать, с чего начиналось его практическое воплощение. В этой части мы разберем самого первого «мастера» нашего компилятора — Лексер, реализованный на Python.

В предыдущих статьях мы много говорили о философии. Теперь давайте поговорим о коде.

Прежде чем браться за создание «боевого» компилятора на Go, нам нужно было быстро проверить саму идею. Сможет ли машина в принципе распознать наш рунический синтаксис? Будет ли логика грамматики работать? Для таких задач нет инструмента лучше, чем Python. Он позволил нам, не увязая в деталях, создать первого «мастера» — «Танушы» (Распознающего).

Его работа — как у чтеца, который скользит по строке и распознает отдельные слова, не вникая в смысл предложения целиком.

Прежде всего, мы определили, как будет выглядеть «распознанное слово» (токен) в нашей системе. Мы создали простой класс Token, который является контейнером для информации о каждом элементе кода.

token.py:

Python

# token.py
class Token:
    """Описывает один 'распознанный' элемент кода."""
    def __init__(self, type, value, line=1, column=1):
        self.type = type      # Тип токена (например, 'Runa_Const', 'Identifier')
        self.value = value    # Значение токена (например, 'Λ', 'san')
        self.line = line      # Номер строки
        self.column = column  # Номер колонки

    def __repr__(self):
        """Метод для красивого вывода токена при печати."""
        return f"Token({self.type}, '{self.value}')"

Каждый токен имеет тип (type) и значение (value). Просто и эффективно.

Теперь — сам «Танушы». В Python его очень удобно реализовать в виде класса Lexer. Сердце этого класса — словарь token_map. Он позволяет мгновенно, без сложных проверок, сопоставить символ из кода с его типом.

lexer.py (ключевые фрагменты):

Python

# lexer.py
from token import Token

class Lexer:
    """«Танушы» — распознающий. Финальная версия прототипа."""
    def __init__(self, source_code):
        self.code = source_code
        self.position = 0
        # Полный словарь для мгновенного распознавания одиночных символов
        self.token_map = {
            'Π': 'Runa_Func_Def', '—': 'Runa_Var',
            'Λ': 'Runa_Const',    'Y': 'Runa_If',
            'Q': 'Runa_True',     'I': 'Runa_False',
            '↻': 'Runa_Loop',     '→': 'Runa_Return',
            # ... и так далее для всех рун и операторов
        }

Главный метод get_next_token смотрит на текущий символ и первым делом проверяет, есть ли он в нашем словаре. Если да — токен мгновенно создан.

Python

# фрагмент метода get_next_token в lexer.py
def get_next_token(self):
    # ... (пропуск пробелов и комментариев)

    current_char = self.code[self.position]

    # Проверка по словарю одиночных символов
    if current_char in self.token_map:
        token_type = self.token_map[current_char]
        token = Token(token_type, current_char)
        self.position += 1
        return token

    # Если символ не в словаре, проверяем, не число ли это
    if current_char.isdigit():
        return self._read_number()

    # Или, может быть, это имя (идентификатор)
    if current_char.isalpha():
        identifier = self._read_identifier()
        return Token('Identifier', identifier)
    
    # ...

Метод _read_identifier() просто читает символы подряд, пока они являются буквами или цифрами, и возвращает получившееся слово. Так tengri распознается как один токен, а не 5 отдельных букв.

Этот простой Лексер на Python выполнил свою главную задачу: он доказал, что наша система рун и синтаксиса жизнеспособна. Он смог правильно прочитать и разбить на осмысленные части код, который мы для него написали.

Получив это подтверждение, мы поняли, что пора переходить на следующий уровень. Для создания настоящего, быстрого и надежного компилятора нужен был более подходящий инструмент. Именно поэтому следующим шагом в нашей истории стал перенос этой выверенной логики на «боевой» язык — Go.

В следующей статье мы познакомимся с сердцем нашего компилятора — «Құрастырушы» (Парсером), который знает все законы языка. Именно он берет поток токенов и строит из них величественное «Ой Бәйтерегі» — Древо Мысли.

Оставайтесь с нами и заглядывайте в наш репозиторий на GitHub!

Architecture of Thought, Part 1: Tanusha — how we taught Python to read runes

This is the first article from the new technical cycle, where we will look "under the hood" of the Tengri-Lang language. Now that the project is open on GitHub, we can show in detail how its practical implementation began. In this part, we will analyze the very first "master" of our compiler, a Lexer implemented in Python.

In previous articles, we talked a lot about philosophy. Now let's talk about the code.

Before starting to create a "combat" compiler for Go, we needed to quickly test the idea itself. Will the machine be able to recognize our runic syntax in principle? Will the grammar logic work? There is no better tool for such tasks than Python. He allowed us, without getting bogged down in details, to create the first "master" — "Tanusha" (the Discerner).

His work is like that of a reader who slides along a line and recognizes individual words without delving into the meaning of the whole sentence.

First of all, we have determined what the "recognized word" (token) will look like in our system. We have created a simple Token class, which is a container for information about each element of the code.

token.py:

Python

# token.py
class Token:
    """Describes one 'recognized' code element."""
def __init__(self, type, value, line=1, column=1):
        self.type = type # Token type (for example, 'Runa_Const', 'Identifier')
        self.value = value # Token value (for example, 'Λ', 'san')
self.line = line # Line number
        self.column = column # Column number

    def __repr__(self):
"""Method for beautiful token output when printing."""
return f"Token({self.type}, '{self.value}')"

Each token has a type and a value. Simple and effective.

Now it's "Tanusha" itself. In Python, it is very convenient to implement it as a Lexer class. The heart of this class is the token_map dictionary. It allows you to instantly, without complex checks, match a character from the code with its type.

lexer.py (key fragments):

Python

# lexer.py
from token import Token

class Lexer:
""""Tanusha" — recognizer. The final version of the prototype."""
def __init__(self, source_code):
self.code = source_code
        self.position = 0
# Full dictionary for instant recognition of single characters
        self.token_map = {
            'Π': 'Runa_Func_Def', '—': 'Runa_Var',
            'Λ': 'Runa_Const',    'Y': 'Runa_If',
            'Q': 'Runa_True',     'I': 'Runa_False',
            '↻': 'Runa_Loop',     '→': 'Runa_Return',
            # ... and so on for all runes and operators.
        }

The main get_next_token method looks at the current character and first checks if it is in our dictionary. If yes, the token is created instantly.

Python

# fragment of the get_next_token method in lexer.py
def get_next_token(self):
# ... (omitting spaces and comments)

    current_char = self.code[self.position]

    # Checking the dictionary of single characters
    if current_char in self.token_map:
        token_type = self.token_map[current_char]
        token = Token(token_type, current_char)
        self.position += 1
        return token

    # If the character is not in the dictionary, check if it is a number.
    if current_char.isdigit():
        return self._read_number()

    # Or maybe it's the name (identifier)
if current_char.isalpha():
identifier= self._read_identifier()
        return Token('Identifier', identifier)
    
    # ...

The _read_identifier() method simply reads characters in a row as long as they are letters or numbers, and returns the resulting word. This way, tengri is recognized as a single token, rather than 5 separate letters.

This simple Python Lexer has fulfilled its main task: it has proved that our system of runes and syntax is viable. He was able to correctly read and break down into meaningful parts the code that we wrote for him.

After receiving this confirmation, we realized that it was time to move on to the next level. To create a real, fast and reliable compiler, a more suitable tool was needed. That is why the next step in our history was the transfer of this verified logic to the "combat" language — Go.

In the next article, we will get acquainted with the heart of our compiler — "Kurastyrusha" (Parser), which knows all the laws of the language. It is he who takes a stream of tokens and builds from them a majestic "Oh Baytergi" — a Tree of Thought.

Stay tuned and check out our GitHub repository!

130

Daulet Baimurza
July 9, 2025 15:13

Comments 0

Sherry Romeo · July 24, 2025 00:32

CONSULT A LICENSED BTC, USDT RECOVERY EXPERT / THE HACK ANGELS A LIFESAVER IN MY DARKEST MOMENT I want to extend my deepest gratitude to THE HACK ANGELS for everything you've done for me. I sincerely appreciate your efforts in locating my misplaced $970,000 worth of Bitcoin, as well as your positive attitude. I admire your desire to go above and above, as well as the fact that you and your team are always willing to assist others. I was overwhelmed with stress and confusion, completely unsure of what to do next. Thankfully, a friend introduced me to THE HACK ANGELS, and that connection truly changed everything. Their professionalism, patience, and unwavering support brought back my smile. I’m incredibly grateful for THE HACK ANGELS helping me in during one of the most difficult times in my life. I cannot thank them enough for their exceptional service and dedication. I highly recommend their services to anyone facing a similar situation, they are experienced, efficient, and trustworthy, you can get in contact with them. WhatsApp +1(520)2 0 0-2 3 2 0) Email at support@thehackangels.com Website at www.thehackangels.com They are truly exceptional when it comes to crypto recovery. If you're in London, you can even visit them in person at their office located at 45-46 Red Lion Street, London WC1R 4PF, UK. They’re super helpful and really know their stuff! Don’t hesitate to reach out if you need help!

Антон Тищенко · July 9, 2025 17:49

🔥🔥🔥

История Очистить

Popular posts

Artificial intelligence and cybersecurity in 2025: a double challenge for organizations

malik berdigaliyev
July 11, 2025

The most sought-after programming languages in 2025

Кирилл Коваленко
July 15, 2025

From Constant Chaos to Control: 3 Rituals for a Founder to Maintain Productivity and Sanity

Адия Битанова
July 10, 2025

, Part 10. Enabling it. Do investors understand exactly how this should work?

Andrey Zhuravlev
July 11, 2025

Архитектура Мысли, Часть 1: Танушы — как мы научили Python читать руны

Architecture of Thought, Part 1: Tanusha — how we taught Python to read runes

Daulet Baimurza
July 9, 2025 15:13

Comments 0

Sherry Romeo · July 24, 2025 00:32

Антон Тищенко · July 9, 2025 17:49

Popular posts

Artificial intelligence and cybersecurity in 2025: a double challenge for organizations

malik berdigaliyev July 11, 2025

The most sought-after programming languages in 2025

Кирилл Коваленко July 15, 2025

From Constant Chaos to Control: 3 Rituals for a Founder to Maintain Productivity and Sanity

Адия Битанова July 10, 2025

, Part 10. Enabling it. Do investors understand exactly how this should work?

Andrey Zhuravlev July 11, 2025

Архитектура Мысли, Часть 1: Танушы — как мы научили Python читать руны

Architecture of Thought, Part 1: Tanusha — how we taught Python to read runes

Daulet Baimurza July 9, 2025 15:13

Comments 0

Sherry Romeo · July 24, 2025 00:32

Антон Тищенко · July 9, 2025 17:49

malik berdigaliyev
July 11, 2025

Кирилл Коваленко
July 15, 2025

Адия Битанова
July 10, 2025

Andrey Zhuravlev
July 11, 2025

Daulet Baimurza
July 9, 2025 15:13