The post has been translated automatically. Original language: English
a call to return to gpu-primitivism
A spiral of computation evolution:
1. The beginning — pure math, a loop on the CPU: take coordinates → evaluate a function → draw a pixel. (Early demos, raymarch experiments).
2. Optimization for weak hardware — the graphics pipeline appeared (vertices → rasterization → fragments). It was a workaround to make real-time graphics possible on small buses and limited hardware.
3. Expansion to compute — GPUs became more general-purpose: CUDA, OpenCL, compute shaders. But under the hood, the “ancient "optimization" baggage” remains — warps, local shared memory, pipeline logic, that built and turned around of pipeline of compatibility.
4. Return to the root — Tensor Cores, TPUs, NPUs. In essence, back to the “pure loop”: billions of identical operations, without the legacy of the pipeline.
The question rises in the end. Why someone needs that layered overcomplicated architecture when in nutshell you just need poor math and you don't need "optimisation" of the past. Why someone needs all that CUDA cores for example when nowadays a bus width allows to compute simple cos(x*x+y*y)*w, sin(x*x+z*z)*w to project reality onto monitor with raymarching algorithm using shaders(yet still with a lot of computational overhead), well with a bit of optimization yet xd
All is needed is massively parallel pure math engine.
In the end the choice is binary. Either keep “living on the ancient ways” — dragging pipelines and abstractions for universality and backward compatibility.
Or accept the “return” — that modern silicon allows the direct path again (pure matrix loops, formulas, raymarching).
a call to return to gpu-primitivism
A spiral of computation evolution:
1. The beginning — pure math, a loop on the CPU: take coordinates → evaluate a function → draw a pixel. (Early demos, raymarch experiments).
2. Optimization for weak hardware — the graphics pipeline appeared (vertices → rasterization → fragments). It was a workaround to make real-time graphics possible on small buses and limited hardware.
3. Expansion to compute — GPUs became more general-purpose: CUDA, OpenCL, compute shaders. But under the hood, the “ancient "optimization" baggage” remains — warps, local shared memory, pipeline logic, that built and turned around of pipeline of compatibility.
4. Return to the root — Tensor Cores, TPUs, NPUs. In essence, back to the “pure loop”: billions of identical operations, without the legacy of the pipeline.
The question rises in the end. Why someone needs that layered overcomplicated architecture when in nutshell you just need poor math and you don't need "optimisation" of the past. Why someone needs all that CUDA cores for example when nowadays a bus width allows to compute simple cos(x*x+y*y)*w, sin(x*x+z*z)*w to project reality onto monitor with raymarching algorithm using shaders(yet still with a lot of computational overhead), well with a bit of optimization yet xd
All is needed is massively parallel pure math engine.
In the end the choice is binary. Either keep “living on the ancient ways” — dragging pipelines and abstractions for universality and backward compatibility.
Or accept the “return” — that modern silicon allows the direct path again (pure matrix loops, formulas, raymarching).