How the swarm maps to the GPU
Before you boot the reactor, one idea unlocks everything: your code runs on thousands of cores at once. You write the orders for a single core, and the whole swarm carries them out in parallel. Here's how a core knows which job is its own.
1 · THE CHAIN OF COMMAND
Cores are organized in a strict hierarchy. The whole grid is split into blocks, and every block holds the same number of threads (the individual cores).
2 · WHAT EACH NAME MEANS
block_dim.xblock_idx.xthread_idx.xA core needs its global index, its position across the whole fleet, not just its seat in one squad. So it combines the two:
i = block_dim.x * block_idx.x + thread_idx.x
In words: (squad size × which squad) gets you to the start of your squad, then + your seat lands on you.
block_dim.x = 4, you're in block_idx.x = 2, seat thread_idx.x = 3 → i = 4·2 + 3 = 11. You handle element 11.3 · BUFFERS · WHERE THE DATA LIVES
The arrays you see in every kernel, inp and out, are buffers: blocks of numbers sitting in GPU memory.
inpis the input buffer: the readings streamed in (an image, a signal, a tensor).outis the output buffer: where your results go.
Your kernel runs once per core, and each core touches just one slot, its own index i. Think one mailbox per core: read inp[i], write out[i].
# every core runs this, only i differs i = block_dim.x * block_idx.x + thread_idx.x out[i] = inp[i] * 2.0 # this core's slot only
4 · THE GUARD · WHY if i < size
Squads come in fixed sizes, so you almost always launch more cores than there is data. The extra cores at the end would read past the buffer, so one check keeps them idle:
if i < size: out[i] = inp[i] * 2.0 # only cores in range run
You'll meet this in Puzzle 3 (“Guard the hull”). Everything else builds on these four ideas.