MISSION BRIEFING · 60 SECONDS

How the swarm maps to the GPU

Before you boot the reactor, one idea unlocks everything: your code runs on thousands of cores at once. You write the orders for a single core, and the whole swarm carries them out in parallel. Here's how a core knows which job is its own.

1 · THE CHAIN OF COMMAND

Cores are organized in a strict hierarchy. The whole grid is split into blocks, and every block holds the same number of threads (the individual cores).

GRID
the whole fleet
every core in the launch
BLOCK
one squad
a fixed-size group of cores
THREAD
one core / bot
runs your kernel once
squad 0 (block 0)
0123
squad 1 (block 1)
4567
← each box is one core; the number is its global index i

2 · WHAT EACH NAME MEANS

block_dim.x
squad size · how many cores are in each block
block_idx.x
which squad · this core's block number: 0, 1, 2, …
thread_idx.x
seat in the squad (a.k.a. tid) · this core's position inside its block: 0 … block_dim-1
THE ONE FORMULA TO REMEMBER

A core needs its global index, its position across the whole fleet, not just its seat in one squad. So it combines the two:

i = block_dim.x * block_idx.x + thread_idx.x

In words: (squad size × which squad) gets you to the start of your squad, then + your seat lands on you.

Worked example: squad size block_dim.x = 4, you're in block_idx.x = 2, seat thread_idx.x = 3 i = 4·2 + 3 = 11. You handle element 11.

3 · BUFFERS · WHERE THE DATA LIVES

The arrays you see in every kernel, inp and out, are buffers: blocks of numbers sitting in GPU memory.

  • inp is the input buffer: the readings streamed in (an image, a signal, a tensor).
  • out is the output buffer: where your results go.

Your kernel runs once per core, and each core touches just one slot, its own index i. Think one mailbox per core: read inp[i], write out[i].

# every core runs this, only i differs
i = block_dim.x * block_idx.x + thread_idx.x
out[i] = inp[i] * 2.0  # this core's slot only

4 · THE GUARD · WHY if i < size

Squads come in fixed sizes, so you almost always launch more cores than there is data. The extra cores at the end would read past the buffer, so one check keeps them idle:

if i < size:
    out[i] = inp[i] * 2.0  # only cores in range run

You'll meet this in Puzzle 3 (“Guard the hull”). Everything else builds on these four ideas.

Boot the reactor · Lesson 1 ⚡You can reopen this briefing any time from a lesson.