๐Ÿ”ฅโ€นAct 0 ยท Pre-Flight
โœฆ 0 XP
Chapter 1 / 3

Load the cargo into the bays

New here? Read the 60-sec briefing โ†—

Before a single core can fire, the data has to be ON the ship. Your cargo (the input array) sits on the dock โ€” host memory, the CPU's RAM. The cores can only reach the ship's bays โ€” device memory, the GPU's own RAM. Move the cargo across.

This is the part the rest of the course hides for you, and it's where most real GPU performance is won or lost. The dockโ†’ship crossing runs over PCIe at ~16โ€“32 GB/s โ€” one to two orders of magnitude slower than the GPU's own ~1โ€“3 TB/s memory. It's routine to leave the MAJORITY of a GPU's real-world speed on the floor (often cited as ~80%) before a single multiply happens, purely from how data is moved. Master this and you're already ahead of people who can write a correct kernel but not a fast one.

โ†ณ Recall: host = the dock (CPU RAM), device = the ship's bays (GPU RAM). Cores read only the bays. briefing โ†—

YOUR TASK
  1. 1The device bay is already allocated for you as `bay`.
  2. 2Copy the host cargo into the bay with ctx.enqueue_copy(dst, src).
๐Ÿ’ก Destination first, source second: ctx.enqueue_copy(bay, host_cargo)
preflight/load_holds.mojo
PUZZLE 0a ยท LOAD THE HOLDS
Clear all 3 chapters to forge๐ŸŽ–๏ธ Flight Certification