Parallel CPU work: rayon::par_iter vs goroutines + WaitGroup

Both implementations score the same FNV-style hash over each candidate's description (1500 iterations × ~200 bytes), so the workload is identical. The comparison is on the parallelism scaffolding, not the math.

Go: one goroutine per candidate. Each goroutine writes to results[idx] — its own pre-allocated slot. No mutex because no two goroutines touch the same index. The "no shared mutation" property is a code-review invariant: get it wrong (e.g. append instead of indexed assign) and go test -race catches it eventually, or production catches it. sync.WaitGroup joins. Idiomatic, ~30 lines including the helpers.

Rust: candidates.into_par_iter().map(score).collect(). That's the entire parallelism. Rayon owns the work-stealing pool; we describe the pure function and Rust monomorphizes the call. The borrow checker refuses any shared mutable state in the closure at compile time, so the "no mutex needed" invariant isn't a code-review property — it's a build-time guarantee.

The wrinkle on the Rust side is the runtime boundary. Rayon is synchronous; the handler is async. The fix is one line — wrap the rayon call in tokio::task::spawn_blocking(move || …).await? — so the rayon threads don't starve the async reactor.

Practical note: for ~30 candidates and a tight CPU loop, both versions finish well under 50 ms on a modest machine. Crank the iterations up by 10× and the parallel speedup is visible to the eye on either side. The code difference, though, is permanent — and that's what this page wants you to see.

What to take away