Rust at the Metal: The GPU Layer Driving Modern AI

OPENING NOTE

How Rust is Closing the Gap to the GPU

In this issue, we explore how Rust is moving closer to the GPU layer, the foundation where modern AI performance is forged. From portable technologies like Vulkan and Metal to low-level CUDA acceleration, Rust reaches the hardware depths that power today's most demanding workloads.

This issue delves deep into the technologies that are making Rust's GPU integration possible, and how they are shaping the next generation of AI and machine learning.

Let's dive into the current issue.

EDITORIAL INSIGHT

Rust’s Quiet Revolution in GPU Computing

GPUs have evolved from fixed-function graphics accelerators into massively parallel compute engines that now underpin much of modern AI and scientific computing. Unlike CPUs, which optimize for sequential logic and branch-heavy workloads, GPUs are designed to execute thousands of lightweight threads in parallel, making them ideal for matrix multiplication, convolution, and other high-throughput numerical operations. Major vendors such as NVIDIA, AMD, Intel, and Apple each maintain their own hardware ecosystems and software stacks. NVIDIA dominates AI workloads through its CUDA platform, while AMD’s ROCm and Apple’s Metal Performance Shaders (MPS) are closing the gap with increasingly mature driver and compiler support.

At the software level, frameworks like PyTorch and TensorFlow abstract this complexity by dispatching tensor operations to backend kernels implemented in CUDA, HIP, or MPS. These kernels are compiled into low-level GPU code and executed by vendor drivers on the hardware’s compute cores. Although GPUs share architectural roots with the graphics pipeline, AI frameworks do not rely on vertex or fragment shaders; instead, they use compute kernels that directly exploit the GPU’s general-purpose SIMD (Single Instruction, Multiple Data) capabilities. This decoupling from the graphics stack has allowed modern GPUs to transcend their gaming origins, becoming the de facto engines for large-scale AI and data-parallel computation.

In reality, most Rust developers working in AI and machine learning will not need to delve into the intricate details of GPU acceleration. The frameworks and libraries they use for training, inference, or data processing manage GPU acceleration behind the scenes. However, it is important to recognize that Rust is making progress in this area. A more robust and unified GPU foundation allows higher-level AI frameworks to be developed entirely in Rust, without the need for C++ bindings or vendor-specific glue code.

Rust’s emergence in GPU computing isn’t about replacing CUDA or rewriting established APIs. On one side, some projects aim to unify all API’s, such as wgpu, Rust’s implementation of the WebGPU standard, unifies access to Vulkan, Metal, DirectX, and WebAssembly. It gives developers a portable foundation for GPU compute that runs across platforms and architectures. Another project Rust-GPU aims to make Rust as a first-class language and ecosystem for GPU graphics & compute shaders

On the other hand, Cudarc and Rust-CUDA extend Rust’s reach directly into NVIDIA’s ecosystem, as well as ZLUDA which allow running unmodified CUDA applications using non-NVIDIA GPUs. There’s also Vulkano, a safe and rich wrapper for the Vulkan API.

Bridging these layers are new compute-language projects like CubeCL and Krnl, which allow developers to write GPU kernels directly in Rust syntax.

While it's still early in the process, the direction for Rust is becoming clear. The Rust ecosystem is developing in a way that goes beyond simply wrapping around existing GPU APIs. This represents a subtle yet significant shift, positioning Rust not just as a newcomer to GPU programming, but as a language poised to redefine how AI workloads interact with hardware.

FRAMEWORK OF THE WEEK

Each week, we spotlight a project pushing Rust's AI frontier forward.

Rust-Cuda - Fast GPU computing using the CUDA Toolkit

If Rust had a direct path to NVIDIA GPUs, it would be Rust CUDA.

Recently rebooted as part of the broader Rust-GPU initiative, Rust-CUDA is an ongoing effort to bring native CUDA kernel development into the Rust ecosystem, letting developers write, compile, and launch GPU kernels directly in Rust without relying on C++ bindings.

Its design allows Rust code to integrate more naturally with the CUDA toolchain while exploring how Rust's modern tooling can improve GPU programming workflows.

Although still in early development, with active refactoring, and incomplete features, the reboot marks an essential step toward unifying Rust's GPU ecosystem.

Its long-term vision is to align with the Rust-GPU project, potentially allowing the same Rust codebase to target multiple GPU architectures, from Vulkan to CUDA.

That vision was illustrated earlier this year in the blog post "Rust on Every GPU", which demonstrated a single Rust codebase running seamlessly across CUDA, SPIR-V, Metal, and DirectX backends. The accompanying demo project, rust-gpu-chimera, showcased this cross-GPU capability in action, a glimpse of a future where developers write GPU code once in Rust and deploy it anywhere.

🔗 Explore ➡ Rust-CUDA

FROM THE COMMUNITY

Highlights from across the Rust + AI ecosystem

Videos & Talks 📺

Rust 2025: $400K Salaries, AI, Defense & Borrow Checker — Jon Gjengset on Rust & Future of Coding
In this in-depth interview, Jon Gjengset (Rust educator, MIT PhD, and author of Rust for Rustaceans) discusses how Rust salaries are reaching $400,000, how AI is changing the way developers write code, and why the borrow checker still defines Rust’s identity.
Watch ➡

Blog Posts ✏

Rust-Python FFI - The post from Dora-rs explores the challenges of building multi-language Rust libraries, focusing on Rust-Python integration with pyo3. It compares data-sharing methods like PyBytes and Apache Arrow for faster, zero-copy performance, and discusses error handling, GIL memory behavior, and tracing with OpenTelemetry. These insights underpin Dora-rs, a framework for real-time, multi-AI, multi-hardware applications.
Read ➡

Github Highlights 🧑‍💻

goose v0.10.0 Released - This release adds new recipes and prompts, enriches the CLI with debugging and session‑management features, enhances the user interface and documentation, and resolves many bugs and technical debts.
Github ➡
lance v0.38.3-beta.3 Released - This release introduces compression support, new dataset APIs, and row-tracking features, while also fixing several indexing and compatibility bugs and updating documentation.
Gihub ➡
ndarray v0.17.0 Released - introduces new array reference types and delivers a host of additional methods and math functions while cleaning up feature flags and improving documentation.
Github ➡
daft v0.6.6 Released - This release introduces explicit AWS vs HTTP modes for common‑crawl datasets, pydantic model conversions, new UDF decorators and Flotilla enhancements.
Github ➡

KAI THE KRAB

A lighthearted moment from our resident Rustacean

@KaiTheKrab

We’d love to hear from you !

Subscribe to receive the latest insights, and updates delivered directly to your inbox and if you know someone who’d enjoy this, feel free to forward it their way!

If you have any feedback or ideas you’d like to see in future issues, we would love to hear from you, drop us a line at [email protected]

📬 Published at Rustacean.ai
🐦 Follow us @RustaceanAI
✍️ Mascot @KaiTheKrab
📌 Curated by @andynog