Built MicroPython bindings for PyTorch's C++ tensor kernels, targeting the ESP32 with just 520KB of SRAM.
The core idea: strip PyTorch down to its bare tensor operations, cross-compile for Xtensa, and expose them through MicroPython's C API. No autograd, no JIT, no CUDA -- just the math.
import torch
x = torch.tensor([1.0, 2.0, 3.0])
y = x * 2 + 1
print(y) # tensor([3.0, 5.0, 7.0])
This runs on hardware that costs $4.
github.com/ljk53/upytorch