182 points by ashvardanian 18 hours ago | 33 comments
rfoo 7 hours ago
[0] For example, gemm but the lhs is in fp8 e4m3 and rhs is in bf16 and we want fp32 accumulation, output to bf16 after applying GELU.
ashvardanian 6 hours ago
The project feels very nice and it would be great to have more notes in the README on the excluded functionality to better scope its applicability in more advanced GPGPU scenarios.
nathanielsimard 5 hours ago
0x7cfe 5 hours ago
wingertge 4 hours ago
nathanielsimard 5 hours ago
kookamamie 9 hours ago
In Halide, the concept was great, yet the problems in kernel development were moved to the side of "scheduling", i.e. determining tiling/vectorization/parallellization for the kernel runs.
the__alchemist 14 hours ago
gitroom 9 hours ago
nathanielsimard 5 hours ago
Since we don't want to rewrite everything multiple times, it also has to be multi-platform and optimal, so the feature set must be per-device, not per-language. I'm not aware of a tool that does that, especially in Rust (which Burn is written in).