42 points by muragekibicho 6 months ago | 32 comments
muragekibicho 6 months ago
It's a CUDA alternative that uses finite field theory to convert GPU kernels to prime number fields.
Finite Field is the primary data structure : FF-asm is a CUDA alternative designed for computations over finite fields.
Recursive computing support : not cache-aware vectorization, not parallelization, but performing a calculation inside a calculation inside another calculation.
Extension of C89 - runs everywhere gcc is available. Context : I'm getting my math PhD and I built this language around my area of expertise, Number Theory and Finite Fields.
almostgotcaught 6 months ago
Your LinkedIn says you're an undergrad that took a gap year 10 months ago (before completing your senior year) to do sales for a real estate company.
pizza 6 months ago
almostgotcaught 6 months ago
pizza 6 months ago
But, suppose I did actually hold that belief for some reason, then it would seem fairly intellectually dishonest to withhold relevant info in my pointed inquisition wherein I just characterize them as someone lacking mathematical experience at all, let alone from a world class university. But maybe that's just me!
foota 6 months ago
It's unclear whether this page is something that could be useful, and deserves attention. The fact that the author is at best making misleading statements is useful in determining whether you should take their claims at face value.
They claim "Finite Field Assembly is a programming language that lets you emulate GPUs on CPUs".
It's not a programming language, it's a handful of C macros, and it doesn't in any way emulate a GPU on the CPU. I'll be honest, I think the author is trying to fake it till they make it, they seem interested in mathematics but their claims are far beyond what they've demonstrated, and their post history reveals a series of similar submissions. In so far as they're curious and want to experiment I think it's reasonable to encourage, but they're also asking for money and don't seem to be delivering much.
Why would they post the 4th article in a series where the previous ones require you to pay?
almostgotcaught 6 months ago
Am I taking crazy pills? I didn't bring it up, the guy himself, here at the top of this very thread branch, wrote specifically explicitly that he's a PhD student working on number theory.
> Wouldn't it be more interesting to discuss the merits of the post?
There is no merit, nothing to discuss. I linked the corresponding GitHub below so you can judge for yourself.
saghm 6 months ago
saagarjha 6 months ago
almostgotcaught 6 months ago
zeroq 6 months ago
Additionally I've tried earlier chapters and they are behind a paywall.
You need a better introduction.
pizza 6 months ago
Conscat 6 months ago
I for one have no clue what anything I read in there is supposed to mean. Emulating a GPU's semantics on a CPU is a topic which I thought I had a decent grasp on, but everything from the stated goals at the top of this article to the example code makes no sense to me.
pizza 6 months ago
adamvenis 6 months ago
markisus 6 months ago
almostgotcaught 6 months ago
vimarsh6739 6 months ago
Interestingly, in the same work, contrary to what you’d expect, transpiling GPU code to run on CPU gives ~76% speedups in HPC workloads compared to a hand optimized multi-core CPU implementation on Fugaku(a CPU only supercomputer), after accounting for these differences in synchronization.
petermcneeley 6 months ago
Looks like this entire paper is just about how to move/remove these barriers.
vimarsh6739 6 months ago
An interesting add on to me would be the handling of conditionals. Because newer GPUs have independent thread scheduling which is not present in the older ones, you have to wonder what is the desired behaviour if you are using CPU execution as a debugger of sorts(or are just GPU poor). It'd be super cool to expose those semantics as a compiler flag for your transpiler, allowing me to potentially debug some code as if it ran on an ancient GPU like a K80 for some fast local debugging.
But the ambitious question here is this - if you take existing GPU code, run it through a transpiler and generate better code than handwritten OpenMP, do you need to maintain an OpenMP backend for the CPU in the first place? It'd be better to express everything in a more richer parallel model with support for nested synchronization right? And let the compiler handle the job of inter-converting between parallelism models. It's like saying if Pytorch 2.0 generates good Triton code, we could just transpile that to CPUs and get rid of the CPU backend. (of course triton doesn't support all patterns so you would fall back to aten, and this kind of goes for a toss)
petermcneeley 6 months ago
I agree that statically proving that something like the syncing is unnecessary can only be a good thing.
The question of why not simply take your GPU code and transpile to CPU code is more of the question of what did you originally lose in writing the GPU code to begin with. If you are talking about ML work most of that is expressed a bunch of matrix operations that naturally translate to GPUs with low impedance. But other kinds of operations might be better expressed directly as CPU code (any serial operations). And for CPU to GPU the loss as you have pointed out is probably in the synchronization.
hashxyz 6 months ago
Conscat 6 months ago
foota 6 months ago
Edit: this tickles my brain about some similar seeming sort of programming language experiment, where they were also trying to express concurrency (not inherently the same as parallelism) using some fancy math. I can't remember what it was though?
foota 6 months ago
imbusy111 6 months ago
catapart 6 months ago
I know that's pretty abstract, but without that kind of "apples to apples" comparison, I have trouble contextualizing what kind of output is bring targeted with this kind of work.
pwdisswordfishz 6 months ago
I thought a finite field's order has to be a prime power.
markisus 6 months ago
I’m dubious of this project.
tooltechgeek 6 months ago
almostgotcaught 6 months ago
This was discussed on Reddit - this is not actually finite field arithmetic.
Also you can go to this dudes GitHub and see exactly how serious this project is.
https://github.com/LeetArxiv/Finite-Field-Assembly
Lol
6 months ago
Retr0id 6 months ago