33 points by rbanffy 4 days ago | 18 comments
meindnoch 5 hours ago
Strilanc 1 hour ago
#include <stdio.h>
#include <time.h>
int fib_mod_9837(int n) {
int a = 0;
int b = 1;
int k = 1;
while (k < n) {
k <<= 1;
}
while (k) {
int a2 = a*a;
int b2 = b*b;
int ab2 = a*b << 1;
a = a2 + ab2;
b = a2 + b2;
if (n & k) {
int t = a;
a += b;
b = t;
}
a %= 9837;
b %= 9837;
k >>= 1;
}
return a;
}
int main() {
struct timespec begin, end;
clock_gettime(CLOCK_MONOTONIC_RAW, &begin);
int result = fib_mod_9837(99999999);
clock_gettime(CLOCK_MONOTONIC_RAW, &end);
printf("%lld\n", result);
printf("elapsed nanos: %lld\n", (end.tv_nsec - begin.tv_nsec) + (end.tv_sec - begin.tv_sec)*1000000000);
}
lb-guilherme 8 hours ago
meindnoch 5 hours ago
Unintentionally, the article is a good cautionary tale of how algorithmic knowledge can beat raw hardware power, if you know what you're doing.
Someone 7 hours ago
It wouldn’t surprise me at all if doing those 5 matrix multiplications on a CPU were faster than doing 31 on the GPU.
Also, FTA: “To calculate extremly large fibonacci numbers we need to avoid integer overflow. A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication“
⇒ This also is NOT calculating the Fibonacci numbers, copping out when things get difficult on a GPU.
If you want to compute Fn mod some constant for 1 ≤ n ≤ some large constant fast on a GPU, I think I would compute quite a few of the intermediate matrices using the fast way, then spin up some ‘threads’ to compute the intermediate versions by copying those with the ‘standard’ matrix.
KeplerBoy 7 hours ago
It only gets interesting if you need a large matmuls or millions of small matmuls in parallel.
CBLT 2 hours ago
In my experience where I didn't modulo the numbers, the compute time was dominated by the last couple of multiplications with Megabyte-sized integers.
Strilanc 1 hour ago
My favorite of the genre so far is https://www.youtube.com/watch?v=KzT9I1d-LlQ , which tries to compute a big Fibonacci number in under a second. It doesn't do the modulus, so it has to do big integer arithmetic, and so is really about doing big multiplications with FFTs. But then it funnily spins off into a series of videos as the author keeps dryly realizing their code is extremely wasteful (ending in finally just using GMP).
0xf00ff00f 16 minutes ago
eesmith 8 hours ago
FWIW, on my 2020 era laptop, the following:
#include <stdio.h>
int main(void) {
int a = 1, b = 1, c, n = 99999999;
while (--n) {
c = (a+b) % 9837;
a = b;
b = c;
}
printf("%d\n", a);
return 0;
}
takes 330 ms measured as "time a.out".The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:
import functools
@functools.cache
def F(n):
if n <= 2:
return 1
m = n // 2 # rounds down
if n % 2:
# odd
return (F(m+1)**2 + F(m)**2) % 9837
else:
return ((2*F(m+1) - F(m))*F(m)) % 9837
print(F(99999999))
qsort 5 hours ago
seanhunter 3 hours ago
qsort 3 hours ago
Probably Claude can do it? Maybe I'll give it a try later :)
amelius 6 hours ago
Or recursive hashing?
staunton 6 hours ago
amelius 6 hours ago
eesmith 5 hours ago
As mentioned, the topic comes from an end-of-chapter problem set on prefix sums, at https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf .
A prefix sum cam be implemented using a GPU, as demonstrated.
However, using a prefix sum may not be the best way to compute Fibonacci numbers. As demonstrated.
simon_vtr 3 hours ago