154 points by hardmaru 4 days ago | 50 comments
RevEng 2 days ago
I'm always happy to see publishing of negative results, but it seems like they are selling what are negative results as positive results.
verdverm 3 days ago
If they can test against Llama 70B and Mistral 7B, they ought to compare against Mistral 8x7b imho
imtringued 3 days ago
wildermuthn 3 days ago
verdverm 3 days ago
wildermuthn 3 days ago
verdverm 3 days ago
I still have yet to see anything that dissuades me from agreeing with Yann LeCun when he says Transformers are fundamentally limited. We won't get creativity, reasoning, or even move past hallucinations without a major breakthrough
mordymoop 3 days ago
verdverm 3 days ago
For example, a small child is completely capable of being told "get in the car" and can understand, navigate, open the door, and get in, with incredibly little energy usage (maybe about the amount of a single potato chip/crisp)
Now consider what I have been working on recently (1) evaluating secops tools from both a technical and business perspective (2) prototyping and creating an RFC for the next version of our DX at the org. They are very far from this capability because it involves so many competing incentives, trade offs, and not just the context of the current state of code, but also the history and vision. Crafting that vision is especially beyond what a foundation in transformers can offer. They are in essence an averaging and sequence prediction algorithm
These tools are useful, even provide an ROI, but by no means anywhere close to what I would call intelligent.
monophonica 11 hours ago
Faith and Fate: Limits of Transformers on Compositionality https://arxiv.org/abs/2305.18654
Maybe the analogy is something with gold mining. We could pretend that the machines that mine gold are actually creating gold. Pretending the entire gold mining sector is instead a discovery of alchemy.
Maybe the way alchemy kind of leads to chemistry is the analogy that applies?
I don't even know if that is right though.
The intelligence is in the training data. The model then is extracting the intelligence.
We can't forget Feynman's ideas here that we aren't going to make a robot cheetah that runs fast. We will make a machine that uses wheels. Viewing things through the lense of a cheetah is a category error.
While I agree completely with you we very well both might be completely and utterly wrong. A category error on what intelligence "is".
mtts 3 days ago
Z-vectors are of course nothing like the subsystems in your brain, but general the approach is certainly similar to how the brain works.
dleeftink 3 days ago
Senses?
mtts 3 days ago
dleeftink 3 days ago
mtts 3 days ago
bugglebeetle 3 days ago
They now have an API that allows for dynamic exploration and manipulation of the latent space for LLama 8-70B models (think Golden Gate Claude). They also open sourced the sparse auto-encoders that (in part) allow for this:
https://huggingface.co/Goodfire/Llama-3.3-70B-Instruct-SAE-l...
logicchains 3 days ago
It's already been invented: https://arxiv.org/abs/2202.05780 . That design is just very inefficient to scale up / use as a transformer backbone.
mnky9800n 3 days ago
magospietato 3 days ago
Jerrrry 3 days ago
Remove the bottom weights dynamically based on the local gradient in varentrophy so that internal dissonance ("doubt") can be selected against.
"Preference Optimization" but with more opportunities for meta-optimization.
QuadmasterXLII 3 days ago
mnky9800n 3 days ago
QuadmasterXLII 3 days ago
mnky9800n 3 days ago
liuliu 3 days ago