483 points by ingve 6 days ago | 67 comments
ofou 6 days ago
[1]: https://www.bishopbook.com [2]: https://www.oreilly.com/library/view/ai-engineering/97810981... [3]: https://d2l.ai [4]: https://udlbook.github.io/udlbook/
Tsarp 5 days ago
swyx and teams podcast, newsletter and discord has been the highest signal to noise ratio for keeping up and learning.
swyx 5 days ago
ofou 3 days ago
I read your book, The Coding Career Handbook, we need something similar for AI Engineering! I really enjoyed it. Thank you for creating and sharing such high-quality multimodal content :)
swyx 3 days ago
rahimnathwani 5 days ago
But the other books (#1, #3, #4) seem like they're intended for those who want to understand all the math. Many people don't want (or need) a full understanding of how all this works. They can provide significant value to their employers with some knowledge of how machine learning works (e.g. the basics of CNNs and RNNs), and some intuitions/vibes about SOTA LLMs, even if they don't understand transformers or other modern innovations.
ewuhic 5 days ago
ofou 5 days ago
Here’s an example: https://d2l.ai/chapter_natural-language-processing-pretraini...
ewuhic 5 days ago
drdude 5 days ago
I read Deep Learning by Goodfellow and Deep Learning with TensorFlow 2 and Keras for practical stuff. I am still thinking if I should do the D2L for additional practice in my free time, though.
kamikazeturtles 6 days ago
I have a feeling, unless you're dabbling at the cutting edge of AI, there's no point in reading research papers. Just get a feel for how these LLMs respond then build a pretty and user friendly app on top of them. Knowing the difference between "multi head attention" and "single head attention" isn't very useful if you're just using OpenAI or Groq's API.
Am I missing something here? I'd love to know where I'm wrong
lolinder 6 days ago
Let's put it this way: if even half the people who call themselves "AI Engineers" would read the research in the field, we'd have a lot less hype and a lot more success in finding the actual useful applications of this technology. As is, most "AI Engineers" assume the same thing you do and consider "AI Engineering" to be "I know how to plug this black box into this other black box and return the result as JSON! Pay me!". Meanwhile most AI startups are doomed from the start because what they set out to do is known to be a bad fit.
wnmurphy 6 days ago
To be fair, most of software engineering is this.
bumby 5 days ago
torginus 5 days ago
bumby 5 days ago
I would disagree that most engineering is not involved in building something...whether most engineers understand the math/science behind it is debatable.
otteromkram 6 days ago
dietr1ch 5 days ago
sanderjd 6 days ago
Or rather, I guess I feel like it's a sign of the immaturity of the space that it is still kind of unclear (at least it is to me) how to build useful things without reading all the research papers.
To me, it seems like there is an uncanny valley between "people who are up on all the papers in this reading list" and "people who are just getting a feel for how these LLMs respond and slapping a UI on top".
Maybe it kind of reminds me of the CGI period of the web. The "research papers" side is maybe akin to all the people working on networking protocols and servers necessary to run the web, and the "slap a UI over the llm APIs" is akin to those of us slinging html and perl scripts.
You could make ok stuff that way, without needing to understand anything about TCP. But it still took a little while for a more professionalized layer to mature between those two extremes.
I feel like maybe generative AI is in the early days of that middle layer developing?
hintymad 6 days ago
HPsquared 5 days ago
crystal_revenge 5 days ago
As someone working in the area for a few years now (both on the product and research side), I strongly disagree. A shocking number of papers in this area are just flat out wrong. Universities/Research teams are churning out garbage with catchy titles at such a tremendous rate that reading all of these papers will likely leave one understanding less than if they read none.
The papers in this list are decent, but I wouldn't be shocked if the conclusions of a good number of them were ultimately either radically altered or outright inverted as we learn more about what's actually happening in LLMs.
The best AI engineers I've worked with are just out there experimenting and building stuff. A good AI engineer definitely has to be working closely to the model, if you're just calling an API you're not really an "AI Engineer" in my book. While most good AI engineers have likely accidentally read most of these paper through the course of their day job, they tend to be reading them with skepticism.
A great demonstration of this is the Stable Diffusion community. Hardly any of the innovation in that space is even properly documented (this, of course, is not ideal), much less used for flag planting on arXiv. But nonetheless the generative image AI scene is exploding in creativity, novel applications, and shocking improvements all with far less engineering/research resources devoted to the task than their peers in the LLM world.
serjester 5 days ago
swyx 6 days ago
> Just get a feel for how these LLMs respond then build a pretty and user friendly app on top of them.
as you know, "just" is a very loaded word in software engineering. The entire thesis of AI Eng is that this attidude of "just slap a UI on an LLM bro whats so hard" is a rapidly deepening field, with its own stack and specialization (which, yes, some if not much of which is unnecessary, vc funded hypey complexity merchantism, but some of which is also valid), and if you do not take it seriously, others will, and do so running rings around those who have decided to not even try to push this frontier, passively waiting for model progress to solve everything.
i've seen this play out before in underappreciated subfields of engineering that became their own thing, with their own language, standard stack, influencers, debates, controversies, IPOs, whole 9 yards.... frontend eng, mobile eng, SRE, data eng, you name it. you just have to see the level and quality of work that these people are doing that is sufficiently distinct from MLE and product/fullstack webdev to appreciate that it probably deserves its own field of study, and while it will NEVER be as prestigious as AI research, there will be a ton more people employed in these roles than there can be in research and thats a perfectly fine occupation too.
I'm even helping instruct a course about it this week as it happens if you want to see what a practical syllabus for it looks like https://maven.com/noah-hein/ai-engineering-intro
thecupisblue 6 days ago
1. The actual deep ML researchers that work on models 2. The "AI engineer" who creates products based on LLM's 3. The "AI researchers" who basically just stack LLM's together and call it something like Meta-Cognitive Chain-of-Thought Advanced Reasoning Inteligence or whatever it is.
jhanschoo 5 days ago
> 3. The "AI researchers" who basically just stack LLM's together and call it something like Meta-Cognitive Chain-of-Thought Advanced Reasoning Inteligence or whatever it is.
I actually think that working purely within the traditional neural nets model is starting to hit against its limits and the most fruitful directions for research are systems that incorporate and modify LLMs on-line, among other systems, despite your unserious characterization of this class of research.
otteromkram 6 days ago
Seems like there's one AI engineer, which is b. The other two are researchers, one doesn't even focus on AI since ML covers a broader swath of disciplines.
eKIK 6 days ago
Similarly I know how to call cryptography libraries to get my passwords hashed using a suitable cipher before storing them. I don't understand the deep math behind why a certain cipher is secure, but that's fine. I can still make good use of cryptographic functions. I'm not a cryptography engineer either :).
My take on it is that if you should call yourself any kind of "XYZ Engineer", you should be able to understand the inner workings of XYZ.
This reading list is most likely (mostly) for those who want to get a really deep understanding and eventuellt work on contributing to the "foundational systems" (for a lack of a better word) one day.
Hope that helps.
swyx 6 days ago
consider:
- does a React/frotnend engineer need to know everything about react internals to be good at their job?
- does a commercial airline pilot need to know every single subsystem in order to do their job?
- do you, a sophisticated hackernewsian, really know how your computer works?
more knowledge is always (usually) better but as a thing diffuses into practice and industry theres a natural stopping point that “technician” level people reach that is still valuable to society bc of relative talent supply and demand.
torginus 5 days ago
Yes? Well, not everything (which I define as being able to implement React from scratch). But if you want to do good work, and be able to fix those pesky bugs which result from the arcane behavior of the framework itself, then you better know your stuff.
Besides, in practice very few people understand the most basic stuff about React. Just recently I had to explain to a veteran frontend dev what list virtualization was and why it's not a good idea to display a list of 100k items directly.
mettamage 5 days ago
vunderba 5 days ago
Not a great comparison. First off, nobody is suggesting that a self-purported "AI Engineer" has to understand EVERY SINGLE SUBSYSTEM, but they should still have a strong command of the internal workings of the modern foundational material (transformers, neural networks, latent space, etc.) to style themselves as such.
The better question is "does an aviation mechanic need to understand the internal systems of an airplane?" and the answer is a resounding yes.
swyx 6 days ago
we went thru this specific reading list in our paper club: https://www.youtube.com/watch?v=hnIMY9pLPdg
if you are interested in a narrative version.
Flux159 5 days ago
- Actual examples of Fine tuning of LLMs or making merges - usually talked about in r/localLlama for specific use cases like role playing or other scenarios that instruction tuned LLMs are not good at. Jupyter notebook or blog post would be great here.
- Specifically around Agents & Code generation - Anthropic's post about SWE-bench verified gives a very practical look at writing a coding agent https://www.anthropic.com/research/swe-bench-sonnet with prompts, tool schema and metrics.
- The wide amount of Loras and fine tunes available on civitai for image models - a guide on making a custom one that you can use in ComfyUI.
- State of the art in audio models in production - Elevenlabs seems to still be the best for closed platforms, but there are some options for open source voice cloning, TTS, or even text to speech with very small parameter models (kokoro 82M).
swyx 5 days ago
adamgordonbell 6 days ago
I am out of my depth when it comes to reading papers, but I second 'The Prompt Report' from your list.
It gives a great taxonomy that helped me understand the space of prompting techniques better.
swyx 6 days ago
fancyfredbot 5 days ago
swyx 5 days ago
andrekorol 6 days ago
I’m curious, is there also some specific existing “AI Researcher Reading List” you would personally recommend? Or do you plan on making and maintaining one?
swyx 6 days ago
jamalaramala 5 days ago
> 1. GPT1, GPT2, GPT3, Codex, InstructGPT, GPT4 papers. Self explanatory. (...)
> 2. Claude 3 and Gemini 1 papers to understand the competition. (...)
> 3. LLaMA 1, Llama 2, Llama 3 papers to understand the leading open models. (...)
I agree that you should have read most of these papers at the time, when they were released, but I wonder if it would be that useful to read them now.
Perhaps it would be better to highlight one or two important papers from this section?