remix logo

Hacker Remix

Show HN: I built an AI that turns GitHub codebases into easy tutorials

911 points by zh2408 1 week ago | 171 comments

bilalq 1 week ago

This is actually really cool. I just tried it out using an AI studio API key and was pretty impressed. One issue I noticed was that the output was a little too much "for dummies". Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is. Every chapter seems to suffer from this. The generated documentation seems more suited for a slightly technical PM moreso than a software engineer. This can probably be mitigated by refining the prompt.

The prompt would also maybe be better if it encouraged variety in diagrams. For somethings, a flow chart would fit better than a sequence diagram (e.g., a durable state machine workflow written using AWS Step Functions).

cushychicken 1 week ago

Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.

I don’t think the outright dismissal of AI is smart. (And, OP, I don’t mean to imply that you are doing that. I mean this generally.)

I also suspect people who level these criticisms have never really used a frontier LLM.

Feeding in a whole codebase that I’m familiar with, and hearing the LLM give good answers about its purpose and implementation from a completely cold read is very impressive.

Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.

linotype 1 week ago

Many devs still think their job is to write code not build products their business needs. I use LLMs extensively and it’s helped me work better faster.

grugagag 1 week ago

LLMs excel at some things and work very poorly at others. People working on different problems have had different experiences, sometimes opposite ends of the spectrum.

danieldk 1 week ago

I think the people who claim 10x-100x productivity improvements are working on tasks where LLMs work really well. There is a lot of development work out there that is relatively simple CRUD and LLMs are very good at it. On the complete opposite end we have designing new algorithms/data structures or extending them in a novel way. Or implementing drivers for new hardware from incomplete specs. LLMs do not do well on these tasks or even slow down developers 10x.

So, I think the claims of improvement in productivity and regression in productivity can be true at the same time (and it's not just that people who don't find using LLMs productive are just prompting them wrong).

I think most can be gained by learning in which areas LLMs can give large productivity boosts and where it's better to avoid using them. Of course, this is a continuous process, given that LLMs are still getting better.

Personally, I am quite happy with LLMs. They cannot replace me, but they can do a chunk of the boring/repetitive work (e.g. boilerplate), so as a result I can focus on the interesting problems. As long as we don't have human-like performance (and I don't feel like we are close yet), LLMs make programming more interesting.

They are also a great learning aid. E.g., this morning I wanted to make a 3D model for something I needed, but I don't know OpenSCAD. I iteratively made the design with Claude. At some point the problem becomes too difficult for Claude, but with the code generated at that point, I have learned enough about OpenSCAD that I can fix the more difficult parts of the project. The project would have taken me a few hours (to learn the language, etc.), but now I was done in 30 minutes and learned some OpenSCAD in a pleasant way.

kaycebasques 1 week ago

Your OpenSCAD experience is an important point in the productivity debates that is often not discussed. A lot of projects that were previously impossible are now feasible. 10 years ago, you might have searched the OpenSCAD docs, watched videos, felt like it was impossible to find the info you needed, and given up. Claude and similar tools have gotten me past that initial blocker many times. Finding a way to unblock 0 to 1 productivity is perhaps as important (or maybe even more important than) as enabling 1 to 10 or 1 to 100.

iteria 1 week ago

You don't even need such fancy examples. There are plenty of codebases where people are working with code that is over a decade old and has several paradigms all intermixed with a lot of tribal knowledge that isn't documented in code or wiki. That is where AI sucks. It will not be able to make meaningfully change in that environment.

There is also the frontend and tnpse code bases don't need to be very old at all before AI falls down. NPM packages and clashing styles in a codebase and AI has been not very helpful to me at all.

Generally speaking, which AI is a fine enhancement to autocomplete, I haven't seen it be able to do anything more serious in a mature codebase. The moment business rules and tech debt sneak in in any capacity, AI becomes so unreliable that it's faster to just write it yourself. If I can't trust the AI to automatically generate a list of exports in an index.ts file. What can I trust it for?

simonw 1 week ago

When is the last time you tried using LLMs against a large, old, crufty undocumented codebase?

Things have changed a lot in the past six weeks.

Gemini 2.5 Pro accepts a million tokens and can "reason" with them, which means you can feed it hundreds of thousands of lines of code and it has a surprisingly good chance of figuring things out.

OpenAI released their first million token models with the GPT 4.1 series.

OpenAI o3 and o4-mini are both very strong reasoning code models with 200,000 token input limits.

These models are all new within the last six weeks. They're very, very good at working with large amounts of crufty undocumented code.

grugagag 7 days ago

Ultimately LLMs don’t really understand what the code does at runtime. Sure, just parsing out the codebase can help make a good guess but in some cases it’s hard to trust LLMs with changes because the consequences are unknown in complex codebases that have weird warts nobody documented.

Maybe in a generation or two codebases will become more uniform and predictible if fewer humans do it by hand. Same with self driving cars, if there were no human drivers out there the problem would become trivial to conquer.

simonw 7 days ago

That's a lot less true today than it was six weeks ago. The "reasoning" models are spookily good at answering questions about how code runs, and identifying the source of bugs.

They still make mistakes, and yeah they're still (mostly) next token predicting machines under the hood, but if your mental model is "they can't actually predict through how some code will execute" you may need to update that.

LunaSea 7 days ago

Gemini 2.5 Pro crashes with a 50) status code every 5 requests. Not great for a model you're supposed to rely on.

simonw 6 days ago

Yeah, there's a reason it still has "preview" and "experimental" in the model names.

kaycebasques 1 week ago

> hearing the LLM give good answers about its purpose and implementation from a completely cold read

Cold read ability for this particular tool is still an open question. As others have mentioned, a lot of the example tutorials are for very popular codebases that are probably well-represented in the language model's training data. I'm personally going to test it on my private, undocumented repos.

tossandthrow 1 week ago

> Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.

IMHO, Ai text additions are generally not valuable and I assume, until proven wrong, that Ai text provides little to no value.

I have seen so many startups fold after they made some ai product that on the surface level appeared impressive but provided no substantial value.

Now, I will be impressed by the ai that can remove code without affecting the product.

jonahx 1 week ago

> Now, I will be impressed by the ai that can remove code without affecting the product.

Current AIs can already do this decently. With the usual caveats about possible mistakes/oversight.

otabdeveloper4 6 days ago

Summarization is one thing LLM's can do well, yes. (That's not what this current hype cycle is selling though.)

kaycebasques 1 week ago

> Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is.

It sounds like the tool (as it's currently set up) may not actually be that effective at writing tutorial-style content in particular. Tutorials [1] are usually heavily action-oriented and take you from a specific start point to a specific end point to help you get hands-on experience in some skill. Some technical writers argue that there should be no theory whatsoever in tutorials. However, it's probably easy to tweak the prompts to get more action-oriented content with less conceptual explanation (and exclamation marks).

[1] https://diataxis.fr/tutorials/

neop1x 6 days ago

>> This can probably be mitigated by refining the prompt

Sometimes it explains things like I am a child and sometimes it doesn't explain things well enough. I think fixing this just by a simple prompt change won't work - it may fix it in one part and make things worse in the other part. This is a problem which I have with LLM: you can fine-tune the prompt for a specific case but I find it difficult to write a universally-working prompt. The problem seems to be LLM "does not understand my intents", like it can't deduce what I need and "proactively" help. It follows requirements from the prompt but the prompt has to (and can't) handle all situations. I am getting tired of LLM.

hackernewds 1 week ago

exactly it is. I'd rather impressive but at the same time the audience is always going to be engineers, so perhaps it can be curated to still be technical to a degree? I can't imagine a scenario where I have to explain to the VP my ETL pipeline

trcf21 1 week ago

From flow.py

Ensure the tone is welcoming and easy for a newcomer to understand{tone_note}.

- Output only the Markdown content for this chapter.

Now, directly provide a super beginner-friendly Markdown output (DON'T need ```markdown``` tags)

So just a change here might do the trick if you’re interested.

But I wonder how Gemini would manage different levels. From my take (mostly edtech and not in English) it’s really hard to tone the answer properly and not just have a black and white (5 year old vs expert talk) answer. Anyone has advice on that?

porridgeraisin 1 week ago

This has given me decent success:

"Write simple, rigorous statements, starting from first principles, and making sure to take things to their logical conclusion. Write in straightforward prose, no bullet points and summaries. Avoid truisms and overly high-level statements. (Optionally) Assume that the reader {now put your original prompt whatever you had e.g 5 yo}"

Sometimes I write a few more lines with the same meaning as above, and sometimes less, they all work more or less OK. Randomly I get better results sometimes with small tweaks but nothing to make a pattern out of -- a useless endeavour anyway since these models change in minute ways every release, and in neural nets the blast radius of a small change is huge.

trcf21 1 week ago

Thanks I’ll try that!

swashbuck1r 1 week ago

While the doc generator is a useful example app, the really interesting part is how you used Cursor to start a PocketFlow design doc for you, then you fine-tuned the details of the design doc to describe the PocketFlow execution graph and utilities you wanted the design of the doc-generator to follow…and then you used used Cursor to generate all the code for the doc-generator application.

This really shows off that the simple node graph, shared storage and utilities patterns you have defined in your PocketFlow framework are useful for helping the AI translate your documented design into (mostly) working code.

Impressive project!

See design doc https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/bl...

And video https://m.youtube.com/watch?v=AFY67zOpbSo

mooreds 1 week ago

I had not used gemini before, so spent a fair bit of time yak shaving to get access to the right APIs and set up my Google project. (I have an OpenAPI key but it wasn't clear how to use that service.)

I changed it to use this line:

   api_key=os.getenv("GEMINI_API_KEY", "your-api_key")
instead of the default project/location option.

and I changed it to use a different model:

    model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-preview-03-25")
I used the preview model because I got rate limited and the error message suggested it.

I used this on a few projects from my employer:

- https://github.com/prime-framework/prime-mvc a largish open source MVC java framework my company uses. I'm not overly familiar with this, though I've read a lot of code written in this framework.

- https://github.com/FusionAuth/fusionauth-quickstart-ruby-on-... a smaller example application I reviewed and am quite familiar with.

- https://github.com/fusionauth/fusionauth-jwt a JWT java library that I've used but not contributed to.

Overall thoughts:

Lots of exclamation points.

Thorough overview, including of some things that were not application specific (rails routing).

Great analogies. Seems to lean on them pretty heavily.

Didn't see any inaccuracies in the tutorials I reviewed.

Pretty amazing overall!

mooreds 1 week ago

If you want to see what output looks like (for smaller projects--the OP shared some for other, more popular projects), I posted a few of the tutorials to my GitHub:

https://github.com/mooreds/prime-mvc-tutorial

https://github.com/mooreds/railsquickstart-tutorial

https://github.com/mooreds/fusionauth-jwt-tutorial/

Other than renaming the index.md file to README.md and modifying it slightly, I made no changes.

Edit: added note that there are examples in the original link.

mooreds 1 week ago

Update, billing was delayed, but for 4 tutorials it cost about $5.

manofmanysmiles 1 week ago

I love it! I effectively achieve similar results by asking Cursor lots of questions!

Like at least one other person in the comments mentioned, I would like a slightly different tone.

Perhaps good feature would be a "style template", that can be chosen to match your preferred writing style.

I may submit a PR though not if it takes a lot of time.

zh2408 1 week ago

Thanks—would really appreciate your PR!