Hacker Remix

Launch HN: Sweep (YC S23) – A bot to create simple PRs in your codebase

198 points by williamzeng0 2 years ago | 114 comments

Hi HN! We’re William and Kevin, cofounders of Sweep (https://sweep.dev/). Sweep is an open-source AI-powered junior developer. You describe a feature or bugfix in a GitHub issue and Sweep writes a pull request with code. You can see some examples here: https://docs.sweep.dev/examples.

Kevin and I met while working at Roblox. We talked to our friends who were junior developers and noticed a lot of them doing grunt work. We wanted to let them focus on important work. Copilot is great, but we realized some tasks could be completely offloaded to an AI (e.g. adding a banner to your webpage https://github.com/sweepai/landing-page/issues/225).

Sweep does this with a code search engine. We use code chunking, ranking, and formatting tricks to represent your codebase in a token-efficient manner for LLMs. You might have seen our blog on code chunking here: https://news.ycombinator.com/item?id=36948403.

We take these fetched code snippets and come up with a plan to write the PR. We found that having the LLM provide structured information using XML tags is very robust, as it’s easy for us to parse with regex, has good support for multi-line answers and is hard for the LLM to mess up.

This is because XML is common in the LLM’s training data (the internet / HTML), and the opening and closing tags rarely appear naturally in text and code, unlike the quotations, brackets, backticks and newlines used by JSON’s and markdown’s delimiters. Further, XML lets you skip the preamble (“This question has to do with xyz. Here is my answer:”) and handles multi-line answers like PR plans and code really well. For example, we ask the LLM for the new code in <new_code> tags and a final boolean answer by writing <answer>True</answer>.

We use this XML format to get the LLM to create a plan, generating a list of files to create and modify from the retrieved relevant files. We iterate through the file changes and edit/create the necessary files. Finally, we push the commits to GitHub and create the PR.

We’ve been using Sweep to handle small issues in Sweep’s own repo (it recently passed 100 commits). We’ve become well acquainted with its limitations. For example, Sweep sometimes leave unimplemented functions with just “# rest of code” since it runs on GPT-4, a model tuned for chatting. Other times, there’s minor syntax errors or undefined variables. This is why we spend the other half of our time building self-recovery methods for Sweep to fix and test its PRs.

First, we invite the developer to review and add comments to Sweep’s pull request. This helps to a point, but Sweep’s code sometimes wouldn’t lint. This is table stakes. It’s frustrating to have to tell the bot to “add an import here” or “this variable is undefined”. To make this better, we used GitHub Actions, which automatically runs the flow of “check the code → tell sweep → sweep fixes the code → check the code again”. We like this flow because you might already have GitHub Actions, and it’s fully configurable. Check out this blog to learn more https://docs.sweep.dev/blogs/giving-dev-tools.

So far, Sweep isn’t that fast, can’t handle massive problems yet, and doesn’t write hundreds of lines of code. We’re excited to work towards that. In the meantime, a lot of our users have been able to get useful results. For example, a user reported that an app was not working correctly on Windows, and Sweep wrote the PR at https://github.com/sweepai/sweep/pull/368/files, replacing all occurrences of "/tmp" with "tempfile.gettempdir()". Other examples include adding a validation function for Github branch name (https://github.com/sweepai/sweep/pull/461) and adding dynamically generated initials in the testimonials on our landing page (https://github.com/wwzeng1/landing-page/issues/28). For more examples, checkout https://docs.sweep.dev/examples.

Our focus is on finding ways that an AI dev can actually help and not just be a novelty. I think of my daily capacity to write good code as a stamina bar. There’s a fixed cost to opening an IDE, finding the right lines of code, and making changes. If you’re working on a big feature and have to context switch, the cost is higher. I’ve been leaving the small changes to Sweep, and my stamina bar stays full for longer.

Our repo is at https://github.com/sweepai/sweep, there’s a demo video at https://www.youtube.com/watch?v=WBVna_ow8vo, and you can install Sweep here: https://github.com/apps/sweep-ai. We currently have a freemium model, with 5 GPT-4 PRs at the free tier, 120 GPT-4 PRs at the paid tier and unlimited at the enterprise tier.

We’re far from our vision of a full AI software engineer, but we’re excited to work on it with the community feedback :). Looking forward to hearing any of your thoughts!

mellosouls 2 years ago

Sweep is an open-source AI-powered junior developer.

I think you should be less bullish on the "open source". As you kindly clarified for me the other day when I asked [1], only the client is open source. The back end is closed (that's Sweep back end, not the LLM which is obvious) and the product as a whole cannot be self-hosted by third parties.

That's fine (though not clear what the benefit is to developers who might want to contribute), but at the moment the impression being given is that this is an open source product.

Of course, if I have misunderstood I'll be happy to be corrected.

I wish you the best with it as this seems like a very cool product even if its closed at core - but a lack of clarity now may undermine reception and goodwill later.

[1] https://news.ycombinator.com/item?id=36953720

mellosouls 2 years ago

Update (I'm unable to edit my comment); it appears I was wrong and Sweep is fully open-source. My apologies.

See the comment below:

https://news.ycombinator.com/item?id=37002341

williamzeng0 2 years ago

Appreciate the correction, and apologies for the lack of clarity!

moneywoes 2 years ago

Sounds like the backend is GPT 4?

mellosouls 2 years ago

No, the backend (as I understand it, please check my linked question) is closed source Sweep-core plus GPTx.

If this was open source, Sweep-core should not be closed source; the whole thing (minus GPT3+ obviously) should be self-hostable.

williamzeng0 2 years ago

Sorry I think you misunderstood.

Sweep’s logic is fully open and it’s self hostable, but we’ve been focusing on the capabilities of Sweep(not on self hosting) so we haven’t provided docs.

Because Sweep runs entirely in GitHub, it’s easy to install but annoying to self host. You’d need to setup modal and create a new github app.

Definitely doable, some of our community members were able to do it.

mellosouls 2 years ago

In that case, my apologies for misrepresenting the product - but I think the clarity was lacking rather than me misunderstanding, here's my question which specifically asked about self-hosting for that reason, and your answer (from the link) which seemed to imply no, only the github hook part:

Q: Can you run it fully self-hosted (apart from the GPT4 engine obv), or is the repo essentially a client to a Sweep API/binary?

A: The repo is just the backend that runs the GitHub webhooks. [...clip...] Now it's only the GitHub interface with creating tickets and comments.

Anyway, if it is fully self-hostable (minus the LLM endpoint) that's terrific, and I will have a go at it.

williamzeng0 2 years ago

Definitely our fault on the part of this. We also deprecated the client recently, because we want to focus entirely on the GitHub issues.

Check out our discord for help! There's a couple of people trying it now :D. Happy to answer questions when we have the time, and I'll point you to the person who set it up themselves.

gcanyon 2 years ago

Your demo video https://www.youtube.com/watch?v=WBVna_ow8vo is ridiculously compelling. You need to make a better version of the video, and maybe a few more of them.

cdcarter 2 years ago

I do think it's a pretty cool demo, but I have to say I didn't love that the PR claims Sweep did "manual testing" of the fix. Additionally, sweep reviews the PR and claims that the function is correctly implemented. A sibling points out that there's actually no testing added or done, and there's also issues with the implementation itself. This appears to be a general issue with GPT4 based products, they are extremely self-confident in their language. Presumably this stems from the overall training to work well as a chatbot.

It's very cool that it inferred the right place to make the change and the steps of finding relevant code, making a plan, then doing it are things I wish all my junior developers would do! This is certainly moving in the right direction.

kevinlu1248 2 years ago

Yup, it's a bit frustrating since it's a problem with LLM, RLHF and fine-tuning for chat. In fact, we also added in the prompt to not say that it did testing. I find that in general it seems really difficult to tell a language model (especially 3.5) to not do something.

The self-review generally catches stuff like this since we tell it that this code is written by an inexperienced developer, so that Sweep becomes more critical.

williamzeng0 2 years ago

Much appreciated! I just got a new mic so the audio won't be so bad.

What kinds of videos would you like? We can make anything, the two repos we use the most are Sweep itself and our landing page

csmpltn 2 years ago

The code produced in the "getInitials" function handles absolutely no corner cases whatsoever. It also didn't add any tests to the PR.

All this does is making sure your website will crap all over itself 2 weeks into using this tool (death by a thousand cuts style) and you'll need to hire more people to fix whatever this thing fucks up. Just about the opposite of what automation is supposed to help with.

Good luck!

eldavido 2 years ago

I think we're going to see a lot of this. I worked in self-driving and the stuff always 95% worked. Never 100.

This is useful in some ways. Thinking about situations like pre-release software testing, there are exploratory test cases that are simply too numerous to ever have a human perform economically. A lot of AI is going to do this kind of very low-valued grunt work where it doesn't matter if it's 90% or 99% correct, it's the fact that it can get done at all. A lot of this work is "additive" in the sense that, it's just too expensive to do today (with a human).

The work product of these systems is best seen as a "rough draft" or "suggestion". It's a first cut, not the last word.

On the other hand, a lot (most?) of the meat-and-potatoes coding done today, is situations where things have to WORK. Stuff where correctness absolutely matters--billing/money/settlement (calculating tax, handling returns, moving money between accounts), a lot of OS code for things like memory management / locking / resource management, drug dosing, reservation management, etc.

Granted, this stuff is a lot more complex and nuanced than the code of an average CRUD app, but then, I also don't spend my days implementing bcrypt, quicksort, or self-synchronizing Unicode parsing. We have libraries for that. The question is whether we're better off relying on agents to write a bunch of grunty code, or come up with better top-level organization / code structures, that doing it "by hand" is the better approach.

I'm actually optimistic that we can do better code-wise. But I'd love to see how things develop. Maybe we wouldn't need AI if we just had better programming languages.

kevinlu1248 2 years ago

I think for teams that want to move fast in a non-critical environment (health, finance etc.) something that works 90% of the time is fine. Getting to 95% takes twice the amount of time but does not provide twice the value. When the 5% difference becomes the difference maker we can fix it later.

Further, we're adding better test systems to Sweep. For now, you can just comment to get Sweep to cover the edge cases and write tests. Happy to take any other feedback.

andrejcasey 2 years ago

Sorry but unless the core business is making statistical predictions, then you're wrong. Other industries (like health and finance) still need robust applications with like 99.99% uptime.

twelve40 2 years ago

Ok i get the skepticism but what i liked about their description is that it's not the overblown hysterical "AI superhuman programmer" pitch, but a more modest "junior" angle. If they keep looking for something that clicks, there are lots of "junior" niches that could be filled - for example, I can see that thing automatically working to beef up the test coverage. It's kind of difficult to screw up tests, the potential fallout is low and there is a unambiguous number (coverage) as the success criteria. If we look around our daily developer lives, there might be more cases that could be automated with this, even if it doesn't ever become good enough for any general programming.

williamzeng0 2 years ago

Something I'm a big fan of is making a small successful ticket (for example migrate the functions in one file) and then applying it map-reduce style across the entire codebase. This could help a lot, and by definition addresses repetitive work.

We have this (to some degree) with issue templates, where you can pre-populate some text and fill in the rest. We're also thinking about good ways to offload that work to Sweep.

williamzeng0 2 years ago

That's completely right, the testimonials will look really strange if the names have 3+ words in them. That's why we're targeting really strong developers to review Sweep's PRs. An experienced dev(like you) will be able to read the code, think "hey this needs tests and edge cases" and then request changes instead of merging it.

marktani 2 years ago

Thanks for staying constructive and on topic. Super interesting tool and amazing video!

Is Sweep also taking in suggestions and then incorporates them with follow-up commits to the PR?

williamzeng0 2 years ago

Yes Sweep does! It's through file comments and PR comments. We also handle failing GitHub actions.

gcanyon 2 years ago

Hence why:

1. They refer to as an "AI junior developer" 2. It creates pull requests, not commits

From (2), your problem is with the person who commits this code without modification, not with the AI.

williamzeng0 2 years ago

That would be me, completely happy to take the blame here :) We manually update the content here so the code works just fine for now.

huijzer 2 years ago

I think this makes sense. I've seen many situations of large software projects where some bug is just open for months or even years and actually very easy to fix. In hindsight then, it was then a lot of missed value if the bug just lingered around for no good reason. If there was some tool that could just run in the background and randomly pop up a PR from time to time, then that would be cool.

Good luck!

williamzeng0 2 years ago

Yep, these bugs can be trivial but that initial context switch, creating a branch, etc tends to drain your energy.

Sweep can do this right now, you just have to label it yourself. We're doing this right now so you don't get flooded with PRs if you have a lot of open issues.

victorantos 2 years ago

The risk of introducing new bugs while fixing old ones should not be underestimated. Software development is a delicate process, and even seemingly minor bug fixes can have unintended consequences. Striking a balance between bug fixing and feature development is crucial to maintain a stable and reliable codebase.

dottedmag 2 years ago

CC-NC-SA is not an open-source license. Please do not use "open source" to describe your software in your marketing materials.

novawhisper23 2 years ago

These AI startups need to resort to slimy measures to attract people to their useless product. What else is new?