198 points by williamzeng0 2 years ago | 114 comments
Kevin and I met while working at Roblox. We talked to our friends who were junior developers and noticed a lot of them doing grunt work. We wanted to let them focus on important work. Copilot is great, but we realized some tasks could be completely offloaded to an AI (e.g. adding a banner to your webpage https://github.com/sweepai/landing-page/issues/225).
Sweep does this with a code search engine. We use code chunking, ranking, and formatting tricks to represent your codebase in a token-efficient manner for LLMs. You might have seen our blog on code chunking here: https://news.ycombinator.com/item?id=36948403.
We take these fetched code snippets and come up with a plan to write the PR. We found that having the LLM provide structured information using XML tags is very robust, as it’s easy for us to parse with regex, has good support for multi-line answers and is hard for the LLM to mess up.
This is because XML is common in the LLM’s training data (the internet / HTML), and the opening and closing tags rarely appear naturally in text and code, unlike the quotations, brackets, backticks and newlines used by JSON’s and markdown’s delimiters. Further, XML lets you skip the preamble (“This question has to do with xyz. Here is my answer:”) and handles multi-line answers like PR plans and code really well. For example, we ask the LLM for the new code in <new_code> tags and a final boolean answer by writing <answer>True</answer>.
We use this XML format to get the LLM to create a plan, generating a list of files to create and modify from the retrieved relevant files. We iterate through the file changes and edit/create the necessary files. Finally, we push the commits to GitHub and create the PR.
We’ve been using Sweep to handle small issues in Sweep’s own repo (it recently passed 100 commits). We’ve become well acquainted with its limitations. For example, Sweep sometimes leave unimplemented functions with just “# rest of code” since it runs on GPT-4, a model tuned for chatting. Other times, there’s minor syntax errors or undefined variables. This is why we spend the other half of our time building self-recovery methods for Sweep to fix and test its PRs.
First, we invite the developer to review and add comments to Sweep’s pull request. This helps to a point, but Sweep’s code sometimes wouldn’t lint. This is table stakes. It’s frustrating to have to tell the bot to “add an import here” or “this variable is undefined”. To make this better, we used GitHub Actions, which automatically runs the flow of “check the code → tell sweep → sweep fixes the code → check the code again”. We like this flow because you might already have GitHub Actions, and it’s fully configurable. Check out this blog to learn more https://docs.sweep.dev/blogs/giving-dev-tools.
So far, Sweep isn’t that fast, can’t handle massive problems yet, and doesn’t write hundreds of lines of code. We’re excited to work towards that. In the meantime, a lot of our users have been able to get useful results. For example, a user reported that an app was not working correctly on Windows, and Sweep wrote the PR at https://github.com/sweepai/sweep/pull/368/files, replacing all occurrences of "/tmp" with "tempfile.gettempdir()". Other examples include adding a validation function for Github branch name (https://github.com/sweepai/sweep/pull/461) and adding dynamically generated initials in the testimonials on our landing page (https://github.com/wwzeng1/landing-page/issues/28). For more examples, checkout https://docs.sweep.dev/examples.
Our focus is on finding ways that an AI dev can actually help and not just be a novelty. I think of my daily capacity to write good code as a stamina bar. There’s a fixed cost to opening an IDE, finding the right lines of code, and making changes. If you’re working on a big feature and have to context switch, the cost is higher. I’ve been leaving the small changes to Sweep, and my stamina bar stays full for longer.
Our repo is at https://github.com/sweepai/sweep, there’s a demo video at https://www.youtube.com/watch?v=WBVna_ow8vo, and you can install Sweep here: https://github.com/apps/sweep-ai. We currently have a freemium model, with 5 GPT-4 PRs at the free tier, 120 GPT-4 PRs at the paid tier and unlimited at the enterprise tier.
We’re far from our vision of a full AI software engineer, but we’re excited to work on it with the community feedback :). Looking forward to hearing any of your thoughts!
mellosouls 2 years ago
I think you should be less bullish on the "open source". As you kindly clarified for me the other day when I asked [1], only the client is open source. The back end is closed (that's Sweep back end, not the LLM which is obvious) and the product as a whole cannot be self-hosted by third parties.
That's fine (though not clear what the benefit is to developers who might want to contribute), but at the moment the impression being given is that this is an open source product.
Of course, if I have misunderstood I'll be happy to be corrected.
I wish you the best with it as this seems like a very cool product even if its closed at core - but a lack of clarity now may undermine reception and goodwill later.
mellosouls 2 years ago
See the comment below:
williamzeng0 2 years ago
moneywoes 2 years ago
mellosouls 2 years ago
If this was open source, Sweep-core should not be closed source; the whole thing (minus GPT3+ obviously) should be self-hostable.
williamzeng0 2 years ago
Sweep’s logic is fully open and it’s self hostable, but we’ve been focusing on the capabilities of Sweep(not on self hosting) so we haven’t provided docs.
Because Sweep runs entirely in GitHub, it’s easy to install but annoying to self host. You’d need to setup modal and create a new github app.
Definitely doable, some of our community members were able to do it.
mellosouls 2 years ago
Q: Can you run it fully self-hosted (apart from the GPT4 engine obv), or is the repo essentially a client to a Sweep API/binary?
A: The repo is just the backend that runs the GitHub webhooks. [...clip...] Now it's only the GitHub interface with creating tickets and comments.
Anyway, if it is fully self-hostable (minus the LLM endpoint) that's terrific, and I will have a go at it.
williamzeng0 2 years ago
Check out our discord for help! There's a couple of people trying it now :D. Happy to answer questions when we have the time, and I'll point you to the person who set it up themselves.
gcanyon 2 years ago
cdcarter 2 years ago
It's very cool that it inferred the right place to make the change and the steps of finding relevant code, making a plan, then doing it are things I wish all my junior developers would do! This is certainly moving in the right direction.
kevinlu1248 2 years ago
The self-review generally catches stuff like this since we tell it that this code is written by an inexperienced developer, so that Sweep becomes more critical.
williamzeng0 2 years ago
What kinds of videos would you like? We can make anything, the two repos we use the most are Sweep itself and our landing page
csmpltn 2 years ago
All this does is making sure your website will crap all over itself 2 weeks into using this tool (death by a thousand cuts style) and you'll need to hire more people to fix whatever this thing fucks up. Just about the opposite of what automation is supposed to help with.
Good luck!
eldavido 2 years ago
This is useful in some ways. Thinking about situations like pre-release software testing, there are exploratory test cases that are simply too numerous to ever have a human perform economically. A lot of AI is going to do this kind of very low-valued grunt work where it doesn't matter if it's 90% or 99% correct, it's the fact that it can get done at all. A lot of this work is "additive" in the sense that, it's just too expensive to do today (with a human).
The work product of these systems is best seen as a "rough draft" or "suggestion". It's a first cut, not the last word.
On the other hand, a lot (most?) of the meat-and-potatoes coding done today, is situations where things have to WORK. Stuff where correctness absolutely matters--billing/money/settlement (calculating tax, handling returns, moving money between accounts), a lot of OS code for things like memory management / locking / resource management, drug dosing, reservation management, etc.
Granted, this stuff is a lot more complex and nuanced than the code of an average CRUD app, but then, I also don't spend my days implementing bcrypt, quicksort, or self-synchronizing Unicode parsing. We have libraries for that. The question is whether we're better off relying on agents to write a bunch of grunty code, or come up with better top-level organization / code structures, that doing it "by hand" is the better approach.
I'm actually optimistic that we can do better code-wise. But I'd love to see how things develop. Maybe we wouldn't need AI if we just had better programming languages.
kevinlu1248 2 years ago
Further, we're adding better test systems to Sweep. For now, you can just comment to get Sweep to cover the edge cases and write tests. Happy to take any other feedback.
andrejcasey 2 years ago
twelve40 2 years ago
williamzeng0 2 years ago
We have this (to some degree) with issue templates, where you can pre-populate some text and fill in the rest. We're also thinking about good ways to offload that work to Sweep.
williamzeng0 2 years ago
marktani 2 years ago
Is Sweep also taking in suggestions and then incorporates them with follow-up commits to the PR?
williamzeng0 2 years ago
gcanyon 2 years ago
1. They refer to as an "AI junior developer" 2. It creates pull requests, not commits
From (2), your problem is with the person who commits this code without modification, not with the AI.
williamzeng0 2 years ago
huijzer 2 years ago
Good luck!
williamzeng0 2 years ago
Sweep can do this right now, you just have to label it yourself. We're doing this right now so you don't get flooded with PRs if you have a lot of open issues.
victorantos 2 years ago
dottedmag 2 years ago
novawhisper23 2 years ago