Will AI agents be able to regularly code small features for us in a year?
💎
Premium
333
á¹€410k
Jul 2
98%
chance

I'm thinking of something like https://mentat.ai/, but that actually works.

I will provide a paragraph or so describing the change I want made. Then it should create a GitHub PR, which I will review and leave only a few comments before merging. The whole process should take less than 30 minutes. This should work fairly reliably.

I tried this yesterday and it failed haha:
https://github.com/manifoldmarkets/manifold/pull/2694

See more discussion in my post:

https://jamesgrugett.com/p/software-automation-will-make-us

Get á¹€1,000 play money
Sort by:

Here's another good one-shot pr from cursor's background agent, adding the ability for admins/mods to 'delete' spam comments so that they aren't rendered at all, unlike the 'hide' feature which still renders the hidden comments: https://github.com/manifoldmarkets/manifold/pull/3600

This took a minute to prompt, 5m for cursor to come up with a solution, and 5-10m to test to make sure it worked.

This was a really good experience! I used cursor's background agent to add a minimum bet filter to the trades tab and it finished a good start in 5 minutes, and then I tested it and prompted it to get rid of pagination, and use infinite scroll instead. Done in less than 20 minutes! https://github.com/manifoldmarkets/manifold/pull/3599

bought á¹€5,000 YES

This looks good to me, stephen gave it two prompts to create this and I think it took less than 10 mins https://github.com/manifoldmarkets/manifold/pull/3588

@ian Looks like we need another prompt to fix the type error, should come in well under 30 mins still, though

bought á¹€50 NO

@ian Initial comment was more than 30 minutes ago, so this is a failure

bought á¹€1,000 NO
bought á¹€2,500 YES at 93%

@CalibratedNeutral oh we stopped paying attention

@CalibratedNeutral I don't know if stephen told it to fix the type error

@ian the key to vibe-coding is to stay just the right amount drunk and not to over do it

bought á¹€50 YES

Claude 4 with github I think does what the mentat.ai thing you linked does

bought á¹€250 NO

@ian do you have access to chatgpt plus or pro and would be willing to see how codex-1 fares? it's currently only accessible on pro and teams iirc but will be accessible to plus probably before the market closes

bought á¹€5,000 YES

GPT 4.1 is awesome for coding.

It's genuinely really good. (mini is ok, nano is dogwater). I have been using it off azure with cursor both as assist and tedious implementation speedrunner - it's one-shot so many instructions that 4o would have a bad time with, and that claude would overthink.

Not tab complete, mostly just asking stuff. Really has come a long way with code

Crazy how ai agents are regularly building small features for me almost daily and this market is still at 80%

@DarklyMade is this code peer reviewed?

@Kire_ of course! The peer review AI looks at it!

I'd like to conduct some tests using codebuff/cursor. What are acceptable small features in your mind? I have a couple ideas:
- add a button to the comments bottom row that allows users to tip the commenter. Denormalize the tip amount onto the comment and display the total tipped amount on the button.
- Add a delete button for admins/mods that marks a comment as deleted (don't actually delete the comment, just set the deleted flag and hidden flags both) that hides the comment completely from the market.

@JamesGrugett said the delete comment button for spam fit the bill, I'll try using codebuff to do this soon

@ian a "view results" button on polls?

@cthor Also seems reasonable!

@ian I am aware that you work on Manifold, but since you are also the largest YES holder can we maybe agree to let @JamesGrugett do these kinds of evaluations once time comes.

@CalibratedNeutral That sounds reasonable, although he doesn't work at manifold anymore so I'm not sure if he'll want to put 30 mins in to do this. I was going to film my attempt from scratch

@CalibratedNeutral I was not aware of that. Then maybe a third party (another developer working on Manifold)? The stakes are reasonably high for me, so I really would strongly prefer to have everything as unbiased as possible.

@CalibratedNeutral We might be able to get @SG or @SirSalty to do it

@CalibratedNeutral Alternatively, @JamesGrugett could test this question on his new startup, codebuff. He uses codebuff to help develop codebuff

@ian Either option sounds good to me as long as the resolution criteria are followed according to @JamesGrugett's judgement

@ian how tf did you get the dead head badge?

Comment hidden