These AI Agents Ain't It. Our Developer Jobs Are Safe (for now)!

These AI Agents Ain't It. Our Developer Jobs Are Safe (for now)!

• 293 views
vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinoengineering managerleadershipmsftsoftware developercode commutecodecommutecommutegithub copilotcopilotgithub copilot agentcursor aiwindsurf aiai agentsai agent

After spending a bit more time with Cursor trying to refactor code with agent mode, I can say with a high degree of confidence:

Our jobs are safe.

For now, at least. Here's what my AI agent refactoring experience was like.

📄 Auto-Generated Transcript

Transcript is auto-generated and may contain errors.

Hey folks, I'm just headed to CrossFit. I'm starting to run late. I got a new phone holder that's just a they're Why are they always the biggest piece of I got one that's supposed to try and clip onto some other part of the car. Um because the surface of this car, the leather inside, there's like one spot where I was like mounting a thing, but like nothing sticks to it. And I ran out of whatever like this kind of I don't know if you call it like a tape. It's like a clear tape kind of thing. Like a gelat I don't know. It's a weird texture. I don't know what kind of tape you'd call it. And uh it just doesn't have any stick left on on it. But I don't have any more of it. And I've tried like double-sided tape doesn't like nothing works.

So like I'm going to get a new holder, clip it onto this other part. Doesn't clip. just like I have to go through like $300 worth of different like phone clips just to find something to to mount in this car. So the windshield isn't all like big enough either. If I put something on the windshield, it's going to be like taking up too much of my field of view. Anyway, I'm flustered now. But uh I'm going to talk a little bit more about AI tools this morning cuz it's a short drive into CrossFit and uh that's what I've been doing a lot of this week cuz I have the week off from work. I've just been building in Brand Ghost, trying to get a lot of progress done and then taking breaks in between to do YouTube videos. So, um, in the last video that I shared on this, well, I I created it yesterday.

It hasn't been shared yet. Um, it's kind of a weird thing to talk about when I'm making the videos and haven't posted them yet. The uh sort of the situation for me was that I'm doing this really big refactor of a bunch of uh different authentication stuff for different social media platforms, but it's very repetitive. It's not identical, but it's very repetitive. I can't just do a search and replace. Need a little bit more than that. But the shape of everything is very uh very very similar. And um there's like 20 instances of of this that needs to happen. So what I was sharing yesterday was that the uh the approach I was doing was I started off and then I try using cursor uh as an agent to go do the refactoring of the rest of stuff. And I would say by the end of the day yesterday I was feeling like Sorry, let me not right.

At some point yesterday, I was feeling like okayish about cursor like yeah, maybe this is being a value ad. Maybe this is saving me some time. And um just to explain what I mean by that, I could I could go look at other stuff like I want to go I don't know there's a couple of log messages that I'm like I don't think I should be seeing those. Let me go clean those up or check if something's broken. So I'm doing that while cursor is doing this refactor in the background. And um you know I periodically look over and it's like hey like do you want me to keep going? So and then I'm I'm checking. So it's letting me do stuff in parallel which is interesting because I would okay if you're doing a context switch back and forth constantly in your own work you're going to be likely uh much less efficient than if you just like focused on one thing got it done.

um the the back and forth like the mental kind of strain that that adds it's going to make you less effective generally speaking. But in my head or why why something like agents would be so helpful is that I I don't have to context switch. I can say go do this stuff and like let me know when you're done because I don't want to think about this. And I think what's actually happening is that I have to keep context switching back and forth. So it'll get a little bit further. Uh it prompts me and I don't think that that's the end of it, like the end of the world. I mean, because if it's just like, "Do you want me to continue?" I'm just sitting there typing yes, like go ahead, go ahead. Um, but I think the big issue is that because I can't trust it.

When I get to a certain point, I'm like, "Okay, like for example, there's 20 social media platforms. Um, before doing the next one, let me review what you've done." So, then I have to do a big context switch and then get my head in the space of like, okay, did it get all the right spots? Blah, blah, blah. Um, you know, do the tests run? the test pass and every single time and like without exaggeration uh there's so I did one of them there's 19 others to do not once did it ever compile so it had 19 attempts not once did the code compile um not once did it get all of the spots so it might have addressed all of the spots like in the best case scenario but um like the code doesn't compile so it didn't like I have to go across a bunch of spots that it touched and actually correct it to be so that it can compile.

So like I'm now doing the work again. Um not once did it get the test right. So again in some cases it wrote the tests that I wanted to see because I have the same type of coverage uh in other uh social media platforms. So that was the idea, right? If you have something to go by, go repeat it. Um, obviously the test itself is going to be slightly different because how you authenticate the the the data you need is slightly different, but it's like the actual steps or the framing of it are all very similar to the point where it's like maybe I need to go make a test harness and um and then treat it a little bit differently. But anyway, it should be able to to get it. Absolutely doesn't. Um, it us every time it writes a test, for some reason, it decides it's going to use a a different pattern, which is interesting.

So, to give you an idea, when it's updating the code, actually, maybe not even, maybe this is just an issue with cursor. It's not putting spaces like empty lines anywhere. So when it goes to write the test, it's just like a blob of code that's wrong, by the way, but it just puts this blob in. And as a human, I need a little bit of spacing between like logically grouped things or else it's just like the readability drops off uh really quick. So it seems like it really emphasizes doing that in tests. It never wrote a single test with any uh empty lines. And I feel like when it was writing big blocks of code in the other parts of the application, it was probably doing the same thing. But if it was just touching up stuff, I think that's where it would leave the empty lines.

So anyway, um even the tests, right, were just not right. So that meant that every time it finished a social media um platform, I had to go basically touch the entire thing and not only fix everything that it touched, but then do this spot check of like did it miss this, did it miss that? And uh I feel like it was a huge distraction because the whole point was that I don't want to have to do this, right? I the work itself to go do this refactor was boring as hell. That's why I was like this will be great for cursor to go do. Um but no, like I ended up having to go do it again. Now, to make it even worse, like this isn't where it it stops, okay?

I get through all 20 of them like uh I don't know, midafter afternoon yesterday, and I'm like, great, I still have the whole evening ahead of me to finally get this functionality wrapped up, which I did get wrapped up by the way, but um I still need more time today, and I'll explain why. So, as I was going through and starting to, now that it's refactored, starting to add the small bit of functionality I want to do, I just needed everything to be conforming to a new pattern. I realized in a handful of spots in Visual Studio, I had like, okay, let me explain what the the code does. uh in the authentication paths I need to be able to use a lot of uh social media platforms they use ooth for authenticating and that means you'll have access tokens and refresh tokens so if

I go to use the off and I know that it's expired I should try to refresh it in line so I go to someone needs to do a social media action I look up the o for it and then it's not it's not uh valid anymore. So I go great, there's a refresh token. Let me refresh on behalf of the user and then I can go continue on the the platforms I had that supported doing inline O refreshing, I was finding all of these blocks of code that were just grayed out in Visual Studio. And if you have a private method in Visual Studio with no callers to it, it just shows it grayed out. And I'm going that's weird. Like that's a really really critical part of the code because if I don't have that then people just stop being able to post on Brand Ghost and we won't know why.

Not only will we not know why, but the feature that I'm building or adding support for that's able to notify people and able to let our UI have the information about it, that code path would never trigger because that code is grayed out. So I realized that cursor just decided to go completely remove calls to that stuff and now I had to go across every single social media platform once again add that stuff back in and that means that all of the test scenarios that I had patched up um they were happy path to start with so they don't have to do off refreshing. Um, all of the Happy Path stuff was even busted because in order to check a lot of the time if the O is still valid on some of these platforms, we have to do like look up the profile. If you can see the profile with the Roth, like you're good to go.

So, just like I had to redo everything again, and I shouldn't say redo, I had to go across all 20 social media platforms again, including their tests, just to be able to say the refactor part is finished. So, this was absolutely not a timesaver. It's Thursday now. It's taken me the weekend, all of Monday, Tuesday, and Wednesday to do this refactor. It's a big refactor, but I honestly if I would have just like put headphones on and sat there and picked one, did it, pick the next one, did it, I could have cranked it out in a day. So, just uh I want to use this as like a an anecdote, right? because I I'm not trying to dismiss the the concept of agents. I think that I think that there's a ton of potential here because if this worked like how I wanted it to, it would have saved such like it it would have saved me from such a headache of just like working on boring ass code, right?

Like I want to go build the feature. I don't want to go refactor all of these things to go do that. That's not exciting. That's going to be like burnout city. I don't want to do that kind of stuff. And and like the patterns are established, right? Like just repeat it. So right now, absolutely not. It's not ready um to do that kind of Like not not even close. If um this person's flash Oh, they're not flashing their lights. That's a bump in the road. I have to move over anyway. Um, the I feel like the way that I could get cuz like I like I I've said this before in videos, I use chat GPT a lot for helping me go back and forth designing stuff, uh, prototyping like bits of code and stuff like that. And I actually find, like I'm not exaggerating, I find it does a tremendous job at this point.

Like I feel like when I'm working with it on designing stuff, I feel good now. In the very beginning, it was like nah. Like no, no. What? Like why am I even asking this thing anything? Um or if I get it to get some generate some code, I'm like this is complete But uh the back and forth now with chat GPT feels pretty good. But what's different I feel like is it's describing highlevel systems. So, it seems to have the context of that and then when I ask it to write code for things, it's only specific like functions or small classes. With the agent stuff, I feel like this is where it's falling apart is that the context is so much bigger. We're talking about like there's probably over a 100 projects in my solution. Um, and each just not that all of those matter for the refactor that was happening yesterday.

Every social media platform that it had to touch is a plugin. And by definition, then it's going to end up well by the definition of how I have them set up, not universally. Each of those plugins is going to be an assembly, which means it's a project. So 20 projects alone just for the social media platforms. And then they each have a test project. So we have basically 40 projects are just for social media and their tests. It really makes me think that the agent stuff is completely ineffective at going more broad. Right? Even when I tell it to do one social media platform at a time and use one other as a reference, that's a total of four projects. It's not like they each have like, you know, hundreds and hundreds of files or something. Um, and they all follow like the exact same shape like I was saying.

So, I just I'm trying to connect the dots of like I can't understand where and how it's falling apart, but it seems to be around this idea of like the amount of context it's trying to go over. It just cannot do it. My codebase is completely indexed by cursor. So I don't think that's an excuse. Um, but I feel like if I just told cursor, hey, in this file, go make these changes. I feel like it could probably do it. And the reason I say that is because chat GPT is able to do it, right? If I use GitHub Copilot in non-agent mode, like if I just use GitHub Copilot in Visual Studio where there isn't agent mode at the time of recording this, they haven't released it publicly yet, then it does well. So, I don't think it's the LLM. I don't think it's specifically a model issue.

I feel like it has to do with the amount of context. So, I'm sharing all of this because that's been my experience. Uh it makes it even more laughable when people are like, "Oh, like we're we can get rid of developers because we can just do this." Like, no, no, you can't. It can't even basically do a repeat thing 19 times without completely screwing the whole thing up. You cannot yet replace developers. I'm so sorry. I'm so sorry that anything you're building purely with AI is an absolute facade of what software engineering is. That's not to say, I want to be very clear about this. That's not to say that this type of stuff will not be able to do it more effectively. I think it absolutely will be able to. I think that it will be amazing when it can. I'm very excited for it.

But right now, absolutely not, man. Like it's laughable. Or is that called Is that called lovable? Get it? Good joke. Um yeah. So I'm I am still going to even though it's a little bit painful, I'm going to keep using agent mode where I can because I want to learn and understand it better. That's going to mean it's going to be painful for certain things for a while and that's how it is. Right. That's that's how it is for now. But I'm going to keep learning. As the stuff gets better, I'm going to be already leveraging it. Uh by the time it's really awesome, I don't want to be using it for the first time. So that's where I'm at. I'm going to be coding a lot more today. I'll see you later.

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

What challenges did you face using AI agents like Cursor for refactoring repetitive code?
I found that Cursor often failed to produce compiling code and missed spots in the refactor, requiring me to manually fix everything it touched. It also removed critical calls in the code, causing important features to break. Overall, it was a huge distraction and not a time saver because I had to redo much of the work.
How does the context size of a codebase affect the effectiveness of AI agents in coding tasks?
The large context size of my codebase, with over 100 projects and many plugins, seems to overwhelm the AI agents like Cursor. Even when focusing on a few projects at a time, the agent struggles to handle the broad context effectively. This limitation appears to be a key reason why the agent mode is currently ineffective for large-scale refactoring.
What is your current stance on AI replacing software developers based on your experience?
Based on my experience, AI agents are not yet capable of replacing developers, especially for complex or repetitive tasks. They often produce incorrect or incomplete code that requires significant human intervention. While I believe AI will improve and become more useful in the future, right now it's laughable to think developers can be replaced by these tools.