Coding With Claude and AI Tools: What REALLY Gets Me Frustrated

Coding With Claude and AI Tools: What REALLY Gets Me Frustrated

• 264 views
vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinoengineering managerleadershipmsftsoftware developercode commutecodecommutecommuteredditreddit storiesreddit storyask redditaskredditaskreddit storiesredditorlinkedin

It's not so much that AI makes mistakes that gets me frustrated... It's that it can seem so smart AND so stupid at the exact same time. Let's dive into it.

📄 Auto-Generated Transcript

Transcript is auto-generated and may contain errors.

Hey folks, I'm just headed to CrossFit. We're going to talk a little bit about AI because that's all we talk about now, right? Um now I want to talk a little bit about a couple things I was trying on the weekend, how things are shifting for me. Um and just kind of see where things go, I guess. But I tried to make it pretty clear before that like uh I want to spend more time exploring AI tools even when they feel like they're not really being totally helpful because I think that I Oops, my music's turning on. Um because I think that I I need to spend more time with them so that I understand them better so that as the tools evolve, I'm not like one day trying to catch up to everything. And literally already it's a matter of like everything's catch up.

So, uh I I think that I kind of need to um and especially because I think there's some places where if you're being I don't know if you're like being forced to like use AI tools at work, at least you're along for the ride and uh I like I don't have that and I'm not coding at work. So, yes, like I have at Microsoft and yes, we have AI tools of course. Uh yes, we talk about using AI tools. Yes, I do use them, but uh like as a developer, it's not quite the same. Um so like even something like Visual Studios autocomplete with C-Pilot that we've had for for ages now. Uh seemingly ages. Like I don't use Visual Studio C-Pilot at work, so I wouldn't even have that exposure. So, I've been using a lot more Clawude uh on the command line and uh in particular this sort of layer on top of it.

It's called Clawude Flow. And I think there's a few of these that are are coming out now. I think someone said there's like a Gemini kind of flavor of this. And I think even Cloudflow itself is well it's going to be kind of ironic if they they support other LLMs. But I think that's the goal is like it's not just going to be clawed. You'll be able to kind of pick whatever underlying command line uh tool that has like your LLM attached to it. But basically Cloudflow gives you uh like multi- aent capabilities. So, you can run what are called swarms or like a hive bind. And this has been pretty good, but um on Windows and I'm it seem like I haven't tried it yet, but apparently there's been this bug on Windows and it's been basically defeating like the entire value prop which

is well the value prop that I'm after which is that it has this shared memory system so that between running agent swarms and things like that like you can basically get it to effectively like learn things and that way you're not constantly trying to teach it the same things over and over because this is the biggest problem right now that I have with using agents and like I I want to kind of like pause to reflect on this wouldn't be a problem with humans. Um and it's to me it's kind of fascinating. Uh, I'm sure there's like obviously very good technical reasons this is the case, but when I think about like comparing agents to or my my goodness like agents to human developers, I was going to say agents to computer developers.

Uh the um the really wild part is like if you had a brand new developer and you put them on a project and you were like here like prompt them basically like go refactor this code like it would probably be a pretty disastrous kind of thing, right? Like they would have and it would take a long time because rightfully it should. um they'd have to go exploring, they'd have to go looking. Um but like agents are able to like I don't want to say pinpoint, but like they can get to they can get to the code pretty quick. They can start changing it pretty quick. And um there's just some things where it's like holy crap. like compared to a person that's so it seems like it's so much snappier.

But there are other parts that are so ridiculous the other way that it's like it it's uh I don't know what the right word is, but it's um it just feels so silly because I and I think that's one of the reasons I get very frustrated with them sometimes. just like how could you do how could you have just written all of this code or done this thing yet we come back to this thing that is so trivial and you simply cannot do it. So my favorite and when I say favorite I mean literally one that like makes my blood feel like it's boiling is uh with clawed code through clawed flow. So it's supposed to have this memory system but like building things.

Uh so pathing issues two things are MS build and then I spent a few days over the past week like having uh benchmarking and tracing running on some software that it could optimize for me and uh getting it to figure out where MS uh build is when running from a WSL environment. It's like, oh, there's no MS build on this system because like I would go to use the code it gave me and nothing compiles. And then it's like, oh yeah, like we don't have on this computer. And I'm like, yes, yes you do. Like, and how like how many times do I need to say it? And then it's like putting it in the clawed MD file and like that's still not enough. um like with a human, right? You could absolutely envision this conversation happening with a human where they're like, "Oh man, like I don't know." Like, and it would be a pretty hard stretch.

Let me on the highway, buddy. I'm going to catch my wife. She's in front of me. Um, imagine a conversation with a human where you know you told them to go refactor some code and um you're building in C, change it to your favorite language, whatever, and um then they're done and you're like, "Well, none of this code works." And they're like, "Well, I tried my best, but like you don't have the tools installed on this computer to like to do it." Um it would feel kind of silly, right? You'd be like, "Well, yeah, I do, man." And then you'd explain it and they'd be like, "Oh, well, sorry. Yeah, like I didn't realize that. Now I know." And like maybe next time they're like, "Wait, how did you where were the tools again?" But they wouldn't think like, "Oh, there are no tools." They might question.

So right, it might take a couple of times, but each time it's like, okay, like I know, I know there's something here until maybe after two times, right? Maybe they never think about it or question it again. They're just like, yeah, MS build is here. Here's where the net executable is. Like, I don't have to think about this. But my biggest frustration right now is this kind of stuff is like it's just not obvious. and um and it's in the clawed MD file and it's in, you know, the shared memory for these agents and it's crazy to me. It's so frustrating because it does all of these advanced things. So, I just want to give you an example like the most it's not a net this time, but it's like a net trace. So this this program for being able to trace your code for performance profiling and so I'm doing this iterative approach.

This is a a lot of fun to do actually. Uh very very frustrating until I had a bit of a like a positive feedback loop going. But I was asking Claude to basically go run benchmarks for my code. And we have net trace hooked up. So it can run the benchmarks and it can get tracing information. And then I would say okay based on the benchmarks and the tracing information like where do you suggest optimizing like perform an analysis give me your top five. And it would do it and it would say like hey we you know I see that we're spending time here. Uh we have a lot of allocations here for memory. Like here's five things. Cool. Then what would happen is like um this this feedback loop would be going well and then if I got too comfortable which it's very very easy to do because you you are like the one directing this thing right it's doing like all of the heavy lifting for you which is the point.

So I'm, you know, it's giving me the suggestions. I'm picking which ones and explaining why. It goes and does it, uh, performs the benchmarks and tracing again to see the improvements. But what was happening was periodically it like confidently tells you like, oh, like we achieved like a 20% performance boost. And there'd be some times where I'm like, oh, that's like that's interesting. That's not what I was expecting. And then I would look through what it was doing and I'm like, "Wait a second." Like, "Something seems like it's missing here." And I would say, "Show me the trace data that backs this up." And then it would be like, "Oh, well, net trace isn't installed on this machine." And I'm like, "Dude, what do you mean? We've just spent like 3 hours going back and forth on this and you've been using Net Trace the whole time.

Now out of nowhere, you've just decided it doesn't exist." system. Like this is the kind of thing that that just does not happen with humans and and it should not happen with the AI tools. Like it feels so silly that it's happening. Um, another really awesome one that uh I don't know. I feel like sure people could make this mistake, but it's just a it seems like a I don't know like a classic LLM kind of error. Same sort of situation when I was having it benchmark and profile some code. This is actually how I spent like the first 24 hours of this uh of this feedback loop kind of thing. Give me one sec. Just want to switch lanes. Um, I was like, "Same process, you know, run the benchmarks, do the the tracing, and I want you to, you know, make a suggestion, and we're going to go do that improvement." So, it did.

Here's the results. Awesome. Okay. And I kept going down this path, and this feedback loop seemed to me like it was working pretty well, you know, cuz I was just telling you, sorry, there's a driver in front of me that's uh going to cause an accident. You got to figure it out, buddy. Um, so feedback loop seems like it's going good, right? It's benchmarking, it's tracing, it's it's giving me real data. I'm like, heck yeah, this is cool. Um, but I spend roughly like over a 24-hour period, and I'm not This is the the nice thing, I guess. I'm not sitting there like only doing this. This is literally the kind of thing that like I can, you know, peek over every 30 minutes or something and be like, "Oh yeah, like let me check this. Cool. Like next step and fire it off and walk away." Um it's just it's really nice that you don't have to like actually be actively working on it.

But again, it's too easy to get comfortable. I start to realize after like this 24-hour period, it's been optimizing this method in part of like um this processing engine I have. It's been optimizing it, but it's a brand new method that it added and nothing's actually calling it. So it's running all of the tests and all of the tests are testing code that I've written. So the tests always are passing and the optimizations it's doing. Yes, it's like improving the thing it has but like it's not actually comparing it to the real code. So there's no it has never actually done the baseline against the real code. So, it keeps optimizing this thing over and over and making it better, which is cool. But nothing's actually using that code except literally just the benchmark that it made for it. So, it's completely like untested. It's completely unused.

And uh at the end when I had it like go benchmark like and I was like, "Oh my god, like what are we doing here?" And then I asked to go compare against the original thing. It like wasn't actually better. So I mean it was it was just very different. And so it's like well I just waste this and this is my fault. I wasted all this time because I just trusted this was like vibe optimizations if you will. But it's the kind of thing that I feel like I don't have to tell a human if I said I want you to go optimize this that it wouldn't I mean it's totally fine to go make a separate implementation run them you know in a benchmark to compare them but if I was like great that's the one like we're going to use that now

that you would probably I'm assuming I don't know it feels kind of crazy but I'm assuming you'd be like great like okay Well, this is going to be the one we use going forward and then it would become the one you use going forward, not just be some second one that doesn't get used. So, my whole point that I wanted to talk through in this conversation was that like it's uh I don't know like the I don't still don't know the right word. I want to call say like this dichotomy where um we have AI doing what seems like are really impressive tasks especially like you know these multi- aent systems where they put together like a to-do list you know they have like researchers and stuff they they organize the work really well they chip away at it like this is pretty awesome but

it's still making ridiculously stupid simple mistakes And uh and I honestly think that's one of my biggest sources of frustration because if a human like I find generally if a human were making such mistakes like I can't find where the build is or something like that. It's probably like an entire skill level thing going on and it would be like okay let me change my expectations. This person is clearly like you know very you know very early on in development and that's totally fine totally fine but like my expectations would change and we'd go work together on it. But when I have something that's like cranking out really difficult things and then it's like I don't know where anything is, it's like what what do you mean man? It's uh it's really frustrating. Um I'm just trying to imagine like one of the senior engineers on my team being like I can't I don't know how to build code.

And I would be like dude what have you been doing for the past like 15 years? right? It would like it would just feel ridiculous. Um so anyway, I think that's a big source of my frustration, but um sticking to it, using the tools. I have the new version of Cloudflow downloaded, so it's supposed to have fixed the memory issue. We'll see. When I say memory, I don't mean like RAM. I mean shared memories um between agent swarms. So, we'll try that out. And um I tried GitHub Spark over the weekend as well. And I have two videos that I'm going to put out on my main channel, Dev Leader, that uh I I have two two clips I hope my editor uses where I have very honest reactions about what was produced. And it's super cool. Um both Claude Flow and GitHub Spark. So, if you're interested in uh like programming tutorials and some AI tooling, head over to Dev Leader.

That's my main channel. And um I have my two other channels at launch now. So, uh Dev Leader Path to Tech has the resume reviews and I will do some interview guidance and stuff as well. And then Devleer podcast has interviews with software engineers and the live stream. So, I will see you folks next time. Take care.

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

What is my biggest frustration when using AI agents like Claude Flow for coding tasks?
My biggest frustration is that AI agents can perform advanced tasks but still make ridiculously simple mistakes, like not recognizing that MS Build is installed on the system. This inconsistency feels silly and frustrating because a human developer wouldn't repeatedly fail to understand such a basic fact after being told multiple times.
How do AI agents compare to human developers when asked to refactor code?
AI agents can quickly locate and start changing code, which is much faster than a new human developer who would need time to explore and understand the project. However, AI agents sometimes fail at trivial tasks that humans would handle easily, leading to a frustrating dichotomy between impressive capabilities and simple errors.
What issues did I encounter when using AI tools for benchmarking and optimizing code?
I found that the AI optimized a new method that wasn't actually called anywhere in the real code, so the improvements were meaningless. The AI kept improving this unused code because the tests passed, but it never compared the optimizations against the original working code, which led to wasted time and trust in the tool's suggestions.