Making Myself Obsolete By Vibe Coding Ralph Loops For Agents

Making Myself Obsolete By Vibe Coding Ralph Loops For Agents

• 105 views
vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinoengineering managerleadershipmsftsoftware developercode commutecodecommutecommuteredditreddit storiesreddit storyask redditaskredditaskreddit storiesredditorlinkedin

AI hasn't replaces us yet, so here's a few steps I am taking to make myself obsolete.

📄 Auto-Generated Transcript

Transcript is auto-generated and may contain errors.

Hey folks, we're going to talk about AI today cuz that's what we do. And uh I figured I'd talk a little bit about some of the things I'm kind of I guess like moving towards but not I don't want to say like doing with any amount of success at all. Um, so kind of things I'm hoping to see come together and I don't really have a timeline, I guess, but I'm I'm kind of speculating. It's not going to be on the order of like days, probably not even weeks, but I suspect like by the end of the year I'll be I'm I'm hoping that I'll be kind of working more like this. And so I want to talk a little bit about um Ralph Loops.

I want to talk about uh how I've been like again for more context I guess if especially if you're new to the channel I've been using Copilot CLI a lot not because I'm against cloud code or anything um just happens to be what I've gravitated towards and then I have a better token usage uh from being a Microsoft employee and um the setup for me is usually a PowerShell window bunch of tabs um on the order now of like 10 to 15 tabs. Not all of them always running. There's probably I don't know like four to eight that I'm like actively kind of got work going on and I I go between them like as as things are clearing up, you know, something will stop and then I can go check in and you know either go into planning, review things, whatever. Um I I definitely have like limits where like it's simply I I cannot do more in parallel.

Not because the agents can't, but because like the the context switching overhead is just so extreme at that point. Um but that seems to be a pretty good balance for me. I think that one of the things I'm realizing is that especially because I I I guess like historically I like being able to when I'm building stuff factor out common pieces where I'm like man I do this pattern over and over and over. Um, so like you know, uh, I've talked before about making this, uh, this dependency injection scanning sort of setup called Neler. Um, again, nothing I can't really take credit. I'm like not inventing dependency injection or anything like that. It's just patterns that I use. So I pull that into a library because I do the same thing every time I make a a new C project. It's just how I how I develop.

So like to me it makes sense to pull that out. Now, one of the things that happens is that if you're making libraries like this and you are a consumer of the library that you make, you're inevitably going to either find bugs andor have like sort of features that you're that you're looking for because you're like, "Okay, I'm using the library and now I want to go do this." And like it feels like the library should be able to support that, but like, you know, I guess I hadn't I hadn't made the feature. I finally have myself to blame. So, this pattern that I regularly find myself in now is like I'm working in some project that's not the library and so I'm realizing I'm saying the library kind of funny and it's because my my jaws really sore from I guess like last couple of weeks when I was on call.

I didn't realize this until I stopped. I noticed on Sunday night my jaw is like incredibly sore I think from being stressed. That's never happened to me in my life. Um I think I was holding my jaw in a funny position but like it it hurts to open my mouth and I think when I say library something's not feeling good. Like it's very sore. um that's not a good sign. But so I end up doing some work and then it's a matter of okay well I I should go basically make this in my in my library project. So like Neler is a good example in this case where I'm like oh there's something else for dependency injection I want to do or I've been using Microsoft agent framework a lot more and I have some some needler support for doing that for putting pipelines and stuff together when you're running uh a combination of like agentic uh flows and do programmatic steps and things like that.

So anyway, um I'll be in a co-pilot terminal and I'm like, okay, well, hey co-pilot, like you you can uh way faster than I can and more effectively like you can explain sort of the roadblock that we're hitting in terms of either a bug or a feature request. So if there's something that's not working, a good example recently for me is like uh metrics and diagnostics. Um, there's been some cases where my pipeline will run and I'm like, "Cool." I get to completion and then I'm checking some Graphfana dashboards and I'm like, "Well, what the heck?" Like, I I see like there's some token usage. I see there's some tool calls, but like some things are just some are there and some aren't. And so then I'm working with co-pilot and then we we as in co-pilot realizes that there's just like something's missing from one of the you know the stages and it's not uh it's not a stage that I have created that's the problem.

It's like this stage that I have created is uh not emitting metrics because part of the framework uh that would be found in needler is not kind of doing what we expect. So um then it's like okay well co-pilot found this okay great so co-pilot you can explain it very well with code examples with a full path to where the issue is like you can do all of this way faster than me write it up in like a spec or a feature request or whatever uh with you know proposals like you can do all of this so much faster than than I And it that doesn't mean that it's better, by the way. Um it doesn't mean that what it's going to suggest is necessarily what I would want or that it's going to suggest the a better design than I could. Uh maybe I'm I'm not saying that it can't and I'm not saying that it always does.

Uh point is that uh without a doubt it can write that all faster than me. That part is for sure. So then what I end up doing is I have it draft this request and then I I take the file path and I go to my other co-pilot terminal and I'm like hey you know other developer is complaining about you or uh you know has a feature request or a bug report and basically here's here's their proposal and so generally what I do depending on the complexity is like if it's uh if it's pretty trivial then I won't even go into plan mode. I'll just say like here's the dock like you know I don't I don't even know what other co-pilot was suggesting but it can't possibly be too crazy like just go do what it says and it's probably good. Uh and then in other cases uh I will say like we're in plan mode here's what they proposed and like we should discuss what's going on here.

And the reason that I'm doing that, yeah, especially when it's more complicated, is that what I really don't want to have happen is that like for something like Needler as an example, right? I have a library. I am opinionated about the library. It's mine. I'm not making it for everyone else to go use. If you like it, that's cool. If not, sorry. Like, uh, use AI and make your own. I don't know. Um so I have an opinionated framework but what I don't want to have happen is that it becomes opinionated uh to one consumer of it right so yes I am a consumer of it but I don't want say like I build brand ghost which is uh sort of my my side business I don't want needler to be like hyper specific to brand ghost like Yeah, I wanted to solve the problem

that Brand Ghost has, but not in a way that's like the next time I go make an application, I'm like, okay, well, I'm not Brand Ghost, and this is very clearly like a Brando concept. Like, I need it to be generic. So, I'm not yet confident that if I just hand off a plan that um that I have either the right instructions in the repository or uh right prompts in place uh all this kind of stuff. I'm not sure that I I have confidence that enough of that is there and and as a result, I don't trust that for something more complicated that co-pilot won't just kind of blindly say, "Well, here's what I was given. let me go do it. Um, there is a rubber duck agent that now ships with co-pilot CLI. And I would say more than half the time it's it's effective.

I don't have like a a stat. Uh, but if I had to guess, somewhere around like 75 to 80% of the time it's helpful. And other times, uh, it's not. like it will suggest things that are uh that are not accurate and then the the agent kind of listens to the rubber duck agent kind and if you're not familiar with what that is like I think the rubber duck agent I don't know if the intention was supposed to be purely for like rubber ducking and having a conversation back and forth uh cuz usually we talk about like rubber duck debugging so I don't use it that way but I think when I have agent agents that are consulting with the rubber duck agent. Uh, another way that you could label it is basically like a devil's advocate agent. So, it's um it kind of it's not super agreeable.

It's actually quite the opposite where it's like going to push back on some things and and challenge ideas. So there is that as a I don't know it's not quite a stop gap but as a as an intermediate that might offer some some interesting push back to the agent to not make uh something so opinionated for a de uh like a developer request coming in. Oh my god, these wipers was terrible. Um, so it's it's just like it's not it doesn't feel like I have a ton of confidence in that yet. So for more complex things, I'll go into plan mode with that feature request or bug report uh and then discuss it with C-Pilot before kind of signing off on it. Okay. So everything I'm saying so far, I'm sure uh for some of you, you might be like, "Oh, cool. That's neat. I don't do that.

Maybe I'll try it." And others are you're probably light years ahead of me. And that's cool. There's a a really interesting spectrum of uh AI usage and experience. So that's a a pattern that I've been having a lot. And it's not just with Needler, right? I use that as one example. And it's not just with uh with Brand Ghost and Needler as a as a combo. Um I have this other repository and I'm I'm so behind on YouTube videos uh on my main channel. I'm really sorry for that. It's just like I I don't I just have not had capacity. Uh I've been very overwhelmed and uh these videos are are nice for code commute because I get in a car and I can record and I don't edit them. So um you know on a day like today where it says how much longer do I have in traffic?

I have 51 more minutes in traffic. Uh I can just talk to a camera and hopefully there's something valuable. But yeah, I I really want to make some YouTube videos on some of this stuff. Uh to to show you and not just talk about it, but like to show you kind of what I'm playing around with. Again, not because I I don't like coming across this way, so I hope I don't uh I'm not trying to tell you my way is the right way or the best way or anything like that, but if I'm talking about this stuff and you're like, "Hey, that sounds kind of neat." Even if you don't agree or want to do it that way, you're but it might be interesting to compare like I would like to show you that or give you the opportunity where you can see and then you can make a decision and be like that's either cool or that sucks or whatever, right?

But it's just perspective. So um I have this repository that I made um I called it uh it's like just for myself. It's called Genesis for the idea that it's like for the beginning and the idea behind this repository is that I can put templates in place. Um and that involves sorry that includes things like you know project templates for the actual code but also like um basic uh GitHub workflows that includes uh linting or Roslin analyzers just like basically if I said or someone came to me today right if you walked up to me and said Nick I need a I don't know like I need a web app that does or web API that does whatever or I need a desktop app that does whatever. I know yes desktop apps do exist still I promise.

Um or uh like I do a lot of net development and for for me doing like pure web development is like not a thing I do but I don't know I maybe I can't say that anymore because I do have a few websites and and I you know recently uh you know whipped up a website for someone just uh someone that's non-technical and was like hey like let me let me get this together for you sent over and like I'll have instructions so that if you want to like tweak and tune things, you can basically talk to co-pilot and you don't have to know HTML or anything else. Like just talk to co-pilot this way and it will kind of take care of things for you. And so like when I put that together, I was like this should be a template, right?

if I ever have to do this again, like what are the the very basic fundamental pieces that I can just pull out, have a template for um you know what kind of uh I talked about like agent instructions and stuff like that. Um in co-pilot you can do uh like globased file path matching. Uh Claude has the same kind of concept too. And it's really cool because instead of blasting like tons and tons of context into your agents MD file, you can match on uh specific file paths. So that way if you're like working on a file that's like something tests uh CS, right, for for some C# tests on some class, uh you can have some test specific instructions, right? or if you're working on a particular project in your solution or you have a naming convention for your your files then you can have specific instructions.

So I have a bunch of those pulled out so that again I go to start something from scratch. I can just say like here's hopefully everything as a baseline that if I wanted something new I have my basic code my basic CI/CD I have uh and like my agent instructions that uh help guide an agent to develop things how how I would want. So this is another sort of library style repository that I have. And so I will often if I'm doing something new and I go, "Oh, like that should go into this Genesis repo." I will do the same kind of feature request thing again, right? I'll say, "Hey, make a feature request for the for the Genesis developer and I'll go pass that over." Um, so yeah, like that that exists sort of in in many different shapes and forms. It's not just uh, you know, between this pair.

And what was the other thing I wanted to say about that? Oh, I I mean bug fixes work the same way too, right? If I use a template and then it doesn't work perfectly out of the box, uh the the sort of the push back that goes to the Genesis developer ends up being something like, hey, we need to to fix it, of course. And then like why did that gap exist, right? So now the Genesis developer also has to think about not just fixing, but how do we like categorically prevent this kind of thing? I shouldn't be finding out a template doesn't work at the time of wanting to use the template, right? It should it should always be working. The only time it should ever break is like while I'm iterating on uh a new template or some template changes. Otherwise, it should be perfect.

Okay. So, where where I'm going with all of this is that this is a uh sort of this this workflow that I have. Again, nothing I feel like is groundbreaking here. It's just that I have uh some dependencies between these work streams and I have you know uh multiple agent sessions kind of going at the same time either um on some different projects that are like products let's say products or services andor um these other ones working on libraries to go uh fix things or add functionality. So, one of the things that I've been interested in and haven't spent much time on, uh, and I've been noticing I I kind of do this with some like AI things as they come out. I'm like, well, that seems dumb or silly. Like, no. Uh, like for example, I haven't touched uh like what is it? Claudebot or whatever it's called now.

The thing that you just like connect and it goes and runs your entire life. Like I I'm not touching that. Uh that's going to take a lot of convincing before uh before I even do anything with that. But one of the things that I thought was like kind of silly was like this Ralph loop thing. And uh uh for those of you that don't know, like my sort of interpretation of what a Ralph loop is is essentially like giving the agent some instruction to go build something and like letting it vibe code in a loop. Um, and there there's more to it. Um, I feel like I'm I wanted to oversimplify just to get the concept across, but I think there's some it's interesting. There's like more uh more forming around this, like how well, how do you do that effectively, right?

Cuz for me, I would think, well, that sounds dumb because aren't you just going to basically have like this like infinite slop production where it's like sitting in a loop just making shittier and shittier code? Like uh probably, right? Like if you if you did this without any type of guidance or structure, like probably it's just going to keep making a bigger and bigger mess uh until it's just, you know, cranking through tokens doing who knows what. But the idea is supposed to be that you can basically give uh an agent in a loop some work to do and some some goal to beat and essentially if it doesn't meet it within a certain number of iterations, you you toss it, right? It's very wasteful for tokens, right? Especially if you have like shitty prompts and bad guidance and all this other stuff, it's just going to waste them, right?

Like if you gave it if you said do 50 iterations of this and you gave it some impossible task like it might try to for 50 iterations go do something that it will never be able to do and just completely be a waste. But instead like what if we can give it good guidance? What if we can um you know step like give it the plan the guidance and step away and just let it go build things. And again, if it if it doesn't work, okay, like sucks, toss it, right? Like ideally, it took no uh aside from like the planning and upfront like handoff, ideally it takes you no time. And that's because along the way, what it should be doing is writing tests, doing verification, like meeting the expectations along the way. Right? It's kind of like think forget AI, think about software developers, right?

Think about how we would imagine everything I was just saying was me talking about like giving a a developer or a team of developers some work to do and fully trusting them to do it right. Like I want to be in that position with teams I manage. Of course, right? if if it's a brand new team and I'm a brand new manager on the team as well, like we don't have a lot of trust together yet, right? And so by default, my managing style is that I want to lean into trusting more. I would rather trust and then like figure out where that boundary is. If there's people that, you know, maybe need more support, um versus others that are like, you know, they'll they'll be very accountable, right? If something messes up, they're they're on it. They're accountable. they'll help others to to avoid the mistake, that kind of thing.

Um, but we kind of have to figure that stuff out. And ideally, like in a perfect world, think about this. In a perfect world, you don't need me as a manager for a team on the technical side or the project side of things. Ideally, I'm just helping people with career growth and anything else that's going on. Um because there was so much trust and autonomy with the team and they're so skilled and so good at what they do. We're all aligned on all of the things and they can make progress, right? That's like that's not real. It's not real, but like that's kind of what I imagine. You know, I I could basically walk away from a team and from an operational perspective, they're like, "We don't we don't rely on you. like you're not gating us or blocking us.

And so if I take that concept and I think about working with agents cuz I'm just talking about like the, you know, the technical deliverables, it's the same thing I would expect with a team, right? Like I would trust my team to be writing tests for things. I would trust them to be writing different kinds of tests at different uh like sort of that exercise things at different points in time. I would expect that they have uh you know metrics, telemetry, diagnostics, monitoring, alerting especially for like live services and stuff, right? Like I would trust that they do that. Again, if we go back to a brand new team, never worked together, whatever, that might have to be some, you know, upfront conversation. and hey like you know this is how we should be operating. These are things that give us confidence. How do we agree that we should do these things?

But you know you you build that up. So with my agents I would I would want to set that expectation for them. Right? This is where um like some of those globased instructions come into play. This is where um some of the the genesis templates I was talking about would come into play where I could say, "Hey, go use this as a starting point, right?" So, you have something basic to to go off of and that meets from my perspective um an initial starting point of like how I want to see things structured. Plus, it has some of the guard rails in place that are things that I would always be complaining about. You know, I might there's going to be some stylistic things that like if it's my code base, like I want it done a certain way. So, instead of me always complaining to Copilot like, "Oh, go change this like why do you keep doing this?" Just build it in a way that like it's forced to do it that way.

Whether that's through uh prompts, instructions, a combination of the two, uh linting tools, Roslin analyzers, all the above, right? Make it so that it's not allowed to screw up. Uh, easier said than done, but point being that if I can give it those things from the start, then I'm kind of like, you know, putting some guard rails up, giving it a paved path to to develop on. And of course from there it's more about where it gets tricky is like what's where is the the boundary between like this is code that is technically going to execute and perform some action, right? And there's tests and there's whatever to to prove it versus, you know, this is genuinely solving a problem that we that we set out to go solve. that becomes a little bit trickier. And when I say a little bit, I mean a lot of bit trickier.

It's a it's not it's a non-trivial thing, right? Just to give you an example, um I could say I want a I want a dashboard that has, you know, visuals for this stuff. I should be able to, you know, toggle certain things, blah blah blah. And then like if if I'm not clear if I'm not clear or honest about what the goal is, maybe the goal is supposed to be for this type of user, you know, it should be so extremely obvious to them when things are healthy versus not and when things are not healthy, how to drill into anything they could possibly need to go understand where the problem is and then follow up with a point of contact to go fix Okay, if I'm not clear about that and I just lay out technical specs, even if the thing gets built and there's tests and there's whatever else, that doesn't mean that I've solved a the problem, it means that I got code produced.

I have working like air quotes working code that's produced. that does not guarantee a problem itself. And so, you know, maybe one of the obvious follow-up questions to that or statements is kind of like, so just put that in the prompt, right? Like, tell it that's what the goal is. Okay. Um, I think that that's part of it. But if we think about that for a second like depending on you know how much time you spent in software engineering you may or may not have encountered something like this. So I think for for people that have spent, you know, some time in software engineering, you've experienced this where someone could give you a feature request and they could spell things out for for you like that and then you feel like you're aligned on it. And then when things come together, it's like the thing that is about to be delivered or the the direction that you were moving or you or someone else, right?

um there actually isn't alignment or you discover something along the way that's not quite right or or whatever else, right? Like there are assumptions. We always make assumptions. We we don't know everything. We can't predict everything. So we have to make assumptions about things. And so if you don't have a feedback loop at all, that's a problem. And even if you do have a feedback loop, like how often is that feedback loop there? When you have the feedback loop about is this thing solving the problem or not, like at that point, do you have enough context to know or are you still making more and more assumptions? I feel like along the way, you know, there's almost always going to be assumptions about things. And even, you know, when you're shipping stuff to customers, it's like, can you conclusively say 100% of users, you know, this is exactly how they would expect to use it?

Like things don't they just don't work that way. So, I think that where I'm heading with some of this stuff is that I'm trying to do as much as possible around um getting some of the infra set up so that when agents are doing work, I'm not ideally I'm not spending time trying to be like, oh my god, like you know, why are you letting flaky tests through or like why did you write flaky tests in the first place or why did you design code to be so ridiculous uh with you know 15 different patterns like I want to do as much work as I can upfront to to structure things in a way that that just is so minimized that that's like uh you know if in a year from now I never have to have a conversation like I'm having right now about you know agents doing stupid and be very happy.

I would much rather focus the time and effort that I'm spending when working with agents on like let me see what was produced. Cool. Um, is this genuinely solving the problem or not? And now that I've get to see it and feel it and use it, like I have feedback as, you know, sort of as the expert as I might expect here. And in cases where I don't know that kind of thing, then I can have conversations with agents to be like, okay, like how do we how do we get a better understanding of like of what this problem is and and use cases and ways to solve it, right? Um because there's there's stuff I'm not going to be an expert on um until I've spent, you know, tons and tons of time doing it. But to me, those are more interesting, meaningful conversations. In my opinion, that's where there's a lot more value in my time.

Like, I don't feel like it's a good use of my time to be trying to see why, you know, out of nowhere co-pilot is now writing tests the wrong way. Um, I don't know, like mocking everything instead of using like real instances of things uh in situations where, you know, I I don't want mocks. I don't know, like just doing dumb that is genuinely not a good use of my time to be uh you know reviewing again going back to a team example, right? Um let's let's imagine a situation where I have a brand new team, right? I I am new to the team. The team itself is brand new forming and it's all junior developers. By the way, I'm using this as an analogy not to compare junior developers to AI. I don't like doing things like that.

Um I'm using junior developers here to say like we're we're all starting from, you know, scratch kind of in in this space as a team and sort of as as developers and I'm coming in with management experience at least. So inevitably with anyone who doesn't have experience doing something there's going to be mistakes right like I I would expect that we are human like we have to learn you know that's going to mean doing something trying it and maybe it works and it's good maybe it doesn't work cool we learn and so if I had developers on a team that were brand new very junior and they were doing something and we went you know whether it's me team members or as a team we agree like hey you know what that pattern like that seems like an anti pattern to us because uh it's

it's either risky it's brittle uh causes flaky tests we lose confidence whatever it is right and we have this discussion we go you know what like you know we should basically try to avoid that for these reasons right doesn't mean there's no place for it ever. But that's not that's not the thing we should be leading into. What should happen is that, you know, over time people learn that, right? It's like maybe you have to have the example brought up a couple of times, but you would hope that once that's established, it's not like, you know, 3 months from that point in time, you're like, "Hey, Billy, like we had this conversation last week and like you're still doing the thing." And then Billy's like, "Oh, sorry. It won't happen again." And then two weeks from that point, you're like, "Hey, Billy." Like, you know, this is the the fifth time we've talked about this.

And Billy's like, "Oh, sorry. It won't happen again." Um, that would be ridiculous, right? It would that would just be completely silly. So, when we have teams of developers, like teams learn and grow together, and that's awesome. And so it's not that like agents can't have memory or can't have instructions and things like that, but I feel like when you don't have the proper guard rails in place or you're starting a new like, you know, brand new context, right, new session, it's like it's a clean slate. So you need to make sure you're loading into context the the relevant stuff. So you do have to be capturing things in some kind of memory instructions blah blah blah or else you should be expecting the same kind of I have many prompts and many instructions that that will say like you know uh either follow this

pattern or you know in other cases like if there aren't if there isn't an established uh set of instructions like look for this pattern in you know nearby files and and use that so it's consistent. I have this kind of thing all over the place and still I see all sorts of random uh get introduced by co-pilot and it's not all the time right things are are certainly improving but it still happens and it's not like with humans that would never happen right so imagine the example I gave and it's like okay well two years down the line a new employee joins the you you might expect that that kind of thing could happen, right? Probably, you know, new employees would do things like um they're making a new change and they're like, "Hm, I wonder like how we do this in the codebase." Or they look around, they see what's going on, they go, "Okay, I see the pattern." Maybe they see a couple variations of it.

Maybe they get confused by that. uh either they do or they don't ask someone or they they see the the most recent usage of it, go with that, whatever it is, right? They might be introducing drift and not realize it and then so someone on a review catches it. So, it's not like it's zero, but it seems I don't know like to me it seems like that happens less with people or it's like the situations are more like I gave you a very specific one about like a new team member joining or maybe it's someone working in a new code area they haven't touched before. But sometimes it feels like with these agents like out of nowhere like everything will be going good.

it like does the feature great and then I look at the test and I'm like no literally you made up a brand new pattern here like this there's no excuse I've had uh you know conversations with co-pilot after and you know instruct it like don't be defensive don't apologize just like be unbiased explain you know why these decisions were made and then I will say like can you tell me which instruction files talk about this and it will list like 10 files. And then I'm like, well, why didn't you do that? And then it will say things like, you know, just like I, you know, I I can't tell you why I just like didn't use the information and confirmed it was loaded into my context. Like I Okay, I I don't know what to say then.

Um, so yeah, like I feel like these things won't be won't be perfect, but putting as many guard rails in place as possible, I think, is what I have been trying to do and will continue to do. Now, if I'm able to get something uh I don't know that feels a little bit more repeatable with like this Ralph loop kind of setup, I'm I'm very curious about sort of these two things. So, long- winded way to get to this part of the conversation, but I knew it was a long drive. So, um can I use Ralph Loops? And then the other thing is, can I have agents that communicate with each other more out of band than me asking for a markdown file to go copy and paste over to another terminal.

So without knowing exactly how these uh these things come together, my my kind of thought would be something like is there a way that I can have the equivalent of I don't know call it I don't know I don't know what the name would be like something like uh you know Ralph off loop orchestrators for particular projects. And this way when agents run into an issue or whatever else, they can basically like open the feature request or the the bug report with the other agents for other Ralph loops. Uh I again I don't know the the organization of it necessarily, but can they basically go open those things up? the other agents are kind of uh waiting, listening, or the orchestrators waiting and listening to go file that work with them. And then the Ralph loop goes and runs. And whether or not this is like an automatic committed thing or pushed thing, a completely different story.

But um I think it would be really cool to be able to have agents that can communicate more directly with each other, which I don't think is that crazy. Um, I know like the techn is obviously possible. Like you can literally have agents talk to each other through a file if you needed to. So, it's like not uh it's not that that technology doesn't exist. Um, but like I don't know, more of a I'm saying more of a formalized structured way to do it or expected way to do it. Can I have that? And then the other thing is um going from I have a request, an idea, whatever it is, maybe a full spec, whatever, and I have it. I just want to put it somewhere and then just have agents chip away at all the work fully autonomously. And that way, you know, if it gets to the end of it and it has something that based on the spec definition meets the requirements, then great.

And otherwise, it's throwaway work, right? Unfortunately, wasted time and tokens, but it's not my time. And so, some of like these things are are certainly possible. And I'm sure I'm sure there's already people doing all sorts of variations of this. I'm not talking about something that like I've invented or anything like that. What I am saying is like I don't currently operate that way with my my agentic development and I would like to move in that direction and see how that comes together. So for me a big part of that is uh that repository I was talking about that I made for myself called Genesis. That one's private by the way. So, uh, it's it's very opinionated to be how I want to build things. And if you're like, "Hey, that sounds interesting." I would just recommend you make one for yourself, right? I'm just using AI to go, you know, create templates and stuff, uh, skills.

They can go set those up, but it's not not rocket surgery. It's not special sauce. Like, you can have your own special sauce. And I think if you like the idea, then you should go do that. So I think that's a big part for me in how I see um you know such an orchestration coming together because that's not that's not the spec that's not the spec that you give the Ralph loot to say hey go build me uh go build me an X right it's here's the foundation for when you're building with this technology And here are all of the guard rails I could possibly put in place so that I can hopefully spend as little time as possible talking about silly stylistic or like I don't know like even some of the architecture structural things. I don't have to spend time on that because the agent will be building in that direction and validating it no matter what.

The um the cross agent communication part like I'm not exactly sure how that fits in, but I do know that that's a a common pattern that I'm finding myself in right now. So, uh Oh, this is super convenient. This guy's leaving and I need to get in. Two people are leaving. Wow. I don't think that's ever happened. So, for context, anytime I'm here on the highway and I'm trying to to merge, it's always me trying to get in and I had someone in front of me and behind me trying to to get onto the highway the other way. Wow. That made it way easier. Um, yeah, I I find that the cross agent stuff is like it's a real pattern I have, but I think the the way I see it is like imagine I'm talking or like working with co-pilot in one of my

my terminals and we're hitting an issue and it's like, "Hey, we should go file a feature request with uh or a bug report, whatever, for this their agent to go take care of. What I would want to have happen is like it goes and makes the the feature request or the bug report and just fires it off to the other one and the other one starts on it right away. Like that's what I want to have happen. Like that's how that's how lazy things are getting because I I don't want to sit there and go let me wait for the markdown file. Let me go copy and paste the file path over. Let me go to the other tab and paste it over there. Uh especially if it's already working on something. Right. I want to cue it up and say like whenever you're ready, dude.

Like chip away at this. Um I did say too like I I don't have a ton of trust in, you know, having having my libraries become too opinionated by the requester. Uh so that would need to be addressed as well. I don't have a perfect solution for that. Maybe maybe that's in prompts and stuff to say like hey like you know try to solve the user's problem but remember that we're a framework or a platform whatever I don't know but the idea being that I would want this stuff fired off and then agents pick it up whenever they can. the you could argue maybe that's just like GitHub copilot already like you file an issue in GitHub assign it to C-pilot and it goes and does it like maybe um maybe that's all that is I feel like over the past I don't know maybe like 6 months now I like now that I've been using terminals I feel like I have less confidence in the cloud runners uh to go do what I expect.

I don't know. I relied on them a lot. Like last year I did tons and tons of development with uh the co-pilot agent uh in GitHub like just running in the cloud. Did so much development that way and loved it. But this is a whole different level. And I I actually feel kind of nervous to go back the other way. I guess it almost feels like my my GitHub co-pilot experience with the agent in the cloud feels like it's it's got its hands tied. That's probably the best way I could explain it. And so maybe that's just a matter of the tooling and stuff getting better over time. And I haven't used it in quite a while, so maybe it's even better. Um, but yeah, maybe that's all I'm kind of getting at. Um, to be honest, I don't know about doing that in combination with a Ralph loop.

I don't know if um if these are entirely separate concepts, but I part of me like wants to see them working together because I I can see both of these having a lot of value in in my workflow. Oh, and I I guess just cuz I'm actually finally getting to work here. um to touch on like why Ralph loops for me because I hinted at it earlier that like that's one of the things I'm like that sounds really dumb. Um, I've noticed that there's a couple of, you know, scenarios that I've had where I've just said like, you know, I don't I don't know exactly how I want this build necessarily, but like sort of just vibe it and then I keep just telling it to keep going and, you know, do some small checks here and there and uh, and I've had some decent results.

Not everything like works perfectly or, you know, is going to be a success and I get that. But this idea that I could just like have something go crank out code and hopefully follow a spec through to completion. That feels that feels pretty good. Like especially if I could fire something off overnight. Like ju just to give you an example for the couple things I was trying I was saying like go make a dashboard for something in uh like React TypeScript like stuff I don't personally use that often and so was letting it run overnight and I had an an agent babysitting the other agents basically and said if you notice that this thing starts to get stuck or fall apart Like that's work for us. Like we have to go correct our infrastructure. You do that and you restart the entire thing from scratch. And so I think it spent like 7 and 1/2 hours last night doing it.

And you know like the whole thing wasn't a success. It never finished end to end the whole spec. But it was pretty cool in that like it could queue up a bunch of work that could be parallelized by agents. Each of those agents would do Ralph loops. Those uh those Ralph loops when they're complete the work they're doing, they like put things back on main and it was really just like having a bunch of developers work on things on their own branches and then merging them back in. So I'm like that concept that whole flow feels pretty cool. Um you know the the agent that was babysitting confirmed that like on the main branch like it legitimately was building and passing like over I don't know like I think it had likeund and something tests. So like that's coming together and working. But uh I think one of the things it realized there's obviously some structural things that needed to be improved.

But it was saying how it structured the plan was actually wrong because it it basically gave without realizing it gave an impossible task. And so when that impossible task was provided uh one of the agents couldn't deliver on it uh because it was impossible. chewed up a ton of time doing it and then subsequent agents were like, "Well, I can't build the I'm supposed to because this impossible task couldn't get done right there." There there was no um the thing that was, this is my take on it, the thing that was orchestrating all the work was too programmatic. as in I bet you, not that it would make it uh the right thing, I guess, but I bet you if there was an LLM that was coordinating all of that, if it saw along the way, oh, this agent got screwed up because it couldn't deliver on something that's truly impossible, maybe we need to tweak that, right?

as a human, that's what I would do is go, oh, like if we can't do that, we have to find an alternative path. And so, I don't think there's any reason an LLM could not do the same type of thing. So anyway, it's a it's kind of a followup for me, but the point is that that to me seems like a really cool way to be able to queue up a bunch of work, either have it run overnight or at any point in time if I'm like, I just want to get I have an idea, I want to fire it off, I just create the idea and can kind of trust that my my little orchestration will take care of it. So yeah, I'm not again I'm not saying this because I'm I think I'm inventing it or I'm the only person or whatever. I'm I am guaranteed I'm behind on this, but that's what I've been playing around with.

Right. For me, it's there's a lot of fun in like exploring and experimenting. And so there's people that literally their jobs are doing this kind of stuff. And I'm sure I'm sure they've already done all of this months and months ago. Uh and I'm just not caught up. But uh part of learning is kind of going through it, trying it out. So that's what I've been doing. Um yeah, it's a long way off, I think, but I I suspect by the end of the year that's how I'll be coding. I I don't see why not. The template stuff has been really a game changer. I don't It's not like I'm starting projects all of the time, but it's nice when I want to go do one. It's like I I know I have a a set of ones that I can just pull from and it's super handy.

This model do cool. But I think that's it. If you guys are using cool agent orchestrations and workflows and stuff, I'd love to hear about it. Leave it in the comments. Um, and then of course obviously if you have questions about career development, software engineering, leave those in the comments or you can go to codemute.com, submit stuff anonymously. Happy to try my best and help make a video for you. Um, hope it helps. See you in the next one.

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

How do you manage Copilot CLI workflow across multiple terminals and tabs?
I usually have a PowerShell window with a bunch of tabs, on the order of 10 to 15 tabs. Not all of them are always running, but I have about four to eight that I'm actively working in. I also have limits on how much I can run in parallel because the context switching overhead is so high.
How do you handle feature requests or bug reports when using Copilot and Needler?
I have Copilot draft the request or bug report with a full file path, then I switch to another co-pilot terminal to share the proposal with another developer. Depending on the complexity, I either let it go directly without plan mode or I go into plan mode to discuss before signing off. I’m careful not to let Needler become hyper-specific to Brand Ghost; I want it generic and to include guard rails and templates.
Why do you want guard rails and verification when using AI agents, and how do you view feedback loops?
I want guard rails from the start, including prompts, instructions, linting, and Roslyn analyzers, so the agent is not allowed to screw up. I also want to ensure that what’s produced actually solves the problem rather than just generating code, so I rely on testing, metrics, and feedback loops to validate usefulness. I know there will be assumptions and misalignment, so I value conversations with agents to better understand the problem and improve how we use these tools.