It was only a matter of time before we could have AI pick up our GitHub issues... I know before I made this video Microsoft had an uncomfortable demo video. I still hadn't seen it at the time of recording, but here's MY experience working with GitHub Copilot Agents with pull requests!
📄 Auto-Generated Transcript ▾
Transcript is auto-generated and may contain errors.
Hey folks, I am leaving the office. It's Monday night. Um busy day doing uh like performance. I guess they're called connects. It's kind of like performance review, but it's not like a rewards conversation. So um most of the way through that with my team. Got a little bit more to do tomorrow. But I'm going to talk today about my experience using GitHub Copilot agents with poll requests. So, um, if you're new here, this channel works by people submitting questions and stuff in the comments, so I prioritize answering those. Uh, if they're on software engineering or career development, please leave them below. Otherwise, you can message Dev Leader on social media or Nick Cantino on LinkedIn. I'm happy to try my best to answer. If you message me, I will keep you anonymous. If it's a comment, it's literally public. It's the internet. Okay.
Um, so I I still haven't gone and watched like or read about the launch of this stuff, but apparently like if you're watching this, you you probably know better than I do, but there was uh I guess Microsoft when they were launching or like demonstrating the GitHub Copilot Pro stuff that has the pull request um feature. It sounds like they were demonstrating it and it was doing like ridiculous stuff. Like it was pretty embarrassing. I'm just trying to be transparent. I literally still have not seen it. I don't know the details about it, but um everything that I've seen peripheral to it is people were like, "Yeah, it's pretty terrible." Okay. Um at the end of last week, so like Friday morning, it's now, you know, Monday evening. at the end of last week. Um I remember it was like Friday morning I think that I started or tried it out but basically it was my first attempt at like okay I'm going to try using GitHub co-pilot with agents.
I'm going to give it I'm going to assign it tasks and it's going to do pull requests for me. And from my other videos, um seen that I've commented on like my agent experience has not been good. Um I've used VS Code, I've used Visual Studio, I've used um Cursor, I've tried, you know, Claude, I've tried for the uh the models. I mean, I've tried Claude, I've tried different GPT models. Like I've just not had good success. Um, the way that I end up summarizing that experience has been uh I find chat GPT in like when I'm chatting with it in the browser, I get pretty good results when it's like building a class for me, making a method, refactoring stuff, whatever. Like it does a good job and I'm happy with that. So in my mind I'm like okay if I have that experience in my IDE and then I can have the agent with the same capabilities like I mean the same quality let's say with more capabilities.
This should be kickass. But my experience has been that either the agent in any of those applications I mentioned, regardless of the model, just does stupid stuff like it's ridiculous and completely invaluable, then I have to spend more time fixing it. Or there's an Aston Martin behind me. Or I have to break it down into such small pieces that I might as well just have it in chat mode. which would be fine because it would probably work, but then it's not an agent. So, I have not yet had a good agent experience until I started using GitHub Copilot with pull requests, which is so funny because everyone was saying it was so terrible and laughing at it. And I remember on Friday morning, I created two issues for my codebase. Said, I'm going to have co-pilot work on this stuff. I'm driving to work. Let's see.
And um and I've like I just like to summarize this whole video, I've had a really awesome experience and I I'm like super stoked about it. Um it's performing at a way that I just would have expected that agents perform at. That's probably the best way I can summarize that. So instead of me being disappointed like crazy of handholding agents and trying to be like, "No, like you're I just told you how to do this pattern. I give I gave you like four files that have the same pattern and you're still making up." Instead of having that experience on repeat, I can actually give it tasks and it actually comes back with reasonable results. So I wanted to walk through a few different things. Um, I'm going to try to recall them, but I think honestly since Friday, Friday evening is when I started firing off more poll requests.
I I've probably fired off like 30 things over the weekend. And this weekend for me, I had to because it's like connect period for me at work and I was on vacation, I basically had to work over the weekend. Not that I'm complaining, it just meant that I wasn't coding. So, I was doing connects. Um, when I do them for my team, they take me a lot longer because I basically I do a talent guide review at the same time as like this connect document. So, it's just it's a lengthy process. Again, not complaining, just explaining what's going on this past weekend. And what I was able to do was I remember Friday evening, it was like a bit after work.
Um, my wife came home and I was like laying down on the couch and uh, you know, hanging out with my wife and I was just creating issues in GitHub and assigning them to co-pilot and I was sitting there going like, hm, like what's all the stuff that I've just been putting off? And specifically, I'm talking about um Brand Ghost, which is the the social media like posting and scheduling application that I'm building. Uh has other features for like uh social media feeds and stuff like that. Really, it's to help content creators um automate all the boring parts. And I I was like, what are all the things that I don't want to do that I've been saying I need to go do? And this covers a like an interesting spectrum of things cuz some of them are like they're probably pretty trivial, but like I shouldn't waste my time on them because I should be spending my time on important things, right?
We're trying to we have customers trying to serve them. I should be focusing on the highest value things, right? Either for our current customers or to get new customers. So I have a list of things that just been putting off. I'm like, there's some stuff that's like users have asked for them, but they're on the more complex end of like what we should do. There's um and they're not urgent. It's like, hey, this would be a good thing to incorporate. There's some stuff that's refactoring. There's there's an entire spectrum of things. So, the results were a mix of everything, but overwhelmingly more positive than than like a typical agent experience that I've had so far. So, one second, I'm switching lanes and I I want to recall what I was asking to build at the same time. So, we have a front-end application and a backend application.
The front end is in Nex.js JS and Typescript and the back end is an ASP.NET core. Uh uses Aspire as the sort of uh entry point and uh configuration. Sorry, got to do one more lane. I got to focus. People are going at all sorts of weird speeds here. I don't understand. Why are you going so slow? What was going on? Um, okay. I'm in the fast lane at least. Just we're not going fast. So, um, we have those two apps. I'm trying to even think the first things I was asking it to build. Um, so now that I'm talking about this and I'm super excited, I can't remember the details of it. Um, okay. Uh, one of them I I'll I'll walk through some parts here. One of them was that I wanted to We have auler that runs right now. It's a background service and I don't really like using it.
I feel something about it feels off. I don't like using a background service in ASP.NET on my core especially because we have uh quartz as a job scheduler and with quartz I can like sort of persist the the job information like it's a lot more I mean it's literally persistent versus the background service is just like I don't know there's no state it's kind of weird like it works it basically it works if the service is running and if I run mult multiple instances. By the way, it's a monolith, but if I run multiple instances, which happens during deployment, I had to go do some fancy stuff to make sure that those background services weren't colliding with each other. Um, it was never designed to be run in multiple instances. If you watch early videos in Code Commute, I was in Hawaii when I discovered that I was having issues like this and had to fix it in Hawaii.
So, I've been kind of burnt by background services, but Quartz uh because the way that you can run the jobs in clustered mode, it can take care of only one instance running the job. So, I could spin up 50 of these and quartz will manage only one of these instances will run the job. So, I said, "Okay, I want to go port this instead of being a background service. Go make it a quartz job." like I I want to go do that. I just don't want to spend the time doing it because it's not really it's not broken. So why spend time fixing it? So new issue. And then I said, "Okay, well if I do that," I said, "There's a consuming side as well." Okay, so the way that this works is theuler fires. And I wanted the scheduler to be very dedicated to like you will um you'll basically queue up the set of work that has to happen.
But there is a consumer of that. Okay. And the consumer of that needs to then go execute the actual postings of social media. Okay. So, not that complicated. But one of the tricky things is that generally the stuff for posting to social media, it's relatively quick. Um, it kind of isn't if you have to post videos and the API requires that you're streaming the file. A lot of the APIs allow you to provide a URL instead and they will go stream. But if you have to stream now, I have to download it on the server and go upload it. So that can take longer. Now, if that's happening the way that it's currently set up, there's this situation that can occur that if especially if I want to get into any video processing, all of a sudden, if I have to do that in line, I'm going to start delaying people's posts.
Don't want to do that. Like, if I have two different people that are posting back to back and the that scheduling consumer has picked up the work, not going to be okay. It's going to slow it down for the second person. So if I use uh mass transit and quartz, I can essentially get around this. I can go schedule multiple jobs and then I could actually spin up multiple instances if I needed to. Suddenly our monolith is becoming a little bit more microsery. Um we don't need to do that by the way, but um I want to make sure that I can decouple the posting and not block other people from from posting. Okay. Conceptually, I know how to do it. I don't want to spend time doing it because it's not broken. Why should I fix it? So, I made another issue. Copilot, I haven't um I haven't merged the code for it, but it wrote code that I'm very content with for both of those things.
It's very straightforward. I can see that it got rid of the background service. It basically just lifted the code from one, put it in the the quartz job. It took the scheduling patterns that I already have for quartz jobs. Like it's it's lovely. So, it just did two of those things while I was hanging around like not worrying about anything. Um, I'll give you another good refactoring one. So, uh, recently we had some examples of basically well two things. Well, we have this code that will basically resize and reformat images. So, if you I think what's a good example? Tik Tok is picky about the image types. Like you can't post a PNG on the Tik Tok API. So, I have logic that will handle this. But we also started to run into not only like uh image sizes in terms of the length or the width and height and the format, but also the actual size of the media.
We had a user that messaged support and said, "Hey, my stuff's not posting." And when I looked into it, they were uploading 10 meg megabyte pictures. So, I said, "Okay, there's a solution here." I looked at the resolution of the picture and it's like 8,000 x 5,000. I said, "No social media platform even is going to support that. I already have media conversion logic." Okay. So I said, "Hey co-pilot, I make an issue and I say we need to go scale the image size down." Cool. PR for that. Looks good. I pulled this one down. I had to massage it a little bit, which was fine. And I was content with it. Tested it out. Awesome. Worked. So co-pilot did like 85% of the work on that. Super super cool. Then this is the refactoring part. When I was looking at the code later, I was like, you know what really sucks?
I have some parameters that are passed around and like basically I make an object, put some data onto it, pass it into a function, but it was designed poorly. Like basically the input parameter was also like the a newly formed output parameter. It's kind of weird to explain, but uh one clear example is like my image uh size calculator needed to take in like a background color. That makes no sense to calculate the image size. You don't need a background color, but the result object was passed into something that would do the actual canvas drawing and that needed background color. So, it was just like the parameters were kind of in the wrong spot. So I said, "Hey, co-pilot, I have these types of um converters, right? One's an image size calculator. Uh, one is the actual thing doing the rendering.
They're currently using the same objects, but they don't really make sense based on the task at hand." So I told it, I need you to go refactor this code and introduce new data transfer objects, DTO's as records where we're only passing the relevant information around. And the work that I had done previously was putting the like the a byte- size limitation in some spots for like the um the size calculation for the dimensions doesn't make sense. You don't need it there. So, it did this refactoring and it did it perfectly. It worked the first time. This is the exact kind of thing I don't want to waste any time doing. It was something where I'm like, man, I see it in the code. Like, not today, man. Like, I got more important to do. But all I had to do was type like a little note, like a a GitHub issue, and it did it.
So, it's been good at some of the refactoring tasks. It's been good at some of the features. Um, I did some front-end features last night. I asked it to uh when I added functionality where when users lose their authentication to social media platforms, we track that now, but our front end doesn't show it. and I don't like working in the front end cuz I like C. Front end's not in C#. So I said, "New issue. This is the information that's on the API. I want you to go change the UI to to basically highlight these rows that need attention when authentication's lost." Worked perfectly the first time. Perfectly. And then I said, "Hey, if it could do that perfectly, here's this other fun thing. Apparently, for some reason, when you authenticate with YouTube, if you ever try to reconnect, Google like it just doesn't work." Google's like, "You've already attached this application to your account." So, you need to literally go into your Google account settings, remove our application, and then you can reconnect.
It's really dumb, but it's confusing as hell because in our app, if you try to refresh, it just fails. So now when you press the and I basically made this issue and I said, "Hey, when someone's trying to refresh their YouTube, you need to put up a modal dialogue that tells them they have to do this step first manually." And all that I told it was that Google has um for Google specifically uh the app is basically associated with their account and they need to remove it. Something super high level. I'm not doing a good job of explaining it, but when it made the UI, it made it perfectly. It put the steps in there and it actually knew what steps to take without me telling it. like it it figured out that it had to go do like like go into your click here in your account settings.
I never told it where to go and then it explained that you need to go do it and I was like I can't believe this. So two of those front-end features work perfectly. And those have been things like you know we should have done them months ago but they're just not a priority. So, what's really cool is when something comes to my mind now, I can just go make a GitHub issue, which is one of the last things I want to talk about here, which is super cool. So, this is the part where it's not doing well, like it's not giving me a perfect solution, but it's at least letting me like evaluate some of my thoughts. And I just want to give you a couple examples. the one from this morning. I said it would be really cool if we had some type of like feedback system uh on some of the data we have.
So when posts fail to go out like every week basically or on a regular basis I should be reviewing that like what are the patterns we see? One of the patterns I told you about was that like you know image formats or in this case it was like image sizes but like what other patterns come up? Is that something we should be building for? Is that something we should be telling users to avoid? What are the patterns? The other part to that is that our notification system, it's not live right now because we need to do a better job with making the notifications actionable. Right? You lost authentication on your account. That's pretty actionable. If a post failed to go out on Tik Tok and it's because of error code uh 37, like what should the user do, right? Like I don't I don't know what error 37 is unless you play Diablo 3, but we like we need some of that insight.
So, I made uh an issue and I said, "I need you to go build a quartz job that on a weekly basis pulls this information. It emails it to me with a prompt for an LLM that when I feed it all of this data, it can go suggest the right issues to make in GitHub. The next step, if that works, is like I'm going to hook it up so that it can actually publish the issues. But maybe that's for a later day. So, um, that one it built while I was at work today. I have been super busy, so I haven't checked it out. I scanned through it a little bit, but like I haven't pulled down the code to try it. Um, what's another one? I wanted to do uh automated social media post when brand ghost hits milestones. So, for example, when we um say we get a thousand users, I want to do a social media post automatically from from Brand Ghost.
When we hit 1500 or 2,000, like these different milestones, I want to be able to have like celebration posts. So, this is something I've wanted to do for a while, but it's really low priority. Like, no user is asking for that. I just think it's a cool thing that we should do, right? like let's use our own tool to go post stuff that's an interesting milestone. So when I was writing the uh the GitHub issue, I was like, "Okay, well there's like total users. Well, what about the total number of posts? What about the um like a new peak of weekly posts? Like that would be kind of cool. Like we're now reaching like a 100 posts per week or 200 posts per week or a,000 posts per week." So, I just wrote like all of these random like this could be cool like in this list and it made a pull request.
This is one that it absolutely didn't get right. Like I scanned through it. I haven't done it in my own IDE. I haven't pulled it down. But when I was looking through it, I'm like this is definitely wrong. But I actually didn't expect this one to work. What was really helpful is that I got to see how it would structure some of the code. And really the the biggest thing that's wrong is like how it's trying to calculate this stuff. So for example, the total number of users, the way that it calculated this, it was like, hm, I see a few tables in the schema that have a user ID. I should count the distinct user IDs across those tables. And I'm like, it it's it probably gives you close to the right answer, but that's not how you would do it. That's not how I would go do it.
So, it's like it it tried its best to go do all these metrics. And I actually think if I go through that and maybe I just refine it to I only want to do like total number of posts or something, that might be a super easy one. Maybe it'll blow away the rest of the code and maybe total number of posts just works, right? So, it was a really cool experience that I could just say, I've had this idea for a while. Instead of it never getting done, let me just see what it can do. Maybe it'll get it most of the way there and then I can spend a few minutes and try it out. Um, so that was really cool. I'm going to keep doing that when I have like some inspiration instead of me being like, "Oh, like we don't have time for that." I don't know.
Maybe I'll just ask Co-Pilot to go check it out. Um, I'm not going to explain what it is, but I'm vibe coding lit. This is actual vibe coding through poll requests. I'm vibe coding an entire application and I'm not going to even try it out until I feel like it has the MVP of features as an experiment. Literally, it's done like five pull requests. I've never checked out the code. No idea. Literally no clue. I'm going to keep going until it has the minimum set of features and then I'm going to pull it down and I'm going to try it out. So, that's a cool one. Um, one where it like really didn't work. Uh, well, two actually. I'll give you two examples. One was from yesterday. I was trying to um there's a bug I want to fix where we have like an invalid state showing up somewhere.
And it's coming from I'm pretty confident it's coming from a SQL query. It's uh it's just mislabeling something. But this is a SQL query I really don't like working with. I know what it's doing, but like it's very finicky. Like if I touch it, it seems to like it. It's either like it takes a performance hit or it like the records are just wrong. So I'm like, I know I got to touch this query. I know it's got to happen. But um I said, okay, like here's what the issue is. I said, it's it's coming from this query. you got to figure it out. And it confidently told me, here's the fix. So I said, okay, I'm not pushing that one live until I try it. Clone down the code, you run it. Absolutely not right. Okay. Give it some feedback. And this cool cuz you just go onto the poll request and you leave a comment and then submit the review and then it just goes off.
So, this is kind of cool cuz I was doing this like while I was working when it had updates. I could, you know, and I could take a break. I could just go check. So, it gets to the point where I I'm like, "Okay, like it was not right and then it was slow and then it optimized it and it was like lightning fast and I was like, "Holy this is actually faster than it was when I started." So, like it's getting uh some states back that are right. So, I'm like, "Okay, it's at least including them now." And it's faster. So, I'm like, "Hell yeah." Like, "This is this is so good." But it took a little bit of a turn cuz when I was looking at the results, I was like, "Wait a second." Like, these are just like tons of results that are not right.
So, this is an example of like letting it go down a path where like, you know, maybe I should have checked uh more carefully early on, but I might have sent it down this wrong path and I spent more time trying to optimize the query versus actually having correct results. So, that didn't work out so well. And then there was another one with um with our tests. So, I wanted to get our XUnit tests. I've had this stupid problem where I need to run the tests in parallel, but the runner treats each assembly as its own process when it runs. And I just want to start up one Docker container before the test. Run the test. Tear down the Docker container. But there is no global setup point when you're running across these different assemblies. It doesn't work that way.
So I tried to get it to go down this path where it would uh basically create like a testr runner, but no matter what I was doing, it was just giving me like really silly results like code that didn't exist. Like it was such a waste of time. So, this one complete failure. Um, it was a really open-ended thing to go solve. Like I said, here's the problem. Like, go solve it. And didn't tell it how to do it. And it it failed miserably. So, you know, the other examples that I was telling you about where it was open-ended, yeah, those also didn't work, but I didn't expect them to. And I think on this one, I was let down because I was like, it should really be able to get this. and it didn't. So, there's two examples where it didn't work. I forgot before telling it to do that with XUnit.
I told it to go um because I'm when I'm making videos for Dev Leader, one of the uh next videos I want to do is XUnit using Microsoft Testing Platform. So I said, "I need you to go upgrade all of the XUnit to be on the compatible version for Microsoft testing platform, which is V3." And then I said, "Then we don't need the um we don't need to include the XUnit runner anymore." So I made that issue, assigned it. It did it perfectly the first time, right? So uh Nougat package migration, it just worked perfectly. So, all of this to say, I'm just trying to share my experiences, but I've been very, very, very impressed. Um, I I think it's super exciting. I love the idea that I can just go make an issue for my phone if I have an idea, assign it to Copilot, and go check it out later.
It's super cool. Um that doesn't change uh my actual development experience right like when I'm working in the IDE that's not helping me at all but from like a project maintenance perspective super cool I forgot another one was like I said I don't have any documentation for like internally for development and um I said okay I want you to go audit the codebase here are some things I want you to focus on and I want you to go write documentation around architectural patterns that are established and then I said I want you to talk about the drift in architectural patterns that you see and put them in another document because we're going to use that as a living document to go address drifting architectural patterns.
So to give you an example um some of the API endpoints some of them throw exceptions some of them uh return uh like an I result in C and some of the some of the common checks that we do are done in one way and they're done a different way in some other places. So it audited that and it wrote documentation on it. So, it's like this is pretty cool. Like, it all makes sense. Uh, another thing that I had it do is I said, I want you to go address in the list of drifting patterns. I want you to go address the first one. And then I said, which is these uh exception throwing versus I result patterns that we see in the web APIs. Said I want you to go fix that across the codebase.
Then I want you to remove that from the drifting patterns and then I want you to go update the architectural document to state this is a like a pattern that we have in the code and it did it. The only thing that it didn't do that or sorry it did this extra thing that I didn't like is it wrote an extra document about the migration which I thought was it was actually kind of cool that it did it. So it it went and calculated I don't know exactly how it did it but it said like 97% of the APIs are already doing like this you know expected pattern. I will address like the other 3%. And I'm like, it's super cool, but I don't want that as a document that lives in my codebase because once this is committed and checked in, like I don't I don't need a document that says 97% used to be on uh following this wrong pattern.
So anyway, I just said like get rid of that document and then everything was great. So, I had updated API endpoints that follow the pattern I want. I had updated documentation. By the way, like a lot of this is just done from my phone, like hanging out with my wife, just like super cool. And if it wasn't from my phone, it was in between like me doing uh you know these performance review documents where when I take a break, I can go check what a co-pilot go output and I just go through a bunch of poll requests. I'm just reviewing code and giving it feedback. Uh so I I've just been super impressed with it. So I really have to follow up on what was so bad with the Microsoft stuff. Um just hasn't been my experience. It's been complete opposite. Anyway, I don't know if that was all over the place, but it was super fun to talk about cuz I love talking about code.
And now I have to go do a live stream. So, if you made it this far, a friendly reminder, Mondays at 700 p.m. Pacific on my main YouTube channel, which is Devleer, I do a live stream. Come check it out. There's other people from Code Commute that hang out. See you next time. Take care.
Frequently Asked Questions
These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.
- How has your experience been using GitHub Copilot agents with pull requests compared to other agent tools?
- My experience with GitHub Copilot agents using pull requests has been surprisingly positive compared to other agent tools. While other agents often did stupid or invaluable things requiring more time to fix, Copilot has delivered reasonable results and performed at the level I expected agents to perform. It has helped me complete various tasks efficiently, from refactoring to feature development, which was a refreshing change.
- What types of tasks have you successfully assigned to GitHub Copilot agents in your projects?
- I've assigned a wide range of tasks to GitHub Copilot agents, including porting background services to Quartz jobs, refactoring code with new data transfer objects, adding front-end UI features, and writing documentation for architectural patterns. It has also helped with automating social media posts and upgrading NuGet packages. Most of these tasks were completed well, often perfectly on the first try, saving me significant time.
- Are there any limitations or failures you've encountered when using GitHub Copilot agents?
- Yes, there have been some limitations and failures. For example, Copilot struggled with fixing a complex SQL query and failed to create a proper test runner for running XUnit tests in parallel. Some open-ended or complex problems did not work out well, and I had to spend time reviewing and correcting the results. However, these failures were expected given the complexity, and overall, the positive experiences outweighed the negatives.