GitHub Copilot CLI vs Claude vs Others - Is It All In The Tool?

GitHub Copilot CLI vs Claude vs Others - Is It All In The Tool?

• 541 views
vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinoengineering managerleadershipmsftsoftware developercode commutecodecommutecommuteredditreddit storiesreddit storyask redditaskredditaskreddit storiesredditorlinkedin

A few thoughts on AI tools, how I'm using them, and how I am hearing about others using them. I think this will be a fun one to reflect on months or years after it was recorded!

📄 Auto-Generated Transcript

Transcript is auto-generated and may contain errors.

Hey folks, I'm just leaving CrossFit. I figured we'd do a little AI talk because I I think it's important we're doing these pretty regularly on this channel just so that we're talking about how things are moving along in AI and I feel like it's a good thing for me at least because >> um I made it a goal this year to make sure I'm playing around with different tools and stuff like that and uh this kind of gives me some accountability to talk through the things I'm learning. So, um, with that said, >> Mhm. >> I've been playing around over the past few weeks now with a co-pilot CLI. Um, which might sound based on the timing that you're watching this, and I actually don't know when you're going to be watching this, could be far in the future. Um, around this time, there is a ton of hype around around Clawude Code and um, and rightfully so.

like it's it's a great tool, but uh I mentioned this to someone the other day and I said like I'm seeing a lot of people that are super hyped about clawed code, but the examples of things that they're talking about I'm like man like that's not that's not like a clawed specific thing. Like um it's it's kind of uh I don't know what the right word is. It's a little frustrating. It's a little confusing um when I'm seeing people go like, "Oh my god, like Claude is the best thing." And then they give me a scenario and I'm like, "But but like Claude could do that a year ago and Copilot could do that a year ago and cursor could do that a year ago." Like um I think it's really cool that people are finding uh a tool or a workflow or they're having an experience and they're having an aha moment.

Like don't get me wrong, I think that part is awesome. Love to see it. And that's I think part of the frustration is like I'm excited for people to have that, but I'm going but you're talking about it almost like a way that I don't appreciate because I feel like it's misleading because a lot of the time it's tied to it's because Clawude or it's because a specific tool and I'm like it's it's really not. It's just that, and I could be very wrong about this, but I feel like some people are having experiences with these tools because it happened to enable them to do or discover in a certain way. That was the dumbest pull out of a car I've ever seen. You almost got t-boned. Um, you couldn't see that. So, it's like they they they're attributing the success of what they're doing specifically to the tool.

And I'm like, I think that's I think that's a dangerous path because at least that the things I'm seeing most of the time get a ton of like, oh my god, around them. It's like it's that's not it's not tool specific. Um, and don't get me wrong, like I I got to be clear about this because I'm obviously not giving you tons of uh real examples here, but there are absolutely things that uh I think Claude can do tremendously well, right? Or there are some really cool experiences and workflows where people are like orchestrating multiple agents doing complex tasks and they're getting efficiency out of it. Like that kind of stuff. I'm like hell yeah. Like super neat. Um maybe the way that Claude is set up around agent orchestration is ahead. Or um maybe I'm just making this up. maybe how skills are done in um cursor versus co-pilot like one of the tools is ahead and it's having some sort of breakthrough functionality.

Yes. Like I think that we will and we we'll continue to see these tools each one take a step and it's like oh my god look at this and then the next one does another thing. We'll see this for a while still, but um I'm seeing a lot of people talk about these kind of generic scenarios attributing it specifically to the tool and um that part's kind got me a little caught off guard and I'm not sure how to feel about it. Like I said, it's almost like frustration, but like I'm happy that the the people are having these aha moments. to give you a a very specific like minor example. Someone was um the other day talking about um about like a special feature that's exclusive to Claude code and they were talking about context management and they said yeah but like this claude

does this and like other tools don't and like this is something again it's a minor thing but something like stand out and they said if you type slash context you get this special readout from that only claude code does and then it gives you a breakdown of like your context and usage and stuff like that and immediately and I'm not I realize yes I work at Microsoft so you might say oh you're going to be biased for co-pilot I use I use all these tools but I went to the co-pilot CLI I typed /context and it has an identical readout same format same information and it's like I think there's a danger in uh like at this point in time with how fast these tools are moving.

I think there's a danger in trying to be like this tool is the tool that rules them all because it's all changing so fast that um you know if you if you start to have that mindset, you're going to be missing out on some of these other things going on. So it's one of the primary reasons I'm very motivated for trying different things out. So, Copilot CLI has been a thing I've been trying out. One of the reasons why is because if you haven't watched my other videos or seen my other content online, I am a C developer. I live in Visual Studio. Um, you know, I've been using an IDE like Visual Studio for years and years and years. I am not a terminal-based developer. I don't like being at a terminal or command prompt. Just never have. doesn't doesn't jive with how I like to build stuff.

Now, that made it a good experience to try using Claude out last year, and I I still do use Claude um because it forced me to try something different. And that's another reason I've been trying to do the C-Pilot CLI is like, hey, here's another tool. And admittedly, I've been really enjoying using the Copilot CLI. I've been really enjoying it. Um I don't know why. I've been trying to to think about this over the past few weeks. Um, and I think what I've arrived at is that I think some of the tool calls like I don't know uh like structurally co-pilot in Visual Studio or VS Code versus Copilot CLI. I don't know structurally if the orchestration's different. I'm imagining like in my mind I'm like doesn't it make sense to make it the same? Uh, but I'm imagining there's differences. Um, and that means I'm imagining the tool calls are probably different.

But I think again what I'm imagining is that in Visual Studio everything's around a solution. Not even I'm not even talking about VS Code. In Visual Studio, everything's around a solution. You open a solution. A solution has projects. And I feel like how a lot of context is probably managed. I'm making assumptions here, but I feel like it's probably managed around your solution contents. And when we're using these tools on the command line, I feel like we're thinking about it more from a like a file system perspective. So, um I don't I don't know why. I would imagine that if you have an IDE and IDE integration that it would do a better job around context management because it has, you know, uh more rich concepts than just raw text on disk, right?

like can't it can't it infer smarter things because it can do more creative higher level I don't know querying thinking thinking is maybe a bad word but but seemingly I feel like in the terminal it's it's just getting better results with context management I think that's probably one of the reasons why I'm having a better experience that way and the other thing that I would add is that it seems like the loop you know, I feel like we're, you know, we're seeing like Ralph loops and stuff like that, but the loop around um the agent trying to do work and going, "Oh, like that's not right. Let me adjust. Let me adjust." Um that loop to me seems to be better in Copilot CLI versus Copilot in Visual Studio. And I I don't have like data to back this. I'm one person, right? Like I don't this is my experience.

I'm wondering if part of that's because the co-pilot CLI is shipping faster and so like genuinely it might be much more improved compared to the last time I updated Visual Studio. I don't know. But that's like one one thing I speculate. But I've been having a lot of fun with it. Um I've been mentioning in a you know past few videos I've been working on my dependency injection. Um it's like a scanning library. So, I didn't reinvent dependency injection, but I'm I'm scanning for plugins and types and automatically registering them using source generators, which is a new thing for me. So, it's been really exciting to learn, try these things out. Um, and then, you know, the last few days I spent trying to build a couple more features out that I think are interesting. And those, you know, were they came from Copilot. I asked it to compare what we have going on in Needler to maybe some other languages, right?

There's dependency injection in other languages. What are they doing? Is there something unique there that we could bring to Needler and do source generators for? So, it's been it's been doing some uh proposals around that and that's been kind of cool. So, it built a couple new features with Copilot and it's doing all of the work. I'm guiding it. Um, and then I had it do some like a GitHub page so that I have uh better docu Well, is it better documentation? I don't know. I guess people will be the judge of that. I feel like better organized documentation and I did uh benchmarks. So, uh once a week, we'll see if this actually works. Once a week, if the benchmarks haven't been run on the latest code, it will go run the benchmarks and publish them. And then there's a page that you can go look at benchmarks.

And so like I built that with Copilot, right? Like it did all of the work. I just guided it through what I wanted. So that's been um it's been a really good experience. Now, one of the conversations I had online recently was around like I asked my audience uh around when you're getting AI generated code like how much are you scrutinizing it compared to say a peer and I I'm just started this survey so it runs for a week and I'll see LinkedIn usually has the best uh like sort of volume for the surveys I run. So you usually get like I don't know half a thousand people responding or so somewhere between 500 and a thousand. And so we'll see what people have to say.

But one of the comments that came in I it kind of aligns with my perspective on it which is it it depends typical software engineering answer but um I think in some code bases I have when I'm using AI I I almost expect it to mess up a little bit more and so I have to be uh a you know more clear. I have to watch it closer. I have to build more guard rails. And so like for example, brand ghost, I feel that way. If I ask it to do something, I'm like it might have good ideas. It might get me 50 to 70% of the way there on some more complex things, but like I should expect that I'm doing work. Um, and it's because the level of investment into prompting it or putting it back on track is like there's diminishing returns there.

like let me just start driving from here and it's the opposite in needler very different code base I use needler in brand ghost but um I find that I've had situations and just to give you an example where I'm building features with Needler I talked about this before but it's writing tests I'm reviewing things I'm going this looks good get to a point we start doing another feature or trying something else out and then I realized, wait a second, the way that it built the last features or two features ago, um, the way that it built that is just it's actually just wrong. And I'm like, but it had tests. And then I go back and look at the test and I'm like, man, like I should have looked closer because it fooled me. So instead of me unwinding all of it and going like, "Okay, like screw AI.

I'm going to build it from scratch or whatever," I just keep pushing forward with it, which is not normally how I would develop things. Usually I would go, "This is wrong. Let's revert that stuff. Let's pull it out cuz it's wrong." And then go back to the drawing board and whatever. With Needler, I'm leaning in the other way where I'm like, "Hey, that was wrong. um now that you see the patterns you have and I'm telling you what's wrong and redirecting you will you go do the right thing like with more guidance um and then I'll tell it things like you have a bunch of tests already you can't touch those tests like those tests still need to pass right because they they were right you know the the logic in the test was right but you cheated in the test setup so like change your test setup and you know, all your assertions need to remain the same.

Example of this, I had 19 test files that were running some source generation code. And across those 19 test files, each one of them had to because it's source generation code, it has to write its own like C code into a string that it will go run a generator on. Let's go, folks. Come on. And so because it's in a string, there's no um there's no checks like that are enforced. You don't have type safety on code written into a string. It doesn't like it that doesn't make sense. So um across 19 files, it had a whole bunch of duplicate code for setting all these like you know, oh well in needler we have these attributes. Let me define the attribute in this string that is code. And so it was duplicated. Um, I basically had it refactor all of this stuff so that it's in one spot.

And even then, I'm not in love with it because it still is writing code into a string. Kind of needs to happen for a source generator, but I want it to use the real definitions of things. To give you an example, it's we use some attributes to mark types and then it will source generate based on those. If I change the name of the attribute, right, I do a symbol rename. Um, a symbol rename would rename all of the code, but not the code inside of strings. So, you might say, "Well, Nick, don't be stupid. Do a search and replace on the actual thing and replace it." And I get it. That makes sense. But the point is it's it's brittle. And so I'm now curious now that it's all in one spot. Can I say like be more intelligent about this? Like I actually want you to look at the source files, pull in the source into this string.

Like we'll be more creative. Um because I have seen these patterns creep through where it's like I'm inventing a feature and I'm going to define this new type inside of this string. run a source generator on it and it works. And it's like, yeah, but that code doesn't exist in real life. So, when that happens, I've been pushing through with the LLMs and and just having it like build on top of, right? Like, here's the here's the issue you did. Here's why it's an issue. Go address this. And um back to the level of scrutiny thing, like it's making me realize there's different things I need to scrutinize. Like yes, test coverage. Good job. You tested that. But you only wrote source generator tests. I need to see that feature working for real, right? I need to see real tests, not just that the source generator output the right code.

And then you know uh I'm sometimes reflecting on the test and I'm like your test name in the scenario is really good but what you're asserting is like is not sufficient right it's like make sure that this feature works and then the assertion is like well at least the list isn't empty and I'm like dude yeah the list shouldn't be empty and it should contain xyz and this other thing needs to be configured and you just checked the list isn't setup was good, test name is good, the scenario is good, your assertions are dog So, um I'm I'm starting to realize like as I'm moving very quickly building with it, there are certain things I need to be uh scrutinizing a little bit more and that will bring me back to this other pattern like in brand ghost where I said, okay, I need these analyzers to make sure it's doing the right thing.

I think um there's stuff I built specifically into Brand Ghost for analyzers that I will probably make a I don't know more of a a common like when Nick is building software I run these analyzers because um kind of like common prompts and stuff like that. I'm sure I'll build a better library out for myself. I'm doing this with skills which is the last thing I wanted to talk about. Um, so, uh, I've been trying to in brand ghost, we're doing more, uh, looking at, uh, other types of content creation, so like blogs. And for those of you that don't know my backstory, um, my main channel and my social media like brand is called Dev Leader. Code Commute is the spin-off where I I vlog about, you know, similar topics. But this guy is not going fast enough, man. This is insane.

Um, Dev Leader started in 2013 and I started it as a as a blog and so I was writing about uh, you know, transitioning from a software engineer role into a manager role and, you know, like kind of let me document what I'm learning and things I'm I'm reading about blah blah blah. But it started off as a blog and then I gave up on it. when I revived it a decade later um I revived it with YouTube but I was also writing technical blogs and uh in 2024 I stopped writing blogs because the amount of time that was going into blogging um versus me making a YouTube video um it it didn't make sense to write blogs anymore just uh too much too much time invested and like it's not getting the the same amount of visibility. So, I stopped. Um, now now that I use brand ghost, it's at a spot.

This car not driving me nuts. Um, it's at a spot where uh a lot of my uh like my social media is on autopilot. I can create content and purely focus on that. Everything else about my social media is on autopilot. Uh I do still as a human obviously I engage in comments and stuff like that but that's the right now that's my big gap is like I I do not make enough time to go respond to comments and stuff to to grow like on social media uh properly and I acknowledge that um I I work like I I have a job. I I simply don't have time to to do that the way I'd like. But um for brand ghost, we've been trying to write more blogs. We need to make sure that we're showing up in search results. So we need to think about SEO.

So doing a lot more um like inbound marketing this way. And as we're working on that, I realized like, hey, wait, like um LLMs have advanced a tremendous amount since uh I was doing like technical article writing. like can I revisit this and uh do some work around that? So, okay, if I need help writing articles, like what are some things that I normally do? Well, most of my technical articles, especially because I make YouTube videos, I was always trying to link a technical art article with a video. And so I'm like, okay, if I make a it used to be I would write the blog article and then I would make a video because now it gave me like a almost like a script or a format. It's the exact opposite way now. Make a video cuz it's way more natural. I can talk naturally, whatever.

And I would love an article on that. So what would I normally do? Well, I'd want to go to the video. I'd want to be able to get a transcript. I'd want to run that through AI, get some ideas, right? And like so that's one thing is getting transcripts. There's another thing where like I am not a thumbnail expert. I have tried to make my own thumbnails and you know get better at it. I it's just another thing to go spend more time on scaling up on. And honestly the AI image generation is so much further ahead than when we started with the initial like dolly models. So like, okay, I need I need to generate blog images. Okay, so that's another sort of task I repeat. And I realize that there's a bunch of these smaller individual tasks. Like I need to take the image, I need to upload it to blob storage for my blog, right?

I need um I need to be able to do an analysis of my article to make sure that it's SEO optimized. I need to make sure the structure I'm not missing anything. I need a structure analysis. These are all things that I have to constantly repeat, right? And if I'm asking an LLM for help, like I'm going to ask it to go do those things, too. So, these are all examples of things that are actually not um coding related specifically, but these are actually skills that I vibe coded. So, I worked with co-pilot, vibecoded a lot of these skills. Um, I noticed that Copilot, like once I was stringing these all together, Copilot on the CLI seemed like it was being super inefficient with things.

Like it would get into these funny loops where it's like trying to like optimize for SEO and being like, "Oh, you need keywords here." And then it would run the structure analysis and be like, "Oh, but now the structure is broken." And then it would try to fix the structure. And then it's like, "Yeah, but now the SEO is not scoring well. I'm like, "Okay, this seems like it's just being stupid." Um, so then I'm like, "Let me try Claude." Okay. And then I realized, oh crap, the way that some of the skills are written, because it was vibe coded by co-pilot, it's very specific. So then I worked with Claude and I was like, "Run these skills like do do this work and we're going to supervise and every time you need to fix something, ask for help or whatever, like we have to go patch these skills up." So now I'm having two different agent orchestration systems, navigate.

And then of course uh because I decided not to pay for Claude Max anymore, I run out of tokens like instantly. Um ridiculous. So I said, "Okay, well, what's another agent orchestration system?" I have cursor. So we jumped over to cursor and I gave it the same task. I said, "You're in this repository. You know, do um look at this article. I need you to SEO optimize. I need you to do structure analysis." And um and ran it through cursor. And honestly, out of all three of these tools, uh, Copilot CLA, Claude, and Cursor, and I never use Cursor, rarely, it's not never. Cursor did way better. The model itself goes so fast for writing things. It's insane. Um, I don't even know what model it used. It was on auto, but whatever it picked was super fast. Um, the It didn't get stuck in loops.

It didn't need help on the tools. Um, it I was I was very very very impressed. Now, does that mean that I'm like, well, screw co-pilot CLI, screw claude, cursor is now the best. No. Um, it's just a reminder that I need to go keep trying these other tools out, right? the usually a model taking longer to think and reason and like go write code and design things like pardon my language I don't give a take your time go build things and I much prefer your reasoning about stuff for the stuff that I was doing in cursor that feeling of watching it just move through stuff quicker I'm like that is the right experience that I want in that moment absolutely now after it's done I'm reading through some of the stuff and I'm like, "Oh, that's actually inaccurate." And I'm going, "Okay, do I

need a skill to go give the LLM to go validate things?" Like, if it's trying to give a stat, I'm like, "You need to go site that." If you can't site that with a data source, take it out. It doesn't make sense. You're making up. So, that is a skill that I needed and did right. Um, if it's like giving a code example, right? So it's like for SEO I need to have some structure to it and it's like oh this would be a good spot for a code example and it drops a code example in if that code example is just wrong it might not actually know that and like could I build a skill maybe like if you're going to put code in here make sure it compiles but I'm like oh it's going to be a can of worms now.

So, I'm like, there's some things where I'm not going to go overinvest into and I will as a human make sure I'm reading my own stuff and then I can post it. But all of that to say that uh the more recent things I've been doing are kind of shifting again between C-Pilot, Claude, Cursor. I do want to try Codeex. I still have not tried Codeex and working on reusable skills. I have something that I want to try at work, a little experiment, uh so that I can contribute back to some AI stuff and uh I will see based on how that goes if I can report back here just for some general learning. But the stuff's fun. So anyway, that's my AI update for Monday. See you in the next video. Take care.

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

How do I manage context differently when using Copilot CLI compared to Visual Studio?
I imagine that in Visual Studio, context is managed around a solution and its projects, which provides richer context than just raw text on disk. In the terminal with Copilot CLI, context management feels more file system-based. I think IDE integrations might infer smarter things because they have higher-level concepts, which could explain why I have a better experience with context management in the CLI.
What is my approach to scrutinizing AI-generated code in different projects?
I treat AI-generated code differently depending on the codebase. For example, in Brand Ghost, I expect the AI to mess up more, so I watch it closely and build more guard rails. In Needler, I lean into the AI's suggestions more, guiding it to fix issues while preserving existing tests. I realize I need to scrutinize different aspects like test coverage and real feature functionality to ensure quality.
How do I use multiple AI tools like Copilot CLI, Claude, and Cursor for content creation and coding?
I experiment with different tools because each has strengths. For example, I use Copilot CLI for coding in C# and source generators, Claude for agent orchestration, and Cursor for fast content creation without getting stuck in loops. I don’t rely on one tool exclusively; instead, I keep trying new tools to find the best fit for specific tasks and workflows.