The Token Measurement Mistake Hurting Your Productivity

The Token Measurement Mistake Hurting Your Productivity

• 47 views
vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinoengineering managerleadershipmsftsoftware developercode commutecodecommutecommuteredditreddit storiesreddit storyask redditaskredditaskreddit storiesredditorlinkedin

From the comments this time, this viewer wanted some perspectives on how to navigate token usage and measuring productivity.

📄 Auto-Generated Transcript

Transcript is auto-generated and may contain errors.

Hey folks, we're going to the comments today. This one's from Epic Technav. Thanks for another question. Um, going to talk about token usage. So, I think in particular question is really about when you're being measured in terms of token usage tied to your productivity or effectiveness, that kind of thing. uh and I got to be careful because when I say this the current state uh of this question is that if you are spending more tokens you are doing a better job and I wanted to highlight that because I think depending on when you watch this video. Um that may or may not be a a good assumption. So what do I mean by that? Well, seems like for the past while, and I don't know how long, but for the past while, right? Of course, everyone at every company, AI, AI, AI. We need, you know, we're adopting AI at our company.

Therefore, we need to see that we're getting the the productivity boost that we expected. Like, how do you how do you measure that? Well, we don't know. Um if like unfortunately it seems like measuring productivity is not a uh a very agreed upon thing in software engineering. So like don't know but uh we know that we got to see people you know got to see people adopting it right. So what better way to measure your productivity than by this raw metric that just tells us how many like how much you're sending to an LLM so the tokens right so the idea being that there is a correlation between you know how many tokens you're sending to your productivity because if the assumption is that if you're using a and I'm I'm I'm stating this as an assumption. By the way, I'm not saying this is my belief.

But if the assumption is that by using AI in general, then your productivity should go up. Then the idea is that the more tokens you spend, the more you're using AI, therefore the more your productivity should be going up, which uh I think is kind of ass backwards. Um, don't get me wrong. I think if you were going to say, well, there is zero token usage at all, then that should imply that there is no AI usage. And then you might say, well, then there's no opportunity to even get that performance increase. But this is just like a I don't know. I feel like this is a typical human thing where, you know, you get you get what you measure, right? So, if you're measuring token usage, you're going to get token usage. But that's the thing that you're optimizing. And when I say optimizing, I don't mean from an efficiency perspective.

I guess it's quite the opposite. Like you're you're maximizing that that metric. I think it's the wrong metric because if you just wanted to, you know, maximize token usage, can't you just put something in a loop and make it send messages to an LLM? Fill the context, blast it away. Fill the context, blast it away, do it on repeat. There you go. Productivity, right? Um, like obviously when I exaggerate like that, it's I I hope it seems pretty apparent that's not like the the right thing to do, but uh I get it, right? Like there's a lot of people at places where it's uh it's token usage, right? Like how much if you're not blasting through tokens and like you're not really using AI. I think this is going to flip back and forth like crazy. Like the time I'm recording this, there's I think a lot of different model providers are now changed their their token billing.

People are freaking out saying like, "Hey, like I have a a subscription or whatever and like I'm an hour into my monthly subscription and I've already blown through like my budget, which is crazy." But like I think we're going to see uh this bounce back and forth where like this isn't going to work well. Um so we might get different pricing models again and then it'll flip back. But I think ultimately, you know, the prices of of tokens will come down um in the long run. I think that we're going to have and these are just predictions, by the I have literally no um no crystal ball or scientific evidence to prove it. But my my suspicion is that long-term token prices come down. I think we'll see combinations of things like like we do in computing like over many years where it's like uh we're going to see models get pushed to the edge, right?

So everyone it's like bring your own model. everyone's going to have, you know, pretty powerful models that uh especially compared to what we have today uh at home, right? Like on your your laptop, your workstation, on your phone, and then that'll get again pushed back to data centers. I think we're going to see kind of just go through this uh these different transitions like we've seen for a lot of different things in computing. And that's I think a lot of that coming down to just like uh how we are optimizing things and where there's opportunity to do so. So anyway, the point is that right now that might not be the case and it might be that you're working somewhere where your token usage is what dictates your effectiveness, your success, how how productive you are. And if that's the case, what do you do about that?

Because Epic Technab's question was if you're using something like C-Pilot and like there's obviously limitations to all these tools. I don't know if he was specifically asking with C-Pilot having limitations. Uh but because then I would say well compared to what? But um if you have to step in and you're coding stuff yourself, like isn't that not using tokens now? And so yeah, I think you know ultimately if the goal is just to you know to put token usage number up higher which is sounds like what you're being asked for just I mean you cheat have have co-pilot sessions doing whatever you want uh all day, right? Like that's going to make the token usage go up. But I think it's the wrong metric and I think it's dumb. Uh and I'm not saying Epic Technav suggested this. just saying like hopefully it seems obvious that's like a dumb thing to have to do.

So if we take a step back I mean I think that anyone who is telling you in this moment oh just you know we want to measure how productive you are based on you using a lot of tokens. I would say like I I anticipate they're going to change their story very soon, which is please stop spending money on tokens, right? It's like it just won't be scalable that way. Now, where's I want to talk about this with a bit of balance because I feel like if I'm bouncing around, it's just like it's two extremes and then we can't get anything done. So I think ultimately when we're looking at this kind of stuff as software engineers, we need to think about AI as a tool. Where can I use the tool effectively? How do I get the most out of it uh for me?

And so I that's going to look different for everyone, but I think that there are I don't know like the more the more time you spend doing these different things and like practicing with AI, trying it out, trying new things with it, the more that you'll kind of learn and understand for yourself where it's valuable, right? So, um, I've been seeing people talk about, you know, I need I need a full spec, you know, we're going to use spec driven development, which by the way, I think great great concept. I'm about to bash it, but with a contrived example, but I'm not against it. Um, so I could use specri development, and that's because I need to build a new app and I'm going to have co-pilot go crank out this huge spec for me. And look, like in no time at all, it wrote out, you know, 50 pages of this spec.

And like that's way more than I could ever do. And like look how much it did. I didn't even have to edit it. Um, and so for that person, it's like very much look how amazing it is at replacing this part of my engineering life cycle. Now, I think there are a lot of traps with this kind of thing, right? Like, so do you do you actually need a 50page spec for what you're doing? Um, you're saying you don't have to edit it, but that's just is that just because it it looks good? Like, is it overengineered? Is it underengineered? Like, did it actually do it perfect? What does perfect even mean? Right? Like I think it's a really interesting thing that we lean we're talking like a lot of people are talking about spec driven development and it's like do you actually know what

software you need to build right like I I actually have not seen many times where the requirements are so clear in the system you have to build that you can just you know crank out a a 50page spec and assume it's going to be the perfect software. right? Like you're going to need to iterate. You probably don't know how the features all need to work perfectly up front. And so I don't know like I think there's a balance, right? So anyone who's cranking out specs saying look it's perfect. I didn't have to edit it or tweak it or whatever. I'm like I don't know like maybe maybe that's true. Maybe you're being lazy. Maybe the reality is you actually don't know what needs to be built yet and you're making assumptions. So, you know, looks good to you doesn't mean the software is going to be perfect.

You'll know how good it is once there's some software and there's usability and you're seeing if the use case is being solved. Um, but you you're always going to need to iterate. So, I think for some people like they're getting a lot out of it in terms of the upfront spec writing. I think for some people it's like there is a ton of value for them in debugging. Uh I was talk I think this is on a video I have to scrap because the audio got screwed, but was talking about using you know co-pilot with MCP servers uh for graphana and helping me debug things. uh just way more effectively than I would have before. These guys gonna let me in. Yeah, obviously there's like scaffolding things. Uh so I was talking with some people even last night talking about, you know, people people kind of complaining that it's like, "Oh, AI is only getting me me 80% of the way there.

It's never going to get me 100%." And it's like, yeah, maybe it gets you very effectively, maybe very effectively through the first 80% of what you have to do. And while the other remaining 20% is a lot of work still, I mean, that's still 80%. Like that's still pretty incredible. But I think it's about finding the different ways to use it. And when you are hitting limitations, are these the question that I would start asking is like, are these novel things? Right. So for me at least when I approach software development um and I think this would apply to other things maybe in life but I think just more obviously for software development if I'm doing things and there's patterns that come up that's often something like it stands out to me. Not because I think I'm special. I think it just some either it bothers me or whatever.

I notice it, but once I start seeing patterns, I don't know why, but I automatically jump to like, well, if there's a pattern, then like there must be something I can do to reduce the pattern. That's just I don't know something about how my brain works at least that that's how I I think about things. So if some if I'm working with AI and there's something that it's doing and it's like it's having a hard time or can't seem to solve a particular type of problem a lot of the time for me it's like is it novel? Is this a have I seen this kind of thing before? Is it a one-off or is this something that I've been seeing repeatedly?

And I think if the pattern keeps showing up then for me I'm like well there's an opportunity to optimize right so for example uh I'll give you one that's not just like solving it through building more automation but um we were talking about uh you know effective what's the the sweet spot for like filling a context and people were suggesting to me that even on million token context windows they're like at you know after 30% it goes into stupid mode for me. And I would say in my recent experience, I'm like I'm having a lot of success even up to like 70% and not noticing consistent issues. But before for sure, before for sure with smaller context windows, it was like, "Oh man, this is terrible." So if that's a repeated pattern, okay, well, what's the optimization for it? Maybe it's like you're going to compact the conversation.

Maybe it's you're going to start a new session, right? So save out current state uh you know update some markdown files to track things whatever it needs to be and then new session right so that's a non not necessarily an an automated way a nonautomated way to to optimize this repeated thing there's other cases where I'm like okay I'm seeing AI uh constantly run into these types of problems right um example I'll give is for for writing tests, right? Um I have a particular project where while I do like to to write my code to be unit testable. Uh and that means that I you know can separate things out with mocked interfaces if I need to. Um I write the code that way but I try to write my test so that everything is being resolved from the same way that I construct the dependency container for my production application.

I resolve my dependencies that way in my tests and that way I'm getting the most real configuration that I possibly can and then I can selectively override things that I might need to mock like a third party API call. Okay, so problem is that I routinely keep seeing co-pilot write tests that violate my my testing standards, right? And so like I have tried many times to like update the prompts or whatever else. Then I moved over to like not only having that in my agents MD file but doing globased instructions. So if you are touching a test file this is what I expect all the way to like okay the pattern still happens. Maybe it's reducing but still happens. I need a Roslin analyzer. And if you're not a .NET developer, that's just like a a llinter on steroids where you can say like when you're going to compile, it's going to run some static analysis and then it can basically look for patterns and then failure compilation.

Right? So if I see that something is getting uh newly instantiated the system under test and it's not getting resolved from the dependency container I could fail the build entirely and then the AI literally needs to work around it. So ultim like I guess what I'm saying in a general sense is uh for me at least I am looking for patterns. So, if it's novel, cool. I spend my time on it. And once I see the pattern come up and it's no longer a novel thing, it's like this keeps happening. Then I'm going, how do I how do I optimize for this? No good parking spots. But yeah, that's my thought process. Uh, if you want more specific things, cuz I realized I kind of bounced around there. Um, I got a couple minutes before I go into CrossFit. So, I think balancing how much spec writing you're doing up front.

So, for me, it's figuring out like if I'm doing a small bug fix, a small feature versus a huge feature that has a really big surface area versus a refactor versus architecture. Um, I I have varying degrees of how much spec I write. Um, I been talking about this. I'm very interested in not I haven't proven that I have a working example of it, but I'm very interested in trying to create really big detailed specs and then like piecing out the work to agents and having it orchestrated and seeing if it can do it end to end. Like that's a very interesting, fascinating thing for me in practice. I've never seen that work effectively. Not saying it's impossible. I want it to be and I'm trying it but more of an experiment but that means that in practice I am doing much smaller specs. Um a lot of it is just more interactive planning mode.

So I spend a lot more time less about crafting like the perfect document that has has you know I don't know it just it feels like the focus is on the wrong thing for me. It's a lot more about the conversation and making sure that I'm aligned and having some of those things high level documented. But some people are going to the point where they're they have the spec. They're carving out what parts are going to be done by agents which with certain models again cost optimization. So I think there's like that whole space. Um, I think that when I am running into limitations of things like co-pilot, uh, or claude or whatever it is, um, I'm trying to challenge myself to like, so if I'm going to go do it myself, which is normal, right? There's going to be things where I'm like, okay, I need to do it myself.

I'm trying to ask myself once I do it, how would AI do this? Did I need to step in because uh you know I had to go log into some system that has some data like one good example is like I still run database queries uh on my production data and I'm like you know what I should probably get an MCP server with readonly access and just let co-pilot do it because that's hitting limitations where it's trying to debug things or design things and it's like I don't have good examples of data like Okay, I step in. I do it. That's a limitation. But is it? It's a current limitation. It's only a limit because I don't have an MCP server for it. Um, I don't know. Like there's probably many more examples.

So, I apologize for kind of rushing here at the end, but uh for other folks or Epic Technav or anyone else, if you have specific examples where you're like, I do this kind of thing and it's a limitation, would love to talk more about that. So, thanks for watching. See you in the next one.

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

How should you think about token usage as a measure of productivity?
I think using token usage as a measure of productivity is the wrong metric to optimize. I point out that if you measure token usage you’re going to optimize for token usage, not for actual productivity. I also note that zero token usage would imply no AI usage, which misses opportunities for performance gains.
What is your general approach to using AI as a productivity tool rather than chasing tokens?
I view AI as a tool and try to figure out where I can use it effectively to get the most out of it. The approach will differ for everyone, and you'll learn what’s valuable by practicing with AI. I think the more time you spend doing these different things with AI, the more you'll learn what is valuable.
How do you identify patterns and decide when to optimize AI usage in your workflow?
I look for patterns; if something is novel, I spend time on it, and if a pattern keeps showing up I try to optimize. For example, I consider context window management and may compact the conversation or start a new session, and I also use tools like a Roslin analyzer to enforce testing patterns.