In this video, I share some of my thoughts on developer tooling for making on-call experiences better for software engineers managing live services.
📄 Auto-Generated Transcript ▾
Transcript is auto-generated and may contain errors.
Hey folks, I'm just leaving the office here. As predicted for fancy Wednesday, no one else dressed up for Fancy Wednesday. Doesn't feel very good, you know. That's okay. I felt good. Felt good to to dress in my fancy clothes. No one said, "Hey, where's the job interview?" Which is a uh it's a nice change from previous attempts in the past at wearing nicer clothes. Um, for today I actually don't have a topic from experienced devs. I kind of looked. We've uh we've exhausted Reddit. I'm just kidding. But like on experienced devs, there was nothing good. There was one topic that I like I thought would be good, but it's not good if I'm driving because we'd have to do an analysis of this person's uh interactions with their manager. And I was like, "Oh, this will be good." But as I'm scrolling, I'm like, man, there's there's too many specific things and I would need to have the list in front of me and we could talk through it.
But, um, interesting read if you're, uh, if you want to go check it out. Um, maybe after some code commute episodes, you can go do your own analysis. And really, this person's talking about a manager that uh, that's kind of new to their team. Seems like a pretty pretty small team, right? like we're talking it's like a handful of people and uh managers just doing all of the things that I would say are like not not good as a manager. Like there wasn't one thing I read where I was like, "Oh, at least they're doing that." Everything was like pretty Um so I think it's an interesting read. uh I have seen this kind of thing just happen with you know IC's as well but I think it's even more uh negatively impactful when you stack on some of the management expectations there as well. So just you know little bit of preview into that.
um was like, you know, they're they're staying the manager is like staying up late to go code stuff and I'm like, hey, they're they're contributing, but it's like they approve their own pull requests that are like hundreds of files. They don't get reviews from anyone. They don't write documentation on anything. They're refactoring the code base for like stuff that the team has agreed is like not not a valuable use of time. They don't show up to meetings like just It seems like a bit of a a cluster. So, um yeah, I don't know. I think even if you you go read through it on experienced devs, uh probably something interesting for you to take away from that. I again, you might not have I've never had a manager that was quite that uh uh ridiculous. But uh I I've definitely seen people on different teams and stuff where I'm like some of those some of those boxes are being checked.
But anyway, um yeah, we're not going to go to experienced debs. I don't have any pending questions. I figured do another kind of like I don't know, just a a blabfest video about like things I got going on, things that are top of mind for me. Uh by the way, I some people don't know this. I have another YouTube channel. Uh, this is I actually had another vlog channel where um I was doing like and I stopped for a while, but I was doing like weekly check-in videos, you know, like 8 to 12 minutes kind of thing mostly and just doing like a a recap of the of the week. And I was doing that mostly for myself. There's almost like no subscribers to that channel, which is totally fine. But kind of like I'm I'm hoping that one day like I don't I've never had a journal or anything like that.
I'm hoping one day that uh I don't know either from content creation or building stuff on the side that I can look at that one day and be like, "Hey, remember remember when I was going through some of that and like we had failures, we had ups, we had downs, like all that kind of stuff." Um, so it's a little uh digital journal entry. So, um, those are kept really high level and I figured maybe if I start blabbing here, there'll be something interesting where I'm like, hey, we can talk more about some software development stuff in particular. So, I think something that's top of mind for me, and I was talking about this a little bit earlier in I think I guess it was a video this morning. I think I did three videos this morning. So, this is a fourth video today, which is nuts.
But, uh, the videos talked about this morning was around like AI tools. And, um, we talked primarily about the usage of AI tools like especially as developers, you know, we got you got your cursor, you got your claude, you got your co-pilot, you got your this, you got your that. And um the thing that we just started to touch on was like around creation of tools like you know you're a development team. How are you using AI in your development aside from the tools that come off the shelf? Like like how are you incorporating that into different parts of your workflow? And so obviously like if I work at Microsoft so there's some things we kind of just get. So like co-pilot is everywhere, right? So we have our uh co-pilot in our idees. We have co-pilot in teams. We got co-pilot in all of our office stuff.
And then um we have like we don't have I guess it I guess they did switch it over to GitHub Copilot but um like our our teams are using Azure DevOps and so originally when GitHub Copilot was uh was getting really going we didn't have that. So, like we had um we had a different agent in DevOps and like I'm just going to be totally honest like from using Git I love using GitHub Copilot. I think it's I think it's great for my use case obviously like with all these AI tools like it definitely screws up. It definitely gets me frustrated. But um from using that and my experience like in my own development I got to pass this person. It's crazy going crazy under the speed limit. I'm like are we stopped here? Um but yeah, for my own development using GitHub Copilot and then using what we had uh initially with Azure DevOps, it just felt like what are we doing here, man?
Like this is crazy. And I I I feel like it's cuz we didn't have reasoning models hooked up or something like that was a big part of it. So I feel like they've they've transitioned over. I actually see that it says GitHub Copilot um on the pull requests and stuff. So I think they're using the same agent now. But anyway, we got all this kind of stuff internally, but like we got a lot of live services going, right? So, one of the things I talked about this morning was that um when it comes to building like AI tools for internal development, there's a lot of people gravitating towards like, well, if we have an agent to do whatever, an agent to do this, an agent to do that, but what seems to happen, and I talked about growing pains and stuff this morning, I think like what's seemingly happening is that we got people that are like, I'm going to build an agent, and then everyone has a custom one-off agent.
for everything because like the agent is seemingly the vehicle for the task, but then you end up in this spot where you're like, why do I have to go talk to like a hundred different agents individually? I'm obviously exaggerating, but like maybe in some places it is like that. So really like we're I'm trying to at least in our part of the org do a bit of a push for like how do we how do we get on the same page here folks? Like it's really cool that everyone has been um trying to find opportunities that like we can leverage AI, we can make certain processes easier, especially for like when you're running a live service and just trying to correlate information. Oh my goodness, there's like even having we have like a you know a chatbot that can do like some like minimal correlation of incidents and stuff.
So we have I'm saying incident that's a very loaded word like a monitor fires right so incident um so when we have some monitors firing it can do like very very simple correlation to be like hey last time this fired like here's what people looked at right like just to get the on call engineer kind of a you know a step in the right direction is it always right like absolutely not like the last time that thing fired could have been for a completely different reason. But you know what? I found that even when it's like, "Hey, you might want to look at this as a as an example, there's some things that fire when I'm on call cuz our surface area is so big." I'm like, "I've never seen that before. I don't know.
I don't know what that monitor is." And so having a couple of of examples to go look that are similar, I'm like, "Hey, that's actually I appreciate that." If I go looking and I investigate a little bit, sometimes I'm like, "Ah, that wasn't actually helpful." But I can certainly appreciate being pointed in a potential direction. Uh, so that's like a a very very simple example. So, it's cool that people have been starting to put these things together, but like we start to hit limits. And so I I'm trying to move us in the direction of like, cool, if we want to go extend that, if we want to go build more, I don't personally I don't want the answer to be like, well, let's just go make one more agent to go do X. We'll have another agent to go do Y.
like um the the example I have that's painful that's like a parallel to this and it might not seem obvious is like um we have a lot of dashboards right so there's we have huge live service lots of like metrics so therefore lots of dashboards for us to go check on things lots of monitors hooked up to those but um when you have dashboards that are across a couple of different systems If it's like if you know the area that you're working in and you're like I know the dashboards we use are like here then fine like no no no worries like it's your domain you got it down awesome but um especially when you're on call and you're like like I was giving you the example something fires and you're like I don't I don't know what this alert's for like oh that's like one of our other sort of like sub teams.
Okay. Well, I'm on call. I have to go figure it out. And so you start looking around. Oh, bad move. Bad move. Okay. Well, let's not do that ever again. Crazy. Um, yeah, I'm in the fast lane. Someone decided they also wanted to be in one of the fast lanes, but uh decided they were not going to go even the speed limit. So, that was cool. Um, what was I saying before everyone almost died? Alert. Oh, dashboards. Yeah. So, we got these awesome dashboards, but if you're like, I don't know this space, like, okay, well, there's like three or four spots we could have dashboards, and you don't even know what to look for in the first place. You're like, you know, you start looking and you're checking out one one spot, and you're like, I don't know, man. Like, there's nothing good here. But you don't realize the other sub team like has really good charts and dashboards in this other spot.
and then you kind of do some of your on call and you're getting frustrated and then someone's like oh did why didn't you just look at this dashboard and you're like oh my god like oh like I wish I knew so like we've been doing a lot of work to like document this stuff make it more obvious for all call engineers but the surface area is huge so discoverability is a challenge and in my mind the more agents we have that are like single purpose we're going to run into a discovery challenge where it's like, well, if I'm already talking with this agent, why do I have to go talk to this one or this one or this one? Like, now my time is being split across all these different conversations. Like, we're just going to have disperate data challenges in a weird format.
So, um I I want to make a bit more of a push for like, you know, build your tool and then if it's a you know, whether that's an MCP server, whatever else. Build things in a way that we can have agents hook up to them. So, we don't have to go add one more agent every time. We already got agents. The agent should, in my opinion, should not be the vehicle for every single thing. So, um trying to do a bit of a push on that. But um you know one of the challenges with this kind of thing is like is uh is like the the scale of the organization, right? So I'm not the person like in charge of that for what it's worth. Um I'm not I'd love to be able to sit here and say like I'm making this decision for like thousands of people.
I'm not. Um, and the way that this is going right now is like on purpose. We've kind of been put in this position where it's like there isn't just one right way because we are going through some growth this, right? There's a lot of stuff that's that's rapidly evolving. So, we've been kind of given some freedom to be like, hey, like find find things that are working, experiment, explore. So, that's really cool. Um and I I feel that we're reaching this point where it's like okay like we have been doing a lot of experimenting. How do we start how do we start unifying a little bit more right? We should continue to experiment explore learn lessons but I think that there are lots of lessons being learned that we have to start u pulling together. So when you think about the size of the org that we're working in and like again for transparency I work at Microsoft.
I don't work in Azure. I don't work in Xbox. I don't work at LinkedIn. I don't work in one of the million different business areas of Microsoft. I mean sorry I don't work in all of those. I I work in one of them, right? Like I work in Office 365 or Microsoft 365, right? So, and then within that I I work on one team within Microsoft 365. It's a big org and um I'm just trying to to I guess like get on the same page as some of the other engineering managers that are also kind of observing this and being like how how are you looking at solving this problem because we should start to partner on this to make sure that like let's start sharing our lessons and sharing our our common approach because some teams are having a lot of success with it because they have started to find an approach where they're like, "Oh, hell yeah." Like, "This is working for us." And that's super cool, super awesome.
And instead of having every other team, like I know there's multiple teams like this, but instead of having all these other teams like start from scratch, these other teams have already been kicking ass at it. So like, you know, give them some more clear direction, right? So as an example, it's like you want to have an MCP server. Okay. What like where does that go? Like can we just check that in to which repository? Any repository. Um or like in some cases it's like we need to run a service for something. It's like okay well what's the expectation for that? Do we do we do it the same way that we do all of our other services? So, like, do I go through an onboarding process that's the same as the other services that we run? Um, if so, like great. Like, I used to work on the I used to manage one of the sub teams that did deployment of those services.
Like, cool. If that's the strategy, no worries. Like, I know who to go talk to. We'll get it going. But I I actually don't know if if that's been established yet or if um if some of the teams having success with this stuff have started leaning into some of those patterns. So, I'm just trying to go through some discovery with that. And um yeah, kind of excited because I think that that's one of the limiting factors right now is that when people start experimenting, they're hitting some roadblocks and they're going, "Cool, if I want to make this the real deal, like I actually don't know what I'm allowed to go do because there's a ton of security stuff. There's a ton of privacy stuff. There's a there's a ton of what feels like red tape if you're someone like me that comes from, you know, startup world.
I'm always like that's where my heart is, right? Like I'm always going to think of myself as someone who operates in a in a tiny company. It's just like that's a very natural way for me to go look at things. And so in comparison, there's tons and tons of red tape. Is it warranted? Like absolutely yes. because I work in a space now where like literally security is the thing that is top of mind. So I I have a lot different perspective that I'm trying to to balance out because previously I'm like oh like all this red tape like this sucks like everything like it makes it feel so slow. It's such a pain in the butt. But like I have more and more perspective as to why there's certain types of red tape. Obviously there's some things where I'm like this just feels like red tape cuz we're a big company and things move slower.
But uh in other cases I'm like wait like if we don't have um you know I'm just imagining this right like what we haven't established security guidelines for that. Okay. Well, what like what are you expecting is going to happen? Cuz we have crazy crazy amounts of security guidelines for everything else. So, like we just don't for this. Like that doesn't seem right. So, I'm trying to to figure out like where this stuff is so that I can talk to my team about it. I can say, "Hey, you want to go build something like this? No problem. I can put you in the right direction." Um, so I was talking with some people from my previous team. They've had they've been having a lot of success with um with building some tooling out where um you know partner teams can do even more like self-service. You want answers to things, you know, you can use some of what this other team is building.
And I'm like, hey, that's so cool because we've been having this really big push to make our on call lives better. team has done such a tremendous job on this kind of stuff like just you know I can't go into like obviously specific details but there's been different points in time where on call rotations were so hectic um and like not because everything's actually on fire but because like the way that alerts are coming in or the way that people can't find information or um just like this compounding effect of like you know you have a lot of noisy alerts so these things keep firing you can't even start investigating because by the time you start there's another one going like just nuts and the team has done such a good job to improve this kind of thing that like it shifts where your um it shifts where it feels like you have a pain point.
So instead of being like, "Oh my god, my phone won't stop buzzing because I have, you know, a million alerts." It's like, "No, I don't I don't have a million alerts." Like, I've been on call. This is Wednesday now. I think I've had two two paging alerts and I think that both of them were literally just partners that were like, I have a question and like I need to talk to an on call engineer to get some information. So even that's a really good example of this pattern shifting where it's like it's not oh my goodness, everything's on fire. It's like there's people that have questions they need answered. So what I experienced on my previous team was and they don't they already been doing this before I joined, but it's like you got questions that need answered. Cool. Let us put that information in front of you.
Here's where you go find the answers to that stuff. Right? They got dashboards. They got spots where you can go look. So as a deployment team, you want to know how long it's going to take to have something deployed, go look here. you want to know uh what version of the software has, you know, your your fix in it or your feature change, go look here, right? Like they have all of these things for you to go answer those questions for yourself. And they they were expanding on them, building on them because there's always going to be questions that people have. So great, go build tools to to give them the answers. And I feel like for us that's a very natural progression because now that's starting to surface. It's like I don't I'm not getting paged every second. But I am getting lots of people asking me questions and rightfully so, right?
They they have services. If something's not working from their perspective and we're the routing plane, they're like, "I have a request that's being sent. It doesn't seem to work." Cool. That doesn't mean it's our fault. That doesn't mean it's the routing plane's fault, but it does mean that we're in a position where we can try and help them, right? And when you're like I just to give you some perspective, if you're an on call engineer and you're completely overloaded with like, you know, I'm being paged non-stop. It seems like everything's on fire. I have to jump from thing to thing just to to see if something's on fire. Like when someone reaches out and asks you a question, you're kind of like, "Dude, I'm I'm busy putting out fires." Like, I I can't help you. Go away. Because you literally don't have time to do it.
Why is it so packaged up here? You guys got to let me in. You got to let me in. This is nuts. Um, so like that's that's not happening now. are not drowning in paging alerts, but people still have questions. So, that's why I think it's a really good natural progression for us where it's like, okay, we want to invest more into AI tooling. People have questions like this seems like a perfect fit for us to go, cool, how do we make tools for you to go answer your questions? And like, we are the first guinea pigs of that, right? We're on call. We have to go answer these questions. There's parts of our service that like I don't work with all the time. I need to go check dashboards and go ask other people on the team what something means and look around and blah blah blah.
But if we have some better AI tooling, instead of me having to go find the right like TSG, go find the right dashboards, go run these commands, like I cannot wait to go ask AI and not bother someone. Like that's going to be awesome. And then from there, we can evolve that into like, you know, partners asking us for information or how to fix something or what's going on. And then as an on call we're like cool let me go talk to you know co-pilot over here. Once that's working well we can just go hey like you know we have that functionality built into into co-pilot tools. Go ask co-pilot. And uh I feel like over time just building out more tooling like that so that you can get questions answered. I can't wait for that. I think that's going to be so helpful. for for everyone, right?
Like it's consistent ways to get information. Um you free up people's time, right? On call engineers can really focus on things that are on fire that we don't have answers for instead of like let me go repeat the steps for something where, you know, it's like here's a step-by-step guide, go follow it to get the answer. Like why should I have to do that? If it's a step-by-step guide to go investigate it, like co-pilot can go do that. Like that's how I see it. So yeah, I I think that's like that's a really big top of mind thing for me like at work. What is going on, buddy? That is not how that works. That person just made a a single right turn lane that's already doubled up into three lanes. Interesting. Hello, Mercedes. I see you. Um, yeah, that's a big that's a big focus at work.
Uh, you know, outside of work brand ghost, we are this week especially just like really trying to double down into uh, you know, reaching out to to creators, people that do social media work and and do like cold outreach, right? We know we have to do it. I think can't remember when I put the video out for it. Maybe it's not even posted. I don't know. I lose track of time. But, um, I put a I did talk about it in a video. I think it was my work in progress video. I think that's actually one that went out today or yesterday, something. And uh was talking about, you know, what we need to be focusing on is more outreach. So, you know, we we talked about it and like we're doing it.
So, we just got to keep that momentum cuz like I the thing I get concerned about is like we feel that pain, we do it and then we're like, "Okay, we did it." Like, "That sucked, but we did it." Like, let's go back to let's go back to other things. And I I think we just have to stay in the habit of doing it. But yeah, there's a couple people we're talking with um to get some partnerships and stuff together that we're pretty excited about. So things are moving, right? Like it's um I don't know. It's like it's one of those weird things like I I'll I'll give you an example, right? So, it's going to sound completely random, but you know when you like see on social media, if you're someone who's like not super active or whatever, you don't go to the gym
a lot and maybe that's been on your mind where you're like, "Oh, like I should get in better shape." You like see like fitness influencers online and they're like, "Oh, like in one week you could have like shredded abs." And like, so I've been going to the gym for a long long time, like as long as I've been programming, so like over 20 years. And since I've been doing that, like I've done I've, you know, I've bulked up. I've been I'm 5'4. I've been 200 lb. I've been, you know, uh, on a bodybuilding stage. I've cut back down. So, like, I have a general idea what it's like to kind of change weight and and diet and all this kind of stuff. Uh, both directions for getting heavier and getting lighter. So, I see this kind of stuff and I'm like like it it doesn't phase me personally, but I'm like this is it's Like I know it's but whatever.
I'm not like it's not affecting me personally. I just I see it and I'm like, okay, like someone's selling something. But the comparison is like when I see business like people talking about business and they're like we got you know we had this idea and like overnight we we had like 10,000 customers. I'm like what how did 10,000 people even find out that you have a thing? Like that's not like a repeatable process. I don't if if 10,000 people found out you had a thing. because you happen to have some weird viral thing and like I I don't actually believe you. But part of me it's like because I haven't actually lived through that where I'm like, "Hey, I had, you know, I've had a business, it's had 10,000 users and blah blah blah." And like I know what it takes to get there. I see these things and I'm like, "Well, why not us?" Like why isn't that happening to us?
So, I have this weird comparison where like for for gym and fitness stuff, I'm like, "Oh, whatever. I know that's snake oil, blah, blah, blah." But for for stuff with business, I'm like, "Where's the where's the get-rich quick scheme?" Like, how do where's that path? Like, why don't where's the button to press for that? Um, you know, where where's the the pill to to get a thousand users? Um, it just it's kind of weird. You go ahead, buddy. I'll let you through. Um, I'll let you through. Oh my goodness. You're going to have an accident. That's what happens when you help too many people, right? Amazing. We almost got to see it, folks. And by we, I mean me. Um, you couldn't see it. Someone was trying to pull in and someone was trying to pull out and they went into the same lane and they had to, you know how it is.
Yeah. So, it's it's kind of interesting because it's it's a slow journey, but like I feel like it's one of those things like the reason a lot of people just aren't super successful doing this kind of stuff is cuz they're like, "Wow, that's hard. That takes a long time. Screw that." Right? But if you just don't stop doing it and you keep putting in the effort, it's like it's kind of like hard to it would be really difficult to not have some kind of success with it if you just kept putting in effort on like the right things, I guess. But I feel like you could still get a bunch of those things wrong. And if you put in effort for a prolonged period of time despite doing a lot of wrong things, you can still have success. So, I got to keep reminding myself like it's not going to happen overnight.
Keep chipping away at it. Lots to learn. So, I think that's it though, folks. Thanks for hanging out. I'm just about to pull onto my street here. A friendly reminder though, this channel is driven by your questions. So, if there's no questions, I go to Experience Dev subreddit. And if Experienced Dev subreddit sucks, you hear me blab. So, uh, send in your questions in the comments. And if you want to submit them anonymously, just go to codecommute.com. I will try my best to answer your software engineering and career development questions as best I can. You got to wait for me to back into my my spot here on fancy Wednesday. We did it. Yay. Cool. Okay, thanks folks. I will see you in the next video. Take care.
Frequently Asked Questions
These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.
- How can AI tools improve the experience of on-call engineers in large organizations?
- I believe AI tools can help on-call engineers by providing consistent ways to get information and freeing up their time. Instead of being overwhelmed by alerts, AI can help answer common questions or guide investigations, reducing the need to bother others. This allows engineers to focus on critical issues rather than repetitive tasks.
- What challenges arise from having multiple AI agents for different tasks in a development environment?
- From my experience, having many single-purpose AI agents can cause discovery challenges because engineers have to interact with multiple agents individually. This disperses data and complicates workflows. I advocate for building tools that allow agents to hook into a unified system rather than creating a new agent for every task.
- How do large organizations balance the need for innovation with security and privacy requirements when building new tools?
- In my work, I see that while experimentation is encouraged, there is significant red tape related to security and privacy that can slow down development. This is warranted because of the sensitive nature of the systems we work on. I try to understand the security guidelines so I can guide my team on what is allowed and help them navigate these requirements effectively.