How Do Software Engineers Have ANY IDEA What's Going On?!

How Do Software Engineers Have ANY IDEA What's Going On?!

• 290 views
vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinoengineering managerleadershipmsftsoftware developercode commutecodecommutecommuteredditreddit storiesreddit storyask redditaskredditaskreddit storiesredditorlinkedin

From the ExperiencedDevs subreddit, this developer wanted to understand how other developers navigate unknowns.

📄 Auto-Generated Transcript

Transcript is auto-generated and may contain errors.

Hey folks, we're going to go to the Experience Devs subreddit for this one. And uh this topic is about sort of what to do when you have no idea what's actually going on. It sounds kind of ridiculous. Um, no, but the framing for this was someone saying, "Hey, like you know, it feels basically like shameful as a as a software engineer when there's a problem or something is happening that you don't understand and like you just genuinely don't actually know what's going on or what to do in terms of making progress on it." And um I think the framing for this is really not like uh like how to go build a feature, but it's more like you have a you know let's say a live service that's running and something is happening and like it's not expected now you have to get to the bottom of what's going on.

That could be uh a live sight issue where there's, you know, something seems like it's on fire. And it could just be, you know, things are operating at steady state, but you're observing something in metrics or whatever else and you're like, wait a second, like why is that happening and how to get to the bottom of it? And it doesn't have to be a live service. It could be something that you have in uh a product that is, you know, downloaded, distributed, mobile app, whatever. um where something is going on uh that's unexpected behavior and you're just in this position of like I don't know what the hell is going on or where to start on this.

So I wanted to talk a little bit about how I approach some problems like this and uh I don't know I've like shared some of these thoughts before and I've definitely talked about them in person with people but I don't know if I've done a good job like it's not like formalized into a framework or anything like that. So, I'm hoping that me talking through this will help me put some thoughts together more clearly. But before I jump directly into that, I just wanted to talk about this like um this framing this person had around the word shame. And like I get it, right? So, they're they're saying uh I feel like there's like two parts to this post, which is like how do we navigate situations like this? Like is there some guidance for like ways that people approach?

But I think the other part that's important here is like this person's saying it feels shameful to be in a situation as a software engineer say with 5 years of experience and you like don't actually know what's up. And so I just wanted to take a moment to say like, hey, that's actually totally okay and normal to have situations where like you just don't know, right? You you are not expected. No one in the world expects you they shouldn't expect you to know to know everything or to have a solution to everything. But part of being an engineer is is being curious and trying to, you know, make progress on things and work towards a solution, work towards a better understanding. And so, you know, I I think there's there's something to be said about us being um proud as software engineers. There's something to be said about as like largely as a group of individuals.

I would say like software engineers are probably I don't have the data to prove it but statistically like intelligent people right statistically probably really good problem solvers all sorts of things like this and so you know not not having a solution or knowing what to do right away uh or running into a challenge where you're like I just I just don't know. Uh that's a wildly uncomfortable thing for for software engineers. And I just wanted to share that like that that's okay to happen right this is why we build software in teams one of the reasons why right you are not alone there are other people this is why diversity in skill sets experiences perspective can be tremendous because if everyone had the exact same you know outlook on things as you did and you get stuck Well, then your whole team is screwed because they think the exact same way as you, right?

So, it's, you know, this this expectation I think uh like I get it, but I I do think it's a lot of it self-inflicted. Uh it's probably a lot of just software engineering culture, but uh I just, you know, wanted to say out loud for anyone who's kind of felt this kind of shame of like, man, like I, you know, I should know this. Um I'm going to look stupid or whatever else. It's okay to not know. Um, I won't explain the the details of the scenario, but I was just uh talking with an employee of mine the other day who was doing some some live site investigation. So for for context like in Microsoft 365 like the the service area of what my team is responsible for is uh like it's a bit outrageous because there's so much traffic flowing through our system and it's impossible for for any single person to know all of the details about everything.

Now, should the people on the team over time become skilled at like investigating and finding out things and understanding? Absolutely. And that's that's what it's all about. But no single person's just going to know everything all the time. It's just there's the surface area is too big. So, we were talking about something that they were investigating and um they were kind of, you know, laying out some different things and how they're approaching it and uh so we like they were talking to me about it and like I also don't know the answer, right? Comparatively we're we're in different roles. I am, you know, multiple levels like above this person. They're a very intelligent engineer, very good engineer. Uh I like to think that I am, you know, quite qualified. Uh and between the two of us, we don't have the answer. There's no obvious answer.

Now, in moments like that, like for me as an engineering manager, I feel kind of bad cuz I'm like, man, I wish I wish I could, you know, I wish I could have the answer in my head and then like help steer my employee in the direction where they do some a little bit more investigation, learning, and then they're like, "Oh, I figured it out." And then I can be there to be like, "Hell yeah, like I'm I'm glad that we can give some guidance and like ultimately you learned." Like that's kind of my ideal situation when I'm helping people out. But I don't know everything, right? And like my boss doesn't expect me to know everything. My skip level doesn't expect me to know everything. And I think it's important we make space for that kind of stuff. But point is like I also kind of get that feeling, right?

I do have to remind myself, hey, like that's not the expectation. Um and at the end of the day, uh I think this is one of those things where if you if your goal is to make it such that you are never in that situation, I think that you'll ultimately fail at that goal. You will find yourself in situations where you do not know the obvious next step. And I just want to say that that is okay because you will try to make progress and ultimately that's all that we can ask for, right? Is that you put effort into trying to make progress. That's all. Okay. So, with that out of the way, um what is my framework uh for doing this kind of stuff? Uh this actually goes back to when I was doing digital forensics work and one of the challenges we had was that if there were customer reported issues.

So the digital forensics work we did by the way was um with software that could essentially scan devices at the time primarily just hard drives uh and basically go through the data on a hard drive and try to provide it in a structured way so that uh someone who is looking for digital evidence has uh for lack of better word just a report to go through to understand what's happened. And so it's basically two parts, a big search engine plus a uh an interactive dashboard. Okay. And so the challenge is that because it's forensics and this is stuff that's being used that is like highly private um literally criminal cases. There could be material on those devices that even if um we were given consent to see that it can't be replicated. This could be stuff that is related to um uh like child sexual abuse material, that kind of stuff.

Uh there could be all sorts of things. So there's it's really difficult to like if there's an issue like oh just like give me some data to reproduce it like a lot of the time not going to happen. Um, another thing is that because of how these workstations are set up for our users, uh, sometimes they'll be airgapped because of the the digital forensic nature of what's going on. They want to be separated from things. So, with that kind of framing in mind, when there's problems, debugging them could sometimes be an absolute cluster because someone's like, "Hey, it doesn't work. Uh, maybe here's a log file. and uh you're like, "Okay, uh I hope I hope there's an obvious thing in the stack trace, right?" And uh a lot of the time just not not the case.

So solving problems like this where you have there's kind of a combination of things like limited information um inability to reproduce like to repro the problem right there's there's these situations where trying to debug them and understand the root cause is extremely painful and so I think there's a lot to be said about when you run into things like this some type of reflection afterwards. It's like, okay, like if this were to happen again, is there any way that we could add better logging, better telemetry, better whatever? So, that's always a nice follow-up. And I think that you can and should do that. It just doesn't help you in the moment, right? Unless you're in a situation where you can make logging, telemetry, monitoring changes, have them rolled out, and then use that for better observation. That might be a thing that you have to get to because you truly don't have any other signal and you need it.

But, uh, my perspective is like usually that's more after the fact. So, how do we navigate stuff like this? My general framework is that um it's not just about proving something is right. And I think that's what I notice a lot of people trying to do when they're problem solving is like I have a hypothesis and so I'm going to go see if this is right. And then they'll try to go down a path and if it happens to be right, cool. They can keep progressing uh and keep kind of repeating. But if it happens to be wrong a lot of the time it's it's almost like it's dismissed and that can be frustrating for people of course because they're like oh I tried this it didn't it didn't get anywhere right is usually the what you hear right so we thought it might be

this we started investigating it's not that so we got nowhere and I think stepping back and going like realizing that disproving something is just as valuable as proving something's Right. And if we keep kind of stepping back from this, in my opinion, in these types of investigations, it's always about making like a a matrix about what is like known fact and what is still assumption and known fact. What's cool about this is that when you are testing a hypothesis, right, so you're like, okay, we see something in a log. We have a hypothesis. It might be whatever. when you try to go validate this, whether that's running, I don't know, uh, trying to reproduce something, whether that's going through code and like proving whether or not that's feasible, whether or not you have another example log that disproves the assumption. If you can prove a statement is true or false, I would highly recommend like putting it all down on a whiteboard on like on a shared digital space, whatever.

I loved using whiteboards for this in the office when we were still working more in person before. And I'd have a whiteboard just filled with like assumptions and then like basically proving them to be true or false. And what was cool about this was that it meant that there wasn't actually wasted effort when we were investigating because if you said I have an assumption, it's this like it's all about trying to get these assumptions either proven or disproven because while they remain an assumption, there's an unknown. And I realize this probably just sounds very like um I don't know abstract or not helpful yet, but uh I'm trying to get there. So the idea being that as you continue to iterate, I think one of the things that we often mess up on is that we go down a path trying to prove a hypothesis and then we realize we're almost like reproving something, right?

right? Like we go down this path and we're like, "Oh yeah, like this actually isn't possible because we've already proven like X and Y aren't we've already proven that those aren't real." Um, and so you can actually shortcircuit a lot of wasted effort by having this like matrix of hypotheses that are proven or disproven. And it also means that there's no dumb ideas, right? Right? And I think that's really helpful when you're in situations that are like no one knows what to do. Someone's like, "Hey, you know what? This is going to sound kind of crazy, but I'm going to mention it." Like, is there is there maybe something going on with the operating system? Maybe. And people like, "Oh, come on, man. Like, it's the operating system. What the hell do you like that's the last thing that's going to have an issue?" And it's like, "Well, maybe it is, right?" Like, I don't know.

So, is it that crazy? I don't know. Write it down. You You don't if you can't say as a matter of fact, it is not. It's an assumption that's still totally valid to go prove. And so, especially for us, right? Like we absolutely across tons and tons of machines, could there be something wrong with the operating system in a particular scenario? 100%. It's totally a valid thing to consider. Is it super super common? No. Is it valid? 100%. So if someone in an investigation that we were doing was like, "hm, I wonder if there's something weird about the operating system." Okay. Like where which machines are reproducing this issue? What operating system are they on? Um and so what you might notice again uh trying to build on this is that maybe there's correlation between things that you didn't see before. So, you might go through this and you're like, "Nope." You know, we see it across these different operating system versions.

It's not the operating system. Okay. Well, what about the um the hardware configuration? We have these different SKs of hardware. Is it is it hard like maybe a specific skew? Nope. Like we end up seeing it on, you know, variation A and B. Uh we don't seem to see it on C. So, we don't know. That's still open, right? Just because we haven't seen it doesn't mean it's not possible, but definitely A and B. And when you start drawing out these truths, they're no longer an assumption you have to try and maintain in your brain and like continue to guess at. You can start building what might look like a correlation. And so when you do this, you might say, "Oh, you know what? We did notice it on operating system versions uh X and Y and on hardware configurations A and B." And if you start looking through the data, you might say, "Wait a second, we only see it on these combinations.

Wait a second, we only ever see it with, you know, this version of the software or in this particular location in the world or with this particular kind of traffic. So you continue to build out a matrix of truth and the inverse of that is like you know things that are that are not true." point is that as this for me at least it's more of a visual thing where like the patterns start to emerge because you're no longer just trying to like list a bunch of assumptions and almost like guess at what to do. So I don't know if that makes sense but my framework is essentially like you know no stupid questions like everything is totally on the table. Uh if you can if you can prove it or disprove it write it down. Right? If you can prove or disprove, write it down.

And the reason I think that this frameworks framework works really well is that if you're in these situations where you're like, man, I have no idea what the hell to do or what's going on or what direction to take my my thinking is like, well, what makes you think that think uh sitting back and just trying to like pick the most obvious things is going to help? So truly having questions to go ask or hypotheses to go test that seem unusual can be super helpful, right? If someone said, "Oh, maybe it's the operating system version." And you're like, "That sounds kind of crazy. Like, why would we ever?" Maybe that's literally the best thing to go check because it's the last thing that you would have checked. And you also said you don't know what to do. So go prove it. go prove their hypothesis or disprove it.

So, I really like this approach because it um it literally starts making the investigation data driven and it goes from being like vibes and like I don't know what's going on. So inaction or just guessing into action and like experimentation and it starts forming up data and that way you can start to uh piece together you know what to go tackle next in terms of data andor do you start seeing correlations because a lot of the time you piece this all together and you start having these next level of like correlation between signals that you're observing and you've proven, right? Like you've proven with your hypotheses that like these are facts. So those next level uh correlations kind of come together. I don't know if that is all making sense, but hopefully that's uh not totally crazy sounding, but that is what I would recommend for people when you're like, I just don't know what to do is instead of doing nothing, start writing down hypotheses and proving or disproving them.

You might I should should have mentioned this. You might also notice that there's um maybe sometimes you're like, well, why would I check that? It's obviously that's obviously not a factor, right? But that's an assumption you have. Did you actually prove it? Cuz sometimes it's things like that. It's kind of like when you're I don't know if people have done this. Uh, you're like looking for your keys and you're like holding them, but you're like searching around the whole house and you're like, I can't find them anywhere and they're like in your hand. Used to do this a lot as a kid building Lego. I'm like, I just I can't they didn't pack the one part that I need. It's the next piece and the instructions and I'm like holding it the whole time. This kind of thing. It's a similar kind of experience. So, I hope that helps.

I don't know. It'd be maybe interesting to go through an example like that, but um that's how my my brain likes to tackle those types of problems. But you are not a a bad engineer because you don't know the next step to take. Um don't feel ashamed. It's a normal thing. The point is that you start trying and you start trying to investigate and you get curious. I think if you're curious and trying to explore the problem space, you'll you'll continue to make progress. So don't beat yourself up over it getting paralyzed by it. Accept that it's totally normal that you don't know everything. If people have that expectation of you, I don't know what those people are thinking cuz you can't have that expectation of anyone. It's kind of impossible. So, um, hope that helps. If you have questions about software engineering, career development, leave them below in the comments or go to codecute.com, write in anonymously.

I am happy to make a video response for you and try my best to share some perspective and hopefully it helps. See you in the next one.

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

How do you approach a problem when you have no idea what's going on with a live service?
I approach problems not just to prove something is right, but to build a matrix of what is known and what is an assumption. I test hypotheses by trying to prove or disprove them and I write it all down on a whiteboard or a shared space. Disproving a hypothesis is just as valuable as proving it.
What framework do you use to investigate difficult problems when you can't reproduce the issue or have limited signals?
I recognize that sometimes you can't reproduce the issue or you have limited information, and you may need better signals later. My framework is to write down hypotheses and prove or disprove them, using a whiteboard or shared space to track what's known and what's still an assumption. This turns the investigation from vibes into data-driven actions.
Why is it okay to not know the answer, and how do you cope with the expectation that you should know everything?
I remind myself that it's totally okay to not know and it's normal in engineering to have situations where you don't know what's going on. I believe the goal is to keep investigating and to be curious, focusing on making progress rather than feeling ashamed.