From the ExperiencedDevs subreddit, this topic is all about failures in software engineering.
📄 Auto-Generated Transcript ▾
Transcript is auto-generated and may contain errors.
Hey folks, I'm just stuck in traffic, so figured we'd do a video. Um, there is a topic on experienced dev subreddit that was, uh, hey experienced devs, can you like share stories about complete failures? And I I didn't even read the whole post because in the snippet it was like, you know, this could be bad managers and whatever else. And I was like, ah, you know what? I don't I don't want to go if I can help it talk about like other things or other people being failures like that seems kind of shitty but I figured maybe if I can reflect on some stuff that I was involved in uh where there was a failure and especially if I'm responsible for it but the first example I was thinking of um goes back I don't know maybe it's almost like 10 years now uh probably something like that so prior to Microsoft um the last place I was working.
So, this one is not this is not my failure. I'm not going to I'm going to try not to point fingers or anything just explain the situation. But, um I'll talk about what happened, how we kind of got out of it and then I'll try to share uh some others that I was thinking about, but I don't know if I remember the specifics. So in this particular case um we had tried to expand our focus uh this is at a digital forensics company. to try to expand our focus in terms of our product offering. And so there was a project that was created and there was a decision made to sort of outsource this project and before anyone jumps in and like goes, "Oh, overseas work like whatever like it's terrible or whatever." Right? Wasn't even overseas or anything like that. It was uh in North America.
So there was work outsourced and um they were given a project and so one of the I mean there's a lot of I think missteps with this project which is very unfortunate um but it was outsourced. there was a you know a dollar amount budget associated with it and um kind of what would happen is like there'd be I don't know if it was like a monthly check-in or something but roughly say like a monthly check-in and like basically no progress was getting made every single month and it went on for like 6 months and I when I reflect on like what was going wrong I I think there's a lot of things that like just really stacked up to make this not uh successful unfortunately and um like one of which was the the actual project scope was like made very much in isolation.
So um when I have talked about this before or like I don't know this kind of idea of teams domain knowledge stuff like this like there's a lot to be said about um you know aside from just software engineering in general right like you're working building software for a particular domain whether that's like healthcare software in this case it was forensic software it could be uh like people doing I don't know uh CMS. It could be you have a sports platform. I don't know anything. There's domain knowledge that uh is associated with that, right? And I think it's actually if I'm saying this and you're like, well, what do you mean like I just kind of focus on writing code? Like my suggestion to you is uh you know try to get more in tune with the domain so that you write better software for that domain.
And so the project scope that was put together was done in isolation by someone in my opinion that doesn't make them a bad person, but they simply don't have the domain knowledge. And it doesn't mean they didn't try their best or didn't do it with best intentions, nothing like that. But in in my opinion, they simply do not have the domain knowledge to to define the scope of a project and hand it off like this for someone to go do and expect success. Right? This is a maybe a a silly comparison, but I I think it's kind of accurate, right? Think about AI tools that we have now, right? Like models are pretty damn good. Now, if you were to pick some language, some tech stack that you've never built in, and you went to uh, you know, copilot, CLI, cloud code, whatever you want,
and you said, "Hey, make me make me this thing, right?" And you try to explain it and almost do it like a oneshot or maybe with a little bit of iteration. Like, odds are it's not going to be perfect, right? It's there's there's just gaps. like there's things that you don't know to tell it. There's going to be decisions it makes that you don't know how to kind of bounce back and forth on and and make, you know, more informed decisions, course correct it, whatever. Um because you don't you don't even know what to do. And again, that's not a it's not an insult or a knock on someone. It's just like quite literally if you don't know, it's very difficult to to try counteracting a lot of that stuff. So I think the project scope was done in this way. So that wasn't really fair.
Um fair in terms of expecting success. Um I think there was some mishandling of like who was responsible for the project. So I think that shifted hands and in a way that the sort of air quotes like new owner um didn't realize they were supposed to be the one who was the accountable party in terms of like managing uh stakeholder interaction. So I think that was a really big misstep because maybe if uh if they were I don't know more clearly informed earlier on maybe that could have been course corrected you know right away and it would be like hold on like what the hell are we even trying to build? What is the expectation here? Let's get this thing back under control. Maybe that that could have like saved it, right? Uh but ultimately I think like even where it was outsourced to just the the team that was building it probably just wasn't the right team to be building such a set of capabilities to be totally honest.
Um I think there was just misunderstanding in terms of what needed to be built. So, in my opinion, this was this was on its way to being a complete and total failure because like there was simply no progress. Um, and so I'll talk now about like what was done and I just want to kind of repeat myself that I wanted to talk about this as a failure example without attaching it to like this is one person's fault because I don't think that that's I don't think that's a fair thing to do basically ever. Even if someone pressed the big red button that said take down production, I would say why is someone allowed to press a big red button that takes down production? Like who put the big red button there without the controls and the access rights or the the double approval system? Like who was part of that?
Right? I think we can always go back and say like, you know, there's more than one person involved to when we have uh you know, things that don't go right. So in this particular case, um there was a bit of moonlighting that was going on to be able to uh try and get this thing back on track. So what we had done was kind of went back to the drawing board and said like what what is it that we actually need? Let's let's talk about that because we're seeing what this team is being told to do. We're seeing that they're not making progress on it. Like there's literally no tangible progress after months. Um, so like what do we actually need? And we just started with like the very very very basics and um it's one of like this part I mean it told in more detail.
It's one of my favorite stories of my career. Um but I was working with a colleague like like I said kind of moonlighting to try and get something put together where we could say hey look like we have a working prototype. And um it was uh we were literally taking off in an airplane and we had our first success with it. Uh which is it was just super cool timing to be like you know taking off from the runway uh on this plane and you know running running this thing that we built for the first time and seeing it work and and then kind of having this holy moment like okay like we have at this moment we have literally proven that we can do what is the complicated part. Everything else from here is kind of like we we know how to do the rest.
Like we know that we're going to have to, you know, execute on putting a good uh, you know, UI and things together around it. We need to make sure that we're delivering on like a really good UX so that it's obvious for people to use. We're like, we know we can iterate on that and do it, but we've proven sort of like this technical uncertainty. Um, so we could basically then take that and go back and say, "Hey, look, like we're pretty sure that we should can this other project because uh instead of like sunk cost fallacy where it's like, oh, just ride it out and see what we get, it was like don't like cut our losses. Let's get out of that because it's not uh it's just not going to be a viable path." And uh it was really cool. um that ended up turning into something that I managed a team for for years after that.
Uh which was a just it was a really transformative opportunity that came out of something that was a failure unfortunately. So that was one example. Um I would I think more recently it's it's always hard to talk about failures. Not because like oh I' I've never failed nothing like that but I feel like anytime there's a failure there's like course correction there's like lessons learned and so I'm trying to reflect and being like was there something that failed like absolutely and was just even the one that I was just telling you I'm like hey look there's like a happy ending. Um, so it's hard for me to think about failures in that sense because I think there's always hopefully there's always something that comes after to make things better. Um, I was trying to think through this and I think there's some general examples and it's happened to me more than once, which is why I think it's worth talking about.
But, uh, so I I build something on the side called Brand Ghost. And if you've uh you know been around my content and stuff for a while, you'll know I talk about Brand Ghost from I think it's it's kind of helpful to hear like the software engineering side of some of the stuff that I also build on the side. And I use Brand Ghost to be able to publish a lot of my my content on social media platforms. So if you read posts on LinkedIn or Twitter or threads or Facebook, Instagram, wherever, um that's all posted with Brand Ghost. And so there was a few instances especi I don't know it's probably been fortunately knock on wood somewhere um there's been a few instances where I'm either doing some refactoring or some really big feature and I'm like I really need to make sure that I test the absolute out of this.
Like it's a live service you know there's paying users selfishly like I use it. I needed to work right. Um, there have been a few instances where I'm going through like as much rigor as I possibly can. Uh, in hindsight, I guess it's it's truly not enough, but I'm, you know, I have my my unit test like on the very narrow scope of things where I'm like, I really have to like drill into these details and make sure they work. Functional tests around that kind of stuff on top. So I could just test the be overall behavior without the specifics, the implementation details. There's different layers of these things, right? You know, running running code coverage to sanity check because I'm like, wait a second. Like I actually don't have tests on this other area. Like I actually don't feel confident. Like I really don't think it should touch this stuff, but I don't have proof.
So, I, you know, run code coverage, making sure that I have at least areas that I expect to be covered, properly covered, like I said, kind of doing everything I think I should, and then being like, okay, like I feel I feel good about this, right? Getting all my tests in place. And I, by the way, I I talk about those two types of tests at least because it's like um just completely different angles to try and get, you know, build confidence on the code that's being written and go to deploy and then like like server is just down like server doesn't come up. It's dead on arrival. Um, and I'm kind of there freaking out because it took out, and there obviously, don't get me wrong, there's like simple mitigations for this kind of stuff. Like, put the last replica back up, it's fine, right?
Um, so it's not truly catastrophic. But the panic sets in because I'm sitting there going, I just did all of this work to build this stuff out. I had so much confidence by the end of doing all this rigorous testing and and a lot of those tests, by the way, there's a lot of them that actually start up an entire server, start up an entire server and do like API calls to go end to end with a database on the other end of it. Like part of me is like I don't know what the hell else I could have done here, right? And so it's not that it's impossible to to mitigate and uh you know fix those scenarios uh thankfully. But you have this moment well there's two feelings that come up.
It's like panic because oh I have to get things back on track and then this like this really sinking feeling of like how is it possible that I could have put so much effort into this and like it doesn't even start. That makes no sense to me, right? Like it's really really frustrating. And what it turns out to be a lot of the time is that uh it truly is edge cases that have to do with configuration. And because when I'm running in a test environment, uh I I miss like, you know, I I don't have a real resource hooked up here and I'm missing a configuration value or or something like that. and uh the real resource uh was not configured right like I missed sorry the resource might be running and it's it's all good but um how my configuration looks in production
is not doing the right thing and it happens so early in startup that it gets missed and so it's like the tests are actually exercising code paths like that but the configuration was wrong so that it ends up blowing up on startup. And so, you know, like I said, there's lots of different things that could be done, right? You can have like a canary kind of thing. Um, just there's so many different uh uh approaches that could be taken. But it's to me it's a good example of failure because it's number one happened a couple of times and number two like if if I get this embarrassing feeling like as I'm explaining it to you I'm like I I feel embarrassed right like been doing this for you know 20 plus years uh and uh still like you know trying to do all the the things that I'm supposed to and still making stupid mistakes and It's uh it's unfortunately life of a software engineer.
So there's always going to be mistakes. We can't uh well in my opinion we shouldn't approach things like how do we how do we eliminate all possible mistakes. It's like you can you should strive towards that but don't make that like your only goal because there will be that breaks. There will be. And sometimes there's that breaks completely outside of your control. So it's not about how to eliminate every possible failure. It's how do you manage when there is failure, right? And that's kind of just one of the takeaways that I I want to provide for this video is like whether it's a a process, whether it's a you know something in code, whether that's how you architected something, whether that's how you set up a project to get delivered for a business, right? This can look so many different ways. It could be uh you know, managing people, right?
you or you're trying something out and you're realizing like your coaching strategy or whatever else you're doing is not the right thing, you fail at it. It's not the end of the world. And I think the most important part is that like you try to learn and improve. So that's just like a a takeaway that I'd like you to have. And so if you're able to kind of think back on things that you, you know, failed at, how did you like how did you move forward from it? What did you learn from? I there's a a thing that I I post semi-regularly and um it makes more sense when it's written cuz when I go to say it, it might sound kind of silly, but um I write failure out in two different ways. And I say capital F failures are ones where you truly screw up.
They're ones that you truly screw up. There's no lesson learned and sometimes they are capital F failures because you repeat them. They are truly something that gets screwed up and there's no action that's ever taken to get better and learn from it. And then I would say little failures are all of the times in between where something doesn't go as planned. And that's okay, right? We don't want them to happen. Of course, no one wants to have something not go as planned, but it's going to. It's always going to happen, right? We'd like to reduce them, but we need to accept it's always going to happen. And so if you can get better and better at learning from these types of things, then little failures, you know, something going wrong is not the end of the world. It's a learning opportunity. I know that probably sounds super cheesy, but like our entire lives are made up of little F failures.
And um you know the cool thing about little F failures and big F failures is if you're thinking back and you're like oh no like I got some here's an example of a big F failure, right? Like something went totally south. Maybe it's happened a second time, third time, whatever. You haven't learned from it. No one's learned from it. I don't know. You're doing the reflection now. Can you transform your big F failure into a little F failure? It's just a stepping stone. And I think we talked about psychological safety in some recent videos. Um I think it's really important to create environments where there's psychological safety especially around failure so that people people are able to sort of like take calculated risks. The goal is not to like be negligent and ridiculous, but if you're if every decision you make is like I must have 100% coveraged understanding of every possibility, you'll be paralyzed.
You'll never do anything because it's impossible to know everything. And so what you don't want to have happen is someone tries something, it doesn't go completely as expected and then they're like, "Well, I, you know, like they're basically shutting down cuz they're like, "Well, I don't want to I don't want to I don't want to do anything unless someone tells me to go do it." Exactly. Because then not my fault, right? I don't want to be the one who's who's always blamed. I'm on the chopping block. I'm fearing for my career or my progression. You you can't have environments like that because your you know your engagement your productivity as a team will will plummet and people won't want to be on the team. It's it's just it creates a really terrible environment. So instead, I think it's important to to not blame to make sure that when things don't go right.
It's like, cool, let's mitigate. Let's get back on track. And then like what can we all learn from it, right? It's and being vulnerable, too. It really takes people to be able to to kind of go out of their way and say like, hey, like here's an example of me screwing up, right? I'm still here. I'm I've you know, I'm have learned from it. I am learning from it. I'm hoping that I can help others learn from it. That's how we all get better. But, uh, yeah, I think sometimes we forget the vulnerability part and we all kind of sit around going like, as long as I'm not the the person to screw up, it's going to be okay. And, uh, I don't know. I hope people can kind of shift their perspective on that. curious to hear from you folks in the comments. Are there any failures you want to share?
Anything that you that you learned from that? And if not, if you're maybe up until this point the answer is no. And maybe doing some reflection, you can say, "Hey, like here is something I would do different." I think that would be super cool to hear. So, that said, leave that kind of stuff in the comments if you got it. Otherwise, if you got uh software engineering or career questions, happy to try my best to answer. And um yeah, uh if you're, you know, not uh not comfortable leaving public comments and you'd still like a video on the topic, go to codemute.com. I'm happy to try doing a video just like this on your topic. You'll be kept totally anonymous. Um even if you include like this company or names or whatever, like I I just won't say them because it doesn't make sense to unless you like specifically say that you want me to say your name or something like that, then that's fine.
But um I'm not going to go out of my way to um you know to to share details. I I the details help me understand the scenario, but uh no there's no benefit to me going out of my way my way sharing that stuff. So hope to try and help. Um but yeah, I think that's it for this video. So thanks folks for being here and I hope to see you in the next one. Take care.