How My Trip To Hawaii Helped With Idempotent Consumers!

Name: How My Trip To Hawaii Helped With Idempotent Consumers!
Uploaded: 2024-10-24T10:43:25.0000000+00:00
Duration: 34 min 23 s
Description: Is it still a commute if it's a one-off destination that happens to be not work-related? And if the purpose is instead vacation and the destination is Hawaii? Maybe not. But here I am to talk to you about some automatic Azure-scaling mishaps and how I needed to navigate an architectural fix while on vacation.

October 24, 2024• 35 views

vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware developerssoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinovlogging lifevlog lifeengineering managermanagerleadershipmsftsoftware engineering manageridempotent consumermessage queuesazure service bus

Is it still a commute if it's a one-off destination that happens to be not work-related? And if the purpose is instead vacation and the destination is Hawaii?

Maybe not.

But here I am to talk to you about some automatic Azure-scaling mishaps and how I needed to navigate an architectural fix while on vacation.

Watch on YouTube 🎧 Listen on Spotify ← All Videos

Transcript is auto-generated and may contain errors.

all right so this is a special edition of code commute because I'm not actually commuting at all I'm actually sitting here in Hawaii you can probably hear some of the ocean wait for it it's there um yeah my my wife has gone to bed she usually goes to bed before me so I figured I might as well get a little episode in um because it's kind of fun um being on vacation I don't do this that often uh now she's the vacation planner I'm the uh unfortunately the workaholic so I'm always working I never really think about vacation but fortunately she does that part for us um no but I had some stuff going on and I figured it'd be good to talk about because it's software engineering related and uh it's just related to stuff that's going on in general so uh I wanted

to talk about brand ghost so um pretty common for me to talk about that is a focal point but I think like one of the things I wanted to start out by saying is like from a time management perspective I to do a lot of stuff on social media right now I'm doing this before I go to sleep here I'm on vacation like this is enjoyable for me right it's not really like work I didn't have to do this there's there's you know nothing forcing me to do it I'm doing it because I enjoy it so I think finding stuff like that is important and if you want to spend time on it like it might look like work or a chore to other people like I'm just sitting here talking to a camera in Hawaii I'm not going to edit this or spend much

time on it so it's not a big deal uh so finding stuff like this is enjoyable for me to do but um in general keeping up with social media and content posting right it's a pain in the butt um so I am thankful that I am putting together brand ghost it saves me and a spoiler alert so I'm going to make a post on there's a there's a there's a point for me explaining all this by the way but I'm going to make a post on Friday uh explaining that all of the like all of the posts that people have been seeing across every social media platform and I can count them all up through brand ghost uh I'm going to say like every single one of these posts that you've seen me do like has been completely scheduled automatically through brand ghost stuff that

I've written but it's put up through brand ghost and I didn't have to do a single thing while on vacation I got to sit in the Sun get a crazy oh I have I have sleeves on right now but I got a crazy burn on my right side of my body from being a dummy um like all of it it's just done automatically for me so um and then that post I'm going to make I'm going to schedule through brand ghost as our oneoff posting feature uh because most of brand ghost is like these topic streames where there's recurring content so I'll do that through brand ghost as well but through a a scheduled post and um there you have it right like I'm able to completely do all of my social media content while on vacation and and and to be transparent like commenting

and engaging on stuff is important um you'll hear people say it like ghost and post if you do that and you don't engage with stuff your content will feel like it's dead um so you do need to be doing it but what's cool is I can make a post I have notifications on my phone if I got a couple minutes here and there I can go engaging on stuff instead of like you'll hear other like really big creators and stuff talking about how they carve out x amount of time for their post like yeah makes sense if that's your job like it's not my job and I don't want it to be so brand ghost has really saved me for that but brand GH was also a pain in the butt uh and I want to explain this because it's going to be the uh

what's it called the newsletter article I write about I feel like I'm like talking like I'm drunk but I'm not intoxicated at all I'm just tired I guess but been in the Sun a lot um so it's going to be a newsletter article because I think it's kind of interesting and the idea that I'm going to talk about in the newsletter article is that like how code becomes complex we start off with things that are like you know Green feel like brand ghost is something that I wrote from from scratch so how could I be introducing all this complexity and and Tech debt and like you know why is the code difficult to navigate in certain parts even though it's so new and it's because of things like building in um like Edge case handling right you have a bug and you're like oh I

have to make sure I can prevent this and what ends up happening is that we're not solving we truly don't end up solving root causes we put in special case handling for things and then later on we solve a root cause and then we're like well what's all this you know you forget about you're like well what's all this extra code in here for and it's like oh well there was this problem so like what do you need that code still or like can you get rid of it you don't really know um so I want to talk about that in the newsletter but this situation we had um I want to talk about it because I was trying to describe it to my wife it feels like I was trying to describe things that feel frustrated versus embarrassing for me and this is one

that and when we were analyzing it cuz she's uh you know got a psychology background and it's really interesting to talk about this stuff with her I was trying to give examples of things that were frustrating versus embarrassing and I felt like I had a difficult time trying to tell her which things felt frustrating versus embarrassing and why but to me they stuck out like they they felt a particular way and I said this example from today was embarrassing not like it was frustrating yes but there was more embarrassment associated with it so what had happened was that um I had actually fixed an issue from the weekend that we encountered where um our scheduler was able to post multiple times and for me that's a the some background here like that's a huge huge huge problem um I have had Open tickets with zapier

which is an automation platform for issues like this for 10 months where they are through some of their automation when they make web requests they have a feature to retry things that I do not have enabled and when they get anything except a 200 back they automatically retry it but they don't explain it their tech support doesn't understand why like I know exactly why they have retry code in there and it's turned on and it should anyway um like literally for 10 months I've had support tickets open and they've never addressed anything so over the weekend we had this situation come up where posts had been scheduled multiple times and I'm going like this is impossible like I have written the code such that it cannot happen and I'm going to explain how it's possible and why this is so frustrating and then it's the

embarrassing part just to explain this cuz I already touched on the psychology part but the embarrassing part is because like I think there's an expectation of me and I have it of myself that I shouldn't have issues like this or the software shouldn't have issues like this right like I I am better than that so it feels embarrassing that I am I feel like I'm letting others down because of the expectation they have on me and that I have on myself that's the embarrassing part it's frustrating because it's happening in the first place but that's just wanted to touch on that but so I'm confident that I have code that prevents this but there's a critical assumption that it makes and I'm going to explain this in just a second so um when we do scheduling we're essentially looking at what's like a Cron job

almost right so there's entries in a database that indicate the days and the times kind of like week days and times Associated within those days to to make a post for a particular topic stream it's not rocket surgery and that thing wakes up and periodically checks like hey do I need to schedule something and it works it's fine it's tested it's got good functional tests on it that that tweak a lot of variables it shouldn't even be possible in production but they're kind of like edge cases that if something did go that it would be able to handle it accordingly so i' I've written those and felt very confident about them so over the weekend when someone said hey this is happening I'm going like no like something's up here because I am so confident about this but they're not wrong like I had screenshots

from them and I'm shaking my head going like okay like what am I missing here um so I went looking through uh through logs in azure and I I'd seen the logs like from our service it's saying like yep I'm scheduling this thing and then right after I'm scheduling this thing and I'm going like no like I know this can't happen and then it dawned upon me like I'm not wrong it can't happen but there's an assumption there's an assumption sorry and the assumption is that there's one instance running and the reality is there was more than one instance running this happened before in a different way and the way it happened before was in a single instance the dependency injection container had started up in the same process two Serv like two background Services they call them or hosted services in aspor so one

process spawned two hosted Services internally and that's originally why I put this protection in but it doesn't work when you go cross process and I didn't realize I had no idea we had automatic scaling because we never set it up it just does it so this entire time under the assumption that we don't have automatic scaling and that currently I know this isn't like a long-term thing but my assumption was currently we only have one instance ever of our backend running one ever so I wrote auler Such that that's the assumption but it's not the reality so the way I fixed this over the weekend was I was like oh crap okay so I looked at the Azure settings and I said I see that the max number of replicas that it can have was like three or four or something so I said no

it's one the max you can have is one um because I had proven at that point that there were multiple replicas running at the same time and like I said I had written code in a way that assumes impossible so that's the first part I wanted to talk about now this morning I woke up and we had some messages that said hey not only did post not go out so that's not good post didn't go out but what had also happened was that there were uh some situations where things had posted up to four times which is nuts um my personal opinion is that if this stuff is going to fail I would much rather have a post not go out than have say across for me in some cases it's across 10 social media platforms if you triple post a video across 10 social

media platforms number one that looks ridiculously embarrassing as a user number two it's a pain in the ass to go delete all that stuff off of your your profiles like you might as well have just done it manually in the first place if it were to not post at all okay whatever like go post right it's not the end of the world but it's it's significantly more negatively impactful when you multi- poost the same thing especially if it's different content it's still pretty gross because it's very spammy but when it's the exact same thing man it's like it literally when I see it happen CU this was happening with zapier was like I feel sick to my stomach like it feels so embarrassing to have a social profile where it's just like blasting out the same thing three times like man so got these screenshots

coming in and I'm going no way man it what else could it possibly be right like I just addressed this so I'm looking at everything and then I'm going okay like I have to go back to what I checked before I like I know this pattern when I look at the logs and I see that things are literally at the same timestamps to explain within a given process the way that I have this code set up is it literally checks a timed cach okay so there's a last recently used like eviction strategy with a Time on it and if I encounter that you're trying to post the same topic stream within a given period of time it will just stop and it will log it but it will stop it from happening so when it happens I know I know it's impossible unless unless there's

multiple processes running so I went back to the replication settings and it was up to 10 now it was allowed to scale up to 10 and I'm going this is impossible I messaged the other guys and I said guys like I don't know who's touching this but like we can't we cannot have more than a single instance like it was never designed that way no one planned for it yet right we know this is a scale thing in the future but we don't need to scale that up right now so I'm looking and what I had failed to realize in Azure is that there's this by default this HTTP scale Rule and I misunderstood it I guess so you can't get rid of you have to have at least one rule there you can't get rid of it you can change it you can't get

rid of it though and I thought that when I was changing the settings so you can set the Min and the max number of replicas I thought by setting the max or even the Min in the max right I thought that screen would say hey we will automatically add more replicas up to your max if you hit this rule like if this rule triggers this will happen so if I set the max to one the max should be one but I think what's happening with this rule is it's going oh you hit this HTTP connection limit and I think it's only 10 it goes no problem we'll scale you up regardless of the limit I set because it the bug flew into my nose it put it up to 10 because no one else on the team touched it so I changed the HTTP scale

rule to say like like some absurd number of requests like simultaneously then scale us up but this is where we had to like like that's a that's a total hack by the way obviously um but the problem is I'm in Hawaii I don't want to be coding I told the guys that are working on this with me I said if I'm if I'm here coding in Hawaii to solve this like this will be the last vacation I take it's completely unfair to my wife right um but we like times like this I actually have downtime right it's I don't even know what time it is right now it doesn't matter um it's Hawaii time so I know I have a little bit of time before going to sleep so I could do this I got to talk about what I was coding because I had

time to code but the point is if I had to go find some other way to work around this I can't be watching like logs I can't be dealing with this kind of stuff while I'm on vacation and I'm not blaming them for the record um I'm just letting them know like we need to come up with some workarounds for while I'm gone um so the one like I mentioned was changing the scale rules in Azure I thought I had done that over the weekend apparently not the way that it needed to be done so like I've checked multiple times today they're still intact it hasn't gone up to 10 again so that's good the second part is that um so the only other time that we can have weird instance issues is that when we're deploying when we deploy the instance now it's Capp

to one means it has to come down which is fine this is what I thought was happening the entire time we've been working on brand ghost and it's not the case so instance comes down when we deploy no worries but we basically said look our front end code which is what they're working on while I'm gone that build and deploy is coupled to our backend build and deploy so I said we need to split this or else you guys are going to be stuck not being able to iterate because they were like oh it's okay like we'll just develop locally and we'll wait to push and I'm like is this not fair so we split our build like that's what they did today which is awesome this is why like working with these guys is so great because like we put our heads together and

then they know I'm gone they were gone last week so I'm gone this week and they just tackled it like they don't they're not guys that love working on build stuff I don't know many people that do I know a few but um they certainly don't love it but they got like they split our front end from our back end in terms of build got it all working just totally awesome and that means now they can go iterate they can push anything they want for front end code and it's not going to disrupt the back end so not only that so not only did we restore our agility to what we had before this weird issue that started happening more frequently not only did they restore it they actually improved it upon what we had before because four it would build and deploy both and

if they're just making front end changes this will be a fraction of the time not that it was it was like 5 minutes before now it'll be 2 minutes or something um so anyway super awesome but we've just like we're still working around issues right we've got our agility back for front end development we have a solution to prevent the scaling but if you think about it we actually want that scaling to happen like in an Ideal World we should be able to scale up but this is a sort of like an architectural decision I didn't want to jump into right away and it's because of complexity so fortunately because I designed stuff pretty modularly um I'm going to walk through kind of like the what I'm envisioning I haven't talked to one of the guys that I really want to spend time with uh

to architect like a queuing system cuz he's very good at it um but I want to walk through kind of like what I'm envisioning and then what the sort of the shortcut I took today after realizing some fun stuff so okay to start things off I already talked about how this schedule works so it's like a Coran job so a job kicks off background service kicks off I don't want to be don't want to be too confusing about this so kicks off periodically and looks hey do I need to schedule things from the last period Then what it does is adds those to a que but it's not crossprocess it's not a distributed queue because it doesn't need to be because it was only supposed to be one instance then there is a processor another background service hosted service that operates on that queue and

it says for every topic stream that's been scheduled I will now go basically process this to go make the post one at a time which has been working really well until this multiprocess things started coming up but this like I'm saying this is going to happen at some point like planned not erroneously we would need to plan to scale up this way and that's because when we have enough users we're going to want to make sure that we can cue everything up and go post it we will need to horizontally scale it totally makes sense so there's a couple challenges with this and the first one is that we have like a a publisher and a consumer on either side of this in-memory que but if we think about it that inmemory Q is probably something that we want to be distributed so that's easy

because there's a million Q Texs out there we're on Azure so makes sense to use um Azure service bus we like mass transit as an API so we'll probably have mass transit over top of azure service bus so so far so good what probably needs to happen and this is my thought process without talking to the other guy that'll likely architect all of it is that the way that we have our our hosted service looking at the queue sorry looking at the schedule what we'll likely do is we won't have to cap the number of instances of those running so I mean we don't want to right we don't want it to just be one it could be any number of them and ideally at least two if it's just one if that process goes down say for deployment if it goes down for a

window of time where it should be checking how do we make sure that it's not over some period of time where a post is supposed to go out so we need to either build better tracking for like for enough you know uh enough robustness to make sure that we can look at the right interval or if we have two of these things running and they can't they have mutual exclusion over their operation they can kind of round robin or one can just run until the other one dies off but the point is we always have a backup ready to go so as soon as one is not running the other can just step in and go so it's more of a redundancy thing not a processing throughput on the scheduling part and I got to be careful because I say scheduling I mean the thing

that looks at the schedule to activate a schedule okay so when that happens what we want to do is say okay now we need to go queue up these topic streams to be posted but because I want to make sure that we have protection if if for some reason the concurrency limits of this scheduler get broken okay so we could have two that are running and only one and they have mutual exclusion so only one ever goes what if that Mutual exclusion breaks and let's say we had up to five now we have five things that are trying to schedule looking at the the time intervals and sending up duplicate top ex streems to be scheduled so what I wanted to do is put in uh duplicate detection and aszure service bus has this mass transit has this via uh the message ID when you're

publishing a message so I can basically dup on an ID and a time stamp that it's supposed to be scheduled for and that that gives it uniqueness so we can use the q technology to try and guarantee uniqueness for what's being published but that's insufficient to guarantee uniqueness throughout the system or D duplication throughout the system right so we need to have item potent consumers and I hate saying that word because it's super awkward um item potent and the idea there is that you could blast any number of things into the queue so if you're not familiar with the uh what the word item potent means is that if you were to take an action there is no side effect of that action that would like compound so for example if you called a method and added one to a variable if you called it

a second time like that number would now go up by two if you called it a third time it would be up by three but so that is not item poed it has a side effect for subsequent times that you call it um if you if you called um delete on something and it deleted it and then you called delete again and it's already deleted so it just did nothing that would be item potent right like you can get away with calling it multiple times so we need our consumers of the queue to be item potent and that way all the other stuff Upstream could totally get bricked right they go Haywire they're saying go schedule a million things in the second and we could have n number of consumers reading off of this que going nope I've already seen that and like not even

nope I've already seen that it's nope we already seen that someone else has handled this one so um I kind of Drew all this out in a diagram I don't have it here with me but I have it on my computer so that when I'm I think the the other guys on vacation that's I'm hoping can architect the Q stuff but I had that good to go to talk to him about but then I had this realization because I was literally between everything we were doing today I was like how am I going to solve this problem problem what can I do and every solution I was coming up with was like you got to rewrite thiser you got to rewrite it um or you're going to have to wait to have a conversation with this guy to re architect it for cues and I'm

like you know something's bothering me and to be clear the scaling issue is addressed so far unless Azure goes nuts again so there is no issue currently it's just that I I want to address the code in a way that will that will work around this regardless of whatever Azure is deciding to do so we realized the the only part that we need like we want all the all the stuff I just talked about like we want a queuing tech in the middle um so that we can be distributed but the part that's missing that we'll have to do anyway is the item potent consumer part so it's actually a a very simple pattern to implement um and I love shouting out code opinion because Derek Co Martin is so awesome um it's funny because I was like okay like I don't want to reinvent

the wheel here I'm just going to search online to see like someone's someone's built something I can use out of the box for this right um this is one of those things I don't want to go reinvent I love Reinventing the wheel to learn about stuff not this not for something for consumers not when people are paying money not what I want to ship so I is search and of course uh code opinion Derek Co Martin has a has an article like right up at the top and I'm like perfect like I know I know Derek knows a ton about this stuff and he explains things very clearly and the pattern's super simple so it's you're effectively just utilizing database locks and uh and transactions and uh the idea is that when you're trying to insert a record into a database that is sort of

the marker for your your Q item however you want to indicate your Q item whether it's a unique identifier whatever if that has a unique key constraint what will happen is that if you're trying to go enter your method and you try to say hey let me go Mark that my method is going to take care of this it won't it won't be able to do it if you have a unique key constraint and it fails to be able to do it and the databases generally have like table locks on them so if you're using a transaction and trying to do this what will happen is that you say hey I want to go insert this like in your transaction I want to go insert This Record into the table and then you you can either carry out actions or whatever you want to do

but when you go to commit that transaction if it's all already someone else came through and finished before you then like you don't actually get to own it so when you go to commit the transaction it will throw and say duplicate key you roll back the transaction and then kind of leave your method so it's like having a guard Clause at the beginning of your method that helps to guarantee Mutual exclusion so in the pattern was so simple in fact that when I was looking at derk's article I was like I I don't understand at all how this guarantees anything until I realized the transaction part with the unique key constraint is like the the piece of the puzzle so it's like okay um and then I started reading more and uh I I can't remember the guy's name the author of mass transit he's

got a really funny GitHub handle it's like fat boy or something um I think his name's Chris man I'm draw a blank but anyway I saw him responding on his back over post to someone and he was saying like you need to go basically have an item potent consumer and follow this exact same pattern that Derrik explained so I'm like you know what it's simple enough to do oh a black cat just ran by supposed to be bad luck or something um and there's all these little lizards anyway um it was enough for me to go look like that's simple enough to do I just need a new new table like let me try this and I can try it all locally in testing infrastructure so I coated it up and like before he went to dinner I coated it up um added the new

code in and called it in the spot where it needed to be called because it's just one spot it has to be called ran the test nothing broke good sign so then I added test for all the new code and I did that right before I got on here so um the part that's missing is actually uh testing and I need to do it more in isolation um because I have like an end to endend test that exercises this but it does not exercise end to end what happens when I like spam stuff so I want to do it in a bit more of a targeted way um just different granularity of tests I have an end to end one like I said I have one that's kind of over this um this Q lock that I have that's sort of this new datab base

uh table that I'm calling um but now I want to make sure the part that uses that uses it properly like I said the tests that exist they're not breaking but that doesn't prove that it's handling the scenario I needed to handle so that will be probably tomorrow when there's some downtime Maybe at night and then I'll write about um sort of how your your code bases can really get messy over time just from trying to handle stuff like this right um so when I go to add the code I'm talking about and then rebuild out the queuing thing there's probably a lot of code that we want to consciously go delete and not go oh like I put that in to handle this scenario like yeah like that scenario might not even be physically possible anymore so we should talk about it cuz if

it is still physically possible sure leave it but I'm confident there's going to be a lot that's just like dude like you don't need that the the thing I talked about where I had this like last recently used que that was checking for if a topic stream was already added I can already tell you that doesn't need to exist once we do the queuing part especially because it's only within a process so anyway I think that's my I'm sorry you can't see the beach and stuff it's too dark but it is Hawaii let's see one sec probably can't see much of anything there's big bugs landing on me now I got to get inside and I will catch you guys later take care

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

How does Brand Ghost help with managing social media posts while on vacation?: I use Brand Ghost to completely automate all of my social media posts across every platform. While on vacation, I didn't have to do a single thing because all posts were scheduled automatically through Brand Ghost, allowing me to relax and enjoy my time without worrying about content posting.
What caused the issue of multiple posts being scheduled simultaneously in Brand Ghost?: The issue was caused by an incorrect assumption that only one instance of our backend was running. In reality, Azure was automatically scaling up to multiple instances, which broke the code's protection against duplicate scheduling because it was designed to work with a single instance only.
What is the solution to handle scaling and ensure idempotent consumers in the queuing system?: I plan to implement a distributed queue using Azure Service Bus with MassTransit and ensure the consumers are idempotent. This means consumers can safely process messages multiple times without side effects, using database locks and unique key constraints within transactions to guarantee mutual exclusion and prevent duplicate processing.