SH*T HIT THE FAN! - How Azure Almost Ended My Hawaiian Vacation

SH*T HIT THE FAN! - How Azure Almost Ended My Hawaiian Vacation

• 101 views
vlogvloggervloggingmercedesmercedes AMGMercedes AMG GTAMG GTbig techsoftware engineeringsoftware engineercar vlogvlogssoftware developmentsoftware developerssoftware engineersmicrosoftprogrammingtips for developerscareer in techfaangwork vlogdevleaderdev leadernick cosentinovlogging lifevlog lifeengineering managermanagerleadershipmsftsoftware engineering managerazureazure container appsazure app serviceazure container

Disaster struck Friday night when I was writing my @DevLeader newsletter.

I received an email from Azure that said our subscription had been disabled.

6 hours of timezone difference between me and the team, how would we get this back online and save our @BrandGhostAI customers from impact?

📄 Auto-Generated Transcript

Transcript is auto-generated and may contain errors.

hey folks it's Saturday night this is the last Saturday here in Hawaii we have a full day tomorrow and then we're uh we're headed home Monday morning early afternoon or something on Monday but full day of traveling on Monday um made another one of these videos couple days ago um talking about some interesting kind of challenges that came up with brand ghost and I wanted to to talk through another one and uh I think the the reason I want to make this is so that I can kind of bring visibility into I don't know like uh building stuff outside of work right I think um I think a lot of people well I shouldn't say a lot I don't know I don't have stats on it but I assume many people are eager to have something they can build on the side something they can

productize right something can monetize and uh that comes at a cost of course so H it's because there's a lot of work involved and I I feel like I've done a really good job this vacation not working on my 9 to-5 specifically um and that's good you know I made arranges with my team ahead of time and stuff and uh I'll have a lot of catch up and stuff to do when I'm back but that's okay but with brand ghost uh we had the hiccup earlier in the week basically like as soon as I got to Hawaii um realized that our Auto scaling stuff that we don't want on had turned back on um which I mean easy fix once you know what's happening like turn it back off but that kind of led us to let us to realize like obviously Auto scaling is

something we want we want to make sure that we can run multiple instances at some point but theer that we use theer itself is not intended to be multi- instance it was never designed that way never needed to be um until later on when we want to go have support for it we can but this kind of made us realize like should we be thinking about this sooner rather than later but obviously while I'm on vacation I'm not going to be coding up a whole new sort of uh queuing system for scheduling so anyway um I did have some downtime this week between like you know my wife's getting ready to do stuff or she wants to take a break before we go back out explore things or head to the beach uh so I had my head down trying to figure some stuff out

I talked about this in the last video but uh essentially uh the focus for us was kind of two two things one is that I was going to write Logic for item potent consumers um this way if we have anything Upstream going hectic so if we have Auto scaling happen again our scheduler is firing multiple things the consumers will basically handle erratic behavior and have to worry which is super cool um we're going to need this in the longer term solution anyway and I was able to chat with one of the guys back home um kind of put the problem space in front of him he's really talented with working uh with with systems like this that have queuing and um and such involved so when he gets some time he's going to have a look at it but basically I I wrote a solution

for this um I had enough time to go write it uh which was pretty simple um go write a whole bunch of tests but it was cool because I put it in place ran all the tests nothing broke and then um and then added tests specifically around this Behavior so um that's one thing uh but we don't want to go push that live I still want to make sure that I can be at home and and kind of do it safely so the problem that exists in the current state is like with the auto scaling and stuff off if uh if anyone wants to go push frontend code the frontend code is forcing front and backend builds and deploys all coupled together and that's something we don't want to have happen so if someone's trying to iterate on the front end and they push up

something to production what will happen is it will take down the back end as well and we only have one instance now because there's no scaling so we don't want to we want to basically schedule any downtime on the back end intelligently until we have the scaling back in place so kind of like these are just real real world things we're trying to work around so uh the guys back home split the front end build and deploy away from the back end which is great then get their agility back um and I talked about this I think last video where I said technically that gives them more agility than they've ever had in the front end because it will just go faster overall so great okay so ran into a really big issue last night um it was about 10:00 here in Hawaii and uh

I sat down to write my newsletter as I like to do um give myself you know about two hours to work through it I've got it pretty optimized now where I have a template and everything but um I just have to write the article so as long as I know what I'm talking about it's not it's not bad I give myself two hours so it's less than two hours to write it and then say like another 30 minutes to kind of um cross post it and schedule and stuff so anyway at 10:00 I guess it would have been my timing's off would have been 9:00 here sorry my apologies would have been 9:00 here because as soon as it was midnight PST which would have been 9:00 here um I got an email and it said uh that the Azure subscription had been disabled so

I said pardon me and I tried hitting our web API and it gave me a not found uh could not connect to our database could not uh connect to the front end so our entire Azure subscription got disabled at 9:00 last night and uh so the first thing going through my head is panic right so what had happened was it was due to billing and the credit card that's on file uh so to explain a little bit more I have some I have credits and then I have a credit card on file we had just touched over the credits and that meant that it had to go to the credit card problem was the credit card just expired now this credit card I need to offer a little bit more background here I originally from Canada and that's a Canadian credit card on file and

the reason it's a Canadian credit card on file is because an Azure subscription was created in Canada it will only take a Canadian credit card now you might be able to see where this is going but I don't live in Canada anymore and I have no more Canadian credit cards that I actively use and all of my Canadian credit cards still exist have a Us address attached to them so I have this credit card that expires and I go great let me go add another credit card and I go to add my business credit card onto there which should have been there in the first place will not work because it's not in Canada it's us-based and then I go oh no because if I go to add any of my Canadian credit cards they still have a Us address so I'm basically in this

position where our Azure subscriptions disabled and the only way to turn it back on is with a Canadian credit card which I do not have any credit cards with a Canadian address uhoh so if we think about the timing here the team is uh 6 hours off from me uh where I am in Hawaii so usually three hours difference now at 6 so 9:00 it's 3:00 a.m. for them so 3:00 a.m. for them which means every single other person on the team is not accessible I'm the only person awake and I'm the only person without a Canadian credit card so I try all my options there and then I say okay well I have to contact let me open up an Azure support ticket so I do that and I see that it's marked as a severity 3 I think is how they classify it

sorry if these waves are super loud it's pretty crazy the water's like 30 ft behind me um so it's like a severity C severity c means that they will uh address it during business hours this is a Friday at midnight and my entire business is stopped so that's not okay so I get on to Twitter and I engage with Azure support and I say hi big problem I need help uh and they weren't super helpful and that they couldn't do anything but I think that they did manage to get it escalated because I I did get a response on my uh support ticket which was ultimately not helpful and they just said hey uh like there's nothing that we can do like you have a subscription created in Canada you need a Canadian credit card the advice was ask a friend or a family member

for a credit card so I'm sitting on the couch in the condo here and I had just started my newsletter right and I started going through all of this stuff and I'm sitting there and I'm like I'm so frustrated that I'm almost in tears because there's absolutely nothing I can do um at all right uh I was online trying to see if I could get find Azure credits somehow like I need to do something to get this account unblocked and it felt uh felt pretty hopeless and like I said I was kind of just sitting there like holding my head and I'm like I I have I have no idea what to do and I wanted to share this because it's like because it's real right um we have paying customers and I'm sitting there going the solution's simple I need a credit card added

to the file like there's no there's no uh there's no technical challenge that can't be solved here I'm just isolated and don't I'm completely unable to make the change that we need um so yeah I ended up uh finishing my newsletter and stuff uh I had I contacted our customers and said basically hi we're experiencing an outage right now um which you know so was crappy to write um I don't have a better way to explain that it was like it felt terrible um it's one of those things where it be like I would love to just be quiet and hope that everything comes back online but I'm like are are everything is turned off and I would I would just much rather be proactive and let people know we're looking at it um but yeah embarrassing frustrated I felt helpless um like I'm on

I'm on vacation I'm on vacation and I'm like about to cry because because this whole thing is like collapsing around me and uh I got I got lucky though um so it would have been about what time would that be was 1:30 in the morning here so or 1: in the morning something I don't know it was like 7:00 a.m. or something on Saturday for back on East Coast or Eastern time and one of the other uh Engineers was up and uh you know again like I've talked about this before but like I'm I'm working with like a dream team in my opinion um it's uh it's amazing so you know did did even ask questions do anything it was immediately screenshot of them trying to enter their credit card information to get the subscription back online like that's what what I mean like by

like by dream team was just like you know had read through everything I was posting into chat didn't ask questions and was immediately just like I'm trying to get my credit card information into here so it's kind of funny it's uh uh how how things got solved here so the only person on the subscription that can actually update the credit card information is me so if I would have gone to bed like left them all this information gone to bed they couldn't have fixed it on their own so I had to get personal credit card information sent over to me so I could go add it while that was happening I was like oh like let me just change the role permissions and stuff couldn't do it I couldn't give this other engineer more like higher level privileges to try and update the credit card

because the whole subscription is disabled and by trying to change the user settings I need to use the subscription so it's like a crazy chicken in the egg problem um so anyway I add the credit card information subscription becomes active uh turns out even when I give them the proper role and stuff they don't have access to change the building stuff which is ridiculous I don't something's wrong there um so then the subscription comes back online and we're sitting there going what happens now um so for those of you that don't know um because I had never had to go through this before uh the mySQL database does come back online automatically which is great so I just had to wait there and it was kind of spinning waiting to come up no data loss nothing right um worked perfectly so the very first thing

that I was able to have happen was that I could like locally connect out to the database that's running in the cloud to prove it's working run queries against the database everything's great good that's critical because that's where user information saved and there are backups um there's a a running uh set of backups so we always have something to go back to if we need to but it was nice that we didn't have to worry about any data loss um but then we have the situation where our our container apps aren't running and I'm going okay well the database came back online automatically like what happens with the container apps so uh and the reason why this is important is because if they come back automatically and I start setting them up manually I don't want to run into this multi-instance bull crap again right

that was the whole reason we had issues earlier in the week uh I told you I haven't pushed up the code changes for the uh item potent consumer so um yeah it's just like what do we do uh turns out uh Azure support on Twitter said oh don't worry everything comes back within 24 hours you don't have have to touch anything but I'm like I'm not sitting here for 24 hours it's like 1:30 in the morning already like I'm not sitting here and waiting so uh we had to go recreate the replicas and stuff um I had to just you know triple check that I set the scaling stuff and everything properly we start up the replicas was funny as soon as I started up the uh the replica for the back end um immediately the old one I think this is like a UI

glitch or something but it showed the old one also started up at the same time so it was like as soon as I pressed start on mine the other one also said that it was starting up and I'm like no give me a break man it's not going to happen sure enough after like I don't know a couple minutes it uh I'm I'm pretty sure it was a UI bug because we only had the one backend instance running uh logs prove it which is good the front end instance was back up um and we were fully recovered so all of that like once we had the subscription back was probably like 15 to 20 minutes um so that was kind of like a good little practice for disaster recovery but man um I haven't felt so terrible in a long time uh it was bad

and yeah I I just like I said I wanted to share with you cuz it's like uh I haven't felt so helpless in I couldn't even tell you like it was enough like oh I'm like what I'm going to be how old am I now I'm 35 I'm going to be 36 in a few months like in April it be 36 and uh so a 36y old man sitting on a couch at you know uh whatever at the time around midnight or something around close to that and being like I just want to cry I have I have no other options I feel so frustrated um yeah I felt bad so um obviously lesson there is make sure that you're building information up to date I I'm completely dumbfounded like how how complicated it is to try and give someone money like I don't care

where the subscription is charge my credit card it's I'm sure there's a good reason for it but like I I don't know what it is and if I'm trying to throw money at you like catch it so yeah was bad but anyway fully recovered which is good um no none of the users said anything about post missing and all that uh and I think we're only down for what that was like four and a half hours so we do have people that are overseas in um in uh in Europe in India so if they had stuff that was scheduled kind of like for more closer to their time then perhaps some stuff got missed I didn't I didn't do a check um mainly because I don't think we have if there was stuff that was missed we're talking about like a couple of posts not

like not like we had like a thousand users that are going to be in that window of time that would have got screwed over um but yeah and then you know since then like everything everything that I needed to post today was going up everything's kind of running totally fine so no more weird scaling issues but yeah we had uh set up some building alerts and stuff like that and uh just trying to move forward but not a good experience um yeah anyway I wanted to share that like I said because I think it was unique maybe some some insight into kind of what happens behind the scenes when you're trying to run your own stuff but um it's uh probably it for me I'm going to head to bed and we'll have one big beach day tomorrow before we head out so looking forward

to that but thanks for tuning in and I'll see you next time

Frequently Asked Questions

These Q&A summaries are AI-generated from the video transcript and may not reflect my exact wording. Watch the video for the full context.

What caused my Azure subscription to be disabled during my vacation?
My Azure subscription was disabled because the credit card on file expired. Since the subscription was created in Canada, it only accepts Canadian credit cards, and I no longer have a valid Canadian credit card with a Canadian address, which caused the billing issue and subsequent disabling of the subscription.
How did I resolve the Azure subscription billing issue while on vacation?
I contacted Azure support and tried to escalate the issue, but the only solution was to add a valid Canadian credit card. Since I was the only person with permission to update the credit card and I didn't have a Canadian card, I had to get a team member to send me their Canadian credit card information so I could add it and reactivate the subscription.
What were the technical challenges faced after the Azure subscription was reactivated?
After reactivating the subscription, the MySQL database came back online automatically without data loss, which was great. However, the container apps did not restart automatically, so I had to manually recreate the replicas and ensure scaling was set up properly to avoid previous multi-instance issues we had encountered earlier.