177 points by MrBuddyCasino 5 days ago | 21 comments
JohnMakin 4 days ago
https://octopus.com/devops/reading-list/
Google's SRE book changed my career, although I know it's a little out of date now, it's well worth reading for the concepts involved, IMO.
This though:
> With DevOps, if you can automate it, you should automate it.
Everyone at some level in this field understands this. I would even go farther and say that almost everything is automatable depending on how much effort you're willing to put into it. However, lots of bad or overwhelmed devops shops I've consulted seem to be stuck in this insane hell-loop of manual processes not ever giving them "time" or "priority" to automate some of these processes and get them off the treadmill. Usually it takes a fair amount of heroics to get out of that, but I have specific approach to such situations that I've been using successfully for a few years now.
it's always important to remember "devops" is a completely loaded term that can mean drastically different things depending on organization.
bittermandel 4 days ago
I couldn't agree less with this. At this point the whole "DevOps" industry is fueled by consultancies who make a great living from convincing business leaders that this is true. Focusing on defining clear processes for recurring events and building the fundamental building blocks that allows you to automate when it's absolutely needed should be the method, not spending more time writing Terraform.
JohnMakin 4 days ago
For one, I don't really consider terraform "automation," and more IaC, but I'll digress - this is all well and good in mature organizations with robust processes and aligned leadership. In practice, however, and what I find most often, is you will find very small "devops" shops in companies that aren't necessarily "tech" sized 50-300 people with a devops team of 3-5 people (if they're lucky) that the organization, or sometimes even themselves, see as glorified IT sysadmins. They're always seen as an expense, usually critically understaffed, and if you leave teams like this to decide on their own what is "necessary" to automate you're going to get weird/misaligned/dysfunctional results, and even moreso if you let the business decide this, which is what usually happens, and they don't really give a crap if some poor former-sysadmin has to spend 12 hours a day clicking buttons in aws console as long as they get what they needed (actually have seen a guy making 150k to basically do just this).
So what happens, like I said, is teams get into this hell-loop of manual task after manual task, which not only requires large amounts of mental bandwidth to keep track of or keep up to date all the documentation or playbooks surrounding these manual tasks (if you're lucky to get even that), you have to deal with the inevitable mistakes and errors that are common when doing things strictly manually, which eats up a ton of unnecessary time and thus $$.
I agree though most devops consultants are terrible, and the industry is driven by this, however, this is the specific niche I've carved out for myself, coming in after big terrible crappy consultant that basically just pitches a brittle jenkins CI setup and some basic terraform and charges you $250k for their time. I actually really enjoy doing it too, and the challenges and issues are almost always unique to the org, even if the patterns are similar - so it's always interesting.
So, long story short, unless you have a super robust process and mature system, it's usually just a lot easier to default to "automate" and come up with reasonable exceptions when it doesn't make sense to do so, rather than the other way around.
hadlock 4 days ago
I worked at a traditional finance company and we had a team of 8 people in traditional operations and another 30 people doing manual testing around the clock to support about 20 developers, 10 network staff, plus another 20-30 managers or leads and security. We could only deploy once a week and there were always issues with "final check out" on sunday morning when hotfixes had to go in or config was modified.
JohnMakin 4 days ago
the funny thing is, the fintech company in the example you gave likely sees nothing wrong with this. I’ve seen cases where the release cycle is once a month or longer, similar team sizes, and they don’t think they have an issue and would probably laugh at you or look at you weird if you mentioned ci/cd.
anshulbhide 3 days ago
mdaniel 3 days ago
Also, there's another bit of nuance to that, as well as your overarching point about "automation isn't free," in that writing Terraform/Tofu isn't usually the long pole in that tent: debugging the raging PoS most certainly is (along with its associated https://xkcd.com/303/ of waiting for the "plan, attempt apply, puke, goto 1" loop)
And, in almost the exact same vein: writing any automation carries with it two downstream bits of work: monitoring the automation and having enough context to debug it when (WHEN) it falls over
MrBuddyCasino 3 days ago
mdaniel 3 days ago
There are so many great tools that solve so many problems but life is filled with trade-offs and many people don't value the same trade-offs that I do, so they just bash their head against Terraform (or $other_legacy_tool) because "it's what we use"
I was really hoping that Earthly or Dagger were going to catch on due to the enormous number of folks that complain about not being able to run GitHub Actions (or GLCI) locally, on top of bitching about yaml alllllllllll the fucking time. But, same problem, IMHO: inertia is so strong
MrBuddyCasino 3 days ago
JohnMakin 3 days ago
mdaniel 3 days ago
phaedrix 2 days ago
mdaniel 2 days ago
I would guess dev tooling usually also falls into the "nice to have," or as my former CEO used to say "vitamins vs painkillers"
MrBuddyCasino 3 days ago
roblh 4 days ago
JohnMakin 4 days ago
The time I've been stuck in these situations, it's mostly about inflicting or bringing notice to enough pain that the business backs off on some tighter deadlines to give more time to automate the tasks that will free up the most bandwidth or time - and it's a lot of small bites. I typically will start (if I can) by insisting anything new that makes sense to be automated (making a basic estimate of time per week, and effort to automate, also something you need to factor in is long term maintenance to use as a pitch to sell to management) and stick rigidly to that until it starts taking over other legacy processes in sort of a slow strangler pattern. This can look wildly different depending on the infrastructure setup and needs and how much firefighting you're doing - which there is usually a lot of. So sometimes that first step is just putting out all the fires or gaining enough visibility/monitoring to ensure you know what fires need to immediately be jumped on and which ones don't, and most importantly, proving that to the business.
Unfortunately though, leadership buy in (to me) is the hardest part almost always which is why I say "inflicting pain" the way I do, it sounds bad, but if they do not feel pain you will never, ever get priority to do anything, because IME like I said "devops" guys to most businesses (and even a lot of technical people) are glorified sysadmins that cost and demand way too much.
readthenotes1 4 days ago
drewcoo 4 days ago
It's as if when cooking there is only one good time to taste the soup.
popalchemist 3 days ago
ChoHag 4 days ago