I’m an operations guy. I have been my whole professional career. I was an operations officer in SOCOM for the better part of a decade and for most of the last decade I’ve been doing some version of the same thing in the consumer software industry.
It’s hard for me to see problems through any other lens.
At it’s core, the art of operations is the activity of matching capacity with demand. There’s a X amount of a particular thing needed because there is Y amount of people/places/enemies who need it.
Figuring out exactly what the X and Y are isn’t really the art of it. That’s just math. The art of operations is understanding the levers that get pulled on either side of the equation to change the math. And understanding what forces move those levers and ensuring that, when the barbarian hordes are amassing at the gates, you’ve got the right levers in your hand to pull to address the ones that you don’t.
At the heart of the COVID-19 epidemic is an incredibly dire capacity problem. And the most common misconception that I’ve seen in the discussion about how to approach the pandemic, is a lack of understanding that capacity problems, at some point, cease to become a symptom of the problem and actually become the problem itself.
The impact from a mismatch of capacity to demand eventually consumes an entire operation. And once that happens, the train is off the track and it doesn’t get back on until demand subsides.
The only variable to control at that point is how many people who need something won’t get it. The grim assumption in pandemic is some higher than acceptable proportion of that group ends up dying.
Spending some time in the math of capacity planning, we can see quickly how shortages reach a tipping point and in pandemic how once that point has tipped, it’s so hard to get it back under control.
The entire acute health care capacity plan in America is based on one baseline assumption:
That an extremely small percentage of Americans will need acute care.
According to the Society for Critical Care Medicine (SCCM) there are about 97,000 ICU beds in America. At any given night, they are about 2/3 full. Which means the working forecast for ICU beds in America is grounded in the assumption that, at any given point in time, 99.98% of Americans will not need an ICU bed.
We have an excess capacity of about 32K beds. Which means that we can stretch our assumption to 99.97% of Americans not needing an ICU bed, without a problem.
Now here’s where the law of large numbers gets problematic for us in a hurry. Let’s say, for the sake of estimation, a hypothetical epidemic hits us and we can only really assume that, at any point in time, 99.9 percent of Americans can be depended on to not need an ICU bed.
This, .07% difference to the untrained, non operations forecaster eye, sounds like an extremely small number. In reality, represents a 400% spike in demand. And it results in a shortfall of 130 thousand ICU beds on any given night.
The clock never stops though. And now time is the enemy.
Say the hypothetical pandemic causing that spike lasts for 60 days. Then over that time, 7.8 million Americans who need acute care, will not have access to ICU beds. This shows how extremely small changes at the top of the demand funnel make for crushing outcomes at the bottom when you have a population of 330M. Based on this estimation, I can describe this problem two ways accurately:
This hypothetical virus will result in less than one tenth of one tenth of one percent increase in need for acute care.
Over the next two months, about 8 million Americans infected with the hypothetical virus who need acute care, will not get it.
The ops guy in me sees the problem the second way immediately and knows that I’ve got a problem. And my only hope is to address both sides of the equation. I need more capacity AND I need less demand. It will be insufficient to try to simply increase capacity. Because the good guys are playing the game with resource dependent arithmetic growth. The bad guys are playing with resource independent exponential growth.
Let’s unpack that.
When I make a new ventilator or a test or an ICU bed, the existence of that ventilator or test or ICU bed does nothing to increase my capability to to make more. On the contrary, it diminishes my capacity to make more because each one uses some amount of finite resources.
Right now the nation that put a man on the moon with slide rules and pencils doesn’t have enough long Q-tips to make COVID19 tests. This is the reality of where we are on increasing capacity.
On the other side of the fence, every person who catches the hypothetical virus can effortlessly spread the virus and infect countless others. The only resource the virus requires is an uninfected human. This is exponential growth. And it gets out of hand in a hurry. A penny that doubles in value every day is worth $5 million at the end of a month.
The hard part about handling exponential demand growth with arithmetic supply growth is that once you’ve fallen behind, it is impossible to catch up. In fact it’s impossible not to rapidly fall further and further behind. If you don’t stop the growth of demand, it doesn’t really matter what you do.
Social media platforms illustrate this gap effectively. Their customer growth is exponential. The more they have the more they get. Their ability to provide human support to those customers is arithmetic.
Ever try to talk to a person, real time, for help with a problem on Facebook? You can’t. They know they can’t provide it. So they don’t try. It’s a luxury social media platform have that the medical community doesn’t.
This is where the truly brutal nature of epidemic starts to materialize. My capacity is behind. I’m falling farther behind by the second. And then the virus actually starts to diminish my capacity to help the people I actually can. Because besides beds ventilators and tests, the capacity to provide care depends on humans to provide it. Now the virus and my operation are fighting over the same resource. And the virus is better…exponentially so. Eventually, they get infected. The gap widens even further until eventually capacity reaches near zero while demand accelerates.
Now the virus is no longer the problem. My lack of medical infrastructure is. I no longer have capacity to deliver services for other issues. And problems that are entirely unrelated to the epidemic can’t be addressed.
Now, I’m on the road to operational collapse.
It starts with eliminating elective procedures. And then I start to pull DEFCON levers that increase in severity until I’m making decisions on who lives and who dies. And then simply how long to shut down the whole operation because the declaration of no capacity is better than the unfilled promise of some.
All because of an increase in one tenth of one tenth of a percentage of demand for acute care.
In the operations world, nothing about that scenario is controversial. It’s the pattern we understand that defines why we do what we do. Where the controversy lies in every operations problem, is the math that gets you to solving the mystery of demand.
For COVID19, we don’t know what the baseline infection rate is. And we don’t know the impact efforts to limit it will be.
These are all common forecasting problems. The way we get around them is through the collection of historical arrival patterns and the collection of massive data sets that allow us to use machine learning algorithms to enable more effective predictions.
We don’t have historical COVID19 data. And we don’t have nearly enough real time data to start to feed ML platforms that can help. And we won’t until it’s too late. So we’re left with the mother of all bad operations problems.
We have a huge area to cover. We have an unknown incidence of events to respond to. And we have extremely limited resources to respond.
As an Ops guy, I’m out of precise, limited measures. The only action I can take is 1-Broad measures I know inherently decrease demand. 2-Rapid production of capacity.
Right now the bets we’re making are on the forecast. And I can tell you from experience that we don’t have enough data to know that it’s accurate. People telling you X people will die don’t know. People telling you Y people won’t die don’t know.
So we’re in a place we’re we have to decide something undecidable. Which means we really have to fall back on principles that allow us to answer the following question:
If we’re wrong, and we most certainly are to some degree because we don’t have enough information to be right, what’s the acceptable damage we’re willing to incur?
Is it worse to have to simply shut the operation down?
Or is it worse to do the things it requires to keep it running?
This is the question every ops person answers on their way out the door when it’s gotten to bad to stay. And it’s the one on the table for world leaders now.