I learned the joy of data and statistics from sports. I have an early memory of arguing with my mom’s boyfriend when he would make his claim that running the football was the only way to win an NFL football game. He liked to cite that when Emmett Smith carried the ball 25 times or more the Cowboys were a hundred million and 0. Of course he carried the ball that much because they were winning and not the other way around. And I didn’t have a name for selection bias yet. I just knew his way was the wrong way to think about the problem. And I couldn’t let it go.
In 1985 the Nintendo Entertainment System came out. I was eight. One of the games the system came with was Nintendo Baseball. By modern standards it was the digital stone age. But at the time it was the first thing that resembled an actual sports game on a home video game console. So my brother and I played it non-stop.
The players didn’t have names and neither did the teams. But you could get 9 innings in…27 outs. And you could get hits and strike outs and home runs. So you could do the math of baseball.
My brother came up with the idea of creating a line up and naming the players and keeping tic sheet statistics. I thought this was normal, and this explains a lot about me. What we realized, over time, was that if we played enough games, all the players statistics evened out. The batting averages were the same. The pitchers earned run averages were the same. The only differences were in whether or not they were playing on his time or mine. What an 8 and a 13 year old didn’t get of course was that all the characteristics of the players were coded identical. And all the variables were rules based. There was no machine learning that adapted to the input. So the only variable was the human. And this was no fun. So we did the only logical thing. We assigned characteristics to each player. Some had to swing at every pitch. Or always with two strikes. Or never on the first pitch. Some pitchers threw only fastballs. Or only curveballs. Over time it got sophisticated.
We learned that slow pitchers were better than fast ones. And batters that had to swing at everything did worse. Slow pitchers could be controlled more easily by the human. And batters that could rely on human judgement to swing did better. And so you can infer that humans added effectiveness to the system. Which makes sense based on the relative sophistication of humans and Nintendo. And this was how I learned about data, statistics and insights you can draw from them. And also that if I spend enough time with numbers, I’ll never forget them.
When I was 9 Roger Clemens struck out 20 Mariners in a baseball game. By then there had been about 200,000 baseball games played, and no one had gotten to 20 strike outs. And I couldn’t let go of the fact that there’s just not that many variables in the number for 20 strike outs to be rare. There’s only 27 potential outcomes. And in 200,000 games, it never happened once. So there was something around the limitation of the humans involved that stopped around 17. The list of 18 strike out games or above today after 150 years and about a quarter million games is 27. And on that list the same people show up multiple times. I don’t know why the limit is 17, but one day, I’m going to quit everything and find out. Because I’ve been asking myself the same question since I was nine…What’s the limit? And baseball is such a magical place to ask it.
Clemens would do it again 10 years later which is too much for me to think about.
Yesterday Hank Aaron passed away at the age of 86. When I was growing up he was the all-time home run leader. I never watched him play. People told me he was one of the all time greats. A day after his passing, I’d like to say something about that categorization. It’s complete bullshit. Hank Aaron was not one of the all-time greats. He was the greatest hitter of all time. And I’ll take all arguments to the contrary under one condition. They’re data based.
One doesn’t measure the greatness of hitting by mystique or by anecdotes like Ted Williams being able to read the label on the ball when it came out of the pitchers hand. One measures a hitter’s greatness by production. And by that regard Hank Aaron, Babe Ruth and Barry Bonds are measurably above anyone else. Ruth played at the dawn of the modern era and played nowhere near the level of competition Aaron did. And Bonds (put him in the Hall of Fame you cowards) clearly was aided by performance enhancing drugs during the most productive years of his career. Meanwhile, Hank Aaron would still have 3,000 hits, a Hall of Fame line of demarcation in its own right, if you took away his 755 home runs.
He hit over .320 8 times in his career.
When he was 37 years old, he hit .327 with 47 home runs.
When he was 39…he hit .302 with 40 home runs.
In 1957, he struck out 13 more times than he hit a home run…and he hit 44 of them. He only walked 57 times that year. Which means he put the ball in play 550 times and still batted .322. For frame of reference, Barry Bonds, in his most productive year put the ball in play 397 times.
In 1959, Aaron had 223 hits. Bonds never had more than 181 in a year. Aaron had less hits than Bonds’ best year once before he was 34.
And Aaron still holds the record for most RBIs.
It’s not too hard to understand why Hank Aaron was not considered to be the greatest hitter of all time. Why Babe Ruth was more famous than him. Why people talk about Ted Williams like he’s some sort of genetic oddity; a Mozart of hitting if you will. But not Aaron. There’s a box full of death threats Aaron kept until the day he died that he received because he was a black man chasing the most hallowed record in American sports in Georgia. But numbers don’t lie. They tell you what the limit is. And it appears to be somewhere less than Henry Louis Aaron. That we don’t think of it that way is just another way for data to tell us the truth about ourselves.