public

Spreadsheets: Letters from a Quant

I'm Bryan Caplan, Professor of Economics at George Mason University and New York Times Bestselling author.

8 days ago

Latest Post Petersen, Jung and the Motives for Fear by Daniel Sanderson members

Last month, I lamented that “No one cared about my spreadsheets“:

The most painful part of writing The Case Against Education was calculating the return to education.  I spent fifteen months working on the spreadsheets… When the book finally came out, I published final versions of all the spreadsheets underlying the book’s return to education calculations.  A one-to-one correspondence between what’s in the book and what I shared with the world.  Full transparency.

Now guess what?  Since the 2018 publication of The Case Against Education, precisely zero people have emailed me about those spreadsheets.  The book enjoyed massive media attention.  My results were ultra-contrarian: my preferred estimate of the Social Return to Education is negative for almost every demographic.  I loudly used these results to call for massive cuts in education spending.  Yet since the book’s publication, no one has bothered to challenge my math.  Not publicly.  Not privately.  No one cared about my spreadsheets.

A while after, I got two thoughtful emails from the blogger behind Applied Divinity Studies. Shared with his permission.

_________________________________________________________

Email #1:

I didn’t quite waste a year of my life, but in a much more minute way I share your pain. I write data-heavy blog posts, often involving original data collection and analysis, which to my knowledge, no one has ever even glanced at.

For example:
– To write my piece on Lamba School’s poor incentives, I had to produce this income tax calculator
– To estimate the effects of “golden handcuffs” on Google employees, I estimated the total number of people who have ever worked there, based on historical headcount data, and then using average tenure to infer average churn, and use than to infer annual hiring

– For this piece updating the data from Bloom’s “Ideas” paper, I had to get TSMC’s latest SEC filings and then convert each year’s stated numbers into USD using the historical exchange rate

Like I said, nothing compared to the year you spent, but thankless work all the same.

But I do think your conclusion, that quantitative social science is “barely relevant” is totally wrong! Sometimes you do the math and change your view. Sometimes you end up with a new conclusion. Sometimes you realize that you were right but for the wrong reasons.

Case in point: I have an upcoming post about San Francisco crime data. I went in thinking it would be about how bad shoplifting is compared to the rest of the country, but is instead about how that narrative is total BS and unsupported by the data. So there was a real purpose to my analysis, if only to inform my views which I will soon impress upon others.

Anyway, that’s all to say, I don’t really care about your spreadsheets either, but I do care about what they have to teach us.
– ADS

P.S. I did really enjoy your book about parenting, and it did have a big impact on me. Given how consequential the decision is, easily a bigger real impact than anything else I’ve ever read. I still don’t have kids yet, but in expectation would say I plan to have 0.5 more than I did previously. I’ve also recommended the book to several other friends who I expect will take it seriously as well.

Email #2

Sure, one person took the numbers seriously, but no one else did…

Ah yes, but that person was you, and then you changed many other people’s minds.

Here’s possibly a better example:
– GiveWell spends many researcher hours putting together their annual Cost-Effectiveness Analysis, this is a pretty massive spreadsheet, which, like yours, has to consider plenty of weird second order effects like “what if our funding crowds out other funding” and complex estimates on the effects of deworming on future income. There are cells in this already massive sheet that just link to entire other massive spreadsheets trying to pin down a certain parameter value, alongside massive google docs explaining the reasoning.
– As far as I know, no one outside of GiveWell looks at these or understands them. I tried once, and nearly lost my mind just trying to grapple with everything that was happening and diving down every rabbit hole each parameter value leads to.
– Instead, ever year, GiveWell is subjected to really stupid criticisms (similar to the ones you get) like “but how can you just put a value on human life???” or “did you consider that people in other cultures might not have the same values as you???” (spoiler: they did, and in much much more depth than the critics have)

As some people have mentioned in replies to your piece, the benefit of doing the quantitative analysis is roughly that people now trust you, even if they haven’t independently verified your numbers. Which they’re trying to say is good, but it actually seems quite bad! I don’t want to be blindly trusted, neither does GiveWell, and I’m guessing neither do you.

But here’s what I think better approximates the social trust system:
– By putting together these analyses, making them public, stating them with confidence, receiving trust from a large community and so on, you are putting an bounty on your head.
– If someone were to look into the GiveWell spreadsheets, discover that they were basing massive grant decisions (with massive human consequence) on faulty math, it would cause a huge scandal.
– I’m very confident that I would hear about it. Even if you conspiratorially believe that GiveWell staff has some kind of grip on EA Forum, I know enough contrarians with an axe to grind that I would definitely find out.
– There is a big incentive to do this kind of investigative work. Any nobody could instantly become an internet star by debunking GiveWell. There are also lots of people who just straight up dislike GiveWell and would love to see them burn. There are also lots of people who would love to be able to say “I’m better at cost-effectiveness analysis that GiveWell, so give me money for my own foundation”

That’s the actual reason to trust their analysis. Not because you glance at it and become intimidated, but because there is a functioning adversarial system. And that system works better the larger the reputational/financial/epistemic stakes are.

Alexey’s Why We Sleep criticism is a great example of this. Every time Matt Walker sold another book, gave a Ted Talk, received an endorsement from Bill Gates, etc, he was raising the bounty on his own head. If it was just a stupid pop-science book with misleading claims, no one would have cared that it got debunked. But instead it was a best-seller lauded by smart people, written by a Berkeley professor, so the reputational stakes were huge, and writing this criticism helped propel Alexey into meta-science stardom.

But there’s still a hurdle, which is in getting people to listen to you as a critic. Most books about pedagogy can’t be debunked because they aren’t even making specific quantitative claims. If a book’s thesis is “the best way to educate kids is to listen to them”, and backs this up with a bunch of anecdotes, someone could write a very well reasoned criticism, but it would still come down to issues like how much I trust the critic, how well they write, if they’re able to publish in a major venue, etc.

In contrast, if you publish the spreadsheets, anyone can come along, say “here’s a specific math error you made which invalidates your results”, and it’s *instantly* scandalous. Someone reading the critic doesn’t have to engage with messy philosophical arguments or read dense prose, it’s just “you know that Bryan Caplan book? It’s based on a cell which uses SUMMXMY2 instead of SUMXM2MY2”. That’s immediately great, immediately viral on Twitter, it’s so juicy, so scandalous, so easy to consume.

To summarize:
– Spreadsheets are inherently silly
– Writing a serious text based on silly spreadsheet errors is ridiculously embarrassing.
– This creates an adversarial epistemic context in which a potential debunking is A) highly motivated, B) easily consumed and distributed, and C) easy to verify if true.
– So I trust The Case Against Education even though I haven’t seen your spreadsheets.

Of course, we can’t all defer judgement to someone else, so there’s a danger in relying too much on this kind of social proof. But just as I don’t independently verify the claims of every paper I read, or independently verify nutrition labels or most other things in life, this kind of mechanism is all that makes complex civilization possible. So social proofs have limits, but having adversarial incentives is about the best we can do.

One more analogy:
– Say Alexey Guzey, either out of carelessness or perhaps deliberately, left some broken links on his Why We Sleep criticism
– Although the post has been read hundreds of thousands of times, no one has pointed this out to him, ironically implying that they didn’t actually check the citations on a post about the importance of checking citations
– Would that imply that citations in science are “barely relevant in the real world”? Or that “even scientists who use citations don’t care about what citations really have to teach”? Not at all!
– I don’t have to check Alexey’s citations, they’re just there so that if someone were to check them, and if they did turn out to be wrong, Alexey would have violated a strong community norm and accordingly be exiled from polite society. So the incentives are very high for Alexey to get this right, and very high for critics to call him on it if he’s wrong.

Critically, citations are a costly signal. And spreadsheets are an even more costly signal. Imagine if some hack published a best selling book about education, based on a spreadsheet they had made public, and which, upon examination, just said “Caplan’s estimated value of education: ~$0. The true intrinsic and immeasurable value of education: $1,000,000. The total value of education with everything added up: $1,000,000”. They would be torn apart right?

Okay, forgive the overly long analogies, but I think they make the point. No one cares about your spreadsheets, but the intellectual community collectively cares that they exist.


The Myth of the Rational Voter: Why Democracies Choose Bad Policies - New Edition, – Illustrated (2008)

The greatest obstacle to sound economic policy is not entrenched special interests or rampant lobbying, but the popular misconceptions, irrational beliefs, and personal biases held by ordinary voters. This is economist Bryan Caplan's sobering assessment in this provocative and eye-opening book. Caplan argues that voters continually elect politicians who either share their biases or else pretend to, resulting in bad policies winning again and again by popular demand.

Purchase on Google Books

TRANSCRIPT: The Myth of the Rational Voter by Bryan Caplan is worth the read. In case anyone is wondering why we link to Google Books versus Amazon, it's not because we receive monetary compensation for one versus the other. We, at planksip, support Google Books over Amazon simply because our Journalists use a shared copy for commenting. Of course, we have to purchase individual copies for each contributor on any given project or story, but the ability to create a shared Google Doc directly linked to the book, research or citations is extremely valuable.

If you would like to sign up to be a planksip contributing writer, feel free to set up an appointment with me directly.

If your writer's voice is fiercely independent and embodied as the protagonist for the truth, if you feel like you have something to offer planksip and want to be a contracted, employed or official contributor then let's chat.

Bryan Caplan

Published 8 days ago