A critique of Poor Economics by Banerjee & Duflo (Book Review)
Throughout most of human history poverty has been the rule rather than the exception. Even as of 1980, close to 90% of humanity lived in extreme poverty (less than $1.90 a day). The last forty years of globalization and development has led to an amazing reversal in world history with only 10% of the population living in extreme poverty as of 2015 (an amazing 80% reduction!). The fact that most citizens of the rich world are unaware of this spectacular economic progress is unfortunate. The data shows that the bulk of this decline has been driven by Asia (especially China). But there are still 700 million individuals living in extreme poverty, mainly concentrated in Sub-Saharan Africa, and their destitution remains a significant concern to world leaders and policy makers.
The economic challenges and incentives that the extremely poor of humanity face are the topic of Poor Economics by Esther Duflo and Abhijit Banerjee. These two Nobel Prize winning economists from MIT helped to pioneer the use of randomized control trials (RCTs) to evaluate policy interventions in poor countries. The use of RCTs in empirical economics is part of the broader “credibility revolution” that focuses on using identification strategies that yield unbiased estimates of a treatment effect. Randomization is one of the oldest and trusted methods for obtaining unbiased effects. The book’s ten chapters outline various economic and health challenges the extremely poor face from education (Top of the Class) and family size (Pak Sudarno’s Big Family) to financial risk (Barefoot Hedge-Fund Managers) and forced entrepreneurship (Reluctant Entrepreneurs). Using almost two decades of research of development research, much of it coming from Duflo and Banerjee’s own work, the authors attempt to use empirical results to explain why it is often so hard for the poor to lift themselves out of poverty.
I have serious concerns with the approach that Banerjee and Duflo espouse and I outline these below. To be clear, I do not mean to pick on the authors of Poor Economics exclusively when I use their names I do so as a synecdoche for the entire discipline.
One economics to rule them all
In addition to proving a summary of their academic findings, Poor Economics champions itself as the sound voice of reason in an existing debate between economic theories. Traditionally development economics has been split into two camps: foreign aid sceptics, led by William Easterly, and poverty-trap theorists, led by Jeffrey Sachs. While both sides have good arguments, Banerjee and Duflo believe that the rivalrous camps have failed provide credible micro-experiments to settle their disagreements and that their exchanges amount to little more than hot air. Aid sceptics are correct that foreign aid money often ends up in the pockets of corrupt government officials or “helps” risk-averse farmers take up business models they are uninterested in. But there is also good empirical data that future wealth, as a function of current wealth, shows a “trap”-like property. Experiments which demonstrate the challenges the poor have to procure insurance, savings accounts, or labour market stability add further credibility to the poverty-trap theory.
By presenting Easterly and Sachs as bickering children whose dispute needs to be adjudicated by the wise RCT-economists, a certain smugness pervades Poor Economics. The implication of the book seems to be that were Easterly and Sachs able to remove their ideological blinders and invest in “credible” research of the type that Banerjee and Duflo have pioneered, then as a profession economists could finally determine what policies actually work in development and which do not. As economists as a profession are known for their preoccupation with department ranks and having a haughty attitude towards other disciplines, it is no surprise that they are prone to creating internal hierarchies of whose knowledge is superior.
The studies we use have in common a high level of scientific rigor, openness to accepting the verdict of the data, and a focus on specific concrete questions of relevance to the lives of the poor.
Although there has been significant push back from other economists against the notion that RCTs provide the “credibility” that researchers in this field would wish for. Even if a study’s design provides good internal validity, the generalizability of the results may be difficult. Most leading empirical economists believe that in addition to having unbiased statistical estimates, they also are “unbiased” researchers: “Just the facts ma’am”. The fact that the research group the authors lead is called the Poverty Action Lab reinforces my view that there is a whiff of scientism in the whole endeavour. In a scientific field like cell biology, discoveries made about the structure of bacterial genomes have the attractive property that they are likely to be replicated in any other set of identical bacteria. But a pilot project which examines how truancy rates can be reduced for students in rural Tanzania may or may not work equally well in urban Lagos.
The institutionalist critique
Another view of economic development that comes from the new institutionalists holds that political structures are the ultimate determinant of economic outcomes. In the pure institutionalist view, there is no uncertainty around what are good or bad policies in terms of aggregate economic welfare. Bad policies are not chosen because of a mistake in analysis or a mechanism design flaw but simply because those bad policies happen to be in the interest of policy makers. Take North Korea as an extreme case. Does the Kim dynasty continue to choose policies that lead to starvation and economic destitution because of ideology or a lack of economic understanding? No. They choose policies that allow them to maintain their power, and these happen to involve the immiseration of their citizens. To be clear, Banerjee and Duflo are careful to acknowledge and accept many of the findings of the new institutionalists, but they also believe that good policies are themselves important for making good institutions. They cite examples of simple voting reforms in flawed democracies that nevertheless increased the power of ordinary citizens.
Now that decades of carefully gathered evidence has been collected, should we expect policy making to be better in the developing world? If RCTs in medicine help patients know what drugs to take, are policy makers swallowing better policy pills? This is something that Poor Economics does not address. I do not actually recall the authors citing any examples of policy changes that have been explicitly implemented because of a finding of an RCT. Of course the relationship between policy adoption and research is much more subtle than drug approvals, but it would be at least valuable to speculate of the influence mechanism. There are examples of governments implementing policies, and then ex post facto research validating their success, but this is different. For example Brazil’s successful Bolsa Família program was modelled on Mexico’s Oportunidades which had associated RCT trials. But how important were the research trials of Oportunidades in informing Bolsa Família? If the institutionalists are correct then mountains of credible research will not be sufficient to change policy if such changes are not in the interest of policy makers.
I find myself on the side of the institutionalists. There may be some politically costless policies that governments would be willing to implement, but I expect this would be the exception rather than the rule. In my own (rich) country of Canada, the most extreme pockets of poverty are concentrated in isolated First Nations communities. For at least the last ten years the Attawapiskat First Nation has suffered acute infrastructure problems ranging from undrinkable water to housing shortages. Every single year politically embarrassing headline stories run in the major news outlets ranging from “suicide crisis” to “boil-water advisory” at Attawapiskat. And every year the prime minister of the day and his cabinet promise “action” (2011), “long-term solutions” (2016), and a “full partnership” (2019). Can the residents of Attawapiskat drink tap water in 2020? Of course not. The failure of economic development for First Nations is not a problem of technical knowledge, it is a failure of political systems. No amount of RCT findings will fix this problem. If a developed country like Canada with high state capacity fails to deliver public services to 2% of its most marginalized population for political reasons, should I expect the government of Kenya to do similarly? Without a theoretical framework for the political reasons of institutional poverty I do not see how Poor Economics could answer such a question.
Trouble in paradise
On several dimensions Poor Economics has aged poorly despite being published in 2012. There are numerous references to psychological studies showing how nudges can influence the poor’s decision making. Unfortunately the reproducibility crisis, which started getting traction in the early 2010s, has shown that most of these studies are complete noise. Awareness of the crisis peaked in 2015 when the Open Science Foundation published research showing that only 36% of psychology studies from top-tier journals could be replicated with pre-registered and high-powered study designs. Even the founder of behavioural economics Daniel Kahneman (and Nobel Prize winner) has admitted he “placed too much faith in underpowered studies”. There are at least eight references to “nudges” in Poor Economics and the authors make reference to studies that amount to priming (e.g. reminding students they are from a low caste). Priming studies have one of the worst replication rates even within the field of psychology.
The authors also speak positively about charter schools in the US and reference the studies that show that students who do not get in through a lottery system have worse outcomes. However it is now clear that while these charter schools cannot select their students, they do select the parents by having rigorous and costly procedures to ensure that only a fraction of lottery winners ever enrol in the school. To be clear I am not against charter schools. What I am against is bad statistical evidence in their favour. Lastly there is de-worming which the authors cite as a perfect example of a cheap intervention that has been demonstrated to be effective through solid research.
In Kenya, children who were given deworming pills in school for two years went to school longer and earned, as young adults, 20 percent more than children in comparable schools who received deworming or just one year.
Deworming has gone from “one of the most potent anti-poverty interventions of our time” to being highly contested. The original deworming paper by Baird et al. cited in the book has been shown to have methodological issues including serious missing data problems. The current consensus put together by the Cochrane Review is that deworming helps to improve childhood nutrition and increase bodyweight but does not lead to significant labour market returns. Ironically there is a strong parallel between the enthusiasm and subsequent disillusionment with microfinance that is discussed in Poor Economics and deworming.
The Credibility Revolution revisited
By focusing solely on having clean statistical identification, empirical economists help to perpetuate the hype cycle of research by neglecting the power of their statistical tests. All statistically significant findings are biased because an observation needs to be a certain number of standard deviations away from zero to be considered significant. This is also referred to as the winner’s curse in statistics. The magnitude of this bias can be explicitly expressed in terms of power (the lower the power, the higher the bias). Low-powered tests help to explain the Proteus phenomenon in which the measured effect size of the first paper in the literature tends to be higher than subsequent estimates. Work by Ioannidis et al. shows that the median power of the empirical economics papers could be low as 18%. In such papers the estimated statistically significant effect size would easily by inflated by a factor of two.
The contested deworming paper referenced above provides a useful cautionary tale. The authors estimate that the deworming policy increases the average number of hours worked by 3.5 hours by week with a standard deviation of 1.42 (Table III, Panel A). Even taking this result at face value (and ignoring the methodology issues), the minimally observable statically significant result would have to be a 2.7 hour increase. In other words the observed effect is not much higher than the one they would have had to observed for it to be considered significant! In biomedicine, research grants are only provided to studies that have at least 80% power. Suppose that the true effect size of deworming is a 5% increase in hours worked (i.e. 1 extra hour per week), then the Baird et al. paper would have a paltry 10% power and an effect size bias ratio of 3.5 (i.e. the average statistically significant effect size would be a 3 hour increase)!
The statistical biases that occur at the level of an individual test lead to an entire literature that has inflated effect sizes. The file drawer problem amounts to the winner’s curse on a systematic scale. The problem of inflated effect sizes is likely to be further exacerbated by researcher’s degrees of freedom which leads to an underestimate of the measurement error and a further reduction in power. Consider this comment from Banerjee and Duflo’s famous microfinance paper:
This also affects power: the initial power calculations were performed when Spandana thought that 80 percent of eligible households would become clients very rapidly after the launch. In fact, the data shows that the proportion reached only 18 percent in 18 months (and stayed at just below 18 per-cent after two and a half years).
If your sample size is 80% smaller than expected then your power is going to be more than halved and any significant results will be massively inflated! It is not responsible to highlight any significant results from such a low-powered study. Researchers who align themselves with the nudge economics, which includes Banerjee and Duflo, believe we live in a world of large effects waiting to be unlocked by small interventions. This view is epistemologically unsound because if such a world did exist nudges could never be measured because there would be too much background noise to observe anything!
Why aren’t we poor any more?
For ye have the poor always with you [Matthew 26:11]
We are all descendants of poverty. In 1750 the citizens of almost every region in the world had a per capita income of \$1,400 a year which is equivalent to contemporary Zambia, one of the poorest countries in the world. What has changed? Technology and science of course. But countries like Zambia remain poor even though they have access to technologies like telecommunications and international finance. In 1960 Botswana had only one-fourth of the per-capita income as its other land-locked neighbour Zambia did. Today Botswana in more than six times richer than Zambia. The only way to go from having a per capita income of \$60 to \$8,000 is to have decades of high and sustained economic growth (which Botswana did).
I will not adjudicate the various competing theories about what leads to sustained rates of economic growth. Countries like South Korea, Botswana, and Taiwan were all able to escape poverty despite having incomes lower than the world average at the time of their national founding. The only commonality between these countries is that they were not “nudged” to prosperity. There may be many roads to riches but none of them are paved by RCTs.
The population that lives on $1.90-5.00 a day hardly has an easy existence either. Even if there were no measurable levels of “extreme” poverty, poverty itself would still be an issue. ↩
For example the effectiveness of a school voucher program in Nairobi is unlikely to produce the same results in Lima due to drastically different cultural and institutional forces. ↩
The history of Canada’s tragic relationship with First Nations is too long and complicated to go into here, but it shares a similar story to the way other aboriginal groups were marginalized and killed in the other settler colonies. ↩
Robert Pondiscio discusses this in his book How the Other Half Learns. ↩
Since the average number of hours worked in a week is 20, this amounts to a difference between a 17% and 14% increase, respectively. ↩
In other words it was not a “statistically significant and large effect size” it was a “large effect size because it was statistically significant”. ↩
Individual researchers often forget that their research is part of a system. There is the old riddle of the email you get sent every Friday night which accurately predicts the results of the weekend football game. How can this mysterious email predict the results of the game week after week? Simple, it sends different people different predictions every week, and by random chance you happen to be the one of the few who see the string of successes. By not appreciating you were part of a system you did not “deflate” the impressiveness of the results to the overall number of predictions being made. Statistically significant findings in scientific research are equivalent to such miraculous predictions because results that are often not significant are either not discussed or not published. ↩