Thinking Fast and Slow by Daniel Kahneman (Book Review)

Models are tools that help us to better understand and predict some aspect of the world. There is a natural appreciation that a model must be a simplification of reality in the sense that extraneous details should be excluded. The failure to impose some level of parsimony on a model would lead to an outcome like Borges’ map – objects so complicated and intractable that by the time they have solved a problem one could have just waited for the world to play out in real time. Assuming that a model receives some inputs and provides some useful outputs there are two ways in which these models could be labeled as “wrong”. First, a model may either use irrelevant inputs or fail to use relevant features. Second, even if a model has access to the necessary information, they way in which it obtains its outputs may depart from reality.

In the field of economics, how decisions are made is viewed through the lens of an agent solving an optimization problem; this could be some complex integral or a set of possible actions to pick from in a repeated game. Because the only humans that have ever solved a lifecycle income/saving problem in the form of a Bellman equation are economic graduate students, this model of the world clearly obtains outputs in a way different from the real world. However, as long as the outputs are fairly accurate, that is they align with what is actually observed in the world, this dual formulation of the problem seems acceptable. Milton Friedman famously defended this position as the “as if” argument.

Consider the problem of predicting the shots made by an expert billiard player. It seems not at all unreasonable that excellent predictions would be yielded by the hypothesis that the billiard player made his shots as if he knew the complicated mathematical formulas that would give the optimum directions of travel, could estimate accurately by eye the angles, etc., describing the location of the balls, could make lightning calculations from the formulas, and could then make the balls travel in the direction indicated by the formulas. Our confidence in this hypothesis is not based on the belief that billiard players, even expert ones, can or do go through the process described; it derives rather from the belief that, unless in some way or other they were capable of reaching essentially the same result, they would not in fact be expert billiard players.

The “as if” argument also presents itself in areas of machine learning. For example the massive VGG19 convolutional neural network has around 140 million parameters that were learned by training over 14 million images. However if I show a 6-year old a picture of a Zebra for the first time, they will have both learned the class permanently with a single image and draw analogies to other animals. Clearly the human brain is more efficient, at least at a certain stage of development. It is also debatable whether the process of learning via backpropagation has any analogs in the human brain. While this is an interesting philosophical debate, the “as if” argument actually requires the dual formulation to describe with reasonable fidelity the phenomena that is observed in the real world. However thanks to the lifetime work of Daniel Kahneman and Amos Tversky it has been thoroughly established that traditional economic models do not even obtain an “as if” status as they fail to describe persistent behavioral tendencies that human beings actually have in the real world.

Thinking Fast and Slow gives a tour of the entire intellectual career of Kahneman, his collaborators (such as Tversky), and the field of behavioral economics. This subfield in economics is at the intersection of psychology and decision theory, the latter being a highly technical method of modeling human behavior. As a quick background, modern economic theory began in the late 19th century with the marginal revolution in which human decisions were framed in the context of marginal utility. Alfred Marshall’s Principles of Economics (1890) represents one of the foundational textbooks of this approach. However, the micro foundations of a decision theory framework were laid down shortly after WWII in the von Neumann-Morgenstern utility theorem which provided an elegant way of deriving rational behavior from a set of simple axioms. As an example of one such axiom consider transitivity which allows one to infer new preferences from previously observed preferences (if you tell me that apples are better than bananas, and bananas are better than oranges, than you must believe that apples are better than oranges). Critically, these axioms also provided a framework for understanding preferences under uncertainty. Instead of having an apple or a banana, there might be a lottery in which there is a 50% chance of having an apple and a 50% chance of having a banana. However, this lottery should still be preferred over a 50% change of having a banana and a 50% change of having an orange.

One of the most seemingly well understood principles of human decision making under uncertainty is risk aversion. This stems from the fact that utility of money is concave such that the next dollar is worth less to an individual than the current one. Imagine you can have one of two lotteries: (Lottery A.1) Heads you get \$100, tails you get \$100 or (Lottery A.2) Heads you get \$210, tails you get \$0. While the second lottery has a higher average return (\$105), most people would prefer the sure thing in Lottery A.1 because \$200 is not twice as valuable to an individual as \$100. That humans will not accept a bet based solely on the expected payoff was first explicitly modeled almost 300 years ago by Daniel Bernoulli have developed the expected utility hypothesis. Modern decision theory is essentially built on the expected utility hypothesis and this idea has formed the bedrock of ideas underpinning risk management especially for insurance markets.

Unfortunately expected utility theory fails in the real world because individuals think in terms of gains and losses and not net changes in wealth under different lotteries. For example, consider the following lottery: (Lottery B.1) Heads you lose \$100, tails you lose \$100 or (Lottery B.2) Heads you lose \$200, tails you lose \$0. Whereas most people would have opted for the sure thing when a “gain” was at stake, most people prefer to pick Lottery B.2 in the case of a “loss” and go for broke. Can risk aversion explain both behaviors? No! And what’s worse they are actually inconsistent. Because risk aversion stems from the concave value of money (the next dollar is worth less than the current one), losing money should cause an individual to be even more risk averse because the previous dollar is worth even more than the current one! In other words, losing \$100 would be bad, but \$200 would be even worse because the marginal utility at that level of money is even higher.

Consider another lottery in which expected utility will fail to conform to actual human choice. This is known as the Allais paradox and it stems from the fact that humans factor in the concept of regret into their decision making. (Lottery C.1) 100% change to win \$1 million or (Lottery C.2) 89% chance to win \$1 million, 10% chance to win \$5 million, and a 1% chance to win \$0. Most people prefer Lottery C.1, because the extra chance to win a larger payout does not seem worth that small risk that something goes wrong. (Lottery D.1) 11% chance to win \$1 million, 89% chance to win nothing or (Lottery D.2) 10% chance to win \$5 million and a 90% chance to win \$0. Now most people prefer D.2 because the small change in winning is more than compensated by the increase in the size of the win. However, one of the axioms of decision theory is the independence axiom which states that adding an additional outcome to both lotteries with equal probabilities should not change the preference order. In other words if apples are better than bananas, then adding the probability of winning an orange with equal probability to either fruit should not change my preference ranking. The orange in this case is the 89% chance to win \$1 million dollars, which if added to lottery D.1 and D.2 returns the lotteries C.1 and C.2 but this time with the opposite choice! Again, psychologically what is going on here is that lotteries D.1 and D.2 do not have a regret factor whereas lottery C.1 establishes a reference point of \$1 million dollars which anchors subsequent decisions.

Throughout Kahneman’s book he describes two forces competing in our brain for decision making: those of System 1 and System 2. System 1 is the fast thinker whose intuitions and heuristics have been crafted by thousands of years of evolution. Because it is much more prudent to have a bird in the bag than two in the bush when you might starve the next day, evolution makes sure we are not too greedy and hence our preference for avoiding situations of regret. System 2 is the slow thinker that can be brought to bear when making decisions but it is fundamentally lazy and avoids being employed. It prefers to rely on System 1 and will usually take its input as given. However with enough energy, System 2 will acknowledge that there is something problematic about violating the independence axiom and would prefer to frame the problem in a way that maintains consistent choices across a range of situations.

For many statistical problems it is the leadership of System 1 that determines how humans behave and understand the stochastic world around them. One of the most important principles is that of What You See if All There Is (WYSIATI). This mechanism explains why humans ignore a given question and replace it with the current data. For example, the question of “will the current political party be re-elected” is reinterpreted as “is the current political party popular now”? WYSIATI also ensures that prior information is almost totally disregarded. When people are given information about a person’s description and then asked what profession they belong to, information about the different rates of professional employment have no impact of how likely someone is to assign them to a given category. As in, if someone sounds like they might be a farmer, it doesn’t matter if they live in Iowa or New York State, they are still just as likely to be a farmer. The fact that humans can both acknowledge that farming is more common in Iowa than New York State but then also think that the current evidence overrides this prior data totally may actually be a beneficial psychological phenomena as it leads to more entrepreneurship and risk taking. Sure most businesses fail, and sure the divorce rate is quite high, but my business/marriage is not affected by these prior rates.

Thinking Fast and Slow provides the reader with a bevy of delightful quirks in the way in which humans understand statistics and probability. Consider the Law of Small Numbers which says that humans act as though small samples have the same properties as large samples. This is particularly problematic in academic research whereby (most) professors falsely assume that they are powered to detect reasonable effect sizes. Suppose there is an urn with 100 balls, some of them red and some of them white. Most people believe that if you draw four balls with replacement and 3 are red and 1 is white provides more information for an unbalanced urn (i.e. not a 50-50 colour split) than drawing (14 red and 10 white).[¹] Instead of grappling with sampling variability, System 1 is able to conjure up elaborate stories to fit the data. Consider the following statement: States with the highest rates of mental health problems are likely to be in conservative and more rural States that mainly vote Republican. Is this true, almost certainly, but so is the converse that the States with the lowest rates of mental health problems are in rural areas. It has nothing to do with their voting patterns but just their size: smaller samples yield higher variability.

The theoretical approach that Kahneman and Tversky used to understand how humans behave under uncertainty is that of prospect theory. Instead of net changes of wealth there are only gains and losses (because humans benchmark to some status quo). Losses are more painful than gains, leading to risk aversion in the domain of gains and risk seeking behavior for losses. People would rather face a doubling of their loss if it meant the possibility of recovering of their losses because the latter option is extremely ameliorating to the psyche. Note that this does not square with the theory of marginal utility.

My only complaint with this book is that the first part, which is more an overview of the field of psychology than behavioral economics, is in need of a serious revision. Many of the studies which Kahneman cites have either been debunked or more generally are failing to replicate on a systemic scale – in part because of the Law of Small Numbers that the book talks about! Since this book was published Kahneman has himself acknowledged that many of these areas of psychology need to do serious robustness checks on their research methodology. Given that the book is quite long (400+ pages along with two seminal papers in the appendix) I think Part I of the book could be wrapped up into other sections about 100 pages could be taken out in total.

Overall Thinking Fast and Slow is a great read, though it can be at times a bit repetitive in highlighting similarly idiosyncratic human behaviors. It is a great summary of the field of behavioral economics and the work of an impressive academic career. While I have not been in academic economics research for many years now, I would be curious to hear the most up-to-date analysis of how these insights are being built into modern theory. With most of the growth of applied consumer behavior research being carried out in e-commerce I wonder how prospect theory is being factored into the customer analytics and marketing strategies too that are enabled by big data too.

Footnotes

If ones draws four balls from an urn, and the urn is truly 50-50, 31.25% of the time you’ll draw 0 or 1 white ball. Whereas if you draw 24 balls, only 27% of the time will there be 10 or less white balls (i.e. it’s a more unlikely event if the urn is truly 50-50). ↩

Written on May 28, 2018