Gelman and Imbens: “Forward causal inference and reverse causal questions”

One of the traumas of graduate school for me was that my training in statistics re-oriented me away from big Why questions (“Why is there disorder/poverty/dictatorship in some places and not in others?”) toward smaller questions like “Does winning office enrich politicians?” The most influential advice I received (and which I now give) was to focus on the effects of causes rather than the causes of effects, or put differently to ask “What if” rather than “Why”. (I recall Jim Snyder saying in his seminar that your paper title really shouldn’t begin with the word “Why”; I am glad to say that I managed to point out one of his best-known papers is called “Why is there so little money in U.S. politics?”) There are important “what if” questions, but when we think big we usually start with a puzzling phenomenon and ask why it occurs: Why are there lots of political parties/coups/civil wars in some countries and not in others? In fact, before I fell under the influence of people like Don Rubin, Guido Imbens, and Jim Snyder, the main advice I had received for empirical work was to identify a puzzle (an anomaly in a dependent variable) and then explain it in my paper, and many people continue to adhere to that approach. What I and others struggle with is the question of how the “what if” approach relates to the “why” approach. If we want to explain the “why” of a phenomenon (e.g. polarization, conflict, trust), do we do it by cobbling together a bunch of results from “what if” studies? Or should we stay away from “why” questions altogether?

Gelman and Imbens have taken on these issues in a short paper that puts “Why” questions in a common statistical framework with “What if” questions; in the title, the “Why” questions are “reverse causal questions” while the “What if” approach is covered by “forward causal inference”. I think their main contribution is to clarify what we mean when we ask a “Why” question: we mean that there is a relationship between two variables that we would not expect given our (perhaps implicit) statistical model of a particular phenomenon. Thus when we ask a “Why” question, we are pointing out a problem with our statistical model, which should motivate us to improve the model. Using the example of cancer clusters, in which the anomaly is that some geographic areas have unusually high cancer rates, Gelman and Imbens highlight one way to improve the model: add a variable. When we add that variable we might think of it as a cause that we could potentially manipulate (e.g. the presence of a carcinogenic agent) or as a predictor (e.g. the genetic background of the people in the area), but the idea is that we have explained the anomaly (and thus provided an answer to the “why?” question) when the data stops telling us that there is an association we don’t expect.

One of the key points the authors make is that there may be multiple answers to the same “why?” question. What do they mean? My reading was: Continuing with the cancer cluster example, the puzzling association might go away both when we control for the present of a carcinogenic agent and when we control for genetic background; this particular indeterminacy is an issue with statistical power, because the anomaly “goes away” when the data no longer reject the null hypothesis for the particular variable under consideration. There are thus two explanations for the cancer clusters, which may be unsatisfactory but is correct under their interpretation of “Why” questions and how they are resolved.

A related point is that there are multiple ways to improve the model. The authors emphasize the addition of a variable, I think because they want to relate to the causal inference literature (and so the question is whether the variable you add to explain an anomaly can be thought of as a “cause”), but elsewhere in the paper they mention statistical corrections for multiple comparisons (particularly relevant for the cancer cluster example) and the introduction of a new paradigm. I wondered why they don’t discuss the option of accepting that the anomalous variable is a cause (or at least a predictor) of the outcome. Using an example from the paper, this would be like looking at the height and earnings data and concluding that height actually does influence (or at least predict) earnings, which means changing the model to include height (in which case there is no longer an anomaly). I guess the attractiveness of this solution depends on the context, and particularly how strong one’s a priori reasons for ruling out the explanation is based on the science; in the case of cancer clusters, you might be correct in saying that there is no good reason to think that fine-grained location on the earth’s surface actually would affect cancer, and thus that there must be some other environmental cause — even if that cause is something highly related to position on the earth’s surface, such as magnetic fields or soil deposits.

A lingering question here is about the distinction between a predictor and a cause.

Friedman: Capitalism and Freedom

I finally got around to reading Milton Friedman’s classic Capitalism and Freedom. After spending time this summer reading Adam Swift’s Political Philosophy: A Beginner’s Guide, which IIRC recommends Friedman’s discussion of equality, I find Friedman to be unimpressive as a work of political philosophy, but this is partly for two reasons that are not his fault.

First, part of why I found the book unimpressive is that the ideas are so familiar to me. When I first heard the Beatles I thought they sounded just like everyone else; it took me a while to figure out that this was partly because so many others had copied them. Similarly, Friedman’s ideas have been recycled so much — not just in economics departments but in political discourse, mainly from the right — that they hardly seem revolutionary anymore.

Relatedly, what he is doing is applying basic economic analysis to questions of political philosophy. Until recently this was the only kind of political philosophy I had ever really engaged in: explaining the efficiency losses associated with government intervention, identifying market failures that justify government intervention, etc. The core ideas about the proper role of government in this book are applications of standard economic theory, with a healthy portion of enthusiasm about freedom thrown in.

Although Friedman is of course a strident free markets guy and the prefatory material introduces the book as a political tract, I was surprised by how modest Friedman is about the extent to which his philosophy can provide answers to tough political questions. He states this clearly on page 32:

Our principles offer no hard and fast line how far it is appropriate to use government to accomplish jointly what it is difficult or impossible to us to accomplish separately through strictly voluntary exchange. In any particular case of proposed intervention, we must make up a balance sheet, listing separately the advantages and disadvantages. Our principles tell us what items to put on one side and what items on the other and they give us some basis for attaching importance to the different items.

Thus in discussing natural monopolies, he admits we are “choosing among evils” (public monopoly, public regulation, or private monopoly) and provides some thoughts on which might be less evil (hint: often it’s private monopoly); in discussing paternalism, he recognizes that the state must restrict freedom to provide for children and the insane, but that after that “there is no formula to tell us where to stop.” This is I think an accurate view of what a commitment to “freedom,” combined with the tools of welfare analysis from microeconomics, yields in terms of policy proposals: not much. That’s not to say that this book stops short of providing lots of policy proposals. In fact, Capitalism and Freedom is much more interesting as a set of provocative policy proposals than a statement of political philosophy. But the key point is that to arrive at these policy proposals you need more than Friedman’s stated commitment to freedom plus standard ideas from microeconomics about the tradeoffs involved in government intervention in markets. Mostly, you need a lot of beliefs about the nature of the social world, e.g. the degree to which high marginal tax rates encourage tax avoidance and evasion. On a superficial reading one can fail to recognize the importance of these beliefs on empirical matters and read this as a coherent work of philosophy in which the policy prescriptions follow from a commitment to freedom and some basic ideas about how markets work. In fact, the interesting ideas in the book (like the claims about how markets tend to disadvantage those who discriminate) are commitments to contestable causal claims just as much as they are embodiments of a high value placed on freedom, or more so.

Another way to put this is that policy proposals from left, right, and center (in liberal democracies like the US, UK, France) could be justified on the basis of principles in the first two chapters of Capitalism and Freedom. The same of course can be said for other influential groundings of political philosophy, such as the Rawlsian thought experiment about the original position. Clarifying normative values and even proposing ways for prioritizing among them seems to fail to get us very far toward policy recommendations, because in all important cases there is a large set of empirical facts that stand between principles and policy outcomes.

A few notes on things I found interesting:

  • Friedman argues that political freedom requires a market economy because dissent requires resources; in a “socialist” economy (by which he means one in which private property does not exist, or at least where the state controls the means of production), how could one finance a program of political dissent? Where would Marx find his Engels?
  • Like Buchanan and Tullock in The Calculus of Consent (published in the same year — 1962), Friedman has some nice insights into how voluntary economic exchange and government intervention relate. One reason to prefer market activity is that you get “unanimity without conformity,” in the sense that everyone agrees to the outcomes (excluding market failures of course) and you still get a variety of outcomes. Again putting market exchanges in political terms, Friedman portrays market exchange as analogous to proportional representation, in the sense that everyone gets what she votes for, without having to submit to the will of the majority.
  • The chapter on education is a strident case for revising the way in which government supports education. With respect to higher education I find him particularly convincing. The analogy that was relevant when he was writing was the GI Bill, a key feature of which was that the government supported veterans’ education wherever they chose to get it (within an approved list of schools); by contrast, at the university level the individual states support education (particularly of their own residents) only at the public universities in that state. I agree that this does not make a lot of sense, and would favor reform in this area if I didn’t think it would lead to a large reduction in support for education overall. It also made me wonder how much the move toward government loans and grants for education was in response to arguments like these, and to what extent this has replaced public funding for public universities.
  • Friedman makes the case that the voucher system would tend to help economically disadvantaged minorities, in part by unbundling schooling from residential location decisions: a poor single mother who wants to invest in her child’s education may have a better chance under a voucher system, where she could save money and purchase that good like any other, than she does under the current system, in which (assuming that private school is prohibitively expensive) she would have to move the family to an expensive town to benefit from better schools — in other words, buy a whole package of goods in order to get one thing she wants.
  • In the chapter on discrimination, Friedman follows up this discussion of segregation and schooling by highlighting the importance of attitudes of tolerance: In addition to getting the government out of schooling, “we should all of us, insofar as we possibly can, try by behavior and speech to foster the growth of attitudes and opinions that would lead mixed schools to become the rule and segregated schools the rare exception.” In the margin here I wrote “this has happened” — not the part about privatization, but rather that public attitudes have shifted (at least where I live) to where a classroom of white faces is a problem. The avidity with which elite private schools and university pursue diversity suggests that a school system with more choice and competition would not have whiter schools. I somehow doubt however that it would have fewer schools in which almost all students are poor minorities. It makes me want to know more about experiments with school choice. For most of the claims he makes about the virtues of school choice, it would seem that almost everything depends on the way in which you deal with undesirable schools and pupils, and I don’t recall reading anything about that here.

Covariate balance, Mill’s methods, and falsificationism

I presented my paper on polarization and corruption at the recent EPSA conference and encountered what was to me a surprising criticism. Having thought and read about the issues being raised, I want to jot down some ideas that I wish I had been able to say at the time.

First, some background: In the paper, I use variation in political polarization across English constituencies to try to measure the effect of polarization on legislative corruption (in the form of implication in the 2009 expenses scandal). One of the points I make in the paper is that although others have looked at this relationship in cross-country studies, my paper had the advantage that the units being compared were more similar on other dimensions than in the case of the cross-country studies, which means that my study should yield more credible causal inferences.

The criticism I encountered was that in seeking out comparisons where the units are as similar as possible, I was doing something like Mill’s Method of Differences, which had been shown to be valid only under a long list of unattractive assumptions, including that the process being considered be deterministic, monocausal, and without interactions.

Now, in seeking out a setting where the units being compared are as similar as possible in dimensions other than the “treatment,” I thought I was following very standard and basic practice. No one wants omitted variable bias, and it seems very straightforward to me that the way to reduce the possibility of omitted variable bias when you can’t run an experiment is to seek out a setting where covariate balance is higher before any adjustment is done. I think of the search for a setting with reasonable covariate balance as a very intuitive and basic part of the “design-based” approach to causal inference I learned about from Don Rubin and Guido Imbens, but also a key part of scientific inference in all fields for a long time. In response to the criticism I received, I said something like this — pointing out that the critic had also raised the possibility of omitted variable bias and thus should agree with me about the importance of restricting the scope for confounding.

I didn’t know at the time how to respond directly to the claim that I had sinned by partaking of Mill’s methods, but in the course of reviewing a comparative politics textbook (Principles of Comparative Politics, 1st edition (2009), by Clark, Golder, and Golder) I have reacquainted myself with Mill’s methods and I think I see where my critic was coming from — although I still think his criticism was off the mark.

What would it mean to use Mill’s method of differences in my setting? I would start with the observation that MPs in some constituencies were punished for being implicated in the scandal more heavily than others. I would then seek to locate the unique feature that is true of all of the constituencies where MPs were heavily punished and not true of the constituencies where they were not heavily punished. To arrive at the conclusion of my paper (which is that greater ideological distance between the locally-competitive candidates, i.e. (platform) polarization, reduces the degree to which voters punish incumbents for corruption), I would have to establish that all of the places where MPs were heavily punished were less polarized than the places where MPs were lightly punished, and that there was no other factor that systematically varied between the two types of constituencies.

This would clearly be kind of nuts. Electoral punishment is not deterministically affected by polarization, and it is certainly affected by other factors, so we don’t expect all of the more-polarized places to see less punishment than all of the less-polarized places. Also, given the countless things you can measure about an electoral constituency, there is probably some other difference that seems to be related to electoral punishment, but Mill’s method doesn’t tell you what features to focus on and what to ignore. Mill’s method is essentially inductive: you start with the difference you want to explain, and then you consider all of the possible (deterministic, monocausal) explanations until you’re left with just one. This process seems likely to yield an answer only when you have binary outcomes and causes, a small dataset, and the willingness to greatly constrain the possible causes you’re willing to consider. The answer that the methods yields would be suspect for all of the reasons rehearsed in the Clark, Golder and Golder book and the sources they cite.

I am not using Mill’s method of differences. I have a postulated relationship between polarization and electoral punishment, and I am attempting to measure that relationship using observational data. I am choosing to focus on units that are similar in other respects, but I am not doing this in order to inductively arrive at the one difference that must explain a given difference in outcomes; rather, I am focusing on these units because by doing so I reduce the scope for unmeasured confounding.

Clark, Golder, and Golder contrast Mill’s methods with the “scientific method” (a great example of a mainstream political science textbook extolling falsificationism and what Clarke and Primo criticize as the “hypothetico-deductive model”), which they argue is the right way to proceed. The virtue of the scientific method in their presentation is that you can make statements of the kind, “If my model/theory/explanation relating X and Y is correct, we will observe a correlation between X and Y” and then, if we don’t observe a correlation between X and Y, we know we have falsified the model/theory/explanation. The point of limiting the possibility of unobserved confounding is that the true logical statement we want to evaluate is “If my model/theory/explanation is correct and I have correctly controlled for all other factors affecting X and Y, we will observe a correlation between X and Y.” To the extent that we remain unsure about the second part of that antecedent, i.e. to the extent that there remains the possibility for unmeasured confounding, we are unable to falsify the theoretical claim: when we don’t observe the predicted correlation between X and Y we are unsure whether the model is falsified or the researcher has not correctly controlled for other factors. By seeking out settings in which the possibility for unmeasured confounding is restricted, we thus try to render our test as powerful as possible with respect to our theoretical claim.

I think this is an important point with respect to two important audiences.

First, I think it is important with respect to the comparative politics mainstream, or more broadly the part of social science that is not too concerned with causal inference. Clark, Golder and Golder is a very impressive book in many respects but it does not trouble its undergraduate audience much with the kind of hyper-sensitivity to identification that we see in recent work in comparative politics and elsewhere in social sciences. The falsificationist approach they take emphasizes the implications we should observe from theories without emphasizing that these implications should be observed if the theory is correct _and_ the setting matches the assumptions underlying the theory, at least after the researcher is done torturing the data. The scientific method they extol is weak indeed unless we take these assumptions seriously, because no theory will be falsified if we can so easily imagine that the consequent has been denied due to confounding rather than the shortcomings of the theory.

Second, I think it is important with respect to Clarke and Primo’s critique of falsificationism and the role of empirical work in their suggested mode of research. I agree with much of their critique of the way political scientists talk about falsifiable theories and hypothesis tests, and especially with their bottom-line message that models can be useful without being tested and empirical work can be useful without testing models. But their critique of falsificationism as practiced in political science (if I recall correctly – I don’t have the book with me) rests largely on the argument that you can’t test an implication of a model with another model, i.e. that the modeling choices we make in empirical analysis are so extensive that if we deny the consequent we don’t know whether to reject the theoretical model or the empirical model. My point is that the credibility of empirical work varies, and this affects how much you can learn from a hypothesis test. If someone has a model that predicts an effect of X on Y, we learn more about the usefulness of the model if someone does a high-quality RCT measuring the effect of X on Y (and everyone agrees that X and Y have been operationalized as the theory specified, etc) than we do if someone conducts an observational study; similarly, we learn more if someone does an observational study with high covariate balance than we do if someone does an observational study with low covariate balance. In short, I suspect Clarke and Primo insufficiently consider the extent to which the nature of the empirical model affects how much we can learn about the usefulness of a theoretical model by testing an implication of it. This suggests a more substantial role for empirical work than Clarke and Primo seem to envision, but also a continued emphasis on credibility through e.g. designing observational studies to reduce the possibility of unmeasured confounding.

Elster (2000) on institutional solutions for procrastination-prone political institutions

Elster does some interesting speculating in Ulysses Unbound (p. 143) about how delay in the implementation of rules may help political institutions to overcome time inconsistency. In this post I flesh out and extend his ideas a bit.

Suppose that a legislature’s preferences were such that it wanted balanced budgets in the future but not now. From the perspective of year 0, the legislature would like to be able to run a deficit in year 0 and year 1 but not after that; from the perspective of year 1, however, it would like to run a deficit in year 1 and year 2 but not after that. Like Augustine, the legislature wants chastity — but not yet. (One reason for this tendency to procrastinate might be that legislators benefit while in office from kickbacks on government contracts, and the median legislator expects to be in office for two more years.)

The solution Elster suggests (based apparently on work by Tabellini and Alesina from 1994) is to have a rule-making process by which any balanced-budget rule — or modification to that rule — can only take effect after two years. Under that process, the current legislature would pass a balanced budget rule, knowing that it will not constrain the legislature in the near future; a year later, the legislature would want to pass a new law delaying the original law by one year (because legislators anticipate the rule to hurt them when it goes into effect in the next year), but given that any such change would take effect after _two_ years, the legislature chooses to stick with the balanced budget rule.

Elster mentions two kinds of institutional delay that might make this possible: one is delay required to pass an amendment after it is first proposed (delay between proposal and adoption), which is a feature of a number of constitutions (Elster mentions Norway, Sweden, France, Bulgaria, and Finland); the other is delay to implement an amendment once passed. He does not make very clear how these interact, but after thinking about it for a few minutes here’s how it seems to me. In the case of a legislature that can’t get around to balancing its budget (like the one above), two years of the second kind of delay (implementation delay) is necessary to induce the legislature to pass the balanced budget rule; two years of any combination of the first and second kinds of delay (adoption and implementation delay) is sufficient to prevent the legislature from revoking the rule. For example, if the constitution requires two time periods between proposal and passage, with no delay for implementation, then the legislature would never propose an amendment revoking the balanced budget rule, knowing that by doing so it would only be making deficits possible in the future, when it actually wants a balanced budget. But without at least two years of delay for implementation, the rule would not be passed in the first place: without implementation delay, the legislature would choose not to pass the balanced budget rule no matter when it finally comes up to a vote, no matter how long the adoption delay. The most straightforward solution is thus to have two years of delay between passage of any rule (or amendment) and implementation.

Elster (2000) on reasons for self-binding

In Ulysses Unbound (2000), Elster considers situations where an actor would benefit from “self-binding” (constraining one’s own behavior) and devices that are used to accomplish this. In other words, the topic is commitment problems and commitment devices — an important theme in political science research over the past couple of decades.

Before I get to the more political aspects of Elster’s work, I want to explicate his discussion of reasons for self-binding, which helped me to see political commitment problems in a somewhat broader perspective.

In another blog post, I’ve talked about the idea that emotions can provide the corrective to rational self-interest: they impose costs and benefits that make otherwise non-credible threats and promises credible. In most cases, however, the passions are the enemy of self-interest, or at least one conception of self-interest. By passions, Elster refers to “emotions proper” (like anger, shame, fear) but also “states” such as drunkenness, sexual desire, or cravings for addictive drugs. The idea here is that these passions can take over and dominate our behavior in self-destructive ways. The clearest example is “blind anger” that leads someone to lash out in ways that he or she will certainly later regret. The discussion here focuses on clarifying the different ways in which passions can lead to self-destructive behavior, and corresponding attempts to “pre-commit” i.e. take actions that will minimize the self-destructive behavior. For example, if the passion is not too strong, it may be sufficient to take measures that will make the self-destructive behavior more costly, such as bringing one’s wife to a party to prevent oneself from getting too drunk or flirting with coworkers. If the passion is so strong that one practically ignores all other considerations and will act self-destructively no matter the cost, then one may need to take steps to avoid the passion entirely, such as not going to the office party. In I.7 Elster addresses these issues in the context of addiction, which is a particular form of passion (leading to self-destruction), in response to which addicts have developed various commitment strategies, with varying success.

Another key commitment problem discussed in Ulysses Unbound is the time inconsistency produced by hyperbolic discounting. The basic idea here is that actors may discount future payoffs in a way that leads to inconsistent action over time: given the choice between a big payoff in two years and an even bigger one in three years, I may prefer to wait longer for the bigger payoff when I think about it today, but not when I reconsider in a year. (This kind of inconsistency, which apparently helps to explain procrastination and suboptimal saving behavior, is ruled out by the standard exponential discounting but is consistent with hyperbolic discounting.) This creates a conflict within the self: today’s self wants to constrain tomorrow’s self. Although Elster does not emphasize this point, the intertemporal conflict created by hyperbolic discounting is clearly analogous to the conflict caused by passions: discounting-based time inconsistency can be thought of, it seems, as a kind of predictable passion that strikes when payoffs become more immediate.

The last reason for pre-commitment Elster considers is anticipated preference change. The idea is that one can anticipate that one’s preferences will change with time, and that one may want to guard oneself against this happening. In the Russian nobleman example provided (and drawn from Derek Parfit), this happens because the current self feels at odds with the anticipated future self: the politically radical young self anticipates that he will become more conservative in the future, so he may want to fight the future self by e.g. giving his resources to radical political causes before his future self can give those resources to conservative political causes. A slightly different phenomenon is highlighted by the Amish and other cultural groups (Islamists, Confucians) that take steps to prevent preference change by shielding themselves from information about competing lifestyles — what Elster calls “self-paternalism.” These examples differ somewhat in that the current “self” does not seek to undermine the future self, with whom it feels in conflict, but rather the current self and the future self have the same interest in preventing preference changes that presumably would lead to the future self being less happy.

Elster (2000) on emotions as credibility enhancers

As part of my summer reading program, I recently read Jon Elster’s Ulysses Unbound (2000) and will be posting some thoughts on it here. In this first installment I’ll discuss the idea that emotions may provide a form of self-binding that can help to overcome self-interest.

In section I.5, Elster considers provocative work by Frank and Hirschleifer that claims (separately) that emotions like envy, anger, guilt, or honesty “could have evolved because they enhance our ability to make credible threats.” The basic idea here is that in some situations an actor would benefit from being able to make threats, such as the threat to refuse a small offer in an ultimatum game, but that those threats are not credible without the actor feeling anger or another “irrational” emotion. The purpose of some emotions, in this view, is to produce privately-experienced costs and benefits that can allow players to make threats and promises that are otherwise non-credible. As Elster points out, it is not the emotions per se that can help actors overcome commitment problems; rather, it is the reputation for being emotional that does it (i.e. other actors’ knowledge of one’s privately-experienced emotional costs and benefits), and actually experiencing these emotions could be a good way to develop that reputation.

On page 51 Elster makes a nice move in linking ideas about self-interest and morality to Frank and Hirshleifer’s ideas on the evolutionary advantages of the moral emotions. First he clarifies that the emotions Frank and Hirschleifer are inserting into behavior are really standing in for side benefits and side penalties that make a given behavior sustainable in a repeated game with a given payoff structure and discount rate. He then goes on to point out how this is “essentially turning an old argument on its head”:

From Descartes onward it has often been argued that prudence or long-term self-interest can mimic morality. Because morality was thought to be more fragile than prudence, many welcomed the idea that the latter was sufficient for social order. By contrast, if one believes that self-interest is likely to be shortsighted rather than farsighted, the moral emotions might be needed to mimic prudence.

To restate the point somewhat, if we can define a type of behavior that is the “moral course of action” (e.g. to give generously in a dictator game), and we can identify the purely self-interested course of action (e.g. give nothing), then any discrepancy between the two can be bridged by “moral emotions” that the players experience (e.g. a warm glow from giving, or guilt from not giving). This clarification highlights what might be dissatisfying about this work (as reported by Elster), in common with e.g. the classic work on the paradox of voting or even Levi’s invocation of normative values in explaining tax compliance: any apparently paradoxical behavior can be explained by saying that the payoffs have been misjudged. But this is not what Frank and Hirshleifer are doing, presumably: they want to explain the existence of emotions, which are privately experienced costs and benefits provoked by interactions with others, not the paradox of cooperation; their interesting point is that these emotions may serve at least in part to help us develop reputations that make our (self-serving) threats and promises credible.

Levi: “Quasi-Voluntary Compliance”

A key term in Margaret Levi’s Of Rule and Revenue (1988) is “quasi-voluntary compliance.” As far as I can tell, Levi uses this term to refer to a situation in which citizens comply with the state’s demands (e.g. to pay taxes) out of a combination of strategic and normative considerations. The strategic considerations involve the calculation of the probability of being caught and the punishment that would be exacted; the normative considerations include the sense that the bargain between the citizen and state is “fair” — that the citizen believes that the state is providing sufficient public goods in return for his tax payments, and that the burden for the state’s activities falls on the citizenry in an equitable way. Thus Levi wants to emphasize (e.g. on page 53-54) that citizens will base their compliance decisions on their understanding of the government’s enforcement procedures not just because those enforcement procedures affect the citizen’s own probability of being caught, but also because they affect the citizen’s view of whether others will be paying their fair share. (And, citizens care about whether others are paying their fair share not just because this affects the probability that they will be caught but because they place a normative value on fairness and will refuse to pay taxes if they perceive the system to be unfair.)

If this is indeed what Levi means by “quasi-voluntary compliance,” then her explication the concept is disappointingly convoluted. At points Levi presents the idea of quasi-voluntary compliance (QVC) as if it were a fundamentally different conception of compliance from what would come out of a Becker-style costs-and-benefits analysis. Unless I’m missing something, it’s not: it merely adds normative concerns to the citizens’ calculations. This does contribute something important to the analysis of rulers’ strategies: if citizens care about fairness or legitimacy, then rulers must take care in choosing tax rates and enforcement systems that they do not violate citizens’ sense of fairness and thus undermine their revenue goals; rulers trying to maximize revenue should also think about the public good they are providing (and citizens’ perceptions of those goods) because, even though these goods are non-excludable and thus would not be part of a purely materially-motivated citizen’s compliance decision, the government’s output matters to fairness-minded taxpayers. But this contribution would have been a lot clearer to me if it were introduced as an extension and modification of a common costs-and-benefits calculus rather than a wholly different concept of compliance.

Given that the definition of quasi-voluntary compliance (QVC) has to do with the utility function of citizens, in practice it should be hard to tell the difference between QVC and mere compliance (the situation in which people pay their taxes because of coercion pg. 64). The main point Levi makes is that citizens will comply less when they perceive that others are not complying and the state is not delivering promised goods (e.g. pg. 68). This could be because of fairness, but it could also be mere compliance: if others are not complying, I may be able to get away with not complying too because others’ non-compliance tells me that the enforcement system is not working well (and that my own non-compliance may be overlooked because so many others are also not complying). (Also, if the state is not providing public goods, maybe it’s because no one else is paying.) The question is how one would know how much (if at all) these normative considerations really matter. At some points QVC seems to be in effect when coercion is not observed, but of course the most successful coercion is the kind that is implicit. In the introductory chapters Levi does offer one phenomenon that would indeed be predicted by “quasi-voluntary compliance” but not in a non-normative account: “rulers invest in deterrence that constituents perceive as being directed toward others” (pg. 67). The challenge would be to distinguish a) deterrence that is carried out in order to boost tax revenues from those being targeted from b) deterrence that is carried out in order to boost tax revenues from others (and not because those others think “I could be next!”).

On to cases. How does QVC enter into Levi’s analysis of states’ revenue strategies? In discussing the Roman Republic, Levi refers to QVC in her explanation for the use of “tax farming” in remote provinces; her argument seems to be that without the social rewards of high income and citizenship, and without detailed information on citizens, collecting taxes in the usual way would have been too expensive. “Without quasi-voluntary compliance, the agency costs of the census rose significantly” (pg. 80). In early parliaments in France and England, QVC helps us understand the role of parliaments in providing a venue in which the ruler could justify his taxes (“parliamentary consent shrouds a ruler’s policies in legitimacy”, pg. 118). In the introduction of the income tax in 18th-century Britain, Levi sees QVC at work in the attention that was paid to the justifications for the tax and the equitability of its administration. In 20th-century Australia, Levi sees QVC as helping to explain individual taxpayers’ objections to their assessments, as well as tax revolts and labor union campaigns against tax avoidance. I find these claims to be more convincing as we move ahead in time, in part because in the latter cases Levi brings to bear more evidence of debates about justification and enforcement (which would be hard to explain in another way), whereas in e.g. the Roman case one could easily explain the use of tax farming in the provinces without recourse to QVC.

I think the best evidence that citizens’ sense of fairness is an important constraint on taxation is that public debates surrounding taxation focus on both government waste and evasion/progressivity/fairness; if citizens were merely being held up by a predatory state and paying just because they don’t want to be arrested, then they should not care about what the government does with their tax revenues or whether anyone else is similarly being held up. On the other hand, for both kinds of debates one can argue that pure economic explanations are sufficient. When citizens complain about government waste or otherwise complain that they are paying too much for the government services they receive, it could simply be that the citizens legitimately want lower taxes for all of society (see e.g. Meltzer-Richard) or simply for themselves (the standard free-rider prediction). Likewise, when there are debates about how the tax burden is to be shared, it need not be that citizens actually care about fairness: it could be instead that simply want to pass more of the burden on to others. I am intrigued by the idea of building on Levi’s QVC idea to investigate patterns in parliamentary debate over time, but this indeterminacy gives me pause: what speech would persuade a skeptic that the speaker is concerned about fairness and not simply about reducing his own tax bill?

In conclusion, I come away from this book intrigued by the contention that fairness helps to explain the history of state revenue extraction but disappointed in the somewhat muddled way in which the ideas are presented and the lack of evidence for the importance of fairness. I think (and I suspect Levi might agree) that this is a book that would have benefited from an explicit application of game theory, or at least a bit more rigor in specifying e.g. the citizens’ outside option, what non-compliance would mean, etc. The idea that normative considerations matter to citizens in determining whether they will pay their taxes is intriguing and believable, but I don’t see a way to convincingly demonstrate to a skeptic that considerations of fairness are fundamental and not epiphenomenal.


Here is what Peter Schweizer says about Gabe Lenz and Kevin Lim’s paper on the wealth of members of Congress:

One study used a statistical estimator to determine that members of Congress were ‘accumulating wealth about 50% faster than expected’ compared with other Americans.

Is that a fair summary of their research? Here is a quote from the abstract of the paper:

We thus conclude that representatives report accumulating wealth at a rate consistent with similar non-representatives, potentially suggesting that corruption in Congress is not widespread.

Schweizer’s claim is strictly true, in that Gabe and Kevin did reporting using a “statistical estimator” that suggested faster-than-expected wealth accumulation. But they also reported that, based on their analysis, it was the wrong estimator; using a better estimator reversed the findings.

I guess Schweizer stopped reading after the fourth sentence of the abstract, so he simply didn’t realize that by the ninth sentence the paper was completely contradicting the argument of his book.

Political investing in the news

There is suddenly a lot of attention being paid to investing behavior of members of Congress. As Jens and I work on finishing up our two papers on the topic, I am trying to keep up with the public discussion.

Larry Lessig alerted me to this Newsweek/Daily Beast article about a new book by Peter Schweizer called Throw Them All Out (subtitle: “how politicians and their friends get rich off insider stock tips, cronyism, and land deals that would send the rest of us to prison”). I had given a talk at Larry’s weekly seminar in the spring about our (Jens and mine) work on this issue, in which the general message was that members of Congress overall are not very good investors, and that existing investigations of “insider trading” in Congress (and ethics issues more generally) suffer from a general bias toward finding wrongdoing even when the evidence is more ambiguous. So, given that I was saying that things weren’t so bad and Peter Schweizer is now publishing this book saying that things are very bad, Larry asks me, “Is he wrong?”

I have not read the book (it just went on sale today I believe), but here are some thoughts on what I could learn about it from the article, and how it relates to our work on the investments of members of Congress:

a) Our work so far is about average behavior, and not isolated instances of wrongdoing. If there is wrongdoing, it probably is at the level of isolated instances — not everyone in Congress, not all the time. It is perfectly consistent for Congress as a whole to do poorly and for improper trading to be going on, and even for an individual to do poorly overall and to be doing some improper trading. Our work responds in part to an earlier study that showed extremely good average performance, which would really only be possible with widespread wrongdoing; showing poor average performance (as we do) does not prove the absence of wrongdoing.

b) Also, some of the behavior Schweizer is talking about is outside of what we analyze, e.g. options on index funds — our analysis is about equity holdings.

That said:

c) John Kerry may have had some conveniently timed trades, but our analysis suggests he would have done better overall had he invested in an index fund. That doesn’t mean he acted ethically, but it does take some of the edge off of this “politicians get rich while we suffer” narrative.

d) There is deep cherry-picking going on here. You could write a book of completely bone-headed investments that come from the same data. If well-timed trades prove corruption, what do poorly-timed trades prove?

e) The current discussion talks a lot about how Congress has exempted itself from insider trading laws, but I think (not being a securities law expert) that is kind of bogus. They are just as exempt from insider trading laws as I am. It’s simply that the SEC regulations on insider trading apply to information held by corporate insiders, but don’t address other types of information that might be gathered by politicians, academics, journalists, bankers, bloggers, hedge fund managers, and others who are in a position to learn about market developments. It seems like an exaggeration to say (as Schweizer does here) that members of Congress “have legislated themselves as untouchable as a political class.” Also, there are ethical restrictions in both houses of Congress against profiting from your political position. Perhaps these should be enforced more strictly, but this places members of Congress roughly in the same category as journalists, who learn a lot of stuff about the market but are prohibited by self-regulation from profiting from it — except that members of Congress are required to disclose their investments while journalists are not.

Overall, I think the whole story lends itself to the kind of argument Larry makes in his book Republic, Lost — it’s hard to tell whether corruption is going on, but why bother allowing it to seem as if it is? If I were in Congress, I would not be trading stocks: I would own broad index funds of U.S. equities and bonds, and/or have my money in a qualified blind trust. I would also probably vote to require other members of Congress to do the same. I would do these things because I would not trust the public to really figure out whether corruption is occurring or not, and because I don’t think losing the flexibility to play around in the stock market is much of a cost to pay at all. (In fact, overall it would have helped members of Congress, according to our study!) I wish we could count on the public to accurately identify instances of corruption, but I think the rewards to “finding” wrongdoing (and reporting on it) are large enough, and the rewards to arguing otherwise small enough, that the public will generally conclude the worst whether or not there is legitimate cause for concern.

What time is lunch?

When people in London suggest a time for lunch, they suggest 1pm. In the US it would be noon, right? I find that curious.

I suspect this is a case where it’s kind of arbitrary what time you go to lunch, but people have just converged on a standard practice, and that practice is different in the US and the UK. (You would think there would be an incentive to go a little earlier to avoid the crowds, but on the other hand it’s probably useful for remembering lunch dates to just go with the standard time.) This is therefore a case of what social science types call a coordination game, in which there are “multiple equilibria.” If everyone else is going to lunch at 1, you go at 1; if everyone else is going at noon, you go at noon; so once a society has converged on a equilibrium lunchtime, it is hard to shake (even if you started down that route for random reasons).

I have not yet determined whether the workplace calendar is generally shifted back an hour or not. I walked to work at around 8:45 this morning and it seemed like rush hour to me.

Also, I checked, and sunrise and sunset are not generally later here in London than in NYC.

I wonder if there’s an interesting story explaining why London started down the 1pm path and e.g. NYC went with noon. Also, is it the same in other cities in the UK? In Europe?