Covariate balance, Mill’s methods, and falsificationism

I presented my paper on polarization and corruption at the recent EPSA conference and encountered what was to me a surprising criticism. Having thought and read about the issues being raised, I want to jot down some ideas that I wish I had been able to say at the time.

First, some background: In the paper, I use variation in political polarization across English constituencies to try to measure the effect of polarization on legislative corruption (in the form of implication in the 2009 expenses scandal). One of the points I make in the paper is that although others have looked at this relationship in cross-country studies, my paper had the advantage that the units being compared were more similar on other dimensions than in the case of the cross-country studies, which means that my study should yield more credible causal inferences.

The criticism I encountered was that in seeking out comparisons where the units are as similar as possible, I was doing something like Mill’s Method of Differences, which had been shown to be valid only under a long list of unattractive assumptions, including that the process being considered be deterministic, monocausal, and without interactions.

Now, in seeking out a setting where the units being compared are as similar as possible in dimensions other than the “treatment,” I thought I was following very standard and basic practice. No one wants omitted variable bias, and it seems very straightforward to me that the way to reduce the possibility of omitted variable bias when you can’t run an experiment is to seek out a setting where covariate balance is higher before any adjustment is done. I think of the search for a setting with reasonable covariate balance as a very intuitive and basic part of the “design-based” approach to causal inference I learned about from Don Rubin and Guido Imbens, but also a key part of scientific inference in all fields for a long time. In response to the criticism I received, I said something like this — pointing out that the critic had also raised the possibility of omitted variable bias and thus should agree with me about the importance of restricting the scope for confounding.

I didn’t know at the time how to respond directly to the claim that I had sinned by partaking of Mill’s methods, but in the course of reviewing a comparative politics textbook (Principles of Comparative Politics, 1st edition (2009), by Clark, Golder, and Golder) I have reacquainted myself with Mill’s methods and I think I see where my critic was coming from — although I still think his criticism was off the mark.

What would it mean to use Mill’s method of differences in my setting? I would start with the observation that MPs in some constituencies were punished for being implicated in the scandal more heavily than others. I would then seek to locate the unique feature that is true of all of the constituencies where MPs were heavily punished and not true of the constituencies where they were not heavily punished. To arrive at the conclusion of my paper (which is that greater ideological distance between the locally-competitive candidates, i.e. (platform) polarization, reduces the degree to which voters punish incumbents for corruption), I would have to establish that all of the places where MPs were heavily punished were less polarized than the places where MPs were lightly punished, and that there was no other factor that systematically varied between the two types of constituencies.

This would clearly be kind of nuts. Electoral punishment is not deterministically affected by polarization, and it is certainly affected by other factors, so we don’t expect all of the more-polarized places to see less punishment than all of the less-polarized places. Also, given the countless things you can measure about an electoral constituency, there is probably some other difference that seems to be related to electoral punishment, but Mill’s method doesn’t tell you what features to focus on and what to ignore. Mill’s method is essentially inductive: you start with the difference you want to explain, and then you consider all of the possible (deterministic, monocausal) explanations until you’re left with just one. This process seems likely to yield an answer only when you have binary outcomes and causes, a small dataset, and the willingness to greatly constrain the possible causes you’re willing to consider. The answer that the methods yields would be suspect for all of the reasons rehearsed in the Clark, Golder and Golder book and the sources they cite.

I am not using Mill’s method of differences. I have a postulated relationship between polarization and electoral punishment, and I am attempting to measure that relationship using observational data. I am choosing to focus on units that are similar in other respects, but I am not doing this in order to inductively arrive at the one difference that must explain a given difference in outcomes; rather, I am focusing on these units because by doing so I reduce the scope for unmeasured confounding.

Clark, Golder, and Golder contrast Mill’s methods with the “scientific method” (a great example of a mainstream political science textbook extolling falsificationism and what Clarke and Primo criticize as the “hypothetico-deductive model”), which they argue is the right way to proceed. The virtue of the scientific method in their presentation is that you can make statements of the kind, “If my model/theory/explanation relating X and Y is correct, we will observe a correlation between X and Y” and then, if we don’t observe a correlation between X and Y, we know we have falsified the model/theory/explanation. The point of limiting the possibility of unobserved confounding is that the true logical statement we want to evaluate is “If my model/theory/explanation is correct and I have correctly controlled for all other factors affecting X and Y, we will observe a correlation between X and Y.” To the extent that we remain unsure about the second part of that antecedent, i.e. to the extent that there remains the possibility for unmeasured confounding, we are unable to falsify the theoretical claim: when we don’t observe the predicted correlation between X and Y we are unsure whether the model is falsified or the researcher has not correctly controlled for other factors. By seeking out settings in which the possibility for unmeasured confounding is restricted, we thus try to render our test as powerful as possible with respect to our theoretical claim.

I think this is an important point with respect to two important audiences.

First, I think it is important with respect to the comparative politics mainstream, or more broadly the part of social science that is not too concerned with causal inference. Clark, Golder and Golder is a very impressive book in many respects but it does not trouble its undergraduate audience much with the kind of hyper-sensitivity to identification that we see in recent work in comparative politics and elsewhere in social sciences. The falsificationist approach they take emphasizes the implications we should observe from theories without emphasizing that these implications should be observed if the theory is correct _and_ the setting matches the assumptions underlying the theory, at least after the researcher is done torturing the data. The scientific method they extol is weak indeed unless we take these assumptions seriously, because no theory will be falsified if we can so easily imagine that the consequent has been denied due to confounding rather than the shortcomings of the theory.

Second, I think it is important with respect to Clarke and Primo’s critique of falsificationism and the role of empirical work in their suggested mode of research. I agree with much of their critique of the way political scientists talk about falsifiable theories and hypothesis tests, and especially with their bottom-line message that models can be useful without being tested and empirical work can be useful without testing models. But their critique of falsificationism as practiced in political science (if I recall correctly – I don’t have the book with me) rests largely on the argument that you can’t test an implication of a model with another model, i.e. that the modeling choices we make in empirical analysis are so extensive that if we deny the consequent we don’t know whether to reject the theoretical model or the empirical model. My point is that the credibility of empirical work varies, and this affects how much you can learn from a hypothesis test. If someone has a model that predicts an effect of X on Y, we learn more about the usefulness of the model if someone does a high-quality RCT measuring the effect of X on Y (and everyone agrees that X and Y have been operationalized as the theory specified, etc) than we do if someone conducts an observational study; similarly, we learn more if someone does an observational study with high covariate balance than we do if someone does an observational study with low covariate balance. In short, I suspect Clarke and Primo insufficiently consider the extent to which the nature of the empirical model affects how much we can learn about the usefulness of a theoretical model by testing an implication of it. This suggests a more substantial role for empirical work than Clarke and Primo seem to envision, but also a continued emphasis on credibility through e.g. designing observational studies to reduce the possibility of unmeasured confounding.

Elster (2000) on reasons for self-binding

In Ulysses Unbound (2000), Elster considers situations where an actor would benefit from “self-binding” (constraining one’s own behavior) and devices that are used to accomplish this. In other words, the topic is commitment problems and commitment devices — an important theme in political science research over the past couple of decades.

Before I get to the more political aspects of Elster’s work, I want to explicate his discussion of reasons for self-binding, which helped me to see political commitment problems in a somewhat broader perspective.

In another blog post, I’ve talked about the idea that emotions can provide the corrective to rational self-interest: they impose costs and benefits that make otherwise non-credible threats and promises credible. In most cases, however, the passions are the enemy of self-interest, or at least one conception of self-interest. By passions, Elster refers to “emotions proper” (like anger, shame, fear) but also “states” such as drunkenness, sexual desire, or cravings for addictive drugs. The idea here is that these passions can take over and dominate our behavior in self-destructive ways. The clearest example is “blind anger” that leads someone to lash out in ways that he or she will certainly later regret. The discussion here focuses on clarifying the different ways in which passions can lead to self-destructive behavior, and corresponding attempts to “pre-commit” i.e. take actions that will minimize the self-destructive behavior. For example, if the passion is not too strong, it may be sufficient to take measures that will make the self-destructive behavior more costly, such as bringing one’s wife to a party to prevent oneself from getting too drunk or flirting with coworkers. If the passion is so strong that one practically ignores all other considerations and will act self-destructively no matter the cost, then one may need to take steps to avoid the passion entirely, such as not going to the office party. In I.7 Elster addresses these issues in the context of addiction, which is a particular form of passion (leading to self-destruction), in response to which addicts have developed various commitment strategies, with varying success.

Another key commitment problem discussed in Ulysses Unbound is the time inconsistency produced by hyperbolic discounting. The basic idea here is that actors may discount future payoffs in a way that leads to inconsistent action over time: given the choice between a big payoff in two years and an even bigger one in three years, I may prefer to wait longer for the bigger payoff when I think about it today, but not when I reconsider in a year. (This kind of inconsistency, which apparently helps to explain procrastination and suboptimal saving behavior, is ruled out by the standard exponential discounting but is consistent with hyperbolic discounting.) This creates a conflict within the self: today’s self wants to constrain tomorrow’s self. Although Elster does not emphasize this point, the intertemporal conflict created by hyperbolic discounting is clearly analogous to the conflict caused by passions: discounting-based time inconsistency can be thought of, it seems, as a kind of predictable passion that strikes when payoffs become more immediate.

The last reason for pre-commitment Elster considers is anticipated preference change. The idea is that one can anticipate that one’s preferences will change with time, and that one may want to guard oneself against this happening. In the Russian nobleman example provided (and drawn from Derek Parfit), this happens because the current self feels at odds with the anticipated future self: the politically radical young self anticipates that he will become more conservative in the future, so he may want to fight the future self by e.g. giving his resources to radical political causes before his future self can give those resources to conservative political causes. A slightly different phenomenon is highlighted by the Amish and other cultural groups (Islamists, Confucians) that take steps to prevent preference change by shielding themselves from information about competing lifestyles — what Elster calls “self-paternalism.” These examples differ somewhat in that the current “self” does not seek to undermine the future self, with whom it feels in conflict, but rather the current self and the future self have the same interest in preventing preference changes that presumably would lead to the future self being less happy.

Elster (2000) on emotions as credibility enhancers

As part of my summer reading program, I recently read Jon Elster’s Ulysses Unbound (2000) and will be posting some thoughts on it here. In this first installment I’ll discuss the idea that emotions may provide a form of self-binding that can help to overcome self-interest.

In section I.5, Elster considers provocative work by Frank and Hirschleifer that claims (separately) that emotions like envy, anger, guilt, or honesty “could have evolved because they enhance our ability to make credible threats.” The basic idea here is that in some situations an actor would benefit from being able to make threats, such as the threat to refuse a small offer in an ultimatum game, but that those threats are not credible without the actor feeling anger or another “irrational” emotion. The purpose of some emotions, in this view, is to produce privately-experienced costs and benefits that can allow players to make threats and promises that are otherwise non-credible. As Elster points out, it is not the emotions per se that can help actors overcome commitment problems; rather, it is the reputation for being emotional that does it (i.e. other actors’ knowledge of one’s privately-experienced emotional costs and benefits), and actually experiencing these emotions could be a good way to develop that reputation.

On page 51 Elster makes a nice move in linking ideas about self-interest and morality to Frank and Hirshleifer’s ideas on the evolutionary advantages of the moral emotions. First he clarifies that the emotions Frank and Hirschleifer are inserting into behavior are really standing in for side benefits and side penalties that make a given behavior sustainable in a repeated game with a given payoff structure and discount rate. He then goes on to point out how this is “essentially turning an old argument on its head”:

From Descartes onward it has often been argued that prudence or long-term self-interest can mimic morality. Because morality was thought to be more fragile than prudence, many welcomed the idea that the latter was sufficient for social order. By contrast, if one believes that self-interest is likely to be shortsighted rather than farsighted, the moral emotions might be needed to mimic prudence.

To restate the point somewhat, if we can define a type of behavior that is the “moral course of action” (e.g. to give generously in a dictator game), and we can identify the purely self-interested course of action (e.g. give nothing), then any discrepancy between the two can be bridged by “moral emotions” that the players experience (e.g. a warm glow from giving, or guilt from not giving). This clarification highlights what might be dissatisfying about this work (as reported by Elster), in common with e.g. the classic work on the paradox of voting or even Levi’s invocation of normative values in explaining tax compliance: any apparently paradoxical behavior can be explained by saying that the payoffs have been misjudged. But this is not what Frank and Hirshleifer are doing, presumably: they want to explain the existence of emotions, which are privately experienced costs and benefits provoked by interactions with others, not the paradox of cooperation; their interesting point is that these emotions may serve at least in part to help us develop reputations that make our (self-serving) threats and promises credible.


Here is what Peter Schweizer says about Gabe Lenz and Kevin Lim’s paper on the wealth of members of Congress:

One study used a statistical estimator to determine that members of Congress were ‘accumulating wealth about 50% faster than expected’ compared with other Americans.

Is that a fair summary of their research? Here is a quote from the abstract of the paper:

We thus conclude that representatives report accumulating wealth at a rate consistent with similar non-representatives, potentially suggesting that corruption in Congress is not widespread.

Schweizer’s claim is strictly true, in that Gabe and Kevin did reporting using a “statistical estimator” that suggested faster-than-expected wealth accumulation. But they also reported that, based on their analysis, it was the wrong estimator; using a better estimator reversed the findings.

I guess Schweizer stopped reading after the fourth sentence of the abstract, so he simply didn’t realize that by the ninth sentence the paper was completely contradicting the argument of his book.

Political investing in the news

There is suddenly a lot of attention being paid to investing behavior of members of Congress. As Jens and I work on finishing up our two papers on the topic, I am trying to keep up with the public discussion.

Larry Lessig alerted me to this Newsweek/Daily Beast article about a new book by Peter Schweizer called Throw Them All Out (subtitle: “how politicians and their friends get rich off insider stock tips, cronyism, and land deals that would send the rest of us to prison”). I had given a talk at Larry’s weekly seminar in the spring about our (Jens and mine) work on this issue, in which the general message was that members of Congress overall are not very good investors, and that existing investigations of “insider trading” in Congress (and ethics issues more generally) suffer from a general bias toward finding wrongdoing even when the evidence is more ambiguous. So, given that I was saying that things weren’t so bad and Peter Schweizer is now publishing this book saying that things are very bad, Larry asks me, “Is he wrong?”

I have not read the book (it just went on sale today I believe), but here are some thoughts on what I could learn about it from the article, and how it relates to our work on the investments of members of Congress:

a) Our work so far is about average behavior, and not isolated instances of wrongdoing. If there is wrongdoing, it probably is at the level of isolated instances — not everyone in Congress, not all the time. It is perfectly consistent for Congress as a whole to do poorly and for improper trading to be going on, and even for an individual to do poorly overall and to be doing some improper trading. Our work responds in part to an earlier study that showed extremely good average performance, which would really only be possible with widespread wrongdoing; showing poor average performance (as we do) does not prove the absence of wrongdoing.

b) Also, some of the behavior Schweizer is talking about is outside of what we analyze, e.g. options on index funds — our analysis is about equity holdings.

That said:

c) John Kerry may have had some conveniently timed trades, but our analysis suggests he would have done better overall had he invested in an index fund. That doesn’t mean he acted ethically, but it does take some of the edge off of this “politicians get rich while we suffer” narrative.

d) There is deep cherry-picking going on here. You could write a book of completely bone-headed investments that come from the same data. If well-timed trades prove corruption, what do poorly-timed trades prove?

e) The current discussion talks a lot about how Congress has exempted itself from insider trading laws, but I think (not being a securities law expert) that is kind of bogus. They are just as exempt from insider trading laws as I am. It’s simply that the SEC regulations on insider trading apply to information held by corporate insiders, but don’t address other types of information that might be gathered by politicians, academics, journalists, bankers, bloggers, hedge fund managers, and others who are in a position to learn about market developments. It seems like an exaggeration to say (as Schweizer does here) that members of Congress “have legislated themselves as untouchable as a political class.” Also, there are ethical restrictions in both houses of Congress against profiting from your political position. Perhaps these should be enforced more strictly, but this places members of Congress roughly in the same category as journalists, who learn a lot of stuff about the market but are prohibited by self-regulation from profiting from it — except that members of Congress are required to disclose their investments while journalists are not.

Overall, I think the whole story lends itself to the kind of argument Larry makes in his book Republic, Lost — it’s hard to tell whether corruption is going on, but why bother allowing it to seem as if it is? If I were in Congress, I would not be trading stocks: I would own broad index funds of U.S. equities and bonds, and/or have my money in a qualified blind trust. I would also probably vote to require other members of Congress to do the same. I would do these things because I would not trust the public to really figure out whether corruption is occurring or not, and because I don’t think losing the flexibility to play around in the stock market is much of a cost to pay at all. (In fact, overall it would have helped members of Congress, according to our study!) I wish we could count on the public to accurately identify instances of corruption, but I think the rewards to “finding” wrongdoing (and reporting on it) are large enough, and the rewards to arguing otherwise small enough, that the public will generally conclude the worst whether or not there is legitimate cause for concern.

What time is lunch?

When people in London suggest a time for lunch, they suggest 1pm. In the US it would be noon, right? I find that curious.

I suspect this is a case where it’s kind of arbitrary what time you go to lunch, but people have just converged on a standard practice, and that practice is different in the US and the UK. (You would think there would be an incentive to go a little earlier to avoid the crowds, but on the other hand it’s probably useful for remembering lunch dates to just go with the standard time.) This is therefore a case of what social science types call a coordination game, in which there are “multiple equilibria.” If everyone else is going to lunch at 1, you go at 1; if everyone else is going at noon, you go at noon; so once a society has converged on a equilibrium lunchtime, it is hard to shake (even if you started down that route for random reasons).

I have not yet determined whether the workplace calendar is generally shifted back an hour or not. I walked to work at around 8:45 this morning and it seemed like rush hour to me.

Also, I checked, and sunrise and sunset are not generally later here in London than in NYC.

I wonder if there’s an interesting story explaining why London started down the 1pm path and e.g. NYC went with noon. Also, is it the same in other cities in the UK? In Europe?

Kahn and Kotchen on unemployment and environmental concern

Matthew Kahn (a teacher of mine during my MA at Tufts) and Matthew Kotchen have an interesting sounding paper showing that people appear to be less concerned about the environment when the economy is doing worse. Fewer people search Google for “global warming” and fewer survey respondents say they think global warming is occurring when their state’s unemployment rate is higher. (This is with state and month-year fixed effects, meaning that the difference is not just capturing over-time changes in attitudes or stable geographical differences between people in richer and poorer states.)

Some of the results, like the one about Google search terms or another finding about people’s responses to a “most important problem question,” are consistent with the idea that economic concerns crowd out environmental concerns. But the fact that survey respondents say global warming is not happening when their local economy is doing poorly says something different: it suggests that economic problems do not simply change people’s priorities, they also change their views. (Or that, when someone’s priorities are changed, his or her views adjust to become consistent with those priorities: if I don’t spend much time worrying about the environment, the problem must not be happening.) (Sorry: or that it takes time to learn that global warming is happening, and people don’t have that time when they are worried about the environment.)

Started at LSE

After last year’s very enjoyable post-doc at Yale’s Leitner Center, I have shipped off to the LSE to start as a Lecturer (asst. prof, in US terms) teaching in the MPA program. I am still settling into my office in Connaught House, and still working out housing for next month, but so far I am really enjoying both the city and the school. I met some of the students last week at the introductory session for MPA first-years and I was extremely impressed with their sharpness and the variety of interesting experiences they bring to London.

More soon — I’m going to try to do some more writing here.

Stock trading project written up in Bloomberg/BusinessWeek

The project on the stock portfolios of members of Congress that Jens and I have been working on for over two years is finally almost done, and now it has been written up by Bloomberg. The paper is tentatively titled “Political Capital: The (Mostly) Mediocre Performance of Congressional Stock Portfolios, 2004-2008”. The writeup looks pretty accurate to me. I was curious what aspect the media would end up focusing on, given that the story is kind of subtle and doesn’t play directly into a corruption narrative. In this case the writer chose to focus on the local premium while telling the rest of the story further down.

(Finally) set up on new MacBook Air

I am fully up to speed now with my new MacBook Air.

I’ve had it now for a little over a week, and yesterday finished installing my rails environment and downloading the databases I have been working with. (For setting up the Rails environment on Snow Leopard, I recommend this guide.)

I love this computer. Above all I love how light and sleek it is: this weekend I went to NYC for a bachelor party and a baby shower and brought only my violin case — with my MacBook Air, a change of clothes, and a toothbrush slipped in the space where you can store sheet music. I just love that efficiency.

I also really like the screen resolution, the way the computer starts up and shuts down very quickly, the way it makes basically no noise (no moving parts!) and does not get hot. I still have my early 2008 15″ MacBook Pro around, and it’s funny how it feels so big and clunky. The old screen does look massive now that I’m accustomed to the 13″ MBA, but somehow I don’t seem to miss the extra space, and I certainly love how portable the new machine is.

So — now that I’m really set up it’s back to work.