Day 1 on the Nakasendo

Takasaki post station, on a construction barrier.
Takasaki post station, on a construction barrier.

Takasaki is at the northwestern edge of the Kanto plain, where the Tokyo megalopolis turns into the mountains. I arrived by Shinkansen the evening before my walk would begin. It only takes an hour from Tokyo station, so I clearly could have gotten up early from Tokyo and arrived in Takasaki in time for a day of walking, but I decided to stay over because with jetlag it seemed like I might get more sleep this way. I checked into my hotel (the New Akagi business hotel; shabby and small room, which smelled like cigarette smoke — cheap but otherwise not recommended) and looked for a place for dinner. The place recommended on was closed that night, but I ended up finding a restaurant called IZANAI that served me an excellent dinner. (Not cheap and probably not easy for a non-Japanese speaker to order, but an excellent meal, pictured below.)

My dinner at IZANAI.
My dinner at IZANAI.

The next morning I set out early-ish along the route. My guide for the route, by the way, was this book that I bought from Amazon. There may be a KML or GPX file that one can download that captures the whole route, but I wasn’t able to find it. (My Japanese Google skills may not have been sufficient.) I used the book to create a partial route on Google Maps on my laptop, which I then could view on my phone along the route. (I’ll see about making that publicly available.) It would be challenging for someone who can’t read Japanese to use the book alone, as none of the place names are written in Roman characters, but I suppose if you were in a place and could find the corresponding map, you could look back and forth between google maps and the book to locate the route. Obviously someone should just make the whole route available as a GPX file — wouldn’t be that hard.

Screenshot of my Google Map.

Anyway, my first day of walking was roughly 20 miles from Takasaki to Sakamoto, which is the last post-town before the path heads into the mountains and over the Usui Pass. This was actually more walking than I had planned; I think I had planned to take the Shin’etsu train from Takasaki to Annaka and walk from there, but I felt optimistic and wanted to walk right away. (And then of course how could I get on a train halfway!)


On the road out of Takasaki
On the road out of Takasaki

The walk out of Takasaki was basically pleasant: there is one stretch where the route takes you along Route 18 that looked kind of unappealing, so I walked on an asphalt path closer to the river (see photo below); I don’t think I missed much.

River to the left, Route 18 (and old Nakasendo, apparently) to the right. Mountains ahead, Takasaki behind.
River to the left, Route 18 (and old Nakasendo, apparently) to the right. Mountains ahead, Takasaki behind.

As I walked along the environment became more and more green and the mountains started looming larger; the setting becomes progressively less urban and more rural throughout the day. Also, as I got further from Takasaki the towns became quieter and more run down, but certainly not in an unpleasant way. I saw a lot of old people and a lot of shops that seemed unlikely to sell much; there were a lot of formerly grand-looking buildings that were deserted, decaying and falling apart. These towns are quite proud of their heritage as Nakasendo post towns; I frequently saw information billboards explaining the role of the post-town or describing events that happened nearby in connection to the road. This heritage seemed to be their strongest selling point to outsiders. There are also quite a few temples and shrines along the way, and I suppose that they catered in part to passing travelers.


Mountains near Yokokawa.
Mountains near Yokokawa.
An abandoned house near Yokokawa.
An abandoned house near Yokokawa.

Around mid-morning I found myself walking behind a man roughly in his 60s who was clearly doing the same walk I was. I eventually caught up to him and struck up a conversation. We ended up having a nice time walking together. Fujii-san is a retired chemical engineer who lives in Chiba and is walking the Nakasendo in stages; this was a day walk for him, with about three hours of train on either side. (From this point onward he planned to do the walk in two- or three-day trips, with his wife alongside — the walk from Nihonbashi in Tokyo to the mountains did not interest her, for understandable reasons!) We talked about a range of things, but eventually got around to talking about American politics; as we were walking, Donald Trump and Hillary Clinton were concluding their first debate, and during our lunch break on the steps of a temple we checked on his tablet to see how it had gone. Later in the day Fujii-san started to tire out, and he sent me ahead so he could go a bit slower to his train. We took the selfie below (and I learned the Japanese word for it: jidori) just before parting.

Me and Fujii-san.
Me and Fujii-san.

As I got further into the mountains and entered the post-town of Sakamoto I was surprised to see some monkeys in a garden. I leaned in to take the picture below (note monkey on the edge of the roof), after which the homeowner came out of her back door with some firecrackers to scare away the monkeys, who seemed to having a meal in her fruit trees.

Monkey on the roof.
Monkey on the roof.

In planning this trip I had some trouble figuring out where to stay at the end of the first day. I think there are some nice places in the mountains that are not near the route, and I thought about doing this and arranging a pickup in Sakamoto. There are a bunch of business hotels around Annaka, but that is not really far enough into the mountains to be interesting. (You could take the train back from Yokokawa to Annaka to stay overnight, but I don’t like to backtrack and get into the hassle of a train.) I ended up staying in a “cottage” at an establishment called Kutsurogi no Sato. At the bottom left of the picture below you can see the reception building for this set of cottages; there are six or seven of them scattered around behind it. It didn’t really make much sense to stay there by myself because the smallest cottage has four beds, and they are meant to be self-catering, but I was making arrangements kind of last-minute and I didn’t want to deal with the hassle of arranging a pick-up. I think it would make a lot of sense for a group of people doing this walk as a weekend outing from Tokyo to stay at this place. You do need to bring your food (there does not seem to be a restaurant in walking distance); with advance notice they can provide the materials for a BBQ. I brought some ready-made stuff from 7-11, which was not gourmet or anything, but it got the job done. The best part is that it’s close to the walk, next door to an onsen (though it was closed the day I stayed there), and has a beautiful view.

The view from Kutsurogi no Sato (in Sakamoto). The building on the left is the reception; behind that are the cottages.
The view from Kutsurogi no Sato (in Sakamoto). The building on the left is the reception; behind that are the cottages.
The door of my cottage at Kutsurogi no Sato.
The door of my cottage at Kutsurogi no Sato.

A few days on the Nakasendo, Part 1

I recently spent 10 days in Japan, mostly to visit old friends in Tokyo and Kanazawa. (I also gave a talk at the University of Tokyo.) I took the opportunity to do some walking on the Nakasendo, which is one of the two roads established in the early 17th century to connect Tokyo (Edo) and Kyoto. I spent a lot of time before my trip trying to figure out how to fit this in, and I wanted to share some of my experiences here partly as an aid to anyone else who is thinking about doing a short trip on the Nakasendo. I was able to find some useful blogs describing journeys from Tokyo to Kyoto (or vice versa), such as this guy who did the whole thing in two weeks, this guy from New Zealand who did it over a more reasonable three weeks, and this guy who has great GPS data on the various stages. The insane do-it-in-two-weeks guy also has a short but useful post for people considering a shorter trip, basically saying that the Kiso Valley is the best part. In addition to telling my friends about my trip, I hope that these blog posts might help someone in the future to plan a short excursion to walk the Nakasendo around Karuizawa, which I think is another excellent trip for someone interested in walking and history who has a few days.

A lovely wooded section of the path in the Usui Pass.
A lovely wooded section of the path in the Usui Pass.

Initially I had thought about spending four days walking; I would arrive somewhere from Tokyo, walk for several days, and then continue on to Kanazawa. I developed a plan for going to the Kiso Valley, which everyone seems to agree is the most scenic part of the Nakasendo. (In fact, in some places I found the section involving Tsumago and Magome described as the Nakasendo, ignoring the rest of the route.) My plan roughly was to take a train to Kani one evening (via Nagoya), stay there overnight, walk to the Daikokuya Ryokan (described in the Walking Fool blog), then to Nakatsugawa, then to Minshuku Koiji in Okuwa (described favorably in the Walking Fool blog), and then maybe as far as Kiso-Fukushima before taking a train to Shiojiri and onward through Nagano to Kanazawa. As it turned out, I ended up with really only two and a half days for walking, so I started to explore an alternative approach that would require less transit time.

The plan I actually implemented was to take the train from Tokyo to Takasaki and walk from there to Sakudaira, stopping overnight in Sakamoto and west Karuizawa. This definitely minimized my transit hassle — basically it was shinkansen to Takasaki and shinkansen from Sakudaira to Kanazawa (with a change at Nagano). So it was perfect for my particular constraints. This also makes it a nice itinerary for an outing from Tokyo: you could walk from Annaka to Karuizawa in a very pleasant two days, with one overnight in Sakamoto, or add another overnight and train back from Sakudaira.

An abandoned house near Yokokawa.
An abandoned house near Yokokawa.

In the next posts I’ll describe my walk in more detail, but let me just quickly say a bit more about what I think would be a good weekend outing from Tokyo: Leave Tokyo by train early on day 1 to arrive in Takasaki; from Tokyo Station it takes about an hour. If you are up for 20 miles of walking in a day, walk from there; for an easier day, transfer to the Shin-etsu line to Annaka and walk from there. Stay at Kutsurogi no Sato in Sakamoto if you have a group of 3 or 4 people; otherwise you might want to arrange a pickup near Yokokawa station or train back to Annaka to stay overnight there. On day 2, walk through the mountains over the Usui Pass, arriving at Karuizawa Shinkansen station early afternoon for a train back to Tokyo (with time to do touristy things in Karuizawa or climb up Hanareyama if you’re up for it). Or, for a three day journey, continue on and stay somewhere between Naka-Karuizawa and Oiwake (I stayed at Yuusuge Onsen Ryokan but I wouldn’t recommend it) and walk to Sakudaira Shinkansen station the next day, for a train back to Tokyo. This itinerary does involve some ordinary road-side walking; it is not unremittingly picturesque. And I also prioritized transit simplicity over other goals. But it also involves some lovely paths in the mountains, lots of historical interest, and plenty of places you won’t see on the usual tourist itinerary.

Mountains near Yokokawa.
Mountains near Yokokawa.

Trust in the US federal government

Screen Shot 2016-04-24 at 9.50.29 AM

The figure documents changes over time in ANES respondents’ answers to a question about how often they “can trust the government in Washington to do what is right”. Basically the story seems to be that there are two periods: a high-trust period before Vietnam War and Watergate and a low-trust period afterward: before the late 1960s almost two thirds of respondents said they trust the federal government “most of the time” (and about one in six said they trust the federal government “just about always”); from the mid-1970s we have about 2/3 saying they trust the government “some of the time” and 1/3 saying “most of the time”. Recent peaks and troughs (judging by the proportion saying “most of the time”) roughly track the business cycle, though the Iraq War probably contributed to the large drop between 2002 and 2006.

Why only one treatment at a time?

I remember learning in Gary King’s Gov 2001 class back in the fall of 2004 (!) that we should really only ask about the effect of one cause at a time. That is, having run our analysis we should not expect to be able to infer the effect of more than one variable from it. I remember not really seeing why this was the case: why can’t we run a regression and learn about the effect of more than one of the variables? I thought I would record here some of my current thoughts on this question.

Let me start by noting that we can infer the effects of more than one cause at a time given the right dataset. Suppose we have run a conjoint experiment, i.e. we have conducted a survey in which respondents are asked to compare or evaluate items (political candidates, products, reform proposals) whose attributes are jointly randomly varied. (So, one respondent is asked to choose her preferred candidate between a young female banker and a middle-aged female teacher; another is asked to choose between a young female lawyer and an old male banker; etc.) Given this data, we can run a regression of the outcome (the respondents’ choices) on the attributes they were shown, and the coefficients on the attributes will tell us the effect of each treatment averaged over the joint distribution of the other treatments. (We can also estimate interaction effects, but that is more in the spirit of “subgroup analysis” in designs that are focused on only one treatment.) The randomization of a given attribute makes it possible to straightforwardly measure the effect of that attribute; it shouldn’t be surprising that we can randomize more than one attribute at once and thus simultaneously measure more than one effect at once.

In some very lucky circumstances we could also use standard methods of design-based inference to simultaneously measure the effect of more than one treatment in observational data. You could have an RDD where treatment A depends on whether the forcing variable is above threshold Z1 and treatment B depends on whether the same forcing variable (or another forcing variable) is above threshold Z2; one could use this situation to measure the effect of both A and B, where each effect is conditional on the value of the other treatment at the relevant cutoff. As with the conjoint experiment, you could of course study these effects totally separately, though combining the analysis may make it more efficient.

Actually, we can simultaneously get unbiased estimates of the effects of multiple treatments in any situation where each treatment is a function of a set of covariates X (but not the other treatments) and we can do covariate adjustment for X. This could mean a regression that includes all of the treatments and X (with the proper functional form assumptions); we could also create matched groups for each combination of treatments and measure effects by averaging across groups. Where the treatments are conditionally independent of each other and determined only by observable covariates, then for each of the treatments we can say that the value of the treatment is independent of the potential outcomes (which Angrist & Pischke call the conditional independence assumption, or CIA) and thus the effects can all be estimated in one regression.

So clearly “one treatment at a time” is not an epistemological law. Why might it still be a good guide?

The key point is that although there are circumstances where the CIA is met for more than one treatment at a time (as illustrated above), these circumstances are not common. The more typical situation in observational studies is that the treatments affect each other in some way. When that is the case (for example when treatment T1 affects treatment T2), we must choose a treatment. Why? To estimate the effect of T2, we need to include T1 in the regression unless we have reason to think that T1 has no effect on the outcome (in which case we don’t really care about estimating the two treatment effects anyway). To estimate T1, however, we don’t want to include T2 in the regression because it is “post-treatment” and thus a “bad control” in Angrist & Pischke’s terms. Because T1 affects T2, estimating the effect of T1 precludes estimating the effect of T2; we must choose one of the two effects to estimate.

I think post-treatment bias is probably the context in which Gary King was talking about the “one treatment at a time” rule in 2004, because he was thinking about post-treatment bias around that time. (I’d like to go back and look at that work.) It occurs to me now that the causal ordering rationale for the “one treatment at a time” rule also has a useful connection to the literature on mediation effects by e.g. Kosuke Imai that I would like to flesh out at some point.

Let me restate all of this a bit differently (and a bit more succinctly): We can estimate the effects of a treatment whenever the conditional independence assumption (CIA) is met, but outside of conjoint experiments it is unusual for the CIA to be simultaneously met for more than one treatment. Fundamentally this is because the world is complicated and the treatments we’re interested in typically affect each other. In these situations we might be able to estimate the effects of the various treatments of interest. (It depends on the plausibility of the conditional independence assumption in each case.) But the design will need to be different for each treatment.

But experiments are expensive!

I’ve recently been talking about experiments in a graduate research design course at Oxford. I was a little surprised when students responded by saying, “Right I understand that experiments are great, but how do I pay for one? I’m a graduate student.”

The first response to someone asking this question is that I’m telling you about experiments not just because it would be nice if you did one, but also because thinking about experiments helps you understand the challenges we face in observational studies. For any causal research question, it’s always good to ask what the ideal experiment would be; in design-based observational studies, you follow that up by saying why the ideal experiment is impossible and why the design you’ve chosen is a good approximation.

But I also want to address the practical question: how can you do an experiment when you don’t have a huge grant?

The answer I gave in last week’s lecture is that you can cultivate a relationship with a party organization, or NGO, or government agency, and offer a free evaluation if they are willing to work with you on design. Of course this depends on some luck and possibly some connections, but it’s important to remember that you might be offering something that they would otherwise have to pay a lot of money for.

Afterward I asked Florian Foos, who did a lot of interesting experiments while he was at Oxford as a graduate student, and he offered these observations:

  • Graduate students are usually poor in money, but rich in another resource: time. How much of what a senior researcher would usually pay for can you do yourself, or in collaboration with other students?
  • Not all field experiments need to be expensive: Think for instance of the many audit experiments conducted with legislators by Butler/Broockman etc.
  • If you have a great idea, but lack funding (and a bit of know-how), it might be worth asking a faculty member (this can, but need not be your supervisor) if they would be interested in collaborating on an experiment.
  • Think about what type of outcome you are interested in: Some behavioural outcomes can be collected without expensive post-treatment surveys. Administrative data is usually free of charge or available from government agencies for a small charge.

Also, you could always fake the outcome data.

Gelman and Imbens: “Forward causal inference and reverse causal questions”

One of the traumas of graduate school for me was that my training in statistics re-oriented me away from big Why questions (“Why is there disorder/poverty/dictatorship in some places and not in others?”) toward smaller questions like “Does winning office enrich politicians?” The most influential advice I received (and which I now give) was to focus on the effects of causes rather than the causes of effects, or put differently to ask “What if” rather than “Why”. (I recall Jim Snyder saying in his seminar that your paper title really shouldn’t begin with the word “Why”; I am glad to say that I managed to point out one of his best-known papers is called “Why is there so little money in U.S. politics?”) There are important “what if” questions, but when we think big we usually start with a puzzling phenomenon and ask why it occurs: Why are there lots of political parties/coups/civil wars in some countries and not in others? In fact, before I fell under the influence of people like Don Rubin, Guido Imbens, and Jim Snyder, the main advice I had received for empirical work was to identify a puzzle (an anomaly in a dependent variable) and then explain it in my paper, and many people continue to adhere to that approach. What I and others struggle with is the question of how the “what if” approach relates to the “why” approach. If we want to explain the “why” of a phenomenon (e.g. polarization, conflict, trust), do we do it by cobbling together a bunch of results from “what if” studies? Or should we stay away from “why” questions altogether?

Gelman and Imbens have taken on these issues in a short paper that puts “Why” questions in a common statistical framework with “What if” questions; in the title, the “Why” questions are “reverse causal questions” while the “What if” approach is covered by “forward causal inference”. I think their main contribution is to clarify what we mean when we ask a “Why” question: we mean that there is a relationship between two variables that we would not expect given our (perhaps implicit) statistical model of a particular phenomenon. Thus when we ask a “Why” question, we are pointing out a problem with our statistical model, which should motivate us to improve the model. Using the example of cancer clusters, in which the anomaly is that some geographic areas have unusually high cancer rates, Gelman and Imbens highlight one way to improve the model: add a variable. When we add that variable we might think of it as a cause that we could potentially manipulate (e.g. the presence of a carcinogenic agent) or as a predictor (e.g. the genetic background of the people in the area), but the idea is that we have explained the anomaly (and thus provided an answer to the “why?” question) when the data stops telling us that there is an association we don’t expect.

One of the key points the authors make is that there may be multiple answers to the same “why?” question. What do they mean? My reading was: Continuing with the cancer cluster example, the puzzling association might go away both when we control for the present of a carcinogenic agent and when we control for genetic background; this particular indeterminacy is an issue with statistical power, because the anomaly “goes away” when the data no longer reject the null hypothesis for the particular variable under consideration. There are thus two explanations for the cancer clusters, which may be unsatisfactory but is correct under their interpretation of “Why” questions and how they are resolved.

A related point is that there are multiple ways to improve the model. The authors emphasize the addition of a variable, I think because they want to relate to the causal inference literature (and so the question is whether the variable you add to explain an anomaly can be thought of as a “cause”), but elsewhere in the paper they mention statistical corrections for multiple comparisons (particularly relevant for the cancer cluster example) and the introduction of a new paradigm. I wondered why they don’t discuss the option of accepting that the anomalous variable is a cause (or at least a predictor) of the outcome. Using an example from the paper, this would be like looking at the height and earnings data and concluding that height actually does influence (or at least predict) earnings, which means changing the model to include height (in which case there is no longer an anomaly). I guess the attractiveness of this solution depends on the context, and particularly how strong one’s a priori reasons for ruling out the explanation is based on the science; in the case of cancer clusters, you might be correct in saying that there is no good reason to think that fine-grained location on the earth’s surface actually would affect cancer, and thus that there must be some other environmental cause — even if that cause is something highly related to position on the earth’s surface, such as magnetic fields or soil deposits.

A lingering question here is about the distinction between a predictor and a cause.

Friedman: Capitalism and Freedom

I finally got around to reading Milton Friedman’s classic Capitalism and Freedom. After spending time this summer reading Adam Swift’s Political Philosophy: A Beginner’s Guide, which IIRC recommends Friedman’s discussion of equality, I find Friedman to be unimpressive as a work of political philosophy, but this is partly for two reasons that are not his fault.

First, part of why I found the book unimpressive is that the ideas are so familiar to me. When I first heard the Beatles I thought they sounded just like everyone else; it took me a while to figure out that this was partly because so many others had copied them. Similarly, Friedman’s ideas have been recycled so much — not just in economics departments but in political discourse, mainly from the right — that they hardly seem revolutionary anymore.

Relatedly, what he is doing is applying basic economic analysis to questions of political philosophy. Until recently this was the only kind of political philosophy I had ever really engaged in: explaining the efficiency losses associated with government intervention, identifying market failures that justify government intervention, etc. The core ideas about the proper role of government in this book are applications of standard economic theory, with a healthy portion of enthusiasm about freedom thrown in.

Although Friedman is of course a strident free markets guy and the prefatory material introduces the book as a political tract, I was surprised by how modest Friedman is about the extent to which his philosophy can provide answers to tough political questions. He states this clearly on page 32:

Our principles offer no hard and fast line how far it is appropriate to use government to accomplish jointly what it is difficult or impossible to us to accomplish separately through strictly voluntary exchange. In any particular case of proposed intervention, we must make up a balance sheet, listing separately the advantages and disadvantages. Our principles tell us what items to put on one side and what items on the other and they give us some basis for attaching importance to the different items.

Thus in discussing natural monopolies, he admits we are “choosing among evils” (public monopoly, public regulation, or private monopoly) and provides some thoughts on which might be less evil (hint: often it’s private monopoly); in discussing paternalism, he recognizes that the state must restrict freedom to provide for children and the insane, but that after that “there is no formula to tell us where to stop.” This is I think an accurate view of what a commitment to “freedom,” combined with the tools of welfare analysis from microeconomics, yields in terms of policy proposals: not much. That’s not to say that this book stops short of providing lots of policy proposals. In fact, Capitalism and Freedom is much more interesting as a set of provocative policy proposals than a statement of political philosophy. But the key point is that to arrive at these policy proposals you need more than Friedman’s stated commitment to freedom plus standard ideas from microeconomics about the tradeoffs involved in government intervention in markets. Mostly, you need a lot of beliefs about the nature of the social world, e.g. the degree to which high marginal tax rates encourage tax avoidance and evasion. On a superficial reading one can fail to recognize the importance of these beliefs on empirical matters and read this as a coherent work of philosophy in which the policy prescriptions follow from a commitment to freedom and some basic ideas about how markets work. In fact, the interesting ideas in the book (like the claims about how markets tend to disadvantage those who discriminate) are commitments to contestable causal claims just as much as they are embodiments of a high value placed on freedom, or more so.

Another way to put this is that policy proposals from left, right, and center (in liberal democracies like the US, UK, France) could be justified on the basis of principles in the first two chapters of Capitalism and Freedom. The same of course can be said for other influential groundings of political philosophy, such as the Rawlsian thought experiment about the original position. Clarifying normative values and even proposing ways for prioritizing among them seems to fail to get us very far toward policy recommendations, because in all important cases there is a large set of empirical facts that stand between principles and policy outcomes.

A few notes on things I found interesting:

  • Friedman argues that political freedom requires a market economy because dissent requires resources; in a “socialist” economy (by which he means one in which private property does not exist, or at least where the state controls the means of production), how could one finance a program of political dissent? Where would Marx find his Engels?
  • Like Buchanan and Tullock in The Calculus of Consent (published in the same year — 1962), Friedman has some nice insights into how voluntary economic exchange and government intervention relate. One reason to prefer market activity is that you get “unanimity without conformity,” in the sense that everyone agrees to the outcomes (excluding market failures of course) and you still get a variety of outcomes. Again putting market exchanges in political terms, Friedman portrays market exchange as analogous to proportional representation, in the sense that everyone gets what she votes for, without having to submit to the will of the majority.
  • The chapter on education is a strident case for revising the way in which government supports education. With respect to higher education I find him particularly convincing. The analogy that was relevant when he was writing was the GI Bill, a key feature of which was that the government supported veterans’ education wherever they chose to get it (within an approved list of schools); by contrast, at the university level the individual states support education (particularly of their own residents) only at the public universities in that state. I agree that this does not make a lot of sense, and would favor reform in this area if I didn’t think it would lead to a large reduction in support for education overall. It also made me wonder how much the move toward government loans and grants for education was in response to arguments like these, and to what extent this has replaced public funding for public universities.
  • Friedman makes the case that the voucher system would tend to help economically disadvantaged minorities, in part by unbundling schooling from residential location decisions: a poor single mother who wants to invest in her child’s education may have a better chance under a voucher system, where she could save money and purchase that good like any other, than she does under the current system, in which (assuming that private school is prohibitively expensive) she would have to move the family to an expensive town to benefit from better schools — in other words, buy a whole package of goods in order to get one thing she wants.
  • In the chapter on discrimination, Friedman follows up this discussion of segregation and schooling by highlighting the importance of attitudes of tolerance: In addition to getting the government out of schooling, “we should all of us, insofar as we possibly can, try by behavior and speech to foster the growth of attitudes and opinions that would lead mixed schools to become the rule and segregated schools the rare exception.” In the margin here I wrote “this has happened” — not the part about privatization, but rather that public attitudes have shifted (at least where I live) to where a classroom of white faces is a problem. The avidity with which elite private schools and university pursue diversity suggests that a school system with more choice and competition would not have whiter schools. I somehow doubt however that it would have fewer schools in which almost all students are poor minorities. It makes me want to know more about experiments with school choice. For most of the claims he makes about the virtues of school choice, it would seem that almost everything depends on the way in which you deal with undesirable schools and pupils, and I don’t recall reading anything about that here.

Covariate balance, Mill’s methods, and falsificationism

I presented my paper on polarization and corruption at the recent EPSA conference and encountered what was to me a surprising criticism. Having thought and read about the issues being raised, I want to jot down some ideas that I wish I had been able to say at the time.

First, some background: In the paper, I use variation in political polarization across English constituencies to try to measure the effect of polarization on legislative corruption (in the form of implication in the 2009 expenses scandal). One of the points I make in the paper is that although others have looked at this relationship in cross-country studies, my paper had the advantage that the units being compared were more similar on other dimensions than in the case of the cross-country studies, which means that my study should yield more credible causal inferences.

The criticism I encountered was that in seeking out comparisons where the units are as similar as possible, I was doing something like Mill’s Method of Differences, which had been shown to be valid only under a long list of unattractive assumptions, including that the process being considered be deterministic, monocausal, and without interactions.

Now, in seeking out a setting where the units being compared are as similar as possible in dimensions other than the “treatment,” I thought I was following very standard and basic practice. No one wants omitted variable bias, and it seems very straightforward to me that the way to reduce the possibility of omitted variable bias when you can’t run an experiment is to seek out a setting where covariate balance is higher before any adjustment is done. I think of the search for a setting with reasonable covariate balance as a very intuitive and basic part of the “design-based” approach to causal inference I learned about from Don Rubin and Guido Imbens, but also a key part of scientific inference in all fields for a long time. In response to the criticism I received, I said something like this — pointing out that the critic had also raised the possibility of omitted variable bias and thus should agree with me about the importance of restricting the scope for confounding.

I didn’t know at the time how to respond directly to the claim that I had sinned by partaking of Mill’s methods, but in the course of reviewing a comparative politics textbook (Principles of Comparative Politics, 1st edition (2009), by Clark, Golder, and Golder) I have reacquainted myself with Mill’s methods and I think I see where my critic was coming from — although I still think his criticism was off the mark.

What would it mean to use Mill’s method of differences in my setting? I would start with the observation that MPs in some constituencies were punished for being implicated in the scandal more heavily than others. I would then seek to locate the unique feature that is true of all of the constituencies where MPs were heavily punished and not true of the constituencies where they were not heavily punished. To arrive at the conclusion of my paper (which is that greater ideological distance between the locally-competitive candidates, i.e. (platform) polarization, reduces the degree to which voters punish incumbents for corruption), I would have to establish that all of the places where MPs were heavily punished were less polarized than the places where MPs were lightly punished, and that there was no other factor that systematically varied between the two types of constituencies.

This would clearly be kind of nuts. Electoral punishment is not deterministically affected by polarization, and it is certainly affected by other factors, so we don’t expect all of the more-polarized places to see less punishment than all of the less-polarized places. Also, given the countless things you can measure about an electoral constituency, there is probably some other difference that seems to be related to electoral punishment, but Mill’s method doesn’t tell you what features to focus on and what to ignore. Mill’s method is essentially inductive: you start with the difference you want to explain, and then you consider all of the possible (deterministic, monocausal) explanations until you’re left with just one. This process seems likely to yield an answer only when you have binary outcomes and causes, a small dataset, and the willingness to greatly constrain the possible causes you’re willing to consider. The answer that the methods yields would be suspect for all of the reasons rehearsed in the Clark, Golder and Golder book and the sources they cite.

I am not using Mill’s method of differences. I have a postulated relationship between polarization and electoral punishment, and I am attempting to measure that relationship using observational data. I am choosing to focus on units that are similar in other respects, but I am not doing this in order to inductively arrive at the one difference that must explain a given difference in outcomes; rather, I am focusing on these units because by doing so I reduce the scope for unmeasured confounding.

Clark, Golder, and Golder contrast Mill’s methods with the “scientific method” (a great example of a mainstream political science textbook extolling falsificationism and what Clarke and Primo criticize as the “hypothetico-deductive model”), which they argue is the right way to proceed. The virtue of the scientific method in their presentation is that you can make statements of the kind, “If my model/theory/explanation relating X and Y is correct, we will observe a correlation between X and Y” and then, if we don’t observe a correlation between X and Y, we know we have falsified the model/theory/explanation. The point of limiting the possibility of unobserved confounding is that the true logical statement we want to evaluate is “If my model/theory/explanation is correct and I have correctly controlled for all other factors affecting X and Y, we will observe a correlation between X and Y.” To the extent that we remain unsure about the second part of that antecedent, i.e. to the extent that there remains the possibility for unmeasured confounding, we are unable to falsify the theoretical claim: when we don’t observe the predicted correlation between X and Y we are unsure whether the model is falsified or the researcher has not correctly controlled for other factors. By seeking out settings in which the possibility for unmeasured confounding is restricted, we thus try to render our test as powerful as possible with respect to our theoretical claim.

I think this is an important point with respect to two important audiences.

First, I think it is important with respect to the comparative politics mainstream, or more broadly the part of social science that is not too concerned with causal inference. Clark, Golder and Golder is a very impressive book in many respects but it does not trouble its undergraduate audience much with the kind of hyper-sensitivity to identification that we see in recent work in comparative politics and elsewhere in social sciences. The falsificationist approach they take emphasizes the implications we should observe from theories without emphasizing that these implications should be observed if the theory is correct _and_ the setting matches the assumptions underlying the theory, at least after the researcher is done torturing the data. The scientific method they extol is weak indeed unless we take these assumptions seriously, because no theory will be falsified if we can so easily imagine that the consequent has been denied due to confounding rather than the shortcomings of the theory.

Second, I think it is important with respect to Clarke and Primo’s critique of falsificationism and the role of empirical work in their suggested mode of research. I agree with much of their critique of the way political scientists talk about falsifiable theories and hypothesis tests, and especially with their bottom-line message that models can be useful without being tested and empirical work can be useful without testing models. But their critique of falsificationism as practiced in political science (if I recall correctly – I don’t have the book with me) rests largely on the argument that you can’t test an implication of a model with another model, i.e. that the modeling choices we make in empirical analysis are so extensive that if we deny the consequent we don’t know whether to reject the theoretical model or the empirical model. My point is that the credibility of empirical work varies, and this affects how much you can learn from a hypothesis test. If someone has a model that predicts an effect of X on Y, we learn more about the usefulness of the model if someone does a high-quality RCT measuring the effect of X on Y (and everyone agrees that X and Y have been operationalized as the theory specified, etc) than we do if someone conducts an observational study; similarly, we learn more if someone does an observational study with high covariate balance than we do if someone does an observational study with low covariate balance. In short, I suspect Clarke and Primo insufficiently consider the extent to which the nature of the empirical model affects how much we can learn about the usefulness of a theoretical model by testing an implication of it. This suggests a more substantial role for empirical work than Clarke and Primo seem to envision, but also a continued emphasis on credibility through e.g. designing observational studies to reduce the possibility of unmeasured confounding.

Elster (2000) on reasons for self-binding

In Ulysses Unbound (2000), Elster considers situations where an actor would benefit from “self-binding” (constraining one’s own behavior) and devices that are used to accomplish this. In other words, the topic is commitment problems and commitment devices — an important theme in political science research over the past couple of decades.

Before I get to the more political aspects of Elster’s work, I want to explicate his discussion of reasons for self-binding, which helped me to see political commitment problems in a somewhat broader perspective.

In another blog post, I’ve talked about the idea that emotions can provide the corrective to rational self-interest: they impose costs and benefits that make otherwise non-credible threats and promises credible. In most cases, however, the passions are the enemy of self-interest, or at least one conception of self-interest. By passions, Elster refers to “emotions proper” (like anger, shame, fear) but also “states” such as drunkenness, sexual desire, or cravings for addictive drugs. The idea here is that these passions can take over and dominate our behavior in self-destructive ways. The clearest example is “blind anger” that leads someone to lash out in ways that he or she will certainly later regret. The discussion here focuses on clarifying the different ways in which passions can lead to self-destructive behavior, and corresponding attempts to “pre-commit” i.e. take actions that will minimize the self-destructive behavior. For example, if the passion is not too strong, it may be sufficient to take measures that will make the self-destructive behavior more costly, such as bringing one’s wife to a party to prevent oneself from getting too drunk or flirting with coworkers. If the passion is so strong that one practically ignores all other considerations and will act self-destructively no matter the cost, then one may need to take steps to avoid the passion entirely, such as not going to the office party. In I.7 Elster addresses these issues in the context of addiction, which is a particular form of passion (leading to self-destruction), in response to which addicts have developed various commitment strategies, with varying success.

Another key commitment problem discussed in Ulysses Unbound is the time inconsistency produced by hyperbolic discounting. The basic idea here is that actors may discount future payoffs in a way that leads to inconsistent action over time: given the choice between a big payoff in two years and an even bigger one in three years, I may prefer to wait longer for the bigger payoff when I think about it today, but not when I reconsider in a year. (This kind of inconsistency, which apparently helps to explain procrastination and suboptimal saving behavior, is ruled out by the standard exponential discounting but is consistent with hyperbolic discounting.) This creates a conflict within the self: today’s self wants to constrain tomorrow’s self. Although Elster does not emphasize this point, the intertemporal conflict created by hyperbolic discounting is clearly analogous to the conflict caused by passions: discounting-based time inconsistency can be thought of, it seems, as a kind of predictable passion that strikes when payoffs become more immediate.

The last reason for pre-commitment Elster considers is anticipated preference change. The idea is that one can anticipate that one’s preferences will change with time, and that one may want to guard oneself against this happening. In the Russian nobleman example provided (and drawn from Derek Parfit), this happens because the current self feels at odds with the anticipated future self: the politically radical young self anticipates that he will become more conservative in the future, so he may want to fight the future self by e.g. giving his resources to radical political causes before his future self can give those resources to conservative political causes. A slightly different phenomenon is highlighted by the Amish and other cultural groups (Islamists, Confucians) that take steps to prevent preference change by shielding themselves from information about competing lifestyles — what Elster calls “self-paternalism.” These examples differ somewhat in that the current “self” does not seek to undermine the future self, with whom it feels in conflict, but rather the current self and the future self have the same interest in preventing preference changes that presumably would lead to the future self being less happy.

Elster (2000) on emotions as credibility enhancers

As part of my summer reading program, I recently read Jon Elster’s Ulysses Unbound (2000) and will be posting some thoughts on it here. In this first installment I’ll discuss the idea that emotions may provide a form of self-binding that can help to overcome self-interest.

In section I.5, Elster considers provocative work by Frank and Hirschleifer that claims (separately) that emotions like envy, anger, guilt, or honesty “could have evolved because they enhance our ability to make credible threats.” The basic idea here is that in some situations an actor would benefit from being able to make threats, such as the threat to refuse a small offer in an ultimatum game, but that those threats are not credible without the actor feeling anger or another “irrational” emotion. The purpose of some emotions, in this view, is to produce privately-experienced costs and benefits that can allow players to make threats and promises that are otherwise non-credible. As Elster points out, it is not the emotions per se that can help actors overcome commitment problems; rather, it is the reputation for being emotional that does it (i.e. other actors’ knowledge of one’s privately-experienced emotional costs and benefits), and actually experiencing these emotions could be a good way to develop that reputation.

On page 51 Elster makes a nice move in linking ideas about self-interest and morality to Frank and Hirshleifer’s ideas on the evolutionary advantages of the moral emotions. First he clarifies that the emotions Frank and Hirschleifer are inserting into behavior are really standing in for side benefits and side penalties that make a given behavior sustainable in a repeated game with a given payoff structure and discount rate. He then goes on to point out how this is “essentially turning an old argument on its head”:

From Descartes onward it has often been argued that prudence or long-term self-interest can mimic morality. Because morality was thought to be more fragile than prudence, many welcomed the idea that the latter was sufficient for social order. By contrast, if one believes that self-interest is likely to be shortsighted rather than farsighted, the moral emotions might be needed to mimic prudence.

To restate the point somewhat, if we can define a type of behavior that is the “moral course of action” (e.g. to give generously in a dictator game), and we can identify the purely self-interested course of action (e.g. give nothing), then any discrepancy between the two can be bridged by “moral emotions” that the players experience (e.g. a warm glow from giving, or guilt from not giving). This clarification highlights what might be dissatisfying about this work (as reported by Elster), in common with e.g. the classic work on the paradox of voting or even Levi’s invocation of normative values in explaining tax compliance: any apparently paradoxical behavior can be explained by saying that the payoffs have been misjudged. But this is not what Frank and Hirshleifer are doing, presumably: they want to explain the existence of emotions, which are privately experienced costs and benefits provoked by interactions with others, not the paradox of cooperation; their interesting point is that these emotions may serve at least in part to help us develop reputations that make our (self-serving) threats and promises credible.