Bringing Experimentation Into The Fold: Making Randomized Controlled Trials A Part Of The Broader Development Project

I recently wrote up a paper on the need for integration of evidence from experiments and RCTs into the broader developmental literature. After spending an entire quarter reading the purportedly large-scale literature on growth, development and institutions, I am more convinced of this need than ever. The following is the text from the paper sans the brief mathematical appendix in which I explicated the simple math of RCTs. Put this omission down to formatting and integration issues between Word and WordPress: I’m working on correcting that as soon as possible, although I still have one final to go. As a final note, understand that this paper is far from my best effort (I was a bit rushed due to my MA requirements) and I’ll probably spend some more time on it later in the summer. Feedback, as always, is greatly appreciated.

BRINGING EXPERIMENTATION INTO THE FOLD: MAKING RANDOMIZED CONTROLLED TRIALS A PART OF THE BROADER DEVELOPMENT PROJECT

Macro-developmental economists have for long relegated the experimentalist school to the sphere of micro-level questions. Randomized controlled trials (RCTs) are presumed to hold little value for the big questions of growth and development, due to the narrowness of scope putatively inherent to experiments. I examine the tension between the experimentalist school and its critics, with specific reference to concerns of external validity. In addressing some of the concerns over RCTs, I also make the case that field experiments could yield results of considerable value for macro policy-makers. While RCTs remain unable to address certain issues in the development sphere, they are able to shed light on informal institutions, incentive structures and institutional change in the developing world.

KEYWORDS: Randomized controlled trials, field experiments, economic development, growth.

1. Introduction

Development economics has traditionally—and particularly over the last thirty years or so—been divided into two broad camps. Rodrik (2008) labels these camps the ­macro-development and the micro-development economists, a distinction that Banerjee (2008) also makes as one between growth/macro policy and development economists. While there are a number of registers along which the two groups can be differentiated (such as in policy focus and policy prescriptions), the most fundamental difference seems to be a methodological one. Where growth policy economists have largely focused on observational data and post hoc analysis, micro-development ones are distinguished by their use of randomized control trials, or field experiments (see Banerjee 2002, Duflo and Kremer 2004, Duflo 2006 and List 2007 for some overviews of the literature in this regard). With a focus on empirical testing of policy ideas, field experiments allow development economists to isolate the effects of specific policy interventions. While the methodology is not a novel one (List 2007), the apparent disjunction between those that Angus Deaton disparagingly calls the “randomistas” (Deaton 2009), and more traditional macro-development economists, begs the question: what lessons do field experiments hold for broader observational studies, and vice versa?

In this essay, I focus on one half of the question posed above: specifically, I examine the lessons that field experiments could hold for macro policymakers. The other half of the puzzle—lessons for micro-development economists from the macro literature—is left unexamined for a number of reasons. To a certain extent, practical constraints dictate this more focused approach. More substantively, as Banerjee and He (2008) detail, the prevailing paradigm amongst macro policy prescriptions is largely divorced from the results obtained via field experiments. Indeed, the external validity issues associated with field experiments have seen them relegated to the role of micro studies that presumably have little value for macro policies that need to generalize beyond context-bound trials (Deaton 2009, Heckman and Urzua 2009, Sims 2010). I make the argument that despite their limitations, randomized control trials (RCTs) have generated a number of valuable insights for macro policy makers that go some way towards bridging the divide between micro-development and macro-development economists. These insights are examined in the context of informal institution-building, incentive structures in developing countries and policy implementation.[1] Ultimately, evidence from RCTs need to be incorporated in the macro policymaker’s toolkit, in order to employ policy interventions that have been proven to work.

The rest of the paper is organized as follows: section 2 contains a conceptual discussion of RCTs and the experimental methodology. Section 3 details the debate between macro and micro-development economists and more generally, between proponents and skeptics of the experimental methodology. I also examine the skepticism regarding the experimental methodology and detail a provisional approach whereby experimental results can be incorporated into macro policy toolkits. Finally, Section 4 details some RCTs, and links them to questions that have been raised in the macro-development literature.

2. Randomization as Methodology

The motivation behind conducting a randomized controlled trial is a simple one: to isolate the treatment effect of a particular policy intervention. The scope of these interventions is wide-ranging, from the provision of a commitment contract to help smokers quit smoking in the Philippines (Gine et al. 2008) to the provision of microcredit to low income families in Hyderabad, India (Banerjee et al. 2009). Each of these interventions seeks to find a causal explanation for the effect of a certain treatment i.e. to answer the question “what is the difference in outcomes for a subject that receives a certain treatment X as opposed to the outcome for the same subject who does not receive said treatment X”?[2] Angrist and Pischke (2010) argue that ‘the key findings from a randomized experiment are typically differences in means between treatment and controls, reported before treatment (to show balance) and after treatment (to estimate causal effects)’ (17). The standard approach involving observational data typically involves a multiple regression, in which various controls are included along with the independent variable of interest: thus, to find the impact of the stimulus package on total employment, we might run a regression with employment as the dependent variable and the stimulus amount as the key independent variable, along with controls for say cyclical employment fluctuations, other exogenous shocks, etc. One problem that becomes readily apparent is: Are we controlling for all the unobservables here? Despite including a whole host of controls, researchers may miss certain controls that have potentially significant confounding effects on the causal relationship between the stimulus package and employment. Given the sheer multitude of unobservables that may be present (along with data limitations), it is entirely plausible that certain controls will be left out. This worry compromises the internal validity of a non-experimental study, casting some doubt on the causal relationships identified.Duflo et al. (2006) describe this problem as the selection bias problem i.e. treatment and control groups may exhibit different outcomes by virtue of factors that are unrelated to the actual treatment being imposed. In the context of the stimulus example, researchers need to ask if the effect on employment would have been observed in the absence of a stimulus package as well, a question that calls for a comparison control group. In order to study the effects of a policy intervention/treatment against the counterfactual, experimentalists advocate conducting targeted policy experiments. Such an experiment would usually involve the identification of a target population, followed by a representative sample within said population. The researcher then randomizes the treatment within the sample, with the non-treated sample subjects forming the control group. Randomization ensures that unobservables are controlled for ex ante, since a treatment subject i is the same as a control subject j in expectation. The difference between the average outcome of interest in the treatment group and the average outcome of interest in the control group is the parameter of interest for the researcher, known as the Local Average Treatment Effect (LATE) following Angrist and Imbens (1994).

A simple example can be used to illustrate the experimental approach. Karlan and Zinman (2010) use a randomized treatment approach to estimate the impact of expanded microenterprise credit access in Manila. In partnership with a lender in Manila, the researchers identified a pool of marginal creditworthy applicants; due to the lender’s constraints, only a limited number of these applicants could be provided with a loan.[3] The “treatment” group (those micro-entrepreneurs provided with loans) was randomized, and the average impact of the loans was compared to the average impact for the “control” group (those micro-entrepreneurs that were not provided with loans). The results were vastly important: the researchers note that ‘the canonical case for microcredit—that access increases profits, business scale, and household consumption—is not supported on average…in all, [their] results suggest that microcredit may work broadly through risk management and investment at the household level, rather than directly through the targeted businesses’ (Karlan and Zinman, 2010). Banerjee et al. (2009) follow a similar methodology in estimating the efficacy of expanded credit access on marginal loan applicants in Hyderabad, India.

While RCTs gain their primary methodological advantage through heightened internal validity, two other facets of the experimental methodology have also gained attention: the cost-efficiency of RCTs in the developing world and the strong relationship between policy questions and experimental results. Duflo (2006) highlights both aspects of field experiments in developing countries:

While the cost of a good randomized policy evaluation in the U.S. easily reaches millions of dollars, both program costs and data collection costs are much lower in developing countries. This has allowed the practice to generalize beyond a few very well-crafted, major projects to a multiplicity of programs, countries, and contexts. In addition, while some of the well-known randomized evaluations are just that—rigorous evaluations of a particular policy intervention—the tradition of posing the question first and then finding the data to answer it has continued with randomized evaluations (2-3).

The direct advantage of cost-effectiveness is apparent: a multitude of experiments are now conducted in developing countries, spanning topics as diverse as microcredit (Karlan and Zinman 2010), improving teacher attendance (Duflo and Hana 2005) and deworming programs (Miguel and Kremer 2003). Each experiment targets a different policy intervention, thereby broadening the scope of questions for which experiments can generate insights. In addition, the cost-effectiveness of experiments in the developing world poses an additional advantage: replicability becomes a more viable option. While RCTs gain considerable leverage internally, they are open to criticisms of external validity (see section 3). Experimental results can only truly be generalized by replicating successful studies in a wider range of settings; lowered constraints on replication improve the prospects for RCTs to generate external validity. This is one reason for optimism with regards to the contributions that RCTs can make to macro policy makers. I address this topic in greater detail in the following sections.

3. Experimentalism and its Discontents[6]

During the course of giving the British Academy’s Keynes Lecture in 2008, Angus Deaton described the value of field experiments in the following terms:

In ideal circumstances, randomized evaluations of projects are useful for obtaining a convincing estimate of the average effect of a program or project. The price for this success is a focus that is too narrow to tell us “what works” in development, to design policy, or to advance scientific knowledge about development processes. Project evaluation using randomized controlled trials is unlikely to discover the elusive keys to development, nor to be the basis for a cumulative research program that might progressively lead to a better understanding of development (Deaton 2009, emphases added).

Deaton’s view is an accurate summary of the widely held view of RCTs as far too narrow in scope to contribute substantively towards a broader program of economic development. Banerjee (2008) perceives a similar bias in development circles, with the focus tending to be on the factors that contribute towards growth—such as macro policy and encouragement of the right institutional environment—as opposed to micro evidence on targeted policy interventions. In a recent debate in Enterprise Development and Microfinance, James Copestake lists four concerns with RCTs in microfinance:[7]

1. Problem selection bias, understood as a narrowing of the research agenda in order to fit the preferred methodology of the researcher.

2. External validity: how generalizable is the evidence from RCTs?

3. Are RCTs the most cost-effective way be which to evaluate programs? This question can be generalized from microfinance to development programs on the whole.

4. Other technical problems with RCTs, such as spillover effects.

Karlan and Goldberg address each of Copestake’s concerns separately. On the issue of problem selection bias, they point out that experimentalists do not advocate only using RCTs to answer questions in development. Indeed, a number of experimentalists acknowledge some areas in which it is impossible to carry out experiments (Imbens (2009), Banerjee and Duflo (2008)). Angrist and Pischke (2010) further caution that narrowness of scope should not be confused with triviality: seemingly narrow research questions can yield vastly important results. A famous example is provided by Banerjee et al. (2009), the first experiment of its kind to rigorously examine the impact of microcredit on household finances. Such a study holds significant value, given the funds being poured into the microfinance industry today.[8] Furthermore, Kremer and Holla (2008) review 16 RCTs on price elasticity in health and education, revealing that isolated experiments, while narrow in scope can cumulatively add up to a substantial corpus of evidence. Problem selection bias is less a problem with the methodology of RCTs and more an issue of understanding the scope of RCTs.[9]

Two points are worth making with regards to the cost effectiveness of randomized controlled trials: the first, made in the previous section, is that experiments are far cheaper to carry out in the developing world (where developmental economists conduct the majority of their RCTs) than in countries like the U.S. The second point, made by Karlan and Goldberg concerns the cost-effectiveness of alternatives: a number of observational studies can be expensive, particularly due to the data collection process. Experimentalists are willing to trade away some measure of cost control in order to generate more valid results, a move that complicates the issue of cost-effectiveness.

With regards to spillovers, a distinction needs to be made between natural spillovers (where non-experimental members of the community may be affected by the treatment) from research spillovers (where members of the control group are affected by the treatment). The first form of spillover poses little trouble to the experimentalist when it comes to causal inference, since it does not interfere with the quantities being measured. The other form of spillover is essentially a violation of the SUTVA assumption, and consequently a more serious problem, one that experimentalists take great pains to address. Karlan and Goldberg point out that while research spillovers remain a valid concern, given the biasing effects they might have on the measured local average treatment effect, researchers are constantly finding innovative ways by which to measure these spillovers and/or dispense with them. Where research spillovers are immeasurable, experiments might even be precluded.

The external validity concern is perhaps the most serious one in the context of the discussion at hand. In essence, critics such as Heckman and Urzua (2009), Rodrik (2008) and Deaton (2009) have pointed out that in their quest for internal validity, RCTs sacrifice too much external validity to be truly useful in a broader policymaking context. Rodrik (2008) identifies the problem in the context of a malaria study conducted in Kenya, where he identifies a number of interactive variables specific to the Kenyan context. The experiment in question involved the disbursement of mosquito nets for free, as a means by which to reduce the rate of malaria contraction:

Randomized evaluations are strong on internal validity, but produce results that can be contrasted on external validity grounds—as I illustrated with the malaria experiment. By contrast, the standard econometric and qualitative approaches I described above are weaker on internal validity—but conditional on credible identification, they have fewer problems of external validity. (In the malaria illustration above, they cover all or most of Africa as a whole and they may also have a temporal dimension.) (16)

The external validity critique is of especial concern for advocates of the incorporation of RCTs into the macro policy-maker’s toolkit. Reduction of malaria-related deaths is one of the Millennium Development Goals (MGDs), which in turn are a broader set of macro developmental goals. If one were to use the results of the malaria experiment in Kenya as evidence supporting a large-scale distribution of free mosquito nets, it is critical that one be able to generalize the findings of the study.

Even as he advocates an increased role for RCTs in macro literature, Banerjee (2008) is wary of the generalizability issue. He identifies a subtle problem involving the scaling up of results, which goes hand-in-hand with the external validity critique. In the context of a program that succeeded in increasing fertility levels by improving female literacy, Banerjee notes that ‘when we scale up the program to the national level, two challenges arise: one is that there will be crowding in the private schools and the other is that the returns to education will fall because of increased supply. For both reasons the experimental evidence would over-state the returns to the voucher program’. This is a problem distinct from the one identified by Rodrik: where Rodrik questions the generalizability of a study from Kenya to say Sudan, Banerjee cautions against the blind scaling-up of a project that was only tested within a few Kenyan villages. Both should be of concern to advocates of RCTs.

The most common rebuttal to the external validity critique identifies external validity as a general problem common to most social science research, including observational studies. Karlan and Goldberg (Karlan-Goldberg v. Copestake 2009) present an argument along these lines. Banerjee (2008) notes that one claimed advantage of cross-country and macro empirical research is that the average treatment effect is purportedly identified over a larger number of settings. This expanded sample space is supposed to generate more external validity than the context-bound results of RCTs. Banerjee counters this view:

However…a part of the problem comes down to what it means to be generalizable: it means that if you take the same action in a different location you would get the same result. But what action? When we talk about comparing educational investment or road construction or labor laws across large jurisdictions, what makes us believe that we are comparing then [sic] same action…In other words, most large area studies end up having to trust that what the data gatherers chose to put under the same label (miles of road constructed, number of teachers hired, etc.) indeed actually represent reasonable alternative implementations of the same “treatment”. (8)

Thus, the leverage on generalizability generated by large-scale observational studies might oftentimes be misleading, since the various observations in the sample may not have been subjected to the same treatment and level of control.

Similarly, Angrist and Pischke (2010) point out that while economic theory often suggests general principles, extrapolation of causal effects to new settings is always speculative. Indeed, the historical failure of one-size-fits-all policy prescriptions should be evidence enough of the problems with generalizing any perceived causal effect. Alternatives to the Washington Consensus, as detailed in Williamson (1994), identify multiple strategies for reform and growth (see Haggard (1990) and Toye (1994) for instance) that defy the notion of a single causal effect identifiable in all settings. Rodrik (2000) emphasizes the importance of local knowledge and calls for experimentation, even if it means sacrificing blueprints for institution building at times. He is particularly chary of homogenous solutions proposed by International Financial Institution (IFI) strictures that disregard the importance of local knowledge and participatory democracy; experiments conceivably yield the sort of micro-level local knowledge that Rodrik calls on policy-makers to take seriously.

While such a defense against the external validity critique—that it is applicable to all developmental studies to a certain extent—is well-founded, it does not dispense with the particularly acute forms of external validity problems faced by experimentalists. The solution proposed by experimentalists like Banerjee and Duflo (2008) is to replicate experiments as much as possible. Duflo (2006) argues ‘that we need to both continue testing existing theories and to start thinking of how the theories may be adapted to make sense of the field experiment results, many of which are starting to challenge them’. The mantra of replication is proposed as the means by which policy-makers can gradually work their way towards generalizable evidence on certain policy measures.[10] Banerjee and Duflo note that ‘to address concerns about generalization, actual replication studies need to be carried out. Additional experiments have to be conducted in different locations, with different teams. If we have a theory that tells us where the effects are likely to be different, we focus the extra experiments there. If not, we should ideally choose random locations within the relevant domain’ (12). Karlan and Goldberg (Karlan-Goldberg v. Copestake 2009) advocate a similar strategy, with replication of successful studies being the most viable route by which to pursue generalizable results.

Rodrik (2008) cautions that replication is a goal that works better in theory than in practice, owing to disciplinary incentives against the mere replication of studies. Academic journals rarely publish studies that replicate an earlier finding, even if the replication is carried out in a different setting. However, this does not stop other actors—namely NGOs and the state—from stepping in to fill the void. Given a successful study in a different setting, states would be well advised to attempt replications in order to determine what policy interventions work. Indeed, this is precisely the vision that experimentalists have for micro-results in the macro arena (see Banerjee 2007). Rodrik’s argument against such a replicatory mechanism—that other actors would have their own interests and stakes in the outcome, making the results problematic—appears considerably weaker than his argument on institutional incentives against replication within the academy. After all, researchers and policy-makers always have some stake in the outcomes of policy interventions, representing one constraint on any form of economic development.

Despite the greater external validity claimed by Rodrik (2008) for large-scale observational studies, he acknowledges that policymaking itself can oftentimes be an experimental process. The example provided by Rodrik is that of China’s economic reforms, described as “experimental gradualism”: local experiments in de-collectivization of farming were initially trialed in a few locations. A central research group studied the effects of these reforms and as successes materialized, the measures were expanded nationwide. Rodrik argues that this form of experimentation/gradualism is not experimentation of the form conducted in randomized controlled trials, yet represents an alternative means of effecting reform. However, there is no reliable evidence to indicate that the Chinese case was not a one-off success, success here being defined in the narrowest of terms as poverty reduction. Without the replication of such experimental gradualism elsewhere, there is no guarantee that Chinese-style reform measures will work elsewhere. Indeed, the need for gradualism stems in large part from this uncertainty. Thus, we are confronted with the generalizability problem here as well. Furthermore, it is difficult to know what specific aspects of the reform effort truly contributed to the poverty reduction; more formally, it is difficult to disaggregate the treatment effect. If experimental gradualism is called for, there is much to be said for RCTs and the strong claims to internal validity that they make. This is not to say that all experimental results would ultimately be generalizable. However, in so far as RCTs lay out a clear process by way of which a certain policy intervention can be carried out, they present a blueprint for working towards generalizability. Banerjee (2008) lays out a vision whereby replicability can gradually lead to external validity: this calls for diligent communication and measurement of local and context-specific factors in an experiment, along with a theory of where an intervention can be replicated. I end this section with a quote from Banerjee and Duflo (2008) on the scope and future of randomized evaluations:

If randomized evaluations can only be carried out in very specific locations or with specific partners, precisely because they are randomized and not every partner agrees to the randomization, replication in many sites does not get rid of this problem [of generalizability]. This is a serious objection…and one that is difficult to refute, since no amount of data could completely reassure us that this is not an issue. Our experience is that, in the context of developing countries, this is becoming less and less of an issue as randomized evaluations gain wider acceptability: evaluation projects have been completed with international NGOs, local governments, and an array of local NGOs. This will only improve if randomized evaluation comes to be recommended by most donors, as it will mean that the willingness to comply with randomization does not set organizations apart any more (16).

Linking the RCT methodology with Rodrik’s prescription of localized gradualism in reforms presents a workable framework within which micro-level studies can be incorporated into the broader literature of development economics. Ultimately, the dichotomy between micro and macro policy prescriptions can be broken down in the interests of a more holistic approach to development. Thus, while Block (1994) debates the role of the state in the economy and broadly prescribes a key role for the state in even a market economy, RCTs shed some light on the ways in which the state can most effectively intervene in the economy. Similarly, Banerjee (2008) argues that while macro-level studies can tell policy-makers to reduce corruption (Shleifer and Vishny 1993), we need experimental data in order to tell us how to reduce corruption.[11] Macro studies oftentimes deal with aggregates, but policy-makers are required to look at detailed distortions and experiments/quasi-experiments can help in figuring out how to deal with these distortions. In the next section, I describe some empirical insights in the RCT literature that can be brought to bear upon broader developmental concerns, with a focus on institution-building.

4. Bridging the Divide: Experimental Results for Macro Policy-Makers

It is not clear to us that the best way to get growth is to do growth policy of any form. Perhaps making growth happen is ultimately beyond our control. Maybe all that happens is that something goes right for once (privatized agriculture raises incomes in rural China) and then that sparks growth somewhere else in economy [sic], and so on. Perhaps, we will never learn where it will start or what will make it continue. The best we can do in that world is to hold the fort till that initial spark arrives: make sure that there is not too much human misery, maintain the social equilibrium, try to make sure that there is enough human capital around to take advantage of the spark when it arrives. Social policy may be the best thing that we can do for growth to happen and micro-evidence on how to do it well, may turn out to be the key to growth success (Banerjee 2008, 17-8).[12]

The quote above represents a view that even Banerjee acknowledges as being a radical one. However, it is clear that a broad-based strategy for growth eludes policymakers; Angus Deaton’s (2009) comment that RCTs do not yield the ‘elusive keys to economic development’ could conceivably be leveled at the development profession as a whole. Regardless of whether or not one agrees that making growth happen may ultimately be beyond our control, it is clear that social policy is an area of great interest for policymakers. As noted earlier, targeted experiments allow policymakers to test the efficacy of specific interventions; in the previous section, I briefly discussed the validity of generalizing from these findings. In this section, I move beyond the largely theoretical discussion employed in the previous sections, and explicate some examples of RCTs that potentially have broader macro implications.

Karlan and Zinman (2010) use a ‘replicable experimental design that randomly assigns credit, through credit scoring, to identify impacts of a credit expansion for marginal microentrepreneurial borrowers in Manila’ (1). The authors find that access to microcredit does not increase profits, business scale and household consumption, with any increases in profit being driven mostly by the shedding of unproductive workers. The authors conclude that microcredit may work better at the household level rather than directly through targeted businesses. This result can be compared to Banerjee et al. (2009), which finds that ‘while microcredit “succeeds” in affecting household expenditure and creating and expanding businesses, it appears to have no discernible effect on education, health or women’s’ empowerment’ (21). The two studies are valuable in so far as they identify some effects of expanding microcredit provision. However, the studies could also be extended towards broader policymaking in the realm of financial services. Karlan and Zinman note that ‘business outcomes are not a sufficient statistic for household welfare, nor even necessarily the locus of the biggest impacts of changing access to financial services’ (17). This result complicates discussions over capital markets, since a number of studies assume ex ante that enhanced business access to financing is in some way a reflection of underlying welfare improvements (see Rajan and Zingales (2003) and Pistor and Xu (2005) for instance). The Manila study hints at a potential disjunction between credit access for businesses, and household welfare; in addition, the replicable experimental design allows policymakers to replicate the study in other settings. Ultimately, the results from many such studies—conducted in a variety of settings—could contribute towards a broader vision of how policy-makers can subsidize credit in the most efficacious way possible. In talking about similar programs conducted in the Philippines, List (2007) notes that ‘these types of results [are] fundamental in learning about how to deepen participation in formal financial institutions in country like the Philippines’ (19). This gets to a crucial point: once one has a set of institutions in place, how does one encourage participation in them? As noted below, North (1990) views institutional change as an incremental one that requires changes at the margins: experimental studies provide clues as to how participation in formal institutions can be encouraged.

With regards to institutions and institutional change in general, North (1990) provides two key insights:

1. Informal constraints on behavior—such as social norms and shared cultural understandings—are a critical component of any institutional framework. ‘The long run implication of the cultural processing of information that underlies informal constraints is that it plays an important role in the incremental way by which institutions evolve’ (44).

2. The process of change is overwhelmingly an incremental one: ‘change typically consists of marginal adjustments to the complex of rules, norms, and enforcement that constitute the institutional framework’ (83).

Thus, any process of institutional change must be an incremental one that attempts to shift individual behavior. However, Banerjee (2008) observes that the institutionalist literature is still unclear as to what sort of institutions need to be encouraged, with few reliable policy prescriptions having emerged. Duflo (2006) describes a number of experiments that have been set up to study the manner in which the incentives faced by individuals affect their behavior. While many of these experiments have been conducted in schools, the answers provided are still important in helping us determine ‘whether efforts to reform institutions to provide stronger incentives can have a chance to improve performance’ (4).

One such experiment, described in Duflo (2006), is Duflo and Hana (2005). Partnering with the NGO Seva Mandir in Udaipur district of the Indian state of Rajasthan, the researchers sought to lower the absentee rates of teachers in schools. Efforts to regularly monitor teacher attendance had hitherto proven ineffective due to the terrain in Udaipur. Duflo’s (2006) description of the experiment provides a snapshot of how a typical field experiment might work:

Seva Mandir selected 120 schools to participate in the experiment. In 60 randomly selected schools (the “treatment” group), they gave the teacher a camera with a tamper-proof date and time function and instructed him to take a picture of himself and his students every day at opening time and at closing time. Teachers received a bonus as a function of the number of “valid” days they actually came to school. A “valid” day was defined as a day where the opening and closing pictures were separated by at least 5 hours and a minimum number of children were present in both pictures. The bonus was set up in such a way that a teacher’s salary could range from 500 rupees to 1,300 rupees, and each additional valid day carried a bonus of 50 rupees (6 US dollars, valued at PPP). In the remaining 60 schools (the “comparison” group), teachers were paid 1,000 rupees and they were told (as usual) that they could be dismissed for poor performance. One unannounced visit every month was used to measure teacher absence as well as teachers’ activities when in school (5).

The researchers found that over a period of 18 months, the absence rate fell from an average of 42 percent in the comparison schools (measured at 43 percent before the experiment) to 22 percent in the treatment schools, a significant drop in delinquencies. As an incentive structure to improve individual behavior, the experiment appears to have been a success on most counts, indicating one way in which policy-makers can work towards incrementally improving individual behavior. Observational studies have to a large extent focused on the impacts of institutional change and enforcement on business and aggregate indicators (Boycko, Shleifer and Vishny 1995, Partnoy 2009); in keeping with the view that marginal changes to norms and enforcement may also prove effective, experiments like the Seva Mandir one in Udaipur serve to guide policymakers in structuring incentives.[13]

The final empirical example involves a study conducted by Beaman, Chattophadhyay, Duflo, Pande and Topalova (2008). The researchers estimated the impact of mandated political representation of women in village councils on citizens’ attitudes towards women leaders in general. The study found that while such a quota system did not alter the preference of villagers for male leaders, it weakened gender stereotypes towards women’s roles in public and lessened the negative bias that was attached to assessments of the effectiveness of female leaders. Most importantly, the study found that such changes are meaningful to the extent that they have lasting impacts in the long run: ‘after 10 years of the quota policy, women are more likely to stand for and win free seats in villages that have been continuously required to have a female chief councilor’ (1). The experimental intervention in question thus succeeded in triggering a change in widely shared cultural norms at the village level, a result that should be of interest to institutionalists and proponents of participatory democracy as a mechanism for growth (Rodrik 2000). In addition, the intervention sheds light on the efficacy of a certain form of affirmative action (see Levitt and List 2008 for evidence from other experiments on discrimination and corrective quota measures).

North (1990) stresses the role of path-dependence in institutional change and cautions that in the long run, countries may be doomed to continue on a set path of institutional evolution (or non-evolution). Against that background, it is appropriate to end this section with Duflo’s (2006) rationale for a greater focus on experimental results, one that is premised on a very North-ian vision of a world comprised by formal and informal institutions:

Development economists have always stressed the importance of institutions, and recently, the study of institutions has re-emerged as one of the central questions in development…a central reason for underdevelopment is the lack of institutions that favor cooperation and social behavior. The central practical question then becomes: what to do about poor institutions? Should we just write off those countries which are plagued with (often historically inherited) poor institutions, or should we instead work on ways to get things done in these environments (with a view, perhaps, to arrive at institution change eventually). Most countries with very poor institutions function to some extent…understanding how to harness people’s intrinsic motivation and social preferences may help to improve the day-to-day functioning of countries where institutions are in disarray…development economics is in large part the study of indigenous, informal institutions that emerge to palliate the absence of well-functioning formal institutions (27).

5. Conclusion

Angrist and Pischke (2010) present an optimistic picture of the contributions that experimental and quasi-experimental studies can make to macro policy-makers. While they provide some promising evidence, there is little to indicate that macro policy makers will move wholly away from large-scale observational studies, despite the problems associated with establishing causal inference in such studies. Leamer (2010, Keane (2010) and Sims (2010) all adopt the view that Angrist and Pischke remain overly optimistic in their vision for the experimentalist’s role in drawing macro implications.

However, what I have attempted to show in this paper is that RCTs and micro-level field experiments still have considerable value for economists interested in the institutions that lead to growth and development. While each experiment may yield results that are narrow in scope, the cumulative results often yield important conclusions (Banerjee and Duflo, 2008). Rodrik (2008) sees the development profession as being on the cusp of a great re-unification between the long-divided macro and micro branches. In addition, there is the prospect of ‘a progression from presumptive approaches with ready-made universal recipes to diagnostic, contextual approaches based on experimentation and policy innovation. If carried to fruition, this transformation would represent an important advance in how development policy is carried out’ (32). Randomization represents no panacea in the field of economic development; however, contrary to some beliefs, it has considerable value for the field at large.

References

ANGRIST, J., AND G. IMBENS., (1994), “Identification and Estimation of Local Average Treatment Effects,” Econometrica, Vol. 62(2): 467-475.

ANGRIST, J., AND PISCHKE, J-S., (2010), “The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics,” Journal of Economic Perspectives, Vol. 24(2): 3-30.

BANERJEE, A., (2002), “The Uses of Economic Theory: Against a Purely Positive Interpretation of Theoretical Results,” MIMEO, MIT.

BANERJEE, A., (2008), “Big Answers for big questions: the presumption of growth policy,” Unpublished Transcript of Speech Prepared for the Brookings Conference on “What Works in Development? Thinking Big and Thinking Small”.

BANERJEE, A., AND E. DUFLO., (2008), “The Experimental Approach to Development Economics, J-PAL, MIT.

BANERJEE, A., AND R. HE, (2008), “Making Aid Work”, in Reinventing Foreign Aid, Cambridge, MA. MIT Press.

BANERJEE, A., E. DUFLO, R. GLENNERSTER AND C. KINNAN., (2009), “The Miracle of Microfinance? Evidence from a Randomized Evaluation,” Innovations for Poverty Action, MIT.

BEAMAN, L., R. CHATTOPADHYAY, E. DUFLO, R. PANDE AND P. TOPALOVA., (2008), “Powerful Women: Does Exposure Reduce Bias?” BREAD Working Paper #181, NBER Working Paper #14198.

BLOCK, F., (1994), “The Roles of the State in the Economy,” in The Handbook of Economic Sociology, ed. Neil . Smelser and Richard Swedberg (Princeton, Princeton University Press, 1994): 691-710.

BOYCKO, M., A. SHLEIFER AND R. VISHNY., (1995), Privatizing Russia (Cambridge, MA: MIT Press, 1995).

COPESTAKE, J., N. GOLDBERG., AND D. KARLAN., (2009), “Crossfire: Randomized control trials are the best way to measure impact of microfinance programmes and improve microfinance product designs,” in Enterprise Development & Microfinance, Vol. 20(3): 167-176. 

DEATON, A., (2009), “Instruments of Development: Randomization in the Tropics, and the Search for the Elusive Keys to Economic Development,” NBER Working Paper #14690.

DUFLO, E., AND M. KREMER., (2004), “Use of Randomization in the Evaluation of Development Effectiveness,” in Evaluating Development Effectiveness (World Bank Series on Evaluation and Development, Volume 7), edited by Osvaldo Feinstein, Gregory K. Ingram, and George K. Pitman. New Brunswick, NJ: Transaction Publishers, pp. 205-232.

DUFLO, E., AND R. HANA., (2005), “Monitoring Works: Getting Teachers to Come to School,” NBER Working Paper #11880.

DUFLO, E. (2006), “Field Experiments in Development Economics,” BREAD, CEPR, NBER.

DUFLO, E., R. GLENNERSTER., AND M. KREMER., (2008), “Using Randomization in Development Economics Research: A Toolkit,” Handbook of Development Economics, (T. P. Schultz and J. Strauss eds.), 3895-3962.

GINE, X., D. KARLAN AND J. ZINMAN., (2008), “Put Your Money Where Your Butt Is: A Commitment Savings Account for Smoking Cessation,” MIMEO, Yale University.

HAGGARD, S., (1990), “Explaining Development Strategies,” in Pathways from the Periphery: The Politics of Growth in the Newly Industrializing Countries (Ithaca: Cornell University Press, 1990): 23-48.

HECKMAN, J., AND S. URZUA., (2009), “Comparing IV With Structural Models: What Simple Can and Cannot Identify,” NBER Working Paper #14706.

IMBENS, G., (2009), “Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009),” Unpublished Manuscript, Kennedy School, Harvard University.

KARLAN, D., AND J. ZINMAN., (2010), “Expanding Microenterprise Credit Access: Using Randomized Supply Decisions to Estimate the Impacts in Manila,” Unpublished Manuscript, Department of Political Science, Yale University.

KEANE, M., (2010), “A Structural Perspective on the Experimentalist School,” Journal of Economics Perspectives, Vol. 24(2): 47-58.

KREMER, M., AND E. MIGUEL., (2003), “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities, forthcoming, Econometrica.

KREMER, M., AND A. HOLLA., (2008), “Pricing and Access: Lessons from Randomized Evaluations in Education and Health,” forthcoming, Journal of Economic Literature.

LEAMER, E., (2010), “Tantalus on the Road to Asymptopia,” Journal of Economic Perspectives, Vol. 24(2): 31-46.

LEVITT, S., AND J. LIST., (2008), “Field Experiments in Economics: The Past, the Present and the Future,” NBER Working Paper #14356.

LIST, J., (2007), “Field Experiments: A Bridge Between Lab and Naturally Occurring Data,” NBER Working Paper #12992.

NORTH, D., (1990), Institutions, Institutional Change and Economic Performance (Cambridge: Cambridge University Press, 1990).

PARTNOY, F., (2009), “Historical Perspectives on the Financial Crisis: Ivar Kreuger, the Credit-Rating Agencies, and Two Theories about the Function, and Dysfunction, of Markets, Yale Journal on Regulation, Vol. 26: 431-43.

PISTOR, K., AND C. XU., (2005), “Governing Stock Markets in Transition Economies: Lessons from China,” American Law and Economics Review, Vol. 7(1): 184-210.

RAJAN, R., AND L. ZINGALES., (2003), “The Great Reversals: The Politics of Financial Development in the Twentieth Century,” Journal of Financial Economics, Vol. 69: 5-50.

RODRIK, D., (2000), “Institutions for high-quality growth: What they are and how to acquire them,” Studies in Comparative International Development, Vol. 35(3): 3-31.

RODRIK, D., (2008), “The New Development Economics: We Shall Experiment, But How Shall We Learn?,” Unpublished Manuscript, Kennedy School, Harvard University.

SHLEIFER, A., AND R. VISHNY., (1993), “Corruption,” The Quarterly Journal of Economics, Vol. 108(3): 599-617.

SIMS, C., (2010), “But Economics is not an Experimental Science,” Journal of Economic Perspectives, Vol. 24(2): 59-68.

WILLIAMSON, J., (1994), “In Search of a Manual for Technopols,” in The Political Economy of Policy Reform, ed. (Institute for International Economics, Washington D.C., 1994): 9-48.


[1] My arguments are closely related to Banerjee and Ruimi (2007) and Rodrik (2008), although I differ from both on certain points: I am more skeptical than Banerjee and Ruimi of cost-efficiency in RCTs, once we entertain replication as the means by which to generate external validity, while I disagree with Rodrik on the incentives to replicate and the non-replicatory nature of macro policies.

[2] Although field experiments cannot be conducted in order to answer all questions- for instance, one could not conduct a field experiment in order to gauge the impact of a war- the form of the causal question being asked is similar across the social sciences.

[3] Critics have pointed to RCTs as being potentially unethical, largely by virtue of picking winners and losers through experiments. Karlan-Goldberg v. Copestake (2009) addresses these ethical questions effectively; however, it is apparent here that limited resources would have dictated limited treatment in any case. In so far as the experiment randomizes the treatment effect, it keeps the total number of loans the same, while possibly making loan provisions more ethical by divorcing them from constraints like cronyism.

[4] This is adapted from Duflo, Glennerster and Kremer (2006).

[5] For more on the mathematical underpinnings of RCTs, see Duflo, Glennerster and Kremer (2006), Imbens (2009) and Angrist and Pischke (2010).

[6] This overview does not claim to be exhaustive—there are a number of other criticisms leveled at RCTs that I do not cover here. However, I try to touch upon the essential parts of the debate.

[7] The debate is henceforth referred to as Karlan-Goldberg v. Copestake, 2009.

[8] Total assets for microfinance firms in Mexico are currently estimated at roughly $60 billion (see Neil MacFarquhar, “Banks Making Big Profits From Tiny Loans,’ New York Times, April 13, 2010: The following is adapted from Duflo, Glennerster and Kremer (2006): http://www.nytimes.com/2010/04/14/world/14microfinance.html?pagewanted=1&hp.

[9] That having been said, Karlan and Goldberg also argue that RCTs need to be given more attention by the microfinance industry.

[10] Replication does not address the scaling-up problem identified by Banerjee (2008); in such cases, it simply might not be possible to scale-up an experimental intervention, thereby diminishing the broader importance of the study results. However, issues of scale do not apply to all experiments.

[11] Experiments conducted on a seemingly unrelated topic like improving teacher attendance in India can often yield results on how to reduce corruption; see section 4.

[12]It may seem problematic that Banerjee conflates growth with economic development and poverty reduction. However, he points to empirical evidence that economic growth has characteristically been associated with poverty reduction and overall economic development; to that effect, Banerjee shuttles freely between the growth and development. The validity of such a move is a debate that lies beyond the scope of this paper.

[13] The remarkable cost-effectiveness of the Seva Mandir program is heartening, allowing for substantial replicability in other areas where teacher absenteeism poses a major problem.

Advertisement

2 Responses to Bringing Experimentation Into The Fold: Making Randomized Controlled Trials A Part Of The Broader Development Project

  1. Pingback: What Are We Learning About Impacts? « Chasing Fat Tails

  2. Pingback: Development that Works: RCTs (Methodology) « Chasing Fat Tails

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.