Three Essays on Political Economy and the Methods 
 
 
A DISSERTATION 
SUBMITTED TO THE FACULTY OF THE 
UNIVERSITY OF MINNESOTA 
BY 
 
 
Yu Wang 
 
 
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS 
FOR THE DEGREE OF 
DOCTOR OF PHILOSOPHY 
 
 
Advisor: Marc F. Bellemare 
 
 
April 2020 
 
  
 
 
© Yu Wang 2020 
 
ALL RIGHTS RESERVED 
 
i 
 
Acknowledgements 
 
I would like to thank my Ph.D. advisor, Prof. Marc F. Bellemare, whose advising and 
instruction have influenced and benefited me tremendously. He has offered numerous 
invaluable instructions and helps for my job market paper, which was also my second-
year paper. In co-authoring with me writing the second essay in my dissertation, he has 
offered me insightful and critical ideas on how commonly seen empirical strategies 
addresses problems in identification, such as endogeneity. He has also provided me 
invaluable academic support so that I could have a good start in econometric theory. My 
deep appreciation also goes to the opportunity of summer research assistantship that he 
offered, his recommendation in my job market and other application for the graduate 
assistantship, as well as the suggestions on job market he provided so that I could prepare 
to be a job market candidate very early.  
    I’m deeply grateful to my Ph.D. dissertation committee members: Prof. Jay Coggins, 
Prof. John Freeman and Prof. Paul Glewwe. My heart-felt appreciation goes to their 
invaluable academic instruction on the three essays in my dissertation, their 
recommendation in my job market and other application for the graduate assistantship, as 
well as their suggestions on my job market. I also thank Prof. Steve Miller, Prof. Elton 
Mykerezi, Prof. Terry Roe and Prof. Sean Sylvia for invaluable comments and suggestions 
on my essays. 
    I’m also deeply grateful to my master advisor, Prof. Renfu Luo from Peking University. 
My sincere appreciation goes to his advising and instruction on my publication, my job 
market paper, and other papers in progress or in plan. I would sincerely thank his insightful 
ideas on Chinese economy. With his ideas, I’m holding strong confidence in Chinese 
economic development, and devoted to telling a good story about China in my research. 
    I would like to thank my supervisors at MPC, Dr. Tracy Kugler, Dr. David Van Riper 
and Dr. Jonathan Schroeder for their instruction on data management. I would also thank 
other professors who have supervised my graduate research assistantship and graduate 
teaching assistantship, Prof. Ragui Assaad, Prof. Chengyan Yue and Prof. Jeff Apland. 
ii 
 
    I would like to thank our DGSs, Prof. Joe Ritter and Prof. Rodney Smith, our graduate 
coordinator, Jenna Mead and Gary Cooper, for their academic supports and suggestions. 
I would thank Elaine Reber for uploading recommendation letters for me. I would thank 
Linda Eells for helping me looking for literature.  
    I would like to thank all professors that have taught me with courses in our department, 
as well as professors in other departments, especially Colleen Meyers for Practicum in 
University Teaching for Nonnative English Speakers (Grad 5105), Prof. Tim Kehoe for 
International Trade (Econ 8401) and Prof. Mikhail Safonov for Math Analysis (Math 
5615H & 5616H). 
    I would like to thank Sebastian Anti, Haseed Ali, Yuan Chai, Xiangwen Kong, 
Qingxiao Li, Bixuan Sun, Huichun Sun, Berenger Djoumessi Tiague, Khoa Vu, Jingjing 
Wang, Yanghao Wang, Zhiyu Wang, Shuoli Zhao, and all other friends in our department. 
I would also thank my friends from other universities who have helped me: Yongdong Liu 
(Assistant Professor, UCL), Liangjie Wu (Ph.D. Candidate, U Chicago). 
    I would like to thank Prof. Chris Blattman, Prof. Dave Donaldson, Prof. Christopher 
Neilson, Prof. Nathan Nunn, Prof. David Zilberman, and all other professors who have 
provided me with invaluable academic suggestions for my research.  
    My deepest appreciation and love go to my wife Fan, my daughter Grace, my son 
Vincent, and my parents. 
   
iii 
 
Dedication 
 
To my wife, Fan, and my two kids, Grace and Vincent. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
iv 
 
Abstract 
 
This dissertation consists of three essays regarding political economy and the 
theoretical discussion of two empirical methods. Chapter 2 (Essay 1) discusses how 
the introduction of local direct elections, by providing local information, facilitates the 
fulfillment of the meritocratic selection of local leaders. Using the Bayesian 
framework, this paper finds that because local residents, the voters, communicate with 
the local leader candidates of more times than upper officials do, local residents infer 
each local leader candidate’s virtue or capacity more accurately and precisely. This 
paper then shows that due to the higher accuracy, the expectation of the competence 
(a weighted average of virtue and capacity) of the elected local leader is higher than 
that of the appointed leader; due to the higher precision, the variance of the competence 
of the elected local leader is lower than that of the appointed leader.  
Chapter 3 (Essay 2) discusses the lagged IV method, namely using the lagged 
endogenous explanatory variable as its instrumental variable (IV). This paper starts 
with a conceptual framework, and then conducts the numerical analysis. It shows that 
when the lagged IV only violates the independence assumption, the lagged IV estimate 
is consistent, and has lower bias than the OLS estimate; however, when the lagged IV 
violates both the independence assumption and the exclusion restriction, the lagged IV 
estimate is inconsistent, and has much higher bias than the OLS estimate. The 
simulation results support the numerical analysis. 
    Chapter 4 (Essay 3) discusses the spatially lagged IV method, namely using the 
spatially lagged endogenous explanatory variable, namely the spatial weighting matrix, 
as its instrumental variable (IV). This paper introduces the spatially local average 
treatment effect (SLATE) theorem, which consists of two key properties: the spatial 
independence assumption and the spatial exclusion restriction. This paper 
demonstrates that when the spatially lagged IV satisfies the spatial independence 
assumption and the spatial exclusion restriction, its estimate is unbiased and consistent. 
Even if the treatment has multiple waves of implementation, the spatially lagged IV is 
still valid. 
 
v 
 
Contents 
Acknowledgements ..................................................................................................................... i 
Dedication ................................................................................................................................. iii 
Abstract ..................................................................................................................................... iv 
Contents ..................................................................................................................................... v 
List of Tables ............................................................................................................................ vi 
List of Figures .......................................................................................................................... vii 
1. Introduction ........................................................................................................................ 1 
2. Local Direct Elections, Local Information, and Meritocratic Selection ............................ 3 
2.1. Introduction ......................................................................................................................... 3 
2.2. Local Governance in Rural China ....................................................................................... 6 
2.3. Meritocratic Selection with the Improved Inference Effectiveness .................................... 8 
2.4. Meritocratic Selection with the Improved Candidate Pool ............................................... 22 
2.5. Concluding Remarks ......................................................................................................... 28 
3. Lagged Variables as Instruments ...................................................................................... 32 
3.1. Introduction ....................................................................................................................... 32 
3.2. Theoretical Framework ..................................................................................................... 36 
3.3. Numerical Analysis ........................................................................................................... 42 
3.4. Simulation Analysis .......................................................................................................... 50 
3.5. Conclusion ........................................................................................................................ 56 
4. Spatially Lagged Variables as Instruments: The Spatially Local Average Treatment Effect 
(SLATE) in Estimation ............................................................................................................ 69 
4.1. Introduction ....................................................................................................................... 69 
4.2. Theoretical Framework ..................................................................................................... 71 
4.3. The Numerical Spatially Local Average Treatment Effects (SLATE) ............................. 80 
4.4. The Dynamic Spatially Local Average Treatment Effects (SLATE) ............................... 91 
4.5. Conclusion ...................................................................................................................... 100 
References .............................................................................................................................. 102 
Appendices for Local Direct Elections, Local Information, and Meritocratic Selection ....... 108 
Appendices of Lagged Variables as Instruments ................................................................... 123 
Appendices of Spatially Lagged Variables as Instruments: Spatially Local Average Treatment 
Effect (SLATE) in Estimation ............................................................................................... 132 
vi 
 
 
 
List of Tables 
Table 3.1. Reviewed Journals Published in 2013-2018, Using Lagged IV Methods .............. 58 
Table 3.2. Simulation Parameters ............................................................................................ 59 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vii 
 
 
List of Figures 
Figure 2.1. Inference Accuracy and Precision with Natural Communication Times .............. 31 
Figure 3.1. Representation of Monte Carlo Simulation Setup ............................................... 60 
Figure 3.2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5 ..................... 61 
Figure 3.3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5 ..................... 62 
Figure 3.4. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑋𝑋 − 1 Also Has Causal Effects 
on 𝑌𝑌𝑋𝑋 ...................................................................................................................................... 63 
Figure 3.5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on 
Explained Variable................................................................................................................. 64 
Figure 3.6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on 
Explained Variable................................................................................................................. 65 
Figure 3.7. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑋𝑋 − 1 Also Has Causal Effects 
on 𝑈𝑈𝑋𝑋 ...................................................................................................................................... 66 
Figure 3.8. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on 
Unobserved Confounder ........................................................................................................ 67 
Figure 3.9. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on 
Unobserved Confounder ........................................................................................................ 68 
Figure 3.A1. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000
 ............................................................................................................................................. 126 
Figure 3.A2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000
 ............................................................................................................................................. 127 
Figure 3.A3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged 
Causality on Explained Variable ......................................................................................... 128 
Figure 3.A4. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged 
Causality on Explained Variable ......................................................................................... 129 
Figure 3.A5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged 
Causality on Unobserved Confounder ................................................................................. 130 
Figure 3.A6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged 
Causality on Unobserved Confounder ................................................................................. 131 
1 
 
 
1. Introduction 
 
Discuss political issues is usually indispensable in economic research. This is because 
as production scale and social complexity rise, it is impossible to ignore or 
underestimate the role that the government or other authorities play. One of the core 
issues in political economy is political selection, yet most previous studies emphasize 
accountability, including political incentives and monitoring. Although ex post 
accountability is of great importance, ex ante political selection aiming at selecting 
politicians with the most competence, also deserves sufficient academic attention. 
Therefore, to gain greater insight into achieving a better governance, I choose to study 
political selection. As for methodology, I noticed that the instrumental variable (IV) 
method is popular among applied social science studies. However, it is difficult to find 
a valid IV, thus most researchers look for alternative IVs like the lagged IV or the 
spatially lagged IV. Few theoretical studies, however, have discussed the validity of 
these IVs in detail. Therefore, chapter 3 and 4 will theoretically discuss these two 
methods in detail. 
    Chapter 2: This paper studies the relationship between local direct elections and 
meritocratic selection through the mechanism of local information. This paper’s 
theoretical model, based on the Bayesian inference framework, shows that local direct 
elections in rural China facilitate the meritocratic selection of both village committee 
members and village party secretaries. Local direct elections transfer the authority for 
selecting village committee members, from township officials to village residents. 
Because village residents, compared to township officials, have advantages in the local 
information on village committee candidates, they infer those candidates’ virtue and 
capacity more accurately and precisely. The introduction of local direct elections, with 
such improved inference effectiveness, enhances the expectations of the competence 
of village committee members, while reducing the variances of that competence. 
Further, some or all village committee members are also the candidates for village 
party secretaries. Therefore, with such improved candidate pools, the expectations of 
the competence of village party secretaries are also enhanced, yet the variances are 
ambiguously changed.  
2 
 
    Chapter 3: Lagged explanatory variables remain commonly used as instrumental 
variables (IVs) to address endogeneity concerns in empirical studies with 
observational data. Few theoretical studies, however, address whether “lagged IVs” 
mitigate endogeneity. We develop a structural model in which dynamics among the 
endogenous explanatory variable and the unobserved confounders cannot be ruled out, 
and look at the endogeneity of lagged IV estimates. We then use Monte Carlo 
simulations to illustrate our analytical findings. We show that in the discussion of the 
Local Average Treatment Effect, when the lagged IVs only violate the independence 
assumption, the lagged IV method mitigates the endogeneity problem by yielding 
consistent estimates in which the biases are smaller than those in the OLS estimates. 
However, when the lagged IVs violate both the independence assumption and the 
exclusion restriction, the lagged IV method cannot mitigate, but even aggravate, the 
endogeneity, by yielding inconsistent estimates in which the biases are much greater 
than those in the OLS estimates.  
    Chapter 4: Spatially lagged variables, more standardly, the spatial weighting 
matrices, are commonly used as instrumental variables to address the endogeneity in 
estimation, yet theoretical discussion about whether spatially lagged variables are valid 
instruments is lacking. In light of the Local Average Treatment Effects Theorem, this 
paper introduces the Spatially Local Average Treatment Effects (SLATE) Theorem to 
discuss such validity. This theorem demonstrates (1) the spatial independence 
assumption, that there is no inter-regional correlation between the endogenous 
explanatory variables and the disturbances in the spatial autocorrelation of either the 
unobserved confounders or the endogenous explanatory variables, and (2) the spatial 
exclusion restriction, that the spatially lagged IV has neither direct nor indirect causal 
impact on the explained variable. This paper’s theory shows that typical spatial 
weighting matrices serving as the spatially lagged IVs satisfy the spatial independence 
assumption, yielding unbiased and consistent estimates; however, if those matrices 
violate the spatial exclusion restriction, the estimates are biased and inconsistent. This 
paper also discusses the dynamic spatially local average treatment effect (SLATE), 
and shows that the spatially lagged IV method is acceptable even if the treatment 
involves multiple waves of implementation. 
 
 
3 
 
2. Local Direct Elections, Local Information, and 
Meritocratic Selection1 
 
YU WANG2 AND RENFU LUO3 
 
“When the Grand course was pursued, a public and common spirit ruled 
all under the sky; they chose men of talents, virtue, and ability; their 
words were sincere, and what they cultivated was harmony.” – 
Confucius (450 BC, translated by James Legge [1885]), Li Chi: Book of 
Rites. 
“The aim of every political constitution, is or ought to be, first to obtain 
for rulers men who possess most wisdom to discern, and most virtue to 
pursue, the common good of society.” – Hamilton, Madison and Jay 
(1788 [2008]), The Federalist Papers. 
 
2.1. Introduction 
    Meritocratic selection has been pursued around the world since ancient times. Since 
around 500 BC, Chinese politicians and philosophers have argued that those who 
govern should be selected by merit rather than inherited status (Sienkewicz, 2003). 
When the concept of meritocracy spread to Europe and the U.S., it was favored by 
philosophers (Kazin et al., 2009) and advocated in political statements (Hamilton, 
Madison and Jay, 2008). 
                                                          
1 This research has been supported by funding from the National Natural Science Foundation of China (Grant 
No.71873008). The authors declare that they have no relevant or material financial interests that relate to the 
research described in this paper. We would like to thank Marc Bellemare, Loren Brandt, Jay Coggins, Paul Glewwe, 
John Freeman, Elton Mykerezi, Terry Roe, and Sean Sylvia for their valuable comments and suggestions. All errors 
are ours. 
2 Wang: Department of Applied Economics, University of Minnesota (email: wang5979@umn.edu). 
3 Luo: China Center for Agricultural Policies, School of Advanced Agricultural Sciences, Peking University (e-
mail: luorf.ccap@pku.edu.cn). 
4 
 
    China has developed a series of top-down political selection schemes that emphasize 
the assessment, recommendation, and promotion of politicians based on their virtue 
and capacity, aiming at ex-ante meritocratic selection4. However, these top-down 
political selection schemes may suffer from adverse selection, thus their effects on the 
meritocratic selection of politicians could be limited. 
    In contrast to China’s top-down schemes, elections—bottom-up political selection 
schemes—were established in ancient Western regimes and gradually prevailed 5. 
However, most previous studies emphasize elections’ role in addressing moral 
hazard—that is, facilitating the ex-post accountability of politicians (Laffont, 2001; 
Besley, 2005).  
This paper studies how local direct elections address adverse selection and facilitate 
the meritocratic selection of politicians by providing local information. Chinese local 
governance is a typical political selection context that is characterized as a stratified 
governance structure with several small-scale organizations on a grassroots level. The 
introduction of local direct elections to Chinese local governance enables an 
institutional comparison to identify the relationship between, and the mechanism of, 
local direct elections and meritocratic selection. Specifically, after local direct 
elections were introduced, village leaders (the small organizations’ executive leaders) 
and other village committee members6 were no longer appointed by township officials, 
but directly elected by village residents. We build a theoretical model showing that 
local direct elections facilitate the meritocratic selection of all village committee 
members by providing more local information on the virtue and capacity of village 
committee candidates. Village party secretaries (the small organizations’ highest 
leaders), who supervise the village committees, are still appointed by township 
officials. Even so, our theoretical model shows that local direct elections of village 
committee members facilitate the meritocratic selection of village party secretaries 
                                                          
4 Around 134 BC, an assessment and recommendation system for noble families was established (Qian, 2012). 
Following the expansion of enfranchisement, the civil examination system of scholars was developed around 605 
AD. This system prevailed for more than 1,000 years and greatly influenced the political selection schemes of 
China and other countries (Elman, 2013; Bai and Jia, 2016; Bell, 2016). 
5 Around 508 BC, Athenian democracy was established. Through the expansion of enfranchisement, this electoral 
system has evolved into the modern representative democracy (Loeper, 2017). 
6 In the following, village committee members include village leaders, who are chairs of village committees, and 
other village committee members. 
5 
 
because some or all village committee members are candidates and are thus likely to 
be appointed village party secretaries by township officials. 
    The essential mechanism through which local direct elections work in the 
meritocratic selection of politicians is local voters’ advantages in local information on 
political candidates. Our model uses the Bayesian inference framework to frame this. 
This is different from studies focusing on the strategic behaviors between politicians 
and voters7 and studies using the game theory framework. The introduction of local 
direct elections allows village residents rather than township officials to select village 
committee members. Village residents naturally communicate with village committee 
candidates more often than with township officials, implying that village residents 
have an advantage in obtaining local information about these candidates (Ghatak, 1999; 
Bell, 2016). Therefore, village residents can infer the virtue and capacity of village 
committee candidates more accurately and precisely than township officials can.8 In 
other words, the inference effectiveness is improved by local direct elections.  
    Due to village residents’ advantages in local information, local direct elections that 
empower them address adverse selection by facilitating the meritocratic selection of 
village committees. With more accurate inferences, our theoretical model proves that 
in a representative village, the expected competence (a weighted average of virtue and 
capacity) of each elected village committee member is greater than that of each 
appointed village committee member. Our model also proves, with the more precise 
inference, that the variance in the competence of each elected village committee 
member is smaller than that of each appointed village committee member. These 
theoretical findings of improved inference effectiveness are in line with Hayek (1945) 
and Chan (2013), who found that assessment and decisions should be left to people 
with local information advantages.  
    Aggregating the local information on political candidates, local direct elections 
further facilitate the meritocratic selection of superior politicians, who are promoted 
by appointment based on their performance, in a stratified governance structure. More 
specifically, our model shows that by improving the candidate pools of village party 
                                                          
7 See Laffont (2001) and Besley (2006) for theoretical demonstrations, Ferraz and Finan (2011) and De Janvry et 
al. (2012) for empirical evidence, and Bell (2016) for demonstrations in political science. 
8 In the Bayesian inference, “more accurately” means that the posterior mean of virtue (or capacity) is closer to the 
real value of virtue (or capacity), whereas “more precisely” means that the posterior variance of virtue (or capacity) 
becomes smaller. These are discussed in detail in Section 2.3. 
6 
 
secretaries, local direct elections facilitate the meritocratic selection of village party 
secretaries, the highest village officials, and the chairs of village party branches. Some 
or all village committee members, including village leaders, are also village party 
branch members and are therefore candidates to become village party secretaries 
(O’Brien and Li, 2000). Upon observing their performance in village affairs, village 
party branch members are likely to be appointed village party secretaries by township 
officials. In a representative village, local direct election enhances the expected 
competence of each village committee member. In other words, the candidate pool for 
the village party secretary is improved. Our model shows that the expected competence 
of the village party secretary also increases; yet the variance of the competence of the 
village party secretary is changed ambiguously. These theoretical findings of improved 
candidate pools imply that the local information provided by local direct elections 
benefits bottom-up local political selection directly, yet such local information benefits 
top-down political selection in higher governance ladders indirectly and limitedly.  
    The remainder of this paper is organized as follows. Section 2.2 introduces the 
institutional background. Section 2.3 develops the theory of meritocratic selection with 
improved inference effectiveness. Section 2.4 develops the theory of meritocratic 
selection with improved candidate pools. And Section 2.5 concludes the paper. 
 
2.2. Local Governance in Rural China 
    The administrative organizations of Chinese villages consist of two committees. 
Village committees, which are de facto government entities at the village level, are 
chaired by village leaders and composed of other members. Village party branches, 
which represent the village-level leadership of the Chinese Communist Party (CCP), 
are chaired by village party secretaries and composed of other members. In practice, 
some or all village committee members, especially village leaders, are also members 
of village party branches. Likewise, some or all village party branch members are also 
members of village committees.  
    Village party branches oversee village committees; thus, village party secretaries 
are superior to village leaders in the governance hierarchy, as stipulated by the Organic 
Law of Village Committees (OLVC) (National People’s Congress of China, 1998) and 
the Working Regulation for Rural Grassroots Organizations of the Chinese 
Communist Party (Central Committee of the Chinese Communist Party, 1999). Village 
7 
 
committees are responsible for providing village infrastructure and public services, 
developing the local economy, and improving village residents’ income (National 
People’s Congress of China, 1998; Martinez-Bravo et al., 2011). The role of village 
party branches in the development of the local economy is to approve village 
committees’ plans and to monitor their implementation (Oi and Rozelle, 2000). 
    The likelihood of being selected as either a village committee member or a village 
party branch member is positively associated with both the candidate’s virtue and her 
capacity (Bell, 2016; Tang, 2016); this is rooted in the concept and practice of 
meritocratic selection in Chinese history (Zhang, 2012). The selection of village 
committee members, including village leaders, requires candidates to be law-abiding, 
have moral integrity, be intrinsically motivated to serve village residents, and have a 
diploma and administrative capacity (National People’s Congress of China, 1998). The 
selection of village party branch members, including village party secretaries, requires 
candidates to have professional knowledge and skills, to be responsive to the needs 
and demands of the village residents, and to be intrinsically motivated to serve them 
(Central Committee of the Chinese Communist Party, 1999).  
    The selection scheme for village leaders and other village committee members has 
changed: previously, they were appointed by township officials, but they are now 
directly elected by village residents. The establishment of local direct elections in rural 
China has been a gradual process (Martinez-Bravo et al., 2014). 9 By 2010, most 
villages had introduced local direct elections for village leaders and other village 
committee members (Padró i Miquel et al., 2015; Wong et al., 2017).  
    In contrast, village party secretaries and other village party branch members are still 
appointed by township officials (Central Committee of the Chinese Communist Party, 
1999). The affairs of village party branches are comprehensively supervised by the 
township officials of the township party branches10. By observing the performance of 
                                                          
9 In 1987, the National People’s Congress of China passed the OLVC, which stipulated that village leaders and 
other village committee members were to be elected. In 1998, the National People’s Congress of China passed a 
revised OLVC that introduced local direct elections for village leaders and other village committee members in 
rural China, resulting in the election of village leaders and other village committee members through open 
nomination and competitive elections (O’Brien and Han, 2009). After the national legislation in 1998, each 
province in China introduced its own Provincial Measures for Implementing the Organic Law of Village 
Committees to provide additional instructions on the implementation of local direct elections (O’Brien and Zhao, 
2014). Counties and townships followed (Wong et al., 2017). 
10 Township officials’ supervision duties include, but are not limited to, whether to set a village party branch, how 
8 
 
each village party branch candidate (village committee members and other village 
residents, both party members and non-party members), township officials decide 
whether to appoint them village party branch members11. If a village party branch 
candidate is not yet a party member, township officials can decide to make her a party 
member first and then appoint her a village party branch member. 
 
2.3. Meritocratic Selection with the Improved Inference Effectiveness 
    Our theoretical model based on the Bayesian inference framework investigates how 
the introduction of local direct election to a representative village facilitates the 
meritocratic selection of village committee members. The mechanism is that as the 
introduction of local direct election provides more local information on each village 
committee candidate, the inference effectiveness of the virtue and the capacity of each 
candidate is improved. 
 
A. Inferences on Village Committee Candidates 
    In this section, we use the Bayesian inference framework to discuss assumptions 
about the virtue and capacity of village committee candidates in a representative 
village. Because the village leader is the chair of the village committee, we only 
discuss the village leader and consider her the representative of all village committee 
members. Thus, our theoretical findings in this section also apply to other village 
committee members.  
    We find that because the representative village resident has an advantage in terms 
of local information about village leader candidates—that is, she naturally 
communicates with the village leader candidates more often than with the 
representative township official—her inferences of the virtue and capacity of these 
candidates are more accurate and precise. By more accurate we mean the posterior 
                                                          
to select party members in each village, how to appoint village party branch members, and how to appoint one 
village party branch member to be village party secretary. Although elections are conducted among all village party 
members to select village party branch members, election procedures, including the nomination process and 
stipulating election standards, are directly led by township officials (Central Committee of the Chinese Communist 
Party, 1999). 
11 Although village party branch members are stipulated elected by village party members, the whole election 
procedure are directly led by township officials (Central Committee of the Chinese Communist Party, 1999). 
Therefore, it is regarded that village party branch members are appointed by township officials (O’Brien and Li, 
2000). 
9 
 
mean of the virtue or capacity gets closer to its real value, and by more precise we 
mean the posterior variance of the virtue or capacity gets smaller. In a word, the 
inference effectiveness is improved. 
    1. Setup: We consider a representative village in which all adult village residents 
are potential village leader candidates. Each resident has two personal characteristics, 
virtue and capacity,12 both of which are assumed to be independent and identically 
distributed on [0, 1] with a mean of 0.5.13 
After the introduction of local direct elections in this representative village, a pool 
of village leader candidates, a subset of all potential candidates, competes to be elected 
village leader by the village residents. Before this introduction, the pool of village 
leader candidates competed to be appointed village leader by the township officials. 
The virtue of village leader candidate 𝑖𝑖 is denoted by 𝛼𝛼𝑖𝑖 , with 𝛼𝛼𝑖𝑖 ∈ [0,1], and her 
capacity is denoted by 𝜃𝜃𝑖𝑖 , with 𝜃𝜃𝑖𝑖 ∈ [0,1] , where 𝑖𝑖 = {1,2, … } . In the following 
analysis, we discuss one representative village resident or township official instead of 
village residents or township officials, assuming that village residents or township 
officials have homogenous inferences of each candidate for the village committee, 
including the village leader.14 
    2. Bayesian Inferences: The representative village resident or township official 
cannot directly obtain each village leader candidate’s virtue or capacity. Thus, natural 
communication is necessary. Natural communication is defined as daily 
communication at work, in everyday life, or in other circumstances in which 
communicators behave naturally and artlessly (Bell, 2016). Importantly, it does not 
lead to illegal outcomes in the management of village affairs, such as conspiracy or 
                                                          
12 Virtue refers to characteristics including, but not limited to, being law-abiding, having moral integrity, and being 
intrinsically motivated to serve the village residents (Central Committee of the Chinese Communist Party, 1999; 
Dal Bó et al., 2017; National People’s Congress of China, 1998). Capacity refers to characteristics including, but 
not limited to, having professional knowledge and administrative skills (Alesina and Tabellini, 2007; Central 
Committee of the Chinese Communist Party, 1999; Dal Bó et al., 2017; National People’s Congress of China, 1998). 
13 The virtue and capacity of village residents are both assumed to be bounded because (1) the number of residents 
in a village, a local area, is usually small (Liu et al., 2009; Martinez-Bravo et al., 2014) and (2) their socio-economic 
characteristics, thinking patterns, and behaviors tend to be homogenous due to homogenous socio-economic, 
cultural, and institutional constraints, and, in the long term, generation-by-generation interactions in local areas 
(Bell, 2016). For simplicity, we assume that their virtue and capacity are both bounded at [0, 1]. 
14 The perceptions are both assumed to be homogenous (1) because of the small number of village residents and 
township officials and (2) because village residents and township officials have homogenous socio-economic, 
cultural, and institutional constraints. 
10 
 
rent-seeking (Baker and Faulkner, 1993). The representative village resident or 
township official obtains 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 , a series of observations of the virtue of village leader 
candidate 𝑖𝑖  through natural communication at the 𝑋𝑋 − th  occurrence 15 , where 𝑋𝑋 =1, … ,𝑁𝑁 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 or 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 and 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 represent the total numbers of occurrences16 
of natural communication between the representative village resident and township 
official and each village leader candidate before a candidate is selected as village 
leader. 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼  is given by  
(2.1)                                        𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼＝𝛼𝛼𝑖𝑖 + 𝜐𝜐𝑖𝑖𝑖𝑖, 
where 𝜐𝜐𝑖𝑖𝑖𝑖  is a series of random shocks when observing virtue. Similarly, the 
representative village resident or township official, by naturally communicating with 
each village leader candidate, obtains 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 , a series of observations of the capacity of 
village leader candidate 𝑖𝑖 at time 𝑋𝑋 = 1, … ,𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 and 1, … ,𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃  is given by  
(2.2)                                        𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃＝𝜃𝜃𝑖𝑖 + 𝜔𝜔𝑖𝑖𝑖𝑖, 
where 𝜔𝜔𝑖𝑖𝑖𝑖 is a series of random shocks when observing capacity. 
    As natural communication happens often and in various situations, communicators 
have little opportunity—and are thus unwilling—to behave strategically to hide their 
true personal characteristics. Therefore, it is acceptable to assume that first, the times 
at which that natural communication occurs is sufficiently large, that is, 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) →+∞, and second, that the series of observations of virtue or capacity is normally 
distributed. 
 
ASSUMPTION 1 (Natural communication and observations on virtue and capacity): 
The representative village resident or township official communicates with the village 
leader candidates naturally, which means that we have 𝜐𝜐𝑖𝑖𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜐𝜐𝛼𝛼2 )  and 
𝜔𝜔𝑖𝑖𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜔𝜔𝜃𝜃2 ), and often, which means that 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 and 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 are sufficiently large.  
 
    Based on Assumption 1, we have 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼~𝑁𝑁(𝛼𝛼𝑖𝑖,𝜎𝜎𝜐𝜐𝛼𝛼2 ), and 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃~𝑁𝑁(𝜃𝜃𝑖𝑖 ,𝜎𝜎𝜔𝜔𝜃𝜃2 ). 
    Prior to natural communication, the representative village resident and the township 
official have their own prior perceptions of the virtue and capacity of each village 
leader candidate, as discussed below. 
                                                          
15 Natural communication at “the 𝑋𝑋-th occurrence” is equivalent to “time 𝑋𝑋” for short in the following context. 
16 “Numbers of occurrences” is equivalent to “times” for short in the following context. 
11 
 
 
ASSUMPTION 2 (Prior distribution of the virtue and capacity of village leader 
candidates): The representative village resident or township official’s prior 
perceptions of the virtue of village leader candidate 𝑖𝑖 are distributed as 𝑁𝑁�𝛼𝛼𝑖𝑖𝐸𝐸 ,𝜎𝜎𝛼𝛼𝑒𝑒2 �, 
truncated at [0,1], where 𝛼𝛼𝑖𝑖𝐸𝐸 ∈ [0, 1]. Their prior perceptions of the capacity of village 
leader candidate 𝑖𝑖  are distributed as 𝑁𝑁�𝜃𝜃𝑖𝑖𝐸𝐸 ,𝜎𝜎𝜃𝜃𝑒𝑒2 � , truncated at [0,1] , where 𝜃𝜃𝑖𝑖𝐸𝐸 ∈[0, 1]. 𝛼𝛼𝑖𝑖𝐸𝐸 and 𝜃𝜃𝑖𝑖𝐸𝐸, the prior means, and 𝜎𝜎𝛼𝛼𝑒𝑒2  and 𝜎𝜎𝜃𝜃𝑒𝑒2 , the prior variances, are known 
to the representative village resident or township official.  
 
    As virtue and capacity are assumed to be bounded at [0, 1] and the representative 
village resident or township official engages in long-term natural communication with 
each village leader candidate, their prior perceptions of the virtue or capacity of each 
village leader candidate are truncated at [0,1]. 
    Based on their prior perceptions and observations in natural communication, the 
representative village resident or township official obtains inferred perceptions of the 
virtue of the village leader candidates. According to Bayes’ rule, these posterior 
perceptions are obtained by iteration, such that the inferred perceptions of virtue at 
time 𝑋𝑋 depend on the inferred perceptions of virtue at time 𝑋𝑋 − 1 and the observations 
of virtue at time 𝑋𝑋. 
    We now introduce the derivation of the density kernel of the posterior distribution 
of the virtue of village leader candidate 𝑖𝑖. The representative village resident and the 
township official have identical prior perceptions of the virtue of village leader 
candidate 𝑖𝑖, whose density kernel is 𝛾𝛾(𝛼𝛼𝑖𝑖) After naturally communicating with village 
leader candidate 𝑖𝑖 for the first time, the representative village resident or township 
official updates the density kernel of the posterior distribution of her virtue as 
(2.3)                           𝑝𝑝(𝛼𝛼𝑖𝑖|Ω𝑖𝑖1𝛼𝛼 ) = 𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;Ω𝑖𝑖1𝛼𝛼 ). 
This posterior distribution at time 𝑋𝑋 = 1 is also the previous distribution at time 𝑋𝑋 = 2. 
After natural communication at time 𝑋𝑋 = 2, the updated density kernel of the posterior 
distribution is given by 
(2.4)                     𝑝𝑝(𝛼𝛼𝑖𝑖|𝛺𝛺𝑖𝑖1𝛼𝛼 ,𝛺𝛺𝑖𝑖2𝛼𝛼 ) = [𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 )] ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖2𝛼𝛼 ) 
                                                  = 𝛾𝛾(𝛼𝛼𝑖𝑖)[𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 ) ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖2𝛼𝛼 )]. 
    Repeating this iteration, after 𝑁𝑁  times of natural communication between the 
representative village resident or township official and village leader candidate 𝑖𝑖, the 
12 
 
density kernel of the posterior distribution of the virtue of village leader candidate 𝑖𝑖 
becomes (See the Appendix) 
(2.5)                     𝑝𝑝(𝛼𝛼𝑖𝑖|𝛺𝛺𝑖𝑖1𝛼𝛼 ∙∙∙ 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 ) = 𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ [𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 ) … 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 )] 
                                                     ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 {−1
2
�
1
𝛴𝛴(𝛼𝛼𝑖𝑖) (𝛼𝛼𝑖𝑖 − 𝑆𝑆(𝛼𝛼𝑖𝑖))2�}, 
where the posterior mean of the virtue of village leader candidate 𝑖𝑖 is  
(2.6)                          𝑆𝑆(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖, 
and the posterior variance of the virtue of village leader candidate 𝑖𝑖 is  
(2.7)                                          𝛴𝛴(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 . 
    The posterior mean and the posterior variance are dynamic across the times of 
natural communication. That is, at time 𝑋𝑋 = 1, … ,𝑁𝑁, the posterior means of virtue are { 𝜎𝜎𝜐𝜐𝛼𝛼2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝜎𝜎𝛼𝛼𝑒𝑒
2 𝛼𝛼𝑖𝑖
𝐸𝐸 + 𝜎𝜎𝛼𝛼𝑒𝑒2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝜎𝜎𝛼𝛼𝑒𝑒
2 𝛼𝛼𝑖𝑖, … , 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖}, and the posterior variances 
of virtue are { 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝜎𝜎𝛼𝛼𝑒𝑒
2 , … , 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 }. 
      Given different total occurrences of natural communication in relation to election 
and appointment, the representative village resident (by election) or the representative 
township official (by appointment) obtains the posterior mean of the virtue of village 
leader candidate 𝑖𝑖 as 
(2.8)                 𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖, 
and obtains the posterior variance of the virtue of village leader candidate 𝑖𝑖 as 
(2.9)                             𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 . 
  Following a similar Bayesian inference, the density kernel of the posterior 
distribution of the capacity of village leader candidate 𝑖𝑖 is given by (See the Appendix) 
(2.10)                𝑝𝑝�𝜃𝜃𝑖𝑖�𝛺𝛺𝑖𝑖1𝜃𝜃 ∙∙∙ 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 � = 𝛾𝛾(𝜃𝜃𝑖𝑖) ∙ [𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖1𝜃𝜃 �… 𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 �] 
                                                  ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 {−1
2
�
1
𝛴𝛴(𝜃𝜃𝑖𝑖) (𝜃𝜃𝑖𝑖 − 𝑆𝑆(𝜃𝜃𝑖𝑖))2�}, 
where the posterior mean of the capacity of village leader candidate 𝑖𝑖 is  
(2.11)                              𝑆𝑆(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖
𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖 , 
and the posterior variance of the capacity of village leader candidate 𝑖𝑖 is  
13 
 
(2.12)                                        𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 . 
    The posterior mean and the posterior variance are dynamic across occurrences of 
natural communication. That is, at time 𝑋𝑋 = 1, … ,𝑁𝑁, the posterior means of capacity 
are { 𝜎𝜎𝜔𝜔𝜃𝜃2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖
𝐸𝐸 + 𝜎𝜎𝜃𝜃𝑒𝑒2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖 , … , 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖
𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖} , and the posterior 
variances of capacity are { 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝜎𝜎𝜃𝜃𝑒𝑒
2 , … , 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 }. 
    Given the different total occurrences of natural communication in relation to 
election and appointment, the representative village resident (by election) or the 
representative township official (by appointment) obtains the posterior mean of the 
capacity of village leader candidate 𝑖𝑖 as 
(2.13)                𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖 ,        
and their posterior variance of the capacity of village leader candidate 𝑖𝑖 is 
(2.14)                            𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 . 
    Proposition 1 explains how the accumulation of natural communication improves 
inferences of the virtue and capacity of village leader candidates. 
 
PROPOSITION 1: As the times that the representative village resident or township 
official communicates naturally with the village leader candidates increases, their 
inference of each village leader candidate’s virtue and capacity is improved in the 
following aspects: 
(a) Inference Precision increases with occurrences of natural communication, as 
evidenced by the decrease in the posterior variance of virtue (or capacity) with the 
total occurrences of natural communication.  
(b) Inference Accuracy increases with the times of natural communication, as 
evidenced by the decrease in the difference between the posterior mean of virtue (or 
capacity) and the real value of virtue (or capacity) with the total occurrences of 
natural communication.  
(c) Marginal Inference Accuracy decreases with occurrences of natural 
communication, as evidenced by the increase in the second-order derivative of the 
14 
 
difference between the posterior mean of virtue (or capacity) and the real value of 
virtue (or capacity) with total occurrences of natural communication.  
Proof: (a) The first-order derivative of 𝛴𝛴(𝛼𝛼𝑖𝑖) with respect to 𝑋𝑋 is 
(2.15)                                𝜕𝜕[𝛴𝛴(𝛼𝛼𝑖𝑖)]
𝜕𝜕𝑖𝑖
= −𝜎𝜎𝛼𝛼𝑒𝑒4 𝜎𝜎𝜐𝜐𝛼𝛼2(𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 )2 < 0. 
  Similarly, the first-order derivative of 𝛴𝛴(𝜃𝜃𝑖𝑖) with respect to 𝑋𝑋 is 
(2.16)                                 𝜕𝜕[𝛴𝛴(𝜃𝜃𝑖𝑖)]
𝜕𝜕𝑖𝑖
= −𝜎𝜎𝜃𝜃𝑒𝑒4 𝜎𝜎𝜔𝜔𝜃𝜃2(𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 )2 < 0. 
    (b) The difference between the posterior mean of virtue and the real value of virtue 
is 
(2.17)                      |𝑆𝑆(𝛼𝛼𝑖𝑖) − 𝛼𝛼𝑖𝑖| = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 |𝛼𝛼𝑖𝑖𝐸𝐸 − 𝛼𝛼𝑖𝑖| = 𝛴𝛴(𝛼𝛼𝑖𝑖)𝜎𝜎𝛼𝛼𝑒𝑒2 |𝛼𝛼𝑖𝑖𝐸𝐸 − 𝛼𝛼𝑖𝑖|. 
  Therefore, we obtain 
(2.18)                                  𝜕𝜕|𝑆𝑆(𝛼𝛼𝑖𝑖)−𝛼𝛼𝑖𝑖|
𝜕𝜕𝑖𝑖
= |𝛼𝛼𝑖𝑖𝑒𝑒−𝛼𝛼𝑖𝑖|
𝜎𝜎𝛼𝛼𝑒𝑒
2
𝜕𝜕[𝛴𝛴(𝛼𝛼𝑖𝑖)]
𝜕𝜕𝑖𝑖
< 0. 
    Similarly, the difference between the posterior mean of capacity and the real value 
of capacity is 
(2.19)                       |𝑆𝑆(𝜃𝜃𝑖𝑖) − 𝜃𝜃𝑖𝑖| = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 |𝜃𝜃𝑖𝑖𝐸𝐸 − 𝜃𝜃𝑖𝑖| = 𝛴𝛴(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 |𝜃𝜃𝑖𝑖𝐸𝐸 − 𝜃𝜃𝑖𝑖|, 
    Therefore, we obtain 
(2.20)                                  𝜕𝜕|𝑆𝑆(𝜃𝜃𝑖𝑖)−𝜃𝜃𝑖𝑖|
𝜕𝜕𝑖𝑖
= |𝜃𝜃𝑖𝑖𝑒𝑒−𝜃𝜃𝑖𝑖|
𝜎𝜎𝜃𝜃𝑒𝑒
2
𝜕𝜕𝛴𝛴(𝜃𝜃𝑖𝑖)
𝜕𝜕𝑖𝑖
< 0. 
    (c) The second-order derivative of 𝛴𝛴(𝛼𝛼𝑖𝑖) with respect to 𝑋𝑋 is 
(2.21)                                  𝜕𝜕
2𝛴𝛴(𝛼𝛼𝑖𝑖)
𝜕𝜕𝑖𝑖2
＝
2𝜎𝜎𝛼𝛼𝑒𝑒
6 𝜎𝜎𝜐𝜐𝛼𝛼
2(𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 )3 > 0. 
    Therefore, the second-order derivative of |𝑆𝑆(𝛼𝛼𝑖𝑖) − 𝛼𝛼𝑖𝑖| with respect to 𝑋𝑋 is 
(2.22)                               𝜕𝜕
2|𝑆𝑆(𝛼𝛼𝑖𝑖)−𝛼𝛼𝑖𝑖|
𝜕𝜕𝑖𝑖2
= |𝛼𝛼𝑖𝑖𝑒𝑒−𝛼𝛼𝑖𝑖|
𝜎𝜎𝛼𝛼𝑒𝑒
2
𝜕𝜕2𝛴𝛴(𝛼𝛼𝑖𝑖)
𝜕𝜕𝑖𝑖2
> 0. 
    Similarly, the second-order derivative of 𝛴𝛴(𝜃𝜃𝑖𝑖) with respect to 𝑋𝑋 is 
(2.23)                               𝜕𝜕
2𝛴𝛴(𝜃𝜃𝑖𝑖)
𝜕𝜕𝑖𝑖2
＝
2𝜎𝜎𝜃𝜃𝑒𝑒
6 𝜎𝜎
𝜔𝜔𝜃𝜃
2(𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 )3 > 0. 
    Therefore, the second-order derivative of |𝑆𝑆(𝜃𝜃𝑖𝑖) − 𝜃𝜃𝑖𝑖| with respect to 𝑋𝑋 is 
(2.24)                                𝜕𝜕
2|𝑆𝑆(𝜃𝜃𝑖𝑖)−𝜃𝜃𝑖𝑖|
𝜕𝜕𝑖𝑖2
= |𝜃𝜃𝑖𝑖𝑒𝑒−𝜃𝜃𝑖𝑖|
𝜎𝜎𝜃𝜃𝑒𝑒
2
𝜕𝜕2𝛴𝛴(𝜃𝜃𝑖𝑖)
𝜕𝜕𝑖𝑖2
> 0. ∎ 
 
15 
 
    The implication of the improved precision and accuracy of inferences is that each 
natural communication brings local information on the village leader candidates, 
leading to more precise and accurate inferences of their virtue and capacity. For 
marginal inference accuracy, the implication is that as the times of natural 
communication increases, the amount of local information that can be used to infer the 
virtue and capacity of the village leader candidates decreases. 
    3. Institutional Comparison (Inference Accuracy and Precision): To compare 
inference precision and inference accuracy before and after the introduction of local 
direct elections, we assume about 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸  and 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, the total occurrences s of natural 
communication between the representative village resident and the representative 
township official with each village leader candidate. 
 
ASSUMPTION 3: 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. 
 
    Assumption 3 indicates that the representative village resident naturally 
communicates with the village leader candidates more often than the representative 
township official, implying that the representative village resident has an advantage in 
terms of local information about the village leader candidates. The reason is that the 
village leader candidates are also residents of the representative village, and have long-
term and frequent natural communication with other village residents in various 
situations (Bell, 2016). For instance, the village leader candidates and other village 
residents have usually known each other since childhood. As they grow up together in 
the village, they communicate frequently at school, in production or commercial 
activities, and in everyday life. In contrast, residents have fewer opportunities to 
communicate naturally with township officials. The reasons may be that the village 
leader candidates usually communicate with township officials when dealing with the 
public affairs of the village or private affairs related to township administration, and 
that township officials are often posted across different towns. 
    Given that 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, we compare the precision and accuracy of the inferences 
of the virtue and capacity of each village leader candidate before and after the 
introduction of local direct elections. 
    (a) The representative village resident’s inferences regarding the virtue and capacity 
of the village leader candidates are more precise before electing one as village leader 
16 
 
compared with those of the representative township official before appointing one as 
village leader. 
    (b) The representative village resident’s inferences regarding the virtue and capacity 
of the village leader candidates are more accurate before electing one as village leader, 
compared with those of the representative township official before appointing one as 
village leader. 
 
 
 
  
    Figure 1 shows that as the times the representative village resident or township 
official communicates naturally with the village leader candidates increases, (a) the 
bandwidth of the square root of the posterior variance decreases, reflecting greater 
inference precision; (b) the difference between the posterior mean of virtue (or 
capacity) and the real value of virtue (or capacity) decreases, reflecting greater 
inference accuracy; and (c) the curve of the posterior mean is concave ascending and 
convex descending, reflecting reduced marginal inference accuracy. 
    As shown in Figure 1, the bandwidth representing the square root of the posterior 
variance of the virtue or capacity of the village leader candidates by election is lower 
than that by appointment, implying more precise inference by the representative 
village resident. The difference between the posterior mean of the virtue or capacity 
of the village leader candidates and its real value by election is smaller than that by 
appointment, implying greater inference accuracy by the representative village 
resident. 
    In summary, the representative village resident, by naturally communicating more 
often with the village leader candidates, has more local information about their virtue 
and capacity than about the representative township official. Therefore, as local direct 
elections allow the representative village resident and not the representative township 
official to select the village leader, the virtue and capacity of the village leader 
candidates are inferred with greater precision and accuracy.  
This theoretical demonstration applies to all village committee members. The 
inferences for each village committee member, including the village leader as 
[Insert Figure 1 here] 
17 
 
representative, are homogenous. Accordingly, the virtue and capacity of each village 
committee member are inferred more precisely and accurately. 
 
B. Selection of Village Committee Members 
    In this section, we discuss how local direct elections facilitate the meritocratic 
selection of village committee members through the improved inference effectiveness. 
We discuss the village leader as the representative of all village committee members, 
and our theoretical findings also apply to other village committee members. We find 
that by providing more accurate inferences about the virtue and capacity of village 
leader candidates, local direct election in a representative village improves the 
expected competence of the village leader. In addition, by providing more precise 
inferences about the virtue and capacity of village leader candidates, local direct 
election reduces the variance in the competence of the village leader.  
    1. Setup: The representative village resident and township official both select the 
village leader candidate with the highest competence as the village leader. Our theory 
defines 𝜋𝜋𝑖𝑖, the competence of village leader candidate 𝑖𝑖, as a weighted average of the 
virtue and capacity of village leader candidate 𝑖𝑖, such that 𝜋𝜋𝑖𝑖 ≡ 𝜇𝜇𝛼𝛼𝑖𝑖 + (1 − 𝜇𝜇)𝜃𝜃𝑖𝑖.  
    𝜇𝜇 represents the weight assigned by the representative village resident or township 
official to the virtue of the village leader candidates, and 𝜇𝜇 ∈ [0, 1] 17, so 𝜇𝜇 is also 
called the village leader’s virtue-capacity spectrum. We have the following assumption 
about 𝜇𝜇: 
 
ASSUMPTION 4 (Village leader’s virtue-capacity spectrum): 
𝜇𝜇 = 𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇 = 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉. 
 
    Assumption 4 states that 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉, the representative village resident’s preference for 
the village leader’s virtue-capacity spectrum, is equal to 𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇 , the representative 
township official’s preference for the village leader’s virtue-capacity spectrum, which 
is valued with 𝜇𝜇. 
                                                          
17  The value that 𝜇𝜇  takes has general implications. In public sectors, we assume politicians’ virtue-capacity 
spectrum to be 𝜇𝜇 ∈ (0.5, 1]. In contrast, in private sectors, we assume the leaders’ virtue-capacity spectrum to be 
𝜇𝜇 ∈ [0, 0.5) because private sectors have less public purpose, instead tending to emphasize making profits. 
18 
 
    𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉 is contingent on the village leader’s responsibility in managing village affairs 
and serving village residents. In other words, the representative village resident will 
select a village leader whose virtue-capacity spectrum is 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉 in order to safeguard 
village residents’ rights and protect village residents’ interests. It is stipulated that the 
village leader not only to be capable of developing the local economy, but also to abide 
by the laws, be intrinsically motivated to serve the people, and protect people’s rights 
and interests (National People’s Congress of China, 1998). Therefore, the assumption 
underlying 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉 is that there exists a virtue-capacity spectrum of the village leader, 
and that the village leader makes full use of her virtue and capacity to safeguard and 
protect the representative village resident’s rights and interests. 
    𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇 = 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉  implies that the representative township official would select a 
village leader whose virtue-capacity spectrum is 𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇, which is equal to 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉. This 
is because the representative township official’s preference for the village leader’s 
virtue-capacity spectrum is stipulated consistent with the representative village 
resident’s preference. In other words, the representative township official is required 
to safeguard and protect the representative village resident’s rights and interests in 
each village (National People’s Congress of China, 2004). 
    Expected Competence. To calculate 𝜋𝜋𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴), the expected competence of the 
elected (or appointed) village leader in a representative village, we calculate 
𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖), the weighted average of the competence of all village leader candidates 
in a representative village that has already (or has not) introduced local direct elections, 
with weight 𝐴𝐴𝑖𝑖  as the probability that village leader candidate 𝑖𝑖 will be elected (or 
appointed). 𝐴𝐴𝑖𝑖 has the following properties:  
(a) 𝐴𝐴𝑖𝑖 is contingent on the competence of village leader candidate 𝑖𝑖. 
(b) 𝐴𝐴𝑖𝑖 ∈ [0, 1]; thus, its value represents the probability of electing or appointing 
village leader candidate 𝑖𝑖 as village leader.  
(c) 𝐴𝐴𝑖𝑖 is positively associated with the virtue and capacity of village leader candidate 
𝑖𝑖, which reflects a positive screening of the election and appointment of village leaders 
(Dal Bó et al., 2017) based on the candidates’ virtue and capacity. Specifically, 𝜕𝜕𝐴𝐴𝑖𝑖
𝜕𝜕𝛼𝛼𝑖𝑖
>0 and 𝜕𝜕𝐴𝐴𝑖𝑖
𝜕𝜕𝜃𝜃𝑖𝑖
> 0. 
    To satisfy these three properties, for simplicity and without loss of generality, we 
assume that village leader candidate 𝑖𝑖’s probability of being elected or appointed as 
19 
 
village leader increases linearly with the weighted average of her posterior mean of 
virtue and her posterior mean of capacity, with weight 𝜇𝜇, the virtue-capacity spectrum. 
Following Alesina and Tabellini (2007), the probability of being elected or appointed 
can be considered a reward. Therefore, the probability that village leader candidate 𝑖𝑖 
will be elected or appointed is given by 
(2.25)      𝐴𝐴𝑖𝑖 = 𝑙𝑙[𝜇𝜇𝑆𝑆(𝛼𝛼𝑖𝑖) + (1 − 𝜇𝜇)𝑆𝑆(𝜃𝜃𝑖𝑖)] 
                 = 𝜇𝜇𝑙𝑙[𝛴𝛴(𝛼𝛼𝑖𝑖)
𝜎𝜎𝛼𝛼𝑒𝑒
2 𝛼𝛼𝑖𝑖
𝐸𝐸 + (1 − 𝛴𝛴(𝛼𝛼𝑖𝑖)
𝜎𝜎𝛼𝛼𝑒𝑒
2 )𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[𝛴𝛴(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + (1 − 𝛴𝛴(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 )𝜃𝜃𝑖𝑖], 
where 𝛴𝛴(𝛼𝛼𝑖𝑖) ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 , 𝛴𝛴(𝜃𝜃𝑖𝑖) ≡ 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 , and 𝑙𝑙 ∈ [0, 1]. 
    As discussed in Section 2.3.A., 𝛴𝛴(𝛼𝛼𝑖𝑖) and 𝛴𝛴(𝜃𝜃𝑖𝑖) measure inference precision. As 
the number of occurrences of natural communication increases, 𝛴𝛴(𝛼𝛼𝑖𝑖)  and 𝛴𝛴(𝜃𝜃𝑖𝑖) 
decrease; thus, 𝐴𝐴𝑖𝑖 tends to be the product of the inferred competence and 𝑙𝑙. Therefore, 
when 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 , 𝛴𝛴(𝛼𝛼𝑖𝑖) = 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸(𝛼𝛼𝑖𝑖), and 𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸(𝜃𝜃𝑖𝑖), we have 𝐴𝐴𝑖𝑖 = 𝐴𝐴𝑖𝑖𝐸𝐸𝐸𝐸𝐸𝐸 , the 
probability that village leader candidate 𝑖𝑖 will be elected. When 𝑁𝑁 = 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, 𝛴𝛴(𝛼𝛼𝑖𝑖) =
𝛴𝛴𝐴𝐴𝐴𝐴𝐴𝐴(𝛼𝛼𝑖𝑖), and 𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝛴𝛴𝐴𝐴𝐴𝐴𝐴𝐴(𝜃𝜃𝑖𝑖), we have 𝐴𝐴𝑖𝑖 = 𝐴𝐴𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴 , the probability that village 
leader candidate 𝑖𝑖 will be appointed. 
    When calculating the weighted average of the competence of all village leader 
candidates with weight 𝐴𝐴𝑖𝑖, as ∫ ∫ 𝐴𝐴𝑖𝑖
1
0
𝑑𝑑𝛼𝛼𝑖𝑖
1
0
𝑑𝑑𝜃𝜃𝑖𝑖 < 1—that is, the sum of all weights is 
less than 1—we should have ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖 + (1 − 𝜇𝜇)𝜃𝜃𝑖𝑖]10 𝐴𝐴𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖, the weighted average 
of the competence of all village leader candidates, divided by ∫ ∫ 𝐴𝐴𝑖𝑖
1
0
𝑑𝑑𝛼𝛼𝑖𝑖
1
0
𝑑𝑑𝜃𝜃𝑖𝑖  to 
standardize the weights. As a result, the expected competence of the elected (or 
appointed) village leader is given by 
(2.26)              𝜋𝜋𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖) = ∫ ∫ [𝜋𝜋𝑖𝑖]𝐴𝐴𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)10 𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖
∫ ∫ 𝐴𝐴𝑖𝑖
𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)1
0 𝑑𝑑𝛼𝛼𝑖𝑖
1
0 𝑑𝑑𝜃𝜃𝑖𝑖
, 
where the probability that village leader candidate 𝑖𝑖 will be elected (or appointed) is 
(2.27)           𝐴𝐴𝑖𝑖
𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) = 𝜇𝜇𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) + (1 − 𝜇𝜇)𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) 
                                    = 𝜇𝜇𝑙𝑙 �𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖)
𝜎𝜎𝛼𝛼𝑒𝑒
2 𝛼𝛼𝑖𝑖
𝐸𝐸 + �1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖)
𝜎𝜎𝛼𝛼𝑒𝑒
2 � 𝛼𝛼𝑖𝑖� 
                                       +(1 − 𝜇𝜇)𝑙𝑙[𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖)
𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖
𝐸𝐸 + (1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖)
𝜎𝜎𝜃𝜃𝑒𝑒
2 )𝜃𝜃𝑖𝑖], 
where 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 , 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) ≡ 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 , and 𝑙𝑙 ∈[0, 1]. This shows that what distinguishes the expected competence of the elected 
20 
 
village leader and the appointed village leader in a representative village is the times 
that the representative village resident and township official communicate naturally 
with the village leader candidates. 
    Variance of Competence. We can also calculate 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋), the variance of 
the competence of the elected (or appointed) village leader in a representative village. 
To this end, we calculate 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖), the variance of the competence of all 
village leader candidates in a representative village that has already (or has not) 
introduced local direct elections. By definition, the variance of the competence of the 
elected (or appointed) village leader in a representative village is  
(2.28)          𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋) = 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖) 
 = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[(𝜋𝜋𝑖𝑖)2] − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)2[𝜋𝜋𝑖𝑖]. 
This measures the extent to which the competence of the elected (or appointed) village 
leader varies. Similar to the expectation, what distinguishes the variance of the 
competence of the elected village leader and the appointed village leader in a 
representative village is the times that the representative village resident and township 
official communicate naturally with the village leader candidates. 
    2. Institutional Comparison (Meritocratic Selection of Village Leaders): 
Proposition 2 compares [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 , the expected competence of the elected village 
leader, with [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴, the expected competence of the appointed village leader.  
 
PROPOSITION 2: The expected competence of the elected village leader is greater 
than that of the appointed village leader in a representative village. Specifically, we 
have 
(2.29)                                   [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 > [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴, 
where [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋𝑖𝑖) and [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 ≡ 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖). Here are some specific cases: 
    Case 1: 𝜇𝜇 = 1; that is, the representative village resident or township official only 
considers the virtue of the village leader candidates. 
    Case 2: 𝜇𝜇 = 0; that is, the representative village resident or township official only 
considers the capacity of the village leader candidates. 
    Case 3: 𝜇𝜇 = 1
2
; that is, the representative village resident or township official 
considers the virtue and capacity of the village leader candidates with equal weights. 
21 
 
    Case 4: 𝜇𝜇 ∈ (1
2
, 1); that is, the representative village resident or township official 
puts more emphasis on virtue. This is contingent on 𝜃𝜃𝑖𝑖𝐸𝐸 ∈ [0.5, 1]; that is, the prior 
mean of the capacity of each village leader candidate is greater than the mean of the 
real value of the capacity of all potential village leader candidates. 
    Case 5: 𝜇𝜇 ∈ (0, 1
2
); that is, the representative village resident or township puts more 
emphasis on capacity. This is contingent on 𝛼𝛼𝑖𝑖𝐸𝐸 ∈ [0.5, 1]; that is, the prior mean of 
the capacity of each village leader candidate is greater than the mean of the real value 
of the capacity of all potential village leader candidates. 
Proof: See the Appendix. ∎ 
 
    In practice, Case 4 and Case 5 both hold; that is, the conditions in both cases exist 
all the time. According to the requirements of the OLVC, village leader candidates 
satisfy certain personal characteristics in terms of capacity, such as having a diploma 
or management experience (National People’s Congress of China, 1998). In this sense, 
the mean of the prior perceptions of the representative village resident or township 
official regarding the capacity of each village leader candidate is greater than or equal 
to 0.5, the mean of the capacity of all village residents—in other words, 𝜃𝜃𝑖𝑖𝐸𝐸 ∈ [0.5, 1]. 
The OLVC also requires that village leader candidates satisfy certain personal 
characteristics in terms of virtue, such as having no criminal record or affiliation with 
the Communist Party of China (National People’s Congress of China, 1998). In this 
sense, the mean of the prior perceptions of the representative village resident or 
township official regarding the virtue of the village leader candidates is greater than 
or equal to 0.5, the mean of the virtue of all village residents. 
    Proposition 3 compares 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋), the variance of the competence of the elected 
village leader, with 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋), the variance of the competence of the appointed 
village leader. 
 
PROPOSITION 3: The variance of the competence of the elected village leader is 
smaller than that of the appointed village leader in a representative village. 
Specifically, we have 
(2.30)                               𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) < 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) 
regardless of the value of 𝜇𝜇 in [0, 1].  
22 
 
    As discussed in (2.28), 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) = 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋𝑖𝑖)  and 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) =
𝑉𝑉𝑉𝑉𝑉𝑉𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖). 
Proof: See the Appendix. ∎ 
 
    In summary, after the introduction of local direct elections in a representative village, 
the expected competence of the village leader increases, while its variance decreases. 
This implies that by providing local information on the virtue and capacity of village 
leader candidates, local direct elections facilitate the meritocratic selection of village 
leaders. 
    This theoretical demonstration applies to all village committee members. The 
selection of each village committee member, including the village leader as 
representative, is homogenous. Thus, the expected competence of each village 
committee member is also homogenous. Accordingly, by providing local information 
on the virtue and capacity of village committee candidates, local direct elections 
increase the expected competence of each village committee member while reducing 
the variance. In other words, local direct elections facilitate the meritocratic selection 
of all village committee members. 
 
C. Summary 
    This section demonstrates that local direct elections facilitate meritocratic selection 
for village committee members. Local direct elections, by providing local information 
on each village committee candidate, improve the inference effectiveness of the virtue 
and the capacity of each candidate. After local direct elections transfer the authority 
for selecting village committee members from township officials to village residents, 
who have an advantage regarding local information about these candidates, the virtue 
and capacity of these candidates are inferred with greater precision and accuracy. As 
a result, the expected competence of village committee members increase and the 
variances decrease, namely the meritocratic selection of village committee members. 
It is noted that with the virtue-capacity spectrum in our model, our demonstrations 
apply to a continuum of selection scenarios, from purely virtue-based to purely 
capacity-based.  
 
2.4. Meritocratic Selection with the Improved Candidate Pool 
23 
 
    Our theoretical model based on the Bayesian inference framework investigates how 
the introduction of local direct election to a representative village facilitates the 
meritocratic selection of the village party secretary. The mechanism is that as the 
introduction of local direct election facilitates the meritocratic selection of village 
committee members, the candidate pool of the village party secretary is improved. 
 
A. Performance-based Promotion of Village Party Secretaries  
    The inferences of the village party secretary candidates and the selection of the 
village party secretary are known as the performance-based promotion, which is 
similar to those discussed in Sections 2.3.A. and 2.3.B. Village party branch members 
are candidates of the village party secretary. The village party branch has two types of 
members: (I) individuals who are both members of the village party branch and the 
village committee, denoted by village party branch member 𝑗𝑗, 𝑗𝑗 = {1,2, … }; and (II) 
individuals who are only members of the village party branch, denoted by village party 
branch member 𝚥𝚥̃ , 𝚥𝚥̃ = {1,2, … } . The village leader is a Type I village party branch 
member. 
    The candidate pool for the village party secretary is partially improved by local 
direct elections. After the introduction of local direct elections, the expected 
competence of Type I village party branch members increases, while that of Type II 
village party branch members remains unchanged. With local direct elections, Type I 
village party branch members are no longer appointed by the representative township 
official, but elected by the representative village resident; thus, their expected 
competence increases, as discussed in Section 2.3.B. In contrast, as Type II village 
party branch members are still appointed by the representative township official, their 
competence remains unchanged. In contrast, before the introduction of local direct 
elections, the expected competence of Type I and that of Type II village party branch 
members are the same because both types were appointed by the representative 
township official. 
    In a representative village, the representative township official cannot observe each 
village party branch member’s virtue or capacity. To infer their virtue, the 
representative township official observes the performance of each village party branch 
member and obtains 𝑃𝑃𝑗𝑗𝑗𝑗𝛼𝛼  or 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼  , the performance on virtue of village party branch 
member𝑗𝑗 or 𝚥𝚥̃ in task 𝑚𝑚 = 1, 2, … ,𝑀𝑀, 𝑀𝑀 being sufficiently large. 𝑃𝑃𝑗𝑗𝑗𝑗𝛼𝛼  is given by 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼
24 
 
＝𝛼𝛼𝑗𝑗 + 𝜀𝜀𝑗𝑗𝑗𝑗 , and 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼  is given by 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼 ＝𝛼𝛼?̃?𝚥 + 𝜀𝜀?̃?𝚥𝑗𝑗 , where 𝜀𝜀𝑗𝑗𝑗𝑗 and 𝜀𝜀?̃?𝚥𝑗𝑗 represent two 
series of random shocks, 𝜀𝜀𝑗𝑗𝑗𝑗, 𝜀𝜀?̃?𝚥𝑗𝑗~𝑁𝑁(0,𝜎𝜎𝜀𝜀2).  
    Similarly, the representative township official observes the performance of each 
village party branch member and obtains 𝑃𝑃𝑗𝑗𝑗𝑗𝜃𝜃  or 𝑃𝑃?̃?𝚥𝑗𝑗𝜃𝜃 , the performance on capacity of 
village party branch member 𝑗𝑗 or 𝚥𝚥̃ in task 𝑚𝑚 = 1, 2, … ,𝑀𝑀, 𝑀𝑀 being sufficiently large. 
𝑃𝑃𝑗𝑗𝑗𝑗
𝜃𝜃  is given by 𝑃𝑃𝑗𝑗𝑗𝑗𝜃𝜃 ＝𝜃𝜃𝑗𝑗 + 𝜂𝜂𝑗𝑗𝑗𝑗, and 𝑃𝑃?̃?𝚥𝑗𝑗𝜃𝜃  is given by 𝑃𝑃?̃?𝚥𝑗𝑗𝜃𝜃 ＝𝜃𝜃?̃?𝚥 + 𝜂𝜂?̃?𝚥𝑗𝑗, where 𝜂𝜂𝑗𝑗𝑗𝑗 and 
𝜂𝜂?̃?𝚥𝑗𝑗 represent two series of random shocks18, 𝜂𝜂𝑗𝑗𝑗𝑗 , 𝜂𝜂?̃?𝚥𝑗𝑗~𝑁𝑁(0,𝜎𝜎𝜂𝜂2).  
    Prior to performance assessment, the representative township official has her own 
prior perceptions of the virtue and capacity of each village party branch member, 
whose distribution is discussed below. 
 
ASSUMPTION 5 (Prior distribution of the virtue and capacity of village party branch 
members): The representative township official’s prior perceptions of the virtue of 
village party branch member 𝑗𝑗  or 𝚥𝚥̃  are distributed as 𝑁𝑁�𝛼𝛼𝑗𝑗𝑢𝑢,𝜎𝜎𝛼𝛼𝑢𝑢2 �  or 𝑁𝑁�𝛼𝛼?̃?𝚥𝑢𝑢,𝜎𝜎𝛼𝛼𝑢𝑢2 � , 
truncated at [0,1] , where 𝛼𝛼𝑗𝑗𝑢𝑢,𝛼𝛼?̃?𝚥𝑢𝑢  ∈ [0, 1] . Her prior perceptions of the capacity of 
village party branch member 𝑗𝑗𝑘𝑘 or 𝑗𝑗𝑘𝑘�  are distributed as 𝑁𝑁�𝜃𝜃𝑗𝑗𝑢𝑢,𝜎𝜎𝜃𝜃𝑢𝑢2 � or 𝑁𝑁�𝜃𝜃?̃?𝚥𝑢𝑢,𝜎𝜎𝜃𝜃𝑢𝑢2 � , 
truncated at [0,1], where 𝜃𝜃𝑗𝑗𝑢𝑢, 𝜃𝜃?̃?𝚥𝑢𝑢  ∈ [0, 1]. The prior means and the prior variances 
are known to the representative township official.  
 
    Therefore, similar to Section 2.3.A., the posterior mean of the virtue of village party 
branch member 𝑗𝑗 is  
(2.31)                       𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝛼𝛼𝑗𝑗� = 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗 , 
and the posterior mean of the virtue of village party branch member 𝚥𝚥̃ is  
(2.32)                       𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝛼𝛼?̃?𝚥� = 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥. 
    The posterior mean of the capacity of village party branch member 𝑗𝑗 is  
(2.33)                        𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝜃𝜃𝑗𝑗� = 𝜎𝜎𝜂𝜂2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃𝑗𝑗 , 
and the posterior mean of the capacity of village party branch member 𝚥𝚥̃ is  
(2.34)                        𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝜃𝜃?̃?𝚥� = 𝜎𝜎𝜂𝜂2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃?̃?𝚥𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃?̃?𝚥. 
                                                          
18 This follows Jones and Olken (2005), Besley et al. (2011), Yao and Zhang (2015), and Bloom et al. (2015). 
25 
 
    Similar to Section 2.3.B., the representative township official selects the village 
party branch member with the highest competence as the village party secretary. Our 
theory defines 𝜋𝜋𝑗𝑗 , the competence of Type I village party branch member 𝚥𝚥̃ , as a 
weighted average of the virtue and capacity of village party branch member 𝑗𝑗, such 
that 𝜋𝜋𝑗𝑗 ≡ 𝜇𝜇𝛼𝛼𝑗𝑗 + (1 − 𝜇𝜇)𝜃𝜃𝑗𝑗 , and defines 𝜋𝜋?̃?𝚥 , the competence of Type II village party 
branch member 𝚥𝚥̃, as a weighted average of the virtue and capacity of village party 
branch member 𝚥𝚥̃, such that 𝜋𝜋?̃?𝚥 ≡ 𝜇𝜇𝛼𝛼?̃?𝚥 + (1 − 𝜇𝜇)𝜃𝜃?̃?𝚥. [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝐸𝐸 is the competence of Type 
I village party branch member 𝑗𝑗, who was elected, and [𝜋𝜋𝑗𝑗]𝐴𝐴𝐴𝐴𝐴𝐴 is the competence of 
Type I village party branch member 𝑗𝑗, who was appointed. As Type II village party 
branch members are always appointed, 𝜋𝜋?̃?𝚥 is uncorrelated with the introduction of local 
direct elections. 
    𝜇𝜇  represents the village party secretary’s virtue-capacity spectrum, which is 
assumed to be the same as the village leader’s virtue-capacity spectrum. This is 
because (1) both the village leader and the village party secretary deal with village 
affairs and (2) the representative township official, to safeguard and protect village 
residents’ rights and interests, is required to select the most competent village leader 
and the most competent village party secretary (National People’s Congress of China, 
2004). 
    Similar to Section 2.3.B., the likelihood of appointing village party branch member 
𝑗𝑗𝑘𝑘 as the village party secretary increases linearly with the posterior mean of her 
competence. Specifically, 
(2.35)                     𝑅𝑅𝑗𝑗 = 𝜇𝜇𝜇𝜇 � 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗� 
                                     +(1 − 𝜇𝜇)𝜇𝜇 � 𝜎𝜎𝜂𝜂2
𝜎𝜎𝜂𝜂
2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2 𝜃𝜃𝑗𝑗
𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2
𝜎𝜎𝜂𝜂
2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2 𝜃𝜃𝑗𝑗� 
and the likelihood of appointing village party branch member 𝚥𝚥̃ as the village party 
secretary increases linearly with the posterior mean of her competence. Specifically, 
(2.36)                     𝑅𝑅?̃?𝚥 = 𝜇𝜇𝜇𝜇 � 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥� 
                                     +(1 − 𝜇𝜇)𝜇𝜇 � 𝜎𝜎𝜂𝜂2
𝜎𝜎𝜂𝜂
2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2 𝜃𝜃?̃?𝚥
𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2
𝜎𝜎𝜂𝜂
2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2 𝜃𝜃?̃?𝚥�, 
where 𝜇𝜇 ∈ [0, 1]. 
 
26 
 
B. Selection of Village Party Secretaries  
    In this section, we discuss how local direct elections facilitate the meritocratic 
selection of the village party secretary through the improved candidate pool. We find 
that because the local direct election in a representative village improves the expected 
competence of village committee members, who are candidates of the village party 
secretary, the expectation of the competence of the village party secretary is also 
improved. However, the variance of the competence of the village party secretary is 
ambiguously changed.  
    Expected Competence. This section calculates [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆, the expected competence of 
the village party secretary. It then compares [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸  and [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 . [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 
refers to the expected competence of the village party secretary, of whom part of the 
candidates, namely the Type I village party branch members, are directly elected by 
local village residents. [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 refers to the expected competence of the village 
party secretary, of whom part of the candidates, namely the Type I village party branch 
members, are appointed by township officials. 
    To calculate [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆, we calculate the weighted average of the competence of all 
elected (or appointed) Type I village party branch members and that of all Type II 
village party branch members. As discussed above, [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) is the competence of 
Type I village party branch member 𝑗𝑗, who was elected or appointed, and 𝑅𝑅𝑗𝑗
𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) is 
village party branch member 𝚥𝚥̃’s likelihood of being appointed village party secretary. 
As discussed earlier, 𝜋𝜋?̃?𝚥 is not correlated with the introduction of local direct elections.  
    As a result, the expected competence of the village party secretary is (2.37)                   [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆 ≡ 𝔼𝔼 (𝜋𝜋𝑗𝑗,?̃?𝚥) = ∫ 𝜋𝜋𝑗𝑗𝑉𝑉𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�
∫ 𝑉𝑉𝑗𝑗
1
0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥�
1
0 𝑑𝑑𝜋𝜋𝚥𝚥�
. 
    Specifically, we compare (2.38)                [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝑒𝑒𝑉𝑉𝑗𝑗𝐸𝐸𝐸𝐸𝑒𝑒10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�
∫ 𝑉𝑉𝑗𝑗
𝐸𝐸𝐸𝐸𝑒𝑒1
0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥�
1
0 𝑑𝑑𝜋𝜋𝚥𝚥�
 
and (2.39)                [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 ≡ 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐴𝐴𝐴𝐴𝐴𝐴𝑉𝑉𝑗𝑗𝐴𝐴𝐴𝐴𝐴𝐴10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�
∫ 𝑉𝑉𝑗𝑗
𝐴𝐴𝐴𝐴𝐴𝐴1
0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥�
1
0 𝑑𝑑𝜋𝜋𝚥𝚥�
. 
    Proposition 4 explains how local direct elections improves the competence of the 
village party secretary in a representative village, because of the improved candidate 
pool. 
27 
 
 
PROPOSITION 4: After the introduction of local direct elections in a representative 
village, the expected competence of the village party secretary increases due to the 
increased expected competence of Type I village party branch members. Specifically, 
we have 
(2.40)                                  [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 > [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 
That is, the expected competence of a village party secretary who has been directly 
elected as a village committee member is greater than that of a village party secretary 
who has never been elected directly as a village committee member. 
Proof: See the Appendix. ∎ 
 
    Variance of Competence. This section calculates 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋) , the variance of the 
competence of the village party secretary. It then compares 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋)  and 
𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋). 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) refers to the variance of the competence of the village 
party secretary, of whom part of the candidates, namely the Type I village party branch 
members, are directly elected by local village residents. 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) refers to the 
variance of the competence of the village party secretary, of whom part of the 
candidates, namely the Type I village party branch members, are appointed by 
township officials. 
    By definition, the variance of the competence of the village party secretary in a 
representative village is  (2.41)                        𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋) = 𝔼𝔼 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼2��𝜋𝜋𝑗𝑗,?̃?𝚥�� 
    Specifically, we compare (2.42)                     𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸2��𝜋𝜋𝑗𝑗,?̃?𝚥�� 
and (2.43)                     𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) = 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴2��𝜋𝜋𝑗𝑗,?̃?𝚥�� 
    Proposition 5 discusses how local direct elections facilitate the meritocratic 
selection of the village party secretary because of the improved candidate pool. 
 
PROPOSITION 5: After the introduction of local direct elections in a representative 
village, the variance of the competence of the village party secretary is changed 
ambiguously 
28 
 
Proof: See the Appendix. ∎ 
 
C. Summary  
    This section demonstrates that local direct elections indirectly and limitedly 
facilitate the meritocratic selection of village party secretaries by improving the 
candidate pools. The introduction of local direct elections turns some village party 
branch members, who are also village committee members, from appointed to elected, 
and thus improves the expected competence of those village committee members. 
Therefore, the expected competence of village party branch members is also improved. 
In this sense, as the pools of the candidates of village party secretaries are partially 
improved, the expected competence of village party secretaries also increases. 
However, the variance of the competence of village party secretaries is ambiguously 
changed. 
 
2.5. Concluding Remarks 
    To the best of our knowledge, this paper is the first to use local information to 
address adverse selection in political selection. Two major problems of political 
economy, both related to information asymmetry, are political selection that suffers 
from adverse selection and political incentive that suffers from moral hazard. Many 
studies address moral hazard, either by explaining current institutional arrangements 
or by designing new mechanisms, to discuss the incentives of politicians (Laffont, 
2001; Besley, 2006). However, few studies discuss adverse selection in the selection 
of politicians (Besley, 2005). In contrast, adverse selection is commonly discussed, for 
example, in the literature on job market signaling (Spence, 1973) and product 
advertising (Milgrom and Roberts, 1986).  
    This paper’s theory emphasizes the role of local information in addressing adverse 
selection in political selection. It is shown that accumulated local information, 
quantified with the increasing numbers of natural communication between political 
candidates and political decision makers, improve the accuracy of inferences of each 
political candidate’s virtue and capacity. Specifically, as the number of occurrences of 
natural communication increases, political decision makers infer each political 
candidate’s virtue and capacity more accurately and precisely. To frame the theory, this 
paper uses the Bayesian inference framework rather than the game theory framework 
29 
 
because this paper focuses on the influences of accumulating local information on the 
inference effectiveness rather than the strategic behaviors of political candidates and 
decision makers. 
    The essential mechanism through which local direct elections work in the 
meritocratic selection of politicians is the local information on the political candidates 
that local direct elections provide. In the context of Chinese local governance, by 
providing local information on each village committee candidate’s virtue and capacity, 
the introduction of local direct elections enhances the expected competence of each 
village committee member and reduces the variance. Further, the expected competence 
of the village party secretary is also enhanced indirectly. In this sense, to evaluate the 
effect of an institutional arrangement on meritocratic selection, one of the criteria is 
whether such an institutional arrangement provides sufficient local information to infer 
political candidates’ virtue and capacity. 
    This paper’s theory, although discussed in the context of Chinese local governance, 
has general implications. The essential scenario that this paper’s theory discusses is 
characterized as grassroots-level small groups in the stratified governance structure, 
regardless of whether in a rural or urban area or a public or private sector. In this 
essential scenario, the leader of each small group is locally and directly elected and 
then has the opportunity to be promoted upward based on her performance. A typical 
example is an organization with multiple departments, in which the head of each 
department is directly elected within each department; those heads are likely to be 
promoted to the board of the organization based on their performance. In this process, 
the local information on each head candidate’s virtue and capacity is aggregated, 
facilitating the meritocratic selection of each head and the board directors.  
    Several limitations and extensions should be noted. (1) This paper only discusses 
the unilateral inference of local leader candidates by the decision makers (local voters 
or upper officials), and it finds that local direct elections work in meritocratic selection. 
However, when discussing bilateral inference, side effects may accompany local direct 
elections. For instance, compared to upper officials, it could be easier for local leader 
candidates to bribe or conspire with local residents due to their increased natural 
communication. (2) Local voters or upper officials’ perceptions of each local leader 
candidate are assumed to be homogenous. A future study could discuss how various 
distributions of local voters or upper officials’ perceptions influence political selection. 
30 
 
(3) Our theory does not consider whether the meritocratic selection facilitated by the 
local direct election yields an equilibrium in which village committee members, 
village party secretaries, village residents, and township officials all obtain optimized 
gains. Future studies could discuss the existence of such an equilibrium with either 
unilateral inference or bilateral inference.
31 
 
 
Figure 2.1. Inference Accuracy and Precision with Natural Communication Times 
 32 
3. Lagged Variables as Instruments1 
 
Yu Wang2 and Marc Bellemare3 
 
3.1. Introduction 
    To address endogeneity concerns in empirical studies with observational data, it is 
common to use a lagged endogenous variable as the instrumental variable (IV) in 
estimation. This strategy, namely “lagged IV”, is popular among applied researchers, 
because it requires no other variables as IVs, which are usually difficult to find.  
Admittedly, lagged variables may not be proper IVs because they are not exogenous; it is 
often argued that lagged variables could at least alleviate endogeneity to some extent 
(Anderson and Hsiao, 1981; Todd and Wolpin, 2003). However, few formal theoretical 
analyses have been conducted to discuss whether the lagged IV method reduces the threat 
of endogeneity. Applied researchers have few theoretical references regarding under what 
conditions the lagged IV method could alleviate the endogenous problem, and regarding 
under what conditions the lagged IV method would even aggravate such a problem. 
    In this paper, we provide a theoretical argument on the validity of the lagged IV strategy 
in response to the endogeneity concern and simulation results to support our findings. We 
find that when the lagged IV does not have direct causal impact on the explained variable 
or on the unobserved confounder, it only violates the independence assumption, but not 
the exclusion restriction, as stated in the Local Average Treatment Effect (LATE) Theorem. 
In this case, the Local Average Treatment Effect (LATE) in the lagged IV estimation only 
consists of the “restricted local Average Treatment on the Treated (restricted local ATT)”. 
Comparatively, the Average Treatment Effect (ATE) in the OLS estimation consists of 
both the “Average Treatment on the Treated (ATT)” and the selection bias. As a result, 
                                                          
1 We thank Jay Coggins, John Freeman, Paul Glewwe and Steve Miller for valuable comments and suggestions. All 
errors are authors’. 
2  Wang: Corresponding Author, Department of Applied Economics, University of Minnesota, email: 
wang5979@umn.edu. 
3 Bellemare: Department of Applied Economics, University of Minnesota, email: mbellema@umn.edu. 
 33 
when the lagged IV only violates the independence assumption, its estimate could yield 
less extent of endogeneity than the OLS estimate; in other words, the lagged IV method 
could mitigate the endogeneity. 
    However, when the lagged IV has direct causal impact either on the explained variable 
or on the unobserved confounder, or both, it violates not only the independence 
assumption, but also the exclusion restriction. In these three cases, the Local Average 
Treatment Effect (LATE) in the lagged IV estimation consists of both the “relaxed local 
Average Treatment on the Treated (relaxed local ATT)” and the “local selection bias”. 
Comparatively, the Average Treatment Effect (ATE) in the OLS estimation consists of 
both the “Average Treatment on the Treated (ATT)” and the selection bias. As a result, 
when the lagged IV violates both the independence assumption and the exclusion 
restriction, its estimate could yield a larger extent of endogeneity than the OLS estimate, 
and thus the lagged IV method cannot mitigate, but even aggravates the endogeneity.  
    We set up a structural model to compare the ATE in the OLS estimate and the LATE in 
the lagged IV estimate both qualitatively and quantitatively. In this model, the explained 
variable is determined by an explanatory variable, an unobserved confounder variable, 
and perhaps the lagged explanatory variable. The explanatory variable is determined by 
its one-order lagged item and also the unobserved confounder; in addition, the unobserved 
confounder has a positive serial correlation and may also be influenced by the lagged 
explanatory variable. With this model, we discuss four scenarios of causal relationships. 
In Scenario 1, the lagged explanatory variable has no direct causal effect on the explained 
variable, nor on the unobserved confounder. Therefore, Scenario 1 only violates the 
independence assumption. In Scenario 2, the lagged explanatory variable has direct causal 
effect on the explained variable; in Scenario 3, the lagged explanatory variable has direct 
causal effect on the unobserved confounder; and in Scenario 4, the lagged explanatory 
variable has direct causal effect on both. Therefore, Scenarios 2, 3 and 4 violate both the 
independence assumption and the exclusion restriction. 
    In line with our theoretical framework, our numerical analysis and simulation results 
show that in Scenario 1, (1) both the OLS estimate and the lagged IV estimate are biased, 
and the bias of the lagged IV estimate is smaller than that of the OLS estimate. (2) Both 
the OLS estimate and the lagged IV estimate are consistent. (3) The larger extent to which 
 34 
the independence assumption is violated, the higher bias the lagged IV estimate suffers 
from. (4) The root mean squared errors (RMSEs) show similar patterns as the biases. (5) 
The likelihood that the lagged IV estimate suffers from the type-I error is very high, and 
close to 1. In a word, when only violating the independence assumption, the lagged IV 
method is acceptable as its estimate is consistent, and has less bias than the OLS, yet is 
still problematic in the likelihood of type-I error that its estimate suffers from. 
    In Scenarios 2, 3 and 4, our numerical analysis and simulation results show that (1) both 
the OLS estimate and the lagged IV estimate are biased, and the bias of the lagged IV 
estimate is smaller than that of the OLS estimate. (2) Both the OLS estimate and the lagged 
IV estimate are inconsistent. In Scenarios 2 and 4, the lagged IV estimate has a larger 
extent of inconsistency than the OLS estimate; in Scenario 3, it is ambiguous whether the 
lagged IV estimate has a larger extent of inconsistency than the OLS estimate or not. (3) 
The larger extent to which the exclusion restriction is violated, the higher the bias the 
lagged IV estimate suffers from. (4) The root mean squared errors (RMSEs) show similar 
patterns as the biases. (5) The likelihood that the lagged IV estimate suffers from the type-
I error is very high, and close to 1. In a word, when violating both the independence 
assumption and the exclusion restriction, the lagged IV method is unacceptable as its 
estimate is inconsistent, could even aggravate the endogeneity by enlarging the bias in the 
estimate, and is problematic in the likelihood of type-I error that its estimate suffers from. 
    It is argued in Blundell and Bond (1998, 2000) that since the lagged explanatory 
variable is weakly correlated with the endogenous explanatory variable’s first difference, 
the GMM method using lagged explanatory variables may not solve endogeneity. Instead 
of discussing the GMM method only, our analysis provides more general types of 
endogeneity, focusing on using the one-order lagged explanatory variable as the single IV 
in estimation, as commonly done in empirical studies. Rossi (2014) also argues against 
using the lagged explanatory variable as the IV. Our analysis results, based on 
mathematical proof and simulation, are consistent with the previous literature. For applied 
researchers in social sciences, our findings provide implications that the lagged IV method 
cannot mitigate, and may even aggravate, endogeneity. Our analysis also contributes to 
the credible estimates of causal inference with the LATE Theorem in instrumental 
estimation (Angrist et al., 1996; Imbens, 2014). 
 35 
    To see how common the practice of lagged IV method is, we examine all articles 
published in the top general academic journals in economics and political science. We 
identify the articles using lagged IV method by searching the text of each paper for key 
words such as “lag”, “lagged” or “lagging”, then determining whether in any of those 
papers the endogenous variables are using their lagged items as their instrumental 
variables in estimations. In these papers, lagged explanatory variables are used as 
instrumental variables either as a main method to address the endogeneity, or as a 
robustness check for the baseline estimation result. 
    Table 1 shows the number of papers using the lagged IV method, published in economic 
journals including American Economic Review, Econometrica, Journal of Political 
Economy, Quarterly Journal of Economics, Review of Economic Studies and Review of 
Economics and Statistics, and in political science journals including American Political 
Science Review, American Journal of Political Science, British Journal of Political 
Science, Comparative Political Studies and Journal of Politics, between 2013 and 2018. 
In total, we find 31 papers in 2013-2018 using the lagged IV method, of which 19 are in 
economics and 12 are in political sciences. Narrowing to 2015-2018, 15 papers use the 
lagged IV method, of which 9 are in economics and 6 are in political science.  
    These papers use first-order lagged, or first-order lagged with multi-order lagged 
explanatory variable as instrumental variables to alleviate the endogenous concerns, 
which, in most papers, are attributed from unobserved confounders. Most papers mention 
that the data availability of lagged explanatory variable is one of the key reasons why it is 
used as the IV; however, they didn’t discuss the difference between the bias of lagged IV 
method and that of OLS in detail. More importantly, seldom do they discuss whether the 
lagged explanatory variable has an explicit direct causal effect on the explained variable 
in identification, namely whether the lagged IV violates the exclusion restriction, which 
makes the estimation bias of the lagged IV method more questionable. 
    This literature review shows that the lagged IV method is commonly used in economic 
and political science research, and that authors of those papers believe that although the 
lagged IV method may only mitigate the endogeneity, the lagged endogenous variable is 
a somewhat valid instrumental variable, because it is at least exogenous to some extent, 
and satisfies the relevance restriction. Our analysis is to see, by comparing with OLS, 
 36 
whether lagged IV method lowers estimation bias, under specific parameter values, or 
exaggerates the bias instead. 
    The rest of this paper is organized as follows. Section 3.2 discusses the theoretical 
framework. Section 3.3 derives the numerical analysis in light of the theoretical 
framework. Section 3.4 presents simulation results which support our numerical analysis 
results. Section 3.5 summarizes.  
 
3.2. Theoretical Framework 
    This section compares the endogeneity in lagged IV and that in OLS qualitatively, by 
deriving the local average treatment effects (LATE) in lagged IV estimation and the 
average treatment effects (ATE) in OLS estimation. In light of the LATE theorem (Angrist 
and Pischke, 2009), this section finds that due to the synchronous relationship between 
the lagged IV and the unobserved confounder, the lagged IV estimation violates the 
independence assumption. As a result, the lagged IV estimate suffers from endogeneity, 
and it is ambiguous whether the lagged IV estimation has less endogeneity than the OLS 
estimate. This section also finds that if the lagged IV influences the explained variable not 
only through the explanatory variable causally, but also through the unobserved 
confounders causally, the lagged IV violates the exclusion restriction in addition to the 
independence assumption. As a result, the lagged IV estimate suffers an explicitly greater 
extent of endogeneity than the OLS estimate. 
 
A. Setup 
    Three sources of endogeneity in identification exist: unobserved confounders that 
influence both the explanatory variable and the explained variable, measurement errors in 
the explanatory variable, and reverse causality between the explanatory variable and the 
explained variable. In empirical studies, the first source is most common, due to the lack 
of randomized treatment, and usually unobserved factors influence both the explained 
variable and the explanatory variable (Stock and Trebbi, 2003; Angist and Krueger, 2001).  
    Empirically there are three reasons why the lagged explanatory variable may serve as a 
valid IV. For the relevance restriction, autocorrelation in the explanatory variable implies 
that the endogenous variable is, to some extent, correlated with its lagged item. For the 
 37 
exclusion restriction, suppose theoretically no causal relationship exists between the 
lagged explanatory variable and the explained variable, it is held that the lagged 
explanatory variable is highly likely to be an exogenous IV. For data availability, the 
lagged IV method requires no other data and is convenient in panel data sets, especially 
with the increasing availability of long panel data sets. 
    However, in the case of unobserved confounders, if autocorrelation exists both in the 
explanatory variable and in the unobserved confounders, the lagged explanatory variable 
could be correlated, through the lagged unobserved confounders, with the unobserved 
confounders in the current period, leading to biased estimates. To explain this, suppose 
the structural model is that 
𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.1) 
where 𝑌𝑌𝑖𝑖𝑖𝑖,𝑋𝑋𝑖𝑖𝑖𝑖,𝑋𝑋𝑖𝑖,𝑖𝑖−1,𝑈𝑈𝑖𝑖𝑖𝑖 represent the explained variable, the explanatory variable, the 
lagged explanatory variable and the unobserved confounder, respectively, and 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖𝑖𝑖,𝑈𝑈𝑖𝑖𝑖𝑖) ≠ 0, so that there is indeed an identification problem. If 𝜉𝜉 ≠ 0, the lagged 
explanatory variable has a direct impact on the explained variable; otherwise such a lagged 
impact does not exist. 
    The autocorrelation function of the explanatory variable is  
𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.2) 
    The autocorrelation function of the unobserved confounder is 
𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜓𝜓𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.3) 
If 𝜓𝜓 ≠ 0 , the lagged explanatory variable has a direct impact on the unobserved 
confounder; otherwise such an impact does not exist.  
    Therefore, we have four scenarios of endogeneity in the dynamic causal relationship 
framework: 
    Scenario 1: 𝜉𝜉 = 0, and 𝜓𝜓 = 0. In this scenario, the lagged explanatory variable has no 
explicit impact on the explained variable, nor does it have any explicit impact on the 
unobserved confounder. 
    Scenario 2: 𝜉𝜉 ≠ 0, while 𝜓𝜓 = 0. In this scenario, the lagged explanatory variable has a 
direct impact on the explained variable, but has no explicit impact on the unobserved 
confounder. 
    Scenario 3: 𝜉𝜉 = 0, while 𝜓𝜓 ≠ 0. In this scenario, the lagged explanatory variable has 
 38 
no explicit impact on the explained variable, but has a direct impact on the unobserved 
confounder. 
    Scenario 4: 𝜉𝜉 ≠ 0, and 𝜓𝜓 ≠ 0. In this scenario, the lagged explanatory variable has a 
direct impact on the explained variable, and also has a direct impact on the unobserved 
confounder.   
    In light of the LATE theorem by Angrist and Pischke (2009), we discuss the local 
average treatment effect of lagged IV method. For simplicity and without losing generality, 
we assume a binary-valued explanatory variable with a value of 1 or 0. Denote 𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 𝑒𝑒�) 
as individual 𝑖𝑖’s latent outcome when its treatment is 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝑒𝑒 and its lagged treatment, the 
lagged IV, is 𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 𝑒𝑒� . To specify the heterogeneous causal effect of lagged IV, we 
denote 𝑋𝑋1𝑖𝑖𝑖𝑖 as individual 𝑖𝑖’s latent treatment state when 𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1, and 𝑋𝑋0𝑖𝑖𝑖𝑖 as individual 
𝑖𝑖 ’s latent treatment state when 𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0 . Therefore, the observed treatment state is 
expressed latently as 
𝑋𝑋𝑖𝑖𝑖𝑖 = 𝑋𝑋0𝑖𝑖𝑖𝑖 + (𝑋𝑋1𝑖𝑖𝑖𝑖 − 𝑋𝑋0𝑖𝑖𝑖𝑖)𝑋𝑋𝑖𝑖,𝑖𝑖−1 (3.4) 
in which either 𝑋𝑋1𝑖𝑖𝑖𝑖  or 𝑋𝑋0𝑖𝑖𝑖𝑖  can be observed, and (𝑋𝑋1𝑖𝑖𝑖𝑖 − 𝑋𝑋0𝑖𝑖𝑖𝑖)  represents the 
heterogeneous causal effect of 𝑋𝑋𝑖𝑖,𝑖𝑖−1 . With these notations, we state the independence 
assumption and the exclusion restriction of the lagged IV as follows: 
    1. The independence assumption implies that the instrumental variable should have no 
association with latent outcome, nor should it have any association with latent treatment 
state. Specifically, we have [{𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 𝑒𝑒�);∀𝑒𝑒, 𝑒𝑒�},𝑋𝑋1𝑖𝑖𝑖𝑖,𝑋𝑋0𝑖𝑖𝑖𝑖 ] ⫫ 𝑋𝑋𝑖𝑖,𝑖𝑖−1 (3.5) 
This implies that the lagged IV should have similar effects as a random assignment does. 
In other words, the lagged IV should be uncorrelated with the explained variable or with 
the latent treatment state by the explanatory variable. 
    Scenario 1 violates the independence assumption, because the lagged IV is 
synchronously correlated with the unobserved confounder. Specifically, because 𝑈𝑈𝑖𝑖,𝑖𝑖−1 
causally influences 𝑈𝑈𝑖𝑖𝑖𝑖 by marginal effect 𝜙𝜙, and causally influences 𝑋𝑋𝑖𝑖,𝑖𝑖−1 by marginal 
effect 𝜅𝜅 , 𝑋𝑋𝑖𝑖,𝑖𝑖−1  and 𝑈𝑈𝑖𝑖𝑖𝑖  have a simultaneous relationship. In other words, as 𝑋𝑋𝑖𝑖,𝑖𝑖−1 
changes, 𝑈𝑈𝑖𝑖𝑖𝑖 changes not causally but synchronously, and further causes 𝑌𝑌𝑖𝑖𝑖𝑖’s change. In 
other words, as 𝑋𝑋𝑖𝑖,𝑖𝑖−1 changes by 1 unit, 𝑈𝑈𝑖𝑖𝑖𝑖 changes synchronously by 𝜙𝜙𝜅𝜅 unit. As a result, 
 39 
𝑋𝑋𝑖𝑖,𝑖𝑖−1  violates the independence assumption because it does not serve as a random 
exogenous shock. Only by assuming that no dynamics exist among the unobserved 
confounders can the independence assumption be satisfied, yet unfortunately, it is almost 
impossible. This implies that it is almost unavoidable for the lagged IV to be problematic. 
    2. The exclusion restriction implies that 𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 𝑒𝑒�) is only the function of 𝑒𝑒, in other 
words, the lagged IV influences the explained variable only through the explanatory 
variable. This is denoted as 
𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 0) = 𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 1), 𝑒𝑒 = 0, 1 (3.6) 
    In Scenario 2, since 𝜉𝜉 ≠ 0, 𝑋𝑋𝑖𝑖,𝑖𝑖−1 has a direct causal influence on 𝑌𝑌𝑖𝑖𝑖𝑖 by marginal effect 
𝜉𝜉. In Scenario 3, although 𝜉𝜉 = 0, since 𝜓𝜓 ≠ 0, 𝑋𝑋𝑖𝑖,𝑖𝑖−1 has a direct causal influence on 𝑌𝑌𝑖𝑖𝑖𝑖 
by marginal effect 𝛿𝛿𝜓𝜓, derived from (2.1) and (2.3). As a result, both Scenario 2 and 3 
violate not only the independence assumption, but also the exclusion restriction. The same 
is true for Scenario 4, which is a combination of Scenario 2 and 3. As in Scenario 3, 𝑋𝑋𝑖𝑖,𝑖𝑖−1 
has a direct impact on 𝑈𝑈𝑖𝑖𝑖𝑖 , which could include more than one unobserved covariate, 
𝑋𝑋𝑖𝑖,𝑖𝑖−1 could have more than one causal path to influence the 𝑌𝑌𝑖𝑖𝑖𝑖. Accordingly, it is difficult 
to argue against the possible existence of Scenario 3, which results in the violation of the 
exclusion restriction being almost inevitable. 
 
B. The LATE in Lagged IV and The ATE in OLS 
    To compare the endogeneity in lagged IV and that in OLS, we first discuss the average 
treatment effect (ATE) in OLS, such that 
𝔼𝔼[𝑌𝑌𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0]= 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] + 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0]= 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖] + 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0] (3.7) 
where 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖] is the average treatment effect on the treated (ATT), the causal 
effect that we are interested in. 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0] is the selection bias, the 
source of the endogeneity that the OLS estimate suffers from. 
    In lagged IV estimation, the local average treatment effect (LATE), in light of Angrist 
and Pischke (2009), is 
𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋1𝑖𝑖𝑖𝑖 > 𝑋𝑋0𝑖𝑖𝑖𝑖] , 
when both the exclusion restriction and the independence assumption are satisfied. Here 
 40 
the LATE is the causal effect that we are interested in.  
    1. The LATE in Scenario 1: In Scenario 1, where only the independence assumption 
is violated, we have 
𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� (3.8) 
Because the exclusion restriction is satisfied, we have 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� = 𝑌𝑌0𝑖𝑖𝑖𝑖 , 
𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� = 𝑌𝑌1𝑖𝑖𝑖𝑖.  
    Therefore, 
   𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� =  𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖 + (𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�   =  𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� +  𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� (3.9) 
    Similarly, we have 
   𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� 
                 =  𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖 + (𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�             = 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� +  𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.10) 
    As the exclusion restriction is satisfied, we have  
𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.11) 
    Therefore, the LATE becomes 
      
𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�   =  𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�
𝔼𝔼�𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.12) 
which is the “restrict local ATT”. As a result, in Scenario 1, the inconsistent lagged IV 
estimates are due to the “restrict local ATT” in the lagged IV.   
    Compared with the ATE in the OLS estimation, it is easy to see that the LATE in 
Scenario 1 of the lagged IV estimation does not include a selection bias, implying that the 
extent of endogeneity of Scenario 1 of the lagged IV estimation is smaller than the extent 
of endogeneity of the OLS estimation. What’s more, it is also easy to see that the LATE 
in Scenario 1 of the lagged IV estimation still yields some extent of endogeneity. This is 
because of (1) the lagged IV’s dependency on the latent treatment, scaled by 𝜌𝜌 , the 
marginal causal effect of the lagged IV on the treatment variable, and of (2) the lagged 
IV’s dependency on the latent outcome, scaled by 𝜙𝜙
𝜅𝜅
, the synchronous relationship between 
 41 
the lagged IV and the unobserved confounder. Because the unobserved confounder’s 
marginal causal effect on the outcome variable is 𝛿𝛿, we could initially predict that the key 
parameters for the extent of endogeneity of Scenario 1 of the lagged IV estimation are 𝜌𝜌, 
𝜙𝜙, 𝜅𝜅 and 𝛿𝛿. 
    2. The LATE in Scenario 2, 3 and 4: In Scenario 2, 3 and 4, both the exclusion 
restriction and the independence assumption are violated, to derive 
𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�, the LATE, we have  
𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� + 𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� (3.13) 
and similarly, 
𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� + 𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.14) 
    Therefore, the LATE becomes the sum of the “relaxed local ATT” in the lagged IV, that 
is, 
𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1�� 𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� − 𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� − 𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�  
and the “local selection bias” in the lagged IV, that is, 
𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� . 
    As a result, in Scenarios 2, 3 and 4, the inconsistent lagged IV estimates are due to the 
“local selection bias” and the “relaxed local ATT” in the lagged IV. 
    Compared with the ATE in the OLS estimation, it is easy to see that the LATEs in 
Scenarios 2, 3 and 4 of the lagged IV estimation include a “local selection bias”, which 
could be greater than the selection bias in the OLS estimation. What’s more, it is also easy 
to see that the LATEs in Scenarios 2, 3 and 4 of the lagged IV estimation also include the 
“relaxed ATSs”, which are different from the “restricted ATT” in Scenario 1. These imply 
that the extent of inconsistency of the estimates in Scenarios 2, 3 and 4 are greater than 
that in Scenario 1, and could be greater than that in OLS. 
    To sum up, the OLS estimate suffers from endogeneity, because it has selection bias in 
 42 
its ATE. When the lagged IV estimate only violates the independence assumption, it 
suffers from endogeneity, because the “restrict local ATT” in its LATE is different from 
the ATT in the OLS estimate’s ATE. When the lagged IV estimate violates both the 
exclusion restriction and the independence assumption, it suffers from endogeneity, 
because on one hand, it has the “local selection bias” in its LATE; on the other, the “relaxed 
local ATT” in its LATE is different from the ATT in the OLS estimate’s ATE. 
 
3.3. Numerical Analysis 
    The theoretical framework demonstrates why using the lagged explanatory variable as 
the IV in the instrumental estimation is unlikely to mitigate the endogeneity problem. In 
this section, we characterize the LATE of lagged IV estimates quantitatively, and compare 
it with the ATE of OLS estimates. The numerical analysis results are consistent with what 
we find in the theoretical framework. For simplicity, we set up a bivariate regression 
scenario, and discuss the 𝐴𝐴𝑅𝑅(1)  process in the data generation process both for the 
endogenous explanatory variable and for the unobserved confounder. 
 
A. LATE and ATE 
    Scenario 1: We first quantitatively discuss the LATE in Scenario 1, which only violates 
the independence assumption but not the exclusion restriction. Following Bellemare et. al 
(2017), we consider the following model 
𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.15) 
𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.16) 
𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.17) 
where 𝑖𝑖 and 𝑋𝑋 are units and time index, respectively, and 𝑖𝑖 = 1, 2, …  𝐼𝐼, 𝑋𝑋 = 1, 2, … ,𝑁𝑁. For 
simplicity, we drop 𝑖𝑖 for the reminder of this session. 𝑌𝑌𝑖𝑖 is the main explained variable, 
and 𝑋𝑋𝑖𝑖  represents the explanatory variable. Since 𝑈𝑈𝑖𝑖 , the unobserved confounder is 
omitted from the OLS estimation, it suffers from endogeneity. The 𝐴𝐴𝑅𝑅(1) process implies 
that 𝑋𝑋𝑖𝑖 is determined both by its lagged value and by the unobserved confounder, and that 
𝑈𝑈𝑖𝑖 is determined by its one-order lagged value. For coefficients we assume that 𝜌𝜌,𝜙𝜙 ∈(0,1); for random errors we assume that 𝜂𝜂𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜂𝜂2), 𝜖𝜖𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜖𝜖2), and 𝐶𝐶𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝑣𝑣2). 
    It is well known that without an unobserved confounder, OLS yields consistent 
 43 
estimates. However, given the fact that the unobserved confounder exists, and is omitted 
in the regression, OLS yields inconsistent estimates, such that  ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑌𝑌𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)  = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)  = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.18) 
    Therefore, (3.18) implies that in Scenario 1, the OLS estimate is biased, in which 
𝛿𝛿𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑈𝑈𝑡𝑡)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) , the bias, is in line with the selection bias in the ATE.  
    To discuss the consistency of the OLS estimate, we need to use equation (A.3) in the 
Online Appendix, and then we could derive the following expression that   
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)1−𝜙𝜙𝜙𝜙 (3.19) 
    Therefore, plugging equation (3.19) into (3.18), we have an expression that 
?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)(1−𝜙𝜙𝜙𝜙)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡)           = 𝛽𝛽 + 𝛿𝛿𝜅𝜅∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1(1−𝜙𝜙𝜙𝜙)∑ 𝑋𝑋𝑡𝑡2𝑇𝑇𝑡𝑡=1 (3.20)   
    Using the Slutsky theorem, (3.20) becomes 
𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ]  (3.21)     
where 𝛽𝛽 is in line with the ATT, and 
𝛿𝛿𝜅𝜅[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑈𝑈𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ] is in line with the selection bias, 
in the ATE. Because 𝜙𝜙 ∈ (0,1) , (3.16) and (3.17) imply that as 𝑁𝑁 → ∞ , 
𝑝𝑝lim �1
𝑖𝑖
�∑ 𝑈𝑈𝑖𝑖
2𝑖𝑖
𝑖𝑖=1 ≪ 𝑝𝑝lim �1𝑖𝑖�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 → 𝛽𝛽; in other words, the 
OLS estimate in Scenario 1 is consistent. 
    Now consider an IV estimation using 𝑋𝑋𝑖𝑖−1 as the instrumental variable for 𝑋𝑋𝑖𝑖. The IV 
estimates expression implies that  ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑌𝑌𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.22) 
    Plugging equation (3.15) into (3.22), we have 
 44 
 ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.23) 
and then  ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖)  
= 𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1)
𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) (3.24) 
    Therefore, (3.24) implies that in Scenario 1, the lagged IV estimate is biased, in which 
𝛿𝛿
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1)
𝜙𝜙+𝜅𝜅
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1) , the bias, is in line with the “restrict local ATT” in the LATE, in which 𝜅𝜅 
is the key parameter determining to what extent the lagged IV estimate is biased. This is 
because the extent to which the independence assumption of the lagged IV violates, is 
measured by 𝜅𝜅. 
     To discuss the consistency of the lagged IV estimate in Scenario 1, using equation (A3) 
in the Appendix, we can also derive the following expression that    
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝜙𝜙𝑈𝑈𝑖𝑖−1 + 𝜈𝜈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1)  = 𝜙𝜙𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1)  = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.25) 
    Therefore, we have  ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝛽𝛽 + 𝛿𝛿𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜙𝜙𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌
𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) (3.26) 
      Using the Slutsky theorem, (3.26) becomes 
𝑝𝑝lim ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ]𝜌𝜌
𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌) �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝜅𝜅2[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] (3.27) 
 45 
where 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ]𝜌𝜌
𝜙𝜙
(1−𝜙𝜙𝜙𝜙)�𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 �+𝜅𝜅
2[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑈𝑈𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ] is in line with the “restrict local ATT” 
in the LATE. Because 𝜙𝜙 ∈ (0,1) , (3.16) and (3.17) imply that as 𝑁𝑁 → ∞ , 
𝑝𝑝lim �1
𝑖𝑖
�∑ 𝑈𝑈𝑖𝑖
2𝑖𝑖
𝑖𝑖=1 ≪ 𝑝𝑝lim �1𝑖𝑖�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝐼𝐼𝑉𝑉,1 → 𝛽𝛽; in other words, the 
lagged IV estimate in Scenario 1 is consistent. 
    Scenario 2: We then discuss the ATE and the LATE in Scenario 2, which not only 
violates the independence assumption but also the exclusion restriction. We consider the 
following model 
𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.28) 
𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.29) 
𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.30) 
For simplicity, we drop 𝑖𝑖 for the reminder of this session, and everything is similar to those 
in Section 3.3.A.  
    Consider the OLS estimate in Scenario 2, such that  ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑌𝑌𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)  = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)  = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜉𝜉𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.31) 
    Therefore, (3.31) implies that in Scenario 2, the OLS estimate is biased, in which 
𝛿𝛿𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑈𝑈𝑡𝑡)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) + 𝜉𝜉𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑋𝑋𝑡𝑡−1)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) , the bias, is in line with the selection bias in the ATE. 
    To discuss the consistency of the OLS estimate, we need to use equation (A.3) in the 
Appendix, and then we could derive the following expression that   
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)1−𝜙𝜙𝜙𝜙 (3.32) 
and that 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝜌𝜌 + 𝜙𝜙𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.33) 
    Therefore, we have an expression that 
 46 
?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)(1−𝜙𝜙𝜙𝜙)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) + 𝜉𝜉𝜌𝜌 + 𝜙𝜙𝜉𝜉𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)(1−𝜙𝜙𝜙𝜙)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡)           = 𝛽𝛽 + 𝛿𝛿𝜅𝜅∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1(1−𝜙𝜙𝜙𝜙)∑ 𝑋𝑋𝑡𝑡2𝑇𝑇𝑡𝑡=1 + 𝜙𝜙𝜉𝜉𝜅𝜅2 ∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1(1−𝜙𝜙𝜙𝜙)∑ 𝑋𝑋𝑡𝑡2𝑇𝑇𝑡𝑡=1 (3.34)   
    Using the Slutsky theorem, (3.34) becomes 
𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝜉𝜉𝜌𝜌 + (𝛿𝛿𝜅𝜅+𝜙𝜙𝜉𝜉𝜅𝜅2)[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ]  (3.35)     
where 𝛽𝛽 is in line with the ATT, and 𝜉𝜉𝜌𝜌 + (𝛿𝛿𝜅𝜅+𝜙𝜙𝜉𝜉𝜅𝜅2)[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ]  is in line with the 
selection bias, in the ATE. Because 𝜙𝜙 ∈ (0,1), (3.38) and (3.39) imply that as 𝑁𝑁 → ∞, 
𝑝𝑝lim �1
𝑖𝑖
�∑ 𝑈𝑈𝑖𝑖
2𝑖𝑖
𝑖𝑖=1 ≪ 𝑝𝑝lim �1𝑖𝑖�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 → 𝛽𝛽 + 𝜉𝜉𝜌𝜌; in other words, 
the OLS estimate in Scenario 2 is inconsistent. 
    Consider an IV estimation using 𝑋𝑋𝑖𝑖−1  as the instrumental variable for 𝑋𝑋𝑖𝑖 , the IV 
estimate expression implies that  ?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑌𝑌𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.36) 
    Plugging equation (3.28) into (3.36), we have  ?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.37) 
and then  ?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝛽𝛽 + 𝜉𝜉𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖)  
= 𝛽𝛽 + 𝜉𝜉 1
𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) + 𝛿𝛿
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1)
𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) (3.38) 
     Therefore, (3.38) implies that in Scenario 2, the lagged IV estimate is biased, in which 
𝜉𝜉
1
𝜙𝜙+𝜅𝜅
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1)  is in line with the “local selection bias”, and 𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1)𝜙𝜙+𝜅𝜅𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1)  is in line 
with the “relaxed local ATT” in the LATE in Scenario 2, in which not only 𝜅𝜅, but also 𝜉𝜉, 
are the key parameters determining to what extent the lagged IV estimate is biased. This 
is because the extent to which the exclusion restriction of the lagged IV violates, is 
measured by 𝜉𝜉. 
 47 
    Then we discuss the consistency of the lagged IV estimate in Scenario 2. We have 
already known, from the Online Appendix, that    
𝑝𝑝 lim
𝑖𝑖→∞
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 −𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.39) 
    Therefore, we have 
?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝛽𝛽 + 𝜉𝜉(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝛿𝛿𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜙𝜙𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) 
= 𝛽𝛽 + 𝜉𝜉 �1𝜙𝜙 − 𝜌𝜌� 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌
𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) (3.40) 
    Using the Slutsky theorem, we have 
𝑝𝑝lim?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝛽𝛽 + 𝜉𝜉 �1𝜙𝜙 − 𝜌𝜌� �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝛿𝛿𝜅𝜅[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ]
𝜌𝜌 �
1
𝜙𝜙 − 𝜌𝜌� �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝜅𝜅2𝜙𝜙 [𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] (3.41) 
where 
𝜉𝜉�
1
𝜙𝜙
−𝜙𝜙��𝐴𝐴lim�
1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 �
𝜙𝜙�
1
𝜙𝜙
−𝜙𝜙��𝐴𝐴lim�
1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 �+
𝜅𝜅2
𝜙𝜙
[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑈𝑈𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ] is in line with the “local selection bias”, 
and 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ]
𝜙𝜙�
1
𝜙𝜙
−𝜙𝜙��𝐴𝐴lim�
1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 �+
𝜅𝜅2
𝜙𝜙
[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑈𝑈𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ] is in line with the “relaxed local ATT”, in 
the LATE in Scenario 2. 
    Because 𝜙𝜙 ∈ (0,1) , (3.28) and (3.29) imply that as 𝑁𝑁 → ∞ , 𝑝𝑝 lim
𝑖𝑖→∞
�
1
𝑖𝑖
�∑ 𝑈𝑈𝑖𝑖
2𝑖𝑖
𝑖𝑖=1 ≪
𝑝𝑝 lim
𝑖𝑖→∞
�
1
𝑖𝑖
�∑ 𝑋𝑋𝑖𝑖
2𝑖𝑖
𝑖𝑖=1  . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝐼𝐼𝑉𝑉,2 → 𝛽𝛽 + 𝜉𝜉𝜙𝜙 ; in other words, the lagged IV 
estimate in Scenario 2 is inconsistent. We could also derive that in Scenario 2, 
𝑝𝑝 lim
𝑖𝑖→∞
?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 → 𝛽𝛽 + 𝜉𝜉𝜌𝜌; in other words, the OLS estimate in Scenario 2 is inconsistent. As 
𝜉𝜉
𝜙𝜙
> 𝜉𝜉𝜌𝜌 , we know that the lagged IV estimate has significantly larger extent of 
inconsistency that the OLS estimate. 
    Scenario 3: We then discuss the ATE and the LATE in Scenario 3, which violates both 
the independence assumption and the exclusion restriction. We consider the following 
model 
𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.42) 
𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.43) 
 48 
𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜓𝜓𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.44) 
For simplicity, we drop 𝑖𝑖 for the reminder of this session, and everything is similar to those 
in Section 3.3.A.  
    Consider the OLS estimate in Scenario 3, such that  ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑌𝑌𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)  = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)  = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.45) 
    Therefore, (3.45) implies that in Scenario 3, the OLS estimate is biased, in which 
𝛿𝛿𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑈𝑈𝑡𝑡)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) , the bias, is in line with the selection bias in the ATE. 
    To discuss the consistency of the OLS estimate, we need to use equation (A.11) in the 
Online Appendix. Therefore, we know that using the Slutsky theorem, we have 
𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝜓𝜓𝜙𝜙(1−𝜙𝜙𝜙𝜙) + 𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ]  (3.46)     
where 𝛽𝛽 is in line with the ATT, and 𝜓𝜓𝜙𝜙(1−𝜙𝜙𝜙𝜙) + 𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ] is in line with the 
selection bias, in the ATE. Therefore, in Scenario 3, the OLS estimate is inconsistent. 
    Consider an IV estimation using 𝑋𝑋𝑖𝑖−1 as the instrumental variable for 𝑋𝑋𝑖𝑖, the lagged IV 
estimate expression implies that  ?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑌𝑌𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.47) 
    Plugging equation (3.28) into (3.47), we have  ?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.48) 
and then  ?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖)  
= 𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1)
𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) (3.49) 
 49 
     Therefore, (3.49) implies that in Scenario 3, the lagged IV estimate is biased, in which 
𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1)
𝜙𝜙+𝜅𝜅
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1)  is in line with the “local selection bias” and the “relaxed local ATT”, 
in the LATE in Scenario 3, in which not only 𝜅𝜅 , but also 𝜓𝜓 , are the key parameters 
determining to what extent the lagged IV estimate is biased. This is because the extent to 
which the exclusion restriction of the lagged IV violates, is measured by 𝜓𝜓. 
    Then we discuss the consistency of the lagged IV estimate in Scenario 3. We have 
already known, from the Online Appendix, that    
𝑝𝑝 lim
𝑖𝑖→∞
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜓𝜓1 − 𝜙𝜙𝜌𝜌 
    Therefore, we have 
𝑝𝑝lim?̂?𝛽𝐼𝐼𝑉𝑉,3  = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) + 𝛿𝛿𝜓𝜓𝜙𝜙 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)[𝜌𝜌𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌) + 𝜓𝜓𝜅𝜅]𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) (3.50) 
  Using the Slutsky theorem, we have 
𝑝𝑝lim?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] + 𝛿𝛿𝜓𝜓𝜙𝜙 �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 �[𝜌𝜌𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌) + 𝜓𝜓𝜅𝜅] �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝜅𝜅2[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] (3.51) 
where 
𝛿𝛿𝛿𝛿
𝜙𝜙
�𝐴𝐴lim�
1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 �[𝜌𝜌
𝜙𝜙
(1−𝜙𝜙𝜙𝜙)+𝜓𝜓𝜅𝜅]�𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 �+𝜅𝜅
2[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑈𝑈𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ] is in line with the “local selection 
bias”, 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ][𝜌𝜌
𝜙𝜙
(1−𝜙𝜙𝜙𝜙)+𝜓𝜓𝜅𝜅]�𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 �+𝜅𝜅
2[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑈𝑈𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ] is in line with the “relaxed local 
ATT”, in the LATE in Scenario 3. 
    It is easy to see that the “relaxed local ATT” in Scenario 3 is smaller than the “restrict 
local ATT” in Scenario 1; however, due to the “local selection bias” in Scenario 3, ?̂?𝛽𝐼𝐼𝑉𝑉,3 
in Scenario 3 has greater extent of inconsistency than ?̂?𝛽𝐼𝐼𝑉𝑉,1 in Scenario 1.  
    When comparing with OLS, we know that in Scenario 3, 𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝜓𝜓𝜙𝜙(1−𝜙𝜙𝜙𝜙) +
𝜅𝜅[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑈𝑈𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1
𝑇𝑇
�∑ 𝑋𝑋𝑡𝑡
2𝑇𝑇
𝑡𝑡=1 ]. Therefore, in Scenario 3, it is ambiguous whether the lagged IV 
estimate has a larger extent of inconsistency than the OLS estimate. 
     Scenario 4: Combining the discussion of Scenarios 2 and 3, we know that the lagged 
 50 
IV estimate in Scenario 4 could have a greater extent of inconsistency than the OLS 
estimate. 
 
B. Implications 
    Our implications for empirical research are: 
    (1) If both 𝜉𝜉 = 0  and 𝜓𝜓 = 0 , the lagged IV satisfies the exclusion restriction, but 
violates the independence assumption. In this scenario, the lagged IV is safe, because its 
estimate is consistent. 
    (2) If 𝜉𝜉 ≠ 0 but 𝜓𝜓 = 0, the lagged IV violates both the exclusion restriction and the 
independence assumption. In this scenario, the lagged IV is unambiguously unsafe, 
because its estimate is inconsistent and has a larger extent of inconsistency than the OLS 
estimate. 
    (3) If 𝜉𝜉 = 0 but 𝜓𝜓 ≠ 0, the lagged IV violates both the exclusion restriction and the 
independence assumption. In this scenario, the lagged IV is unambiguously unsafe, 
because its estimate is inconsistent. In addition, it is ambiguous whether the lagged IV 
estimate has a larger extent of inconsistency than the OLS estimate. 
    (4) If 𝜉𝜉 ≠ 0 and 𝜓𝜓 ≠ 0, the lagged IV violates both the exclusion restriction and the 
independence assumption. In this scenario, the lagged IV is unambiguously unsafe, 
because its estimate is inconsistent and has a larger extent of inconsistency than the OLS 
estimate. 
 
3.4. Simulation Analysis 
    So far, we have, with mathematical arguments, shown that using the lagged explanatory 
variable as the instrumental variable in estimations can either alleviate or aggravate 
endogeneity issues. By characterizing the source and magnitude of bias in the lagged IV 
and that in OLS analytically, in a simple 𝐴𝐴𝑅𝑅(1) process setup, we have found that whether 
the bias in OLS is larger than that in the lagged IV method is determined by whether, and 
how, the independence assumption and (or) the exclusion restriction is violated.  
    In this section, we use Monte Carlo methods to create a simulation of the theoretical 
setups of our four scenarios discussed in the conceptual framework and the numerical 
analysis, to quantitatively discuss the bias of both the lagged IV estimates and the OLS 
 51 
estimates, together with the root mean squared errors (RMSE) and the likelihood of type-
I errors of the lagged IV and OLS estimations. 
 
A. Setup 
    We start with Scenario 1, which only violates the independence assumption but not the 
exclusion restriction. Figure 1 parameterizes the relations between the explained variable, 
the explanatory variable and the unobserved confounders in Scenario 1. As is shown, the 
unobserved confounder, regarded as a general representation of endogeneity source, is 
correlated both with 𝑌𝑌𝑖𝑖 and with 𝑋𝑋𝑖𝑖. 𝛿𝛿, the direct marginal effect of 𝑈𝑈𝑖𝑖 on 𝑌𝑌𝑖𝑖, is normalized 
as 1. 𝛽𝛽, the direct marginal effect of 𝑋𝑋𝑖𝑖 on 𝑌𝑌𝑖𝑖, is assigned with 0 and 2. 
    The first key parameter in our simulation is 𝜅𝜅, the marginal effect of 𝑈𝑈𝑖𝑖 on 𝑋𝑋𝑖𝑖 in our 
setup, which measures the magnitude of the endogeneity at the violation of the 
independence assumption. The value of 𝜅𝜅 is assigned with 0.5 and 2, to represent the 
attenuated and the amplified marginal effect of 𝑈𝑈 on 𝑋𝑋, respectively. The second and the 
third key parameters are the autocorrelation parameters 𝜌𝜌 and 𝜙𝜙. They are set with 0.5 and 
with {0, 0.1, 0.2, …, 0.9}, alternatively, to represent the relevance of 𝑋𝑋𝑖𝑖, the endogenous 
variable, and 𝑋𝑋𝑖𝑖−1, the lagged IV, relative to the relevance of the current and the lagged 
unobserved confounder. In each simulation, we generate a panel with 𝑁𝑁 = 50 periods and 
𝑁𝑁 = 100 cross-section units, for a total of 5,000 observations. 
    Our simulation follows the same data generating process (DGPs) as in Section 3.3. Each 
set of parameter values, shown in Table 2, are simulated 100 times. Then three estimators 
of 𝛽𝛽 are illustrated: (1) the “naïve” estimator (?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸), or the OLS estimator, that regresses 
𝑌𝑌𝑖𝑖 on 𝑋𝑋𝑖𝑖 and ignores the unobserved confounder, (2) the “lagged IV” estimator (?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉) 
that regresses 𝑌𝑌𝑖𝑖 on 𝑋𝑋𝑖𝑖 and uses 𝑋𝑋𝑖𝑖−1 as the IV for 𝑋𝑋𝑖𝑖 , and (3) the “correct” estimator 
(?̂?𝛽𝛿𝛿𝑇𝑇𝑉𝑉𝑉𝑉𝐸𝐸𝛿𝛿𝑖𝑖) that regresses 𝑌𝑌𝑖𝑖 on 𝑋𝑋𝑖𝑖 and also the unobserved confounder. Here the “correct” 
estimator is the counterfactual, and since researchers cannot observe the confounders in 
their applied studies, our DGPs provides the tests of the performance of both the OLS 
estimates and the lagged IV estimates, by comparing each of their biases with the “correct” 
estimator, of which the bias is zero. To make our analysis simple and straightforward, we 
just use the one-period autocorrelation. 
    Three criteria are used to evaluate the performance of the lagged IV estimates: (1) bias, 
 52 
(2) root mean squared error (RMSE), and (3) likelihood of type-I error, which tells 
researchers the extent to which they could make false inference on the estimates, rejecting 
the true null hypotheses that 𝛽𝛽 = 0. 
    We then discuss Scenario 2, which violates not only the independence assumption, but 
also the exclusion restriction directly. Figure 4 parameterizes the relations between the 
explained variable, the explanatory variable, and the unobserved confounders in Scenario 
2. In this scenario, the first key parameter in our simulation is 𝜉𝜉, the marginal effect of 
𝑋𝑋𝑖𝑖−1 on 𝑌𝑌𝑖𝑖 in our setup, which measures the magnitude of the endogeneity at the violation 
of the exclusion restriction. The value of 𝜉𝜉 is assigned with 0.5 and 2, to represent the 
attenuated and the amplified marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑌𝑌𝑖𝑖, respectively. 
    After those, we discuss Scenario 3, which violates not only the independence 
assumption, but also the exclusion restriction indirectly. Figure 7 parameterizes the 
relations between the explained variable, the explanatory variable and the unobserved 
confounders in Scenario 3. In this scenario, the first key parameter in our simulation is 𝜓𝜓, 
the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑈𝑈𝑖𝑖 in our setup, which measures the magnitude of the 
endogeneity at the violation of the exclusion restriction. The value of 𝜓𝜓 is assigned with 
0.5 and 2, to represent the attenuated and the amplified marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑈𝑈𝑖𝑖, 
respectively. 
 
B. Monte Carlo Simulation Results 
    Figure 2 summarizes the simulation results when 𝜅𝜅=0.5 and 2, 𝜌𝜌 = 0.5, and 𝜙𝜙 ranges 
from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and 
the bias of the lagged IV estimate is smaller than that of the OLS estimate. This is 
consistent with our theoretical prediction that as the lagged IV only violates the 
independence assumption in Scenario 1, it is less problematic than the OLS estimate. (2) 
As 𝜙𝜙 increases, the bias of the lagged IV estimate also increases; as 𝜅𝜅 increases, the bias 
of the lagged IV estimate decreases. This is also consistent with our theoretical prediction 
that the lagged IV estimate’s violation of the independence assumption is quantified with 
𝜙𝜙
𝜅𝜅
, the synchronous change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; as 
𝜙𝜙
𝜅𝜅
 increases, the independence assumption is 
violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. 
 53 
(3) The RMSEs show similar patterns as the biases.  
    Admittedly, it is argued that applied researchers may not be interested in whether the 
degree of their estimates are biased; instead, it really matters whether the p-values in their 
t-test result in a false rejection of the null hypothesis that 𝛽𝛽 = 0 , at some level of 
significance. Therefore, in the simulation, we also see what would happen, provided that 
the null hypothesis is true (𝛽𝛽 = 0), if applied researchers use the lagged IV method to test 
the alternative hypothesis that 𝛽𝛽 ≠ 0. Here we use the 95% confidence levels. 
    Our simulation results imply that when 𝜅𝜅 > 0  and as 𝜙𝜙  ranges from 0 to 1, the 
likelihood of type-I errors rises dramatically. The reason is that lagged IV identification 
will lead to nonzero estimates of 𝛽𝛽 even if 𝛽𝛽 = 0, because 𝛿𝛿, the marginal effect of the 
unobserved confounder on the explained variable, and 𝜅𝜅 , the marginal effect of the 
unobserved confounder on the explanatory variable, are both nonzero. In addition, similar 
to the magnitude of estimation bias, the likelihood of rejecting the true null hypothesis 
rises dramatically and becomes close to 1, as 𝜙𝜙 goes up. Accordingly, these results and 
interpretation suggest that using the lagged IV method in estimation in response to 
endogeneity from unobserved confounders can hardly help mitigate the type-I errors, 
because applied researchers may tend to reject the null hypotheses that are true, and finally 
find that the numbers of the estimated associations are spurious while in fact they do exist. 
    To step further, Figure 3 represents the simulation results when 𝜙𝜙 = 0.5, 𝜌𝜌 ranges from 
0 to 1, and 𝜅𝜅=0.5 and 2. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are 
biased, and the bias of the lagged IV estimate is smaller than that of the OLS estimate. 
This is consistent with our theoretical prediction that as the lagged IV only violates the 
independence assumption in Scenario 1, it is less problematic than the OLS estimate. (2) 
As 𝜌𝜌 increases, the bias of the lagged IV estimate decreases. This shows that as the 
relevance of the lagged IV and the endogenous variable goes up, the validity of the lagged 
IV also goes up. (3) As 𝜅𝜅 increases, the bias of the lagged IV estimate decreases. This is 
also consistent with our theoretical prediction that the lagged IV estimate’s violation of 
the independence assumption is quantified with 𝜙𝜙
𝜅𝜅
, the synchronous change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; 
as 𝜙𝜙
𝜅𝜅
 increases, the independence assumption is violated to a larger extent and as a result, 
the lagged IV estimate suffers from higher bias. (4) The RMSEs show similar patterns as 
 54 
the biases. (5) The likelihood of the type-I error is very high. 
    In sum, our simulation results convey unambiguous message that if lagged explanatory 
variable does not have a direct causal effect on the explained variable, or on unobserved 
confounder, using the lagged explanatory variable as the IV in instrumental estimation 
would mitigate the estimation bias and RMSE. What’s worse, type-I errors can hardly be 
mitigated by the lagged IV method in applied research. These results imply that even if 
the exclusion restriction is satisfied, the lagged IV method is still problematic.  
    We also discuss the case in which the lagged explanatory variable has a direct causal 
effect on the explained variable, the case in which the lagged explanatory variable has a 
direct causal effect on the unobserved confounder, and the case in which the lagged 
explanatory variable has direct causal effects both on the explained variable and on the 
unobserved confounder. These cases coincide with Scenarios 2, 3 and 4 discussed in our 
conceptual framework. These three cases yield much different results regarding estimation 
bias and RMSE, in that both bias and RMSE in lagged IV estimation are significantly 
larger than those in OLS; besides, in these three cases the likelihood of type-I errors are 
close to, or even equal to one, and significantly higher than those in OLS. These results 
imply that when lagged IV estimation violates both the exclusion restriction and the 
independence assumption, it even aggravates the endogeneity. 
    Figure 5 summarizes the simulation results when 𝜉𝜉=0.5 and 2, 𝜌𝜌 = 0.5, and 𝜙𝜙 ranges 
from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and 
the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is 
consistent with our theoretical prediction that as the lagged IV violates both the 
independence assumption and the exclusion restriction in Scenario 2, it is much more 
problematic than the OLS estimate. (2) As 𝜙𝜙 increases, the bias of the lagged IV estimate 
also increases. This is also consistent with our theoretical prediction that the lagged IV 
estimate’s violation of the independence assumption is quantified with 𝜙𝜙
𝜅𝜅
, the synchronous 
change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; as 
𝜙𝜙
𝜅𝜅
 increases, the independence assumption is violated to a larger 
extent and as a result, the lagged IV estimate suffers from higher bias. (3) As 𝜉𝜉 increases, 
the bias of the lagged IV estimate also increases. This is also consistent with our theoretical 
prediction that the lagged IV estimate’s violation of the exclusion restriction in Scenario 
 55 
2 is quantified with 𝜉𝜉 , the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑌𝑌𝑖𝑖 ; as 𝜉𝜉 increases, the exclusion 
restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from 
higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the 
type-I error is very high, and close to 1. 
    Figure 6 summarizes the simulation results when 𝜉𝜉=0.5 and 2, 𝜙𝜙 = 0.5, and 𝜌𝜌 ranges 
from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and 
the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is 
consistent with our theoretical prediction that as the lagged IV violates both the 
independence assumption and the exclusion restriction in Scenario 2, it is much more 
problematic than the OLS estimate. (2) As 𝜌𝜌 increases, the bias of the lagged IV estimate 
decreases. This shows that as the relevance of the lagged IV and the endogenous variable 
goes up, the validity of the lagged IV also goes up. (3) As 𝜉𝜉 increases, the bias of the 
lagged IV estimate also increases. This is also consistent with our theoretical prediction 
that the lagged IV estimate’s violation of the exclusion restriction in Scenario 2 is 
quantified with 𝜉𝜉 , the marginal effect of 𝑋𝑋𝑖𝑖−1  on 𝑌𝑌𝑖𝑖 ; as 𝜉𝜉  increases, the exclusion 
restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from 
higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the 
type-I error is very high, and close to 1. 
    Figure 8 summarizes the simulation results when 𝜓𝜓=0.5 and 2, 𝜌𝜌 = 0.5, and 𝜙𝜙 ranges 
from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and 
the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is 
consistent with our theoretical prediction that as the lagged IV violates both the 
independence assumption and the exclusion restriction in Scenario 3, it is much more 
problematic than the OLS estimate. (2) As 𝜙𝜙 increases, the bias of the lagged IV estimate 
also increases. This is also consistent with our theoretical prediction that the lagged IV 
estimate’s violation of the independence assumption is quantified with 𝜙𝜙
𝜅𝜅
, the synchronous 
change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; as 
𝜙𝜙
𝜅𝜅
 increases, the independence assumption is violated to a larger 
extent and as a result, the lagged IV estimate suffers from higher bias. (3) As 𝜓𝜓 increases, 
the bias of the lagged IV estimate also increases. This is also consistent with our theoretical 
prediction that the lagged IV estimate’s violation of the exclusion restriction in Scenario 
 56 
3 is quantified with 𝜓𝜓, the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑈𝑈𝑖𝑖; as 𝜓𝜓 increases, the exclusion 
restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from 
higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the 
type-I error is very high, and close to 1. 
    Figure 9 summarizes the simulation results when 𝜓𝜓=0.5 and 2, 𝜙𝜙 = 0.5, and 𝜌𝜌 ranges 
from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and 
the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is 
consistent with our theoretical prediction that as the lagged IV violates both the 
independence assumption and the exclusion restriction in Scenario 3, it is much more 
problematic than the OLS estimate. (2) As 𝜌𝜌 increases, the bias of the lagged IV estimate 
decreases. This shows that as the relevance of the lagged IV and the endogenous variable 
goes up, the validity of the lagged IV also goes up. (3) As 𝜓𝜓 increases, the bias of the 
lagged IV estimate also increases. This is also consistent with our theoretical prediction 
that the lagged IV estimate’s violation of the exclusion restriction in Scenario 3 is 
quantified with 𝜓𝜓 , the marginal effect of 𝑋𝑋𝑖𝑖−1  on 𝑈𝑈𝑖𝑖 ; as 𝜓𝜓  increases, the exclusion 
restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from 
higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the 
type-I error is very high, and close to 1. 
 
3.5. Conclusion 
    Given the discussion of the independence assumption and the exclusion restriction in 
the lagged IV estimation regarding four scenarios, it is implied that if the lagged IV 
satisfies the exclusion restriction by strictly assuming the non-existence of specific causal 
influence, the lagged IV method is acceptable and helpful, as its estimate is consistent and 
yields less bias than the OLS estimate. However, the violation of the independence 
assumption still makes the lagged IV method troubling, as it is of high likelihood that the 
lagged IV estimate suffers from the type-I error. If the lagged IV violates both the 
independence assumption and the exclusion restriction, its estimate is unambiguously 
inconsistent and yields much more bias than the OLS estimate. 
    Few applied researchers have discussed the independence assumption and the exclusion 
restriction in detail, assuming empirically that the lagged IV method could at least yield 
 57 
estimates with a lower bias than that of OLS. However, only by holding the non-existence 
of specific causal influence, as well as with limited ranges of parameter values, the bias 
of the lagged IV method is smaller than that of OLS, while by relaxing the strict 
assumption, the lagged IV method could even enlarge the bias. In addition to estimation 
biases, no matter whether relaxing such an assumption or not, the high likelihood of type-
I error always jeopardizes the validity of the lagged IV method. What’s worse, since the 
causal impacts of the lagged explanatory variable on unobserved covariates can hardly be 
excluded, not only the independence assumption but also the exclusion restriction are 
inevitably violated, resulting in a larger estimation bias for the lagged IV than for the OLS 
most of the time. 
    Causal inference usually requires experimental data to identify the treatment effect of 
explanatory variables. With observational data, natural experiments are usually 
indispensable to provide an exogenous shock in causal identification (Angrist and Krueger, 
2001; Freeman, 2005), although they lack underlying theoretical relationships 
(Rosenzweig and Wolpin, 2000). Therefore, valid instrumental variables are likely to 
obtained from natural experiments because in this sense, they are very likely to be 
exogenous and satisfy both the independence assumption and the exclusion restriction. 
Lagged explanatory variables, on the contrary, have a simultaneous relationship with the 
unobserved confounder that influences the explained variable, and the lagged IV lacks the 
exogeneity as a natural experiment. Therefore, the lagged IV method can hardly provide 
additional information in causal inference. 
 58 
Table 3.1. Reviewed Journals Published in 2013-2018, Using Lagged IV Methods 
Journal Name Discipline 2013-2018 2015-2018 
American Economic Review Economics 5 3 
Econometrica Economics 0 0 
Journal of Political Economy Economics 1 0 
Quarterly Journal of Economics Economics 3 2 
Review of Economic Studies Economics 3 1 
Review of Economics & Statistics Economics 7 2 
American Political Science Review Political Science 1 0 
American Journal of Political Science Political Science 1 1 
British Journal of Political Science Political Science 6 4 
Comparative Political Studies Political Science 3 1 
Journal of Politics Political Science 1 0 
 59 
Table 3.2. Simulation Parameters 
Parameters Causal Pathway Simulation Values 
Basic Parameters   
𝛽𝛽 𝑋𝑋𝑖𝑖 → 𝑌𝑌𝑖𝑖 {0, 2} 
𝛿𝛿 𝑈𝑈𝑖𝑖 → 𝑌𝑌𝑖𝑖 {1} 
Key Parameters   
𝜙𝜙 𝑈𝑈𝑖𝑖−1 → 𝑈𝑈𝑖𝑖 {0, 0.1, 0.2,…,0.9}, {0.5} 
𝜌𝜌 𝑋𝑋𝑖𝑖−1 → 𝑋𝑋𝑖𝑖 {0.5}, {0, 0.1, 0.2,…,0.9} 
𝜅𝜅 𝑈𝑈𝑖𝑖 → 𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖−1 → 𝑋𝑋𝑖𝑖−1 {0.5, 2} 
𝜉𝜉 𝑋𝑋𝑖𝑖−1 → 𝑌𝑌𝑖𝑖 {0.5, 2} 
𝜓𝜓 𝑋𝑋𝑖𝑖−1 → 𝑈𝑈𝑖𝑖 {0.5, 2} 
 
 60 
 
 
Figure 3.1. Representation of Monte Carlo Simulation Setup 
 61 
 
Figure 3.2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5 
 62 
 
Figure 3.3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5 
 63 
 
 
Figure 3.4. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑖𝑖−1 Also Has Causal 
Effects on 𝑌𝑌𝑖𝑖 
 64 
 
Figure 3.5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on Explained Variable 
 65 
 
Figure 3.6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on Explained Variable 
 66 
 
 
Figure 3.7. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑖𝑖−1 Also Has Causal 
Effects on 𝑈𝑈𝑖𝑖 
 67 
 
Figure 3.8. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on Unobserved Confounder 
 68 
 
Figure 3.9. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on Unobserved Confounder
 69 
4. Spatially Lagged Variables as Instruments: The Spatially 
Local Average Treatment Effect (SLATE) in Estimation1 
 
YU WANG2 
 
 
4.1. Introduction 
    It is becoming more common to use the spatially lagged variables, or technically 
speaking, the spatial weighting matrices, as the instrumental variables (IVs) to address 
the endogeneity concerns. This is due to the accumulating validity of spatial data and 
the lack of valid IVs. Typically, when lacking valid IVs, the neighboring variables of 
the endogenous variables are used as the IVs in empirical studies (Wong et al., 2017). 
In the spatial econometric context, the neighboring IVs constitute the spatial weighting 
matrices of the endogenous variables. Few formal theoretical analyses, however, have 
been conducted to discuss whether the spatially lagged IV method addresses 
endogeneity. Valid IVs should satisfy, according to the Local Average Treatment 
Effects (LATE) Theorem, the independence assumption and the exclusion restriction. 
However, it is unknown whether the LATE Theorem has any specific form when using 
the spatially lagged IV. 
  In this paper, I demonstrate the Spatially Local Average Treatment Effects (SLATE) 
to theoretically discuss the validity of the spatially lagged IV strategy and raise the 
Spatially Local Average Treatment Effects (SLATE) theorem, which includes the 
spatial independence assumption and the exclusion restriction. The spatial 
independence assumption states that there is no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
unobserved confounders, namely the external spatial exogeneity of the explanatory 
variable; and that there is no inter-regional correlation between the explanatory 
variables and the disturbances in the spatial autocorrelation of the explanatory 
variables, namely the internal exogeneity of the explanatory variables. Both the 
external and the internal exogeneity ensure that the spatially lagged IV has no 
                                                          
1 I thank Dave Donaldson for valuable comments and suggestions. All errors are the author’s. 
2 Wang: Department of Applied Economics, University of Minnesota, email: wang5979@umn.edu. 
 70 
correlation with the latent outcome of the explained variable, nor does it have 
correlation with the latent treatment condition. 
    To ensure the unbiased and consistent estimate of the spatially lagged IV method, 
the spatially lagged IV should also satisfy the spatial exclusion restriction, which 
consists of the direct, and the indirect, spatial exclusion restriction. The direct spatial 
exclusion restriction of the spatially lagged IV means the spatially lagged IV has no 
direct causal impact on the explained variable, and the indirect spatial exclusion 
restriction of the spatially lagged IV means the spatially lagged IV has no indirect 
causal impact on the explained variable. Both the direct and the indirect spatial 
exclusion restrictions ensure that the spatially lagged IV influences the explained 
variable only through the explanatory variable, excluding other influencing channels 
of the spatially lagged IV on the explained variable. Accordingly, upon satisfying the 
spatial independence assumption and the spatial exclusion restriction, the spatially 
lagged IV estimate is unbiased and consistent. 
    I set up a structural model to compare the average treatment effect (ATE) in the OLS 
estimate and the spatially local average treatment effect (SLATE) in the lagged IV 
estimate both qualitatively and quantitatively. In this model, the explained variable is 
determined by both an explanatory variable and an unobserved confounder, and 
perhaps also by the spatially lagged explanatory variable. The explanatory variable is 
determined by the spatially lagged explanatory variable and also the unobserved 
confounder. The unobserved confounder is determined by the spatially lagged 
unobserved confounder, and may also be influenced by the spatially lagged 
explanatory variable. It is found that when the spatially lagged IV estimate violates the 
spatial exclusion restriction or the spatial independence assumption, or both, it suffers 
from the endogeneity, because on one hand, the spatially lagged IV estimate has the 
“spatially local selection bias” in its SLATE; on the other, the “relaxed spatially local 
ATT” in its SLATE is different from the average treatment effect on the treated (ATT) 
in the OLS estimate’s ATE.  
    I then numerically discuss the spatially local average treatment effect. I characterize 
the spatially local average treatment effect (SLATE) of the spatially lagged IV 
estimates numerically, and compare them with the average treatment effect (ATE) of 
the OLS estimates. I find that a valid spatially lagged IV should satisfy both the spatial 
independence assumption, that is, the explanatory variables should satisfy both the 
 71 
external and the internal spatial exogeneity, and the spatial exclusion restriction, both 
the direct and the indirect. Accordingly, I raise the spatially local average treatment 
effect (SLATE) theorem. 
    I also discuss the dynamic spatially local average treatment effect numerically. I find 
that when satisfying the SLATE theorem, including the spatial independence 
assumption, and the spatial exclusion restriction if necessary, the spatially lagged IV 
estimate is unbiased and consistent, even if the treatment is implemented in multiple 
waves. I use pioneers and stragglers, a situation commonly seen in empirical studies, 
as an example to explain the dynamic SLATE. 
    My findings provide implications for applied researchers that the spatially lagged 
IV method, to a large extent, addresses the endogeneity concern, even if the treatment 
has multiple waves of implementation of the treatment. By discussing the SLATE 
theorem, my analysis also contributes to the credible estimates of causal inference with 
the LATE theorem in instrumental estimation (Angrist et al., 1996; Imbens, 2014). 
    The rest of this paper is organized as follows. Section 4.2 discusses the theoretical 
framework. Section 4.3 derives the numerical analysis of the spatial local average 
treatment effect, and introduces the spatially local average treatment effect theorem. 
Section 4.4 numerically discusses the dynamic spatially local average treatment effect. 
And Section 4.5 summarizes.   
 
4.2. Theoretical Framework 
    This section derives the spatially local average treatment effects (SLATE) in the 
spatially lagged IV estimation and the average treatment effects (ATE) in the OLS 
estimation. In light of the local average treatment effects (LATE) theorem (Angrist and 
Pischke, 2009), this section shows that a valid spatially lagged IV should satisfy both 
the independence assumption and the exclusion restriction. In a data generation 
process with the spatial autocorrelation of the unobserved confounder and the 
explanatory variable, I compare the SLATE in the spatially lagged IV estimation and 
the ATE in the OLS estimation. I find that as the spatially lagged IV estimate violates 
the independence assumption, it yields an estimation bias smaller than the bias in the 
OLS estimation. I also find that as the spatially lagged IV estimate violates the 
exclusion restriction, it yields an estimation bias greater than the bias in the OLS 
estimation. 
 72 
 
A. Setup 
    In empirical studies, the existence of unobserved confounders that influence both 
the explanatory variable and the explained variable is the most common source of 
endogeneity (Stock and Trebbi, 2003; Angist and Krueger, 2001). Measurement errors 
in the explanatory variable and reverse causality between the explanatory variable and 
the explained variable, the other two sources of endogeneity, can also be largely 
attributed to unobserved confounders (Krueger, 1999; Angrist and Lang, 2004). 
    With the accumulating availability of spatial data sets, the spatially lagged 
explanatory variable is serving more frequently as the instrumental variable, namely 
the spatially lagged IV. The spatially lagged IV, with its standard form of the spatial 
weighting matrix of the endogenous explanatory variable, is commonly regarded as a 
valid IV. For the relevance restriction, the spatial weighting matrix implies to what 
extent the spatially lagged IV is correlated with the endogenous explanatory variable. 
For the exclusion restriction, suppose theoretically that the data generation process 
does not take the spatial Durbin form; if so, then the spatially lagged IV influences the 
explained variable only through the endogenous explanatory variable.  
  However, when taking the unobserved confounder into consideration, it is inevitable 
to think about whether the spatially lagged IV works as a random shock, in other words, 
whether the spatially lagged IV satisfies the independence assumption. In a typical 
spatial econometric model with the drive of omitted variables, the unobserved 
confounder follows the spatial autocorrelation, and the explanatory variable may be 
correlated with the random error in the unobserved confounder’s spatial 
autocorrelation. In addition, the endogeneity of the explanatory variable may also 
come from the explanatory variable’s spatial autocorrelation; thus the explanatory 
variable may be correlated with the random error in the explanatory variable’s spatial 
autocorrelation. As a result, the OLS estimate in this spatial econometric model is 
biased and inconsistent. On the other hand, suppose the spatially lagged IV is not 
correlated with the random error in the unobserved confounder’s spatial 
autocorrelation, nor is it correlated with the random error in the explanatory variable’s 
spatial autocorrelation; therefore, the spatially lagged IV is not correlated with the 
latent state of the explained variable, nor is it correlated with the latent state of the 
explanatory variable; in other words, the spatially lagged IV works as a random shock. 
 73 
    To explain these above, I use the following structural model as the standard data 
generation process, such that 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.1) 
where 𝒀𝒀,𝑿𝑿,𝑼𝑼  represent the explained variable, the explanatory variable and the 
unobserved confounder, respectively. 𝝐𝝐  represents an independent and identically 
distributed random error. 
    The spatial autocorrelation function of the unobserved confounder is  
𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝝋𝝋𝑿𝑿 + 𝜸𝜸 (4.2) 
where 𝑾𝑾  represents the spatial weighting matrix, and the explanatory variable is 
correlated with the random error in this spatial autocorrelation process, such that 
𝐸𝐸(𝑿𝑿,𝜸𝜸) ≠ 0.  
    The spatial autocorrelation function of the explanatory variable is 
𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.3) 
where the explanatory variable is also correlated with the random error in this spatial 
autocorrelation process, such that 𝐸𝐸(𝑿𝑿,𝜼𝜼) ≠ 0.  
    The independence assumption demonstrates that a valid instrumental variable 
should work as a random shock. On one hand, (4.3) shows that the spatially lagged IV 
is not correlated with the latent state of the explanatory variable. On the other hand, 
when 𝝋𝝋 = 𝟎𝟎, 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜸𝜸) = 0 implies that the spatially lagged IV is not correlated with 
the latent state of the explained variable; when 𝝋𝝋 ≠ 𝟎𝟎, it requires both 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜸𝜸) = 0 
and 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜼𝜼) = 0 imply that the spatially lagged IV is not correlated with the latent 
state of the explained variable. 
    In this spatial data generation process, 𝑾𝑾𝑿𝑿 represents a set of weighted averages of 
neighboring explanatory variables. More specifically, 𝑾𝑾𝑿𝑿 = ∑ [∑ 𝜇𝜇~𝑖𝑖𝑿𝑿~𝒊𝒊~𝑖𝑖 ]𝑖𝑖 , where 
∑ 𝜇𝜇~𝑖𝑖𝒙𝒙~𝒊𝒊~𝑖𝑖   represents a weighted average of the explanatory variable of all the 
individual 𝑖𝑖 ’s neighboring individuals, and ∑ [∑ 𝜇𝜇~𝑖𝑖𝑿𝑿~𝒊𝒊~𝑖𝑖 ]𝑖𝑖   represents the set 
summing up all those weighted averages. Therefore, 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜸𝜸) = 0  means there 
exists no inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the unobserved confounders. Similarly, 
𝐸𝐸(𝑾𝑾𝑿𝑿,𝜼𝜼) = 0  means there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
explanatory variable. These two assumptions imply that in this spatial data generation 
process, the spatially lagged IV is not causally influenced by either the explanatory 
 74 
variable or the unobserved confounder, nor does the spatially lagged IV have any 
synchronic relationship with the explanatory variable or the unobserved confounder. 
In a word, the spatially lagged IV works as a random shock, satisfying the 
independence assumption. 
    It is also inevitable to think about whether the spatially lagged IV influences the 
explained variable only through the explanatory variable, in other words, whether the 
spatially lagged IV satisfies the exclusion restriction. In the standard data generation 
process above, the spatially lagged IV has no influencing channel but the explanatory 
variable on the explained variable. If the spatially lagged IV also has a direct causal 
effect on the explained variable, it violates the exclusion restriction. In addition, if the 
spatially lagged IV also has an indirect causal effect on the explained variable, in other 
words, the spatially lagged IV influences the explained variable through the 
unobserved confounder, it still violates the exclusion restriction. In these cases, the 
spatially lagged IV estimate may yield much larger bias and inconsistency, compared 
to the OLS estimate. 
    To explain these cases, I keep using the structural model above as the standard data 
generation process, such that 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐 (4.4) 
𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + +𝝃𝝃𝑾𝑾𝑿𝑿 + 𝜸𝜸 (4.5) 
𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.6) 
in which I add 𝝍𝝍𝑾𝑾𝑿𝑿, the direct causal effect of the spatially lagged IV, and 𝝃𝝃𝑾𝑾𝑿𝑿, the 
indirect causal effect of the spatially lagged IV, on the explained variable. When 𝝍𝝍 ≠
𝟎𝟎, or 𝝃𝝃 ≠ 𝟎𝟎, or both, the exclusion restriction is violated. 
    In this spatial data generation process, 𝝍𝝍 = 𝟎𝟎 means there exists no inter-regional 
direct causal effect of the explanatory variable on the explained variable, and 𝝃𝝃 = 𝟎𝟎 
means there exists no inter-regional indirect causal effect of the explanatory variable 
on the explained variable. As a result, this structural model shows that the inter-
regional causal effect of the explanatory variable on the explained variable, namely 
the spatially lagged IV, exists only through the explanatory variable. In a word, the 
spatially lagged IV works excluding other causal channels, satisfying the exclusion 
restriction. 
 
 75 
B. The SLATE in the Spatially Lagged IV and the ATE in OLS 
    In this section, I conceptually discuss the extent to which the spatially lagged IV 
method addresses the endogeneity and the extent to which the OLS method addresses 
the endogeneity, by comparing the spatially local average treatment effect (SLATE) in 
the spatially lagged IV estimate and the Average Treatment Effect (ATE) in the OLS 
estimate. In light of the local average treatment effect (LATE) theorem in Angrist and 
Pischke (2009), I discuss the spatially local average treatment effect in the spatially 
lagged IV estimation. For simplicity and without losing generality, I assume a binary-
valued explanatory variable and an explained variable with values of 1 or 03, and also 
assume that there is only one explanatory variable. Denote 𝑌𝑌(𝑒𝑒,𝒘𝒘𝑒𝑒� ) as region 𝑖𝑖 ’s 
latent outcome when its treatment is 𝑋𝑋𝑖𝑖 = 𝑒𝑒 and its spatially lagged treatment, the 
spatially lagged IV, is 𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 = 𝒘𝒘𝑒𝑒�  , where 𝑾𝑾𝒊𝒊 represents the spatial 
weighting vector, a row vector, of the explanatory variable of region 𝑖𝑖. To specify the 
heterogeneous causal effect of the spatially lagged IV, denote 𝑋𝑋1𝑖𝑖 as region 𝑖𝑖’s latent 
treatment state when ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 = 1 and 𝑋𝑋0𝑖𝑖 as region 𝑖𝑖’s latent treatment state when 
∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 = 0. Therefore, the observed treatment state is latently represented as 
𝑋𝑋𝑖𝑖 = 𝑋𝑋0𝑖𝑖 + (𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖)�𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 (4.7) 
in which either 𝑋𝑋1𝑖𝑖  or 𝑋𝑋0𝑖𝑖  can be observed, and (𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖)  represents the 
heterogeneous causal effect of ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖  the spatially lagged IV. These notations 
comply with (or conform to) the independence assumption which states that the 
instrumental variable should have no association with latent outcome, nor should it 
have any association with latent treatment state. Specifically,  [{𝑌𝑌𝑖𝑖(𝑒𝑒,𝒘𝒘𝑒𝑒� );∀𝑒𝑒,𝒘𝒘𝑒𝑒� },𝑋𝑋1𝑖𝑖,𝑋𝑋0𝑖𝑖 ] ⫫�𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 (4.8) 
This implies that the effects of a spatially lagged IV should be similar to the effects of 
a random assignment. In other words, the spatially lagged IV should be uncorrelated 
with the explained variable or with the latent treatment state by the explanatory 
variable. 
    Similarly, the exclusion restriction is stated that 𝑌𝑌𝑖𝑖(𝑒𝑒,𝒘𝒘𝑒𝑒� ) is only the function of 𝑋𝑋𝑖𝑖, 
                                                          
3 This assumption is based on the latent index model (Heckman, 1978). Specifically, the binary values of the 
explanatory variable and the explained variable can be regarded as the ultimate choice, which is influenced by an 
unobserved decision process with the latent revenue and cost, as well as a random error, if necessary.  
 76 
but not ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 ; in other words, the spatially lagged IV influences the explained 
variable only through the explanatory variable. This is denoted as 
𝑌𝑌𝑖𝑖(𝑒𝑒, 0) = 𝑌𝑌𝑖𝑖(𝑒𝑒, 1), 𝑒𝑒 = 0, 1 (4.9) 
    When 𝝍𝝍 ≠ 𝟎𝟎 , namely the direct causal effect of the spatially lagged IV on the 
explained variable exists, or 𝝃𝝃 ≠ 𝟎𝟎, namely the indirect causal effect exists, or if both 
exist, the exclusion restriction is violated. 
    To compare the endogeneity in the spatially lagged IV estimation and in the OLS 
estimation, I first discuss the average treatment effect (ATE) in OLS, such that 
𝔼𝔼[𝑌𝑌𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑌𝑌1𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] + 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖] + 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] (4.10) 
where 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖] is the average treatment effect on the treated (ATT), the causal 
effect of interests. 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0]  is the selection bias, from the 
endogeneity suffered by the OLS estimate. 
    In the spatially lagged IV estimation, the spatially local average treatment effect 
(SLATE), in light of Angrist and Pischke (2009), is 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� =
𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] , when both the exclusion restriction and the independence 
assumption are satisfied. Here the SLATE is the causal effect that I’m interested in.  
    When there exists inter-regional correlation between the explanatory variables and 
the disturbances in the spatial autocorrelation of the unobserved confounders, or there 
exists inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the unobserved confounders, or both, the 
independence assumption is violated. When there exists no inter-regional direct or 
indirect causal effect of the explanatory variable on the explained variable, the 
exclusion restriction is satisfied. In this scenario, it is known that 
𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�(4.11) 
Because the exclusion restriction is satisfied, it is known that 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) = 𝑌𝑌0𝑖𝑖 , 
𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) = 𝑌𝑌1𝑖𝑖.  
Therefore,          𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] =  𝔼𝔼[𝑌𝑌0𝑖𝑖 + (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1]   =  𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] +  𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] (4.12) 
    Similarly,  
 77 
𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋0𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�         =  𝔼𝔼[𝑌𝑌0𝑖𝑖 + (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0]                       = 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] +  𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] (4.13) 
      As the exclusion restriction is satisfied, it is also known that 
𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] (4.14) 
      Therefore, the SLATE in the spatially lagged IV estimation becomes 
𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�   =  𝔼𝔼�(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋0𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� (4.15) 
which refers to the “restricted spatially local ATT”. As a result, the endogeneity of the 
spatially lagged IV estimate results from the “restrict spatially local ATT” in the 
spatially lagged IV estimate.   
    Compared with the ATE in the OLS estimation, it is easy to see that the SLATE in 
the spatially lagged IV estimation does not include a selection bias, implying that the 
extent of endogeneity of the spatially lagged IV estimation is smaller than the extent 
of endogeneity of the OLS estimation, which includes 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0], 
a selection bias.  
    Because of the violation of the independence assumption, (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖  and (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖 , the latent outcomes, are not independent from 𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 , the spatially 
lagged IV; specifically, 𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] ≠ (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖 , 
𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] ≠ (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖 . Because of the same reason, 𝑋𝑋1𝑖𝑖 and 
𝑋𝑋0𝑖𝑖 , the latent treatment states, are also not independent from 𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 ; specifically, 
𝔼𝔼[𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] ≠ 𝑋𝑋1𝑖𝑖, 𝔼𝔼[𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] ≠ 𝑋𝑋0𝑖𝑖. As a result, the SLATE cannot be 
simplified as 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖].  
    When there exists no inter-regional correlation between the explanatory variables 
and the disturbances in the spatial autocorrelation of the unobserved confounders, nor 
does there exist inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the unobserved confounders, the 
independence assumption is satisfied. However, when there exists inter-regional direct 
or indirect causal effect of the explanatory variable on the explained variable, the 
exclusion restriction is violated. In this scenario, to derive 
𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�, the SLATE, it is known that  
 78 
𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�(4.16) 
    Because the independence assumption is satisfied, it is known that                        𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�= 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖� (4.17) 
    Therefore,          𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] =  𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖� (4.18) 
    Similarly,  
   𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋0𝑖𝑖 = 0� =  𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋0𝑖𝑖� (4.19) 
  In addition, satisfying the independence assumption implies that  
𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖] 
  Therefore, the SLATE in the spatially lagged IV estimation becomes 
𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�    =  𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)−𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�−𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖)−𝑌𝑌𝑖𝑖(0,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖)�𝑋𝑋0𝑖𝑖�
𝔼𝔼[𝑋𝑋1𝑖𝑖−𝑋𝑋0𝑖𝑖]=  𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)−𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�(𝑋𝑋1𝑖𝑖−𝑋𝑋0𝑖𝑖)�
𝔼𝔼[𝑋𝑋1𝑖𝑖−𝑋𝑋0𝑖𝑖]= 𝔼𝔼[𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)]= 𝔼𝔼[𝑌𝑌1𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖]+𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖]= 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] + 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖]              (4.20)
 
in which 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖]  refers to the “relaxed spatially ATT”, and 
𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] refers to the “spatially selection bias”. As 
a result, the endogeneity of the spatially lagged IV estimate results from the “restricted 
spatially selection bias” in the spatially lagged IV estimate.   
    Compared with the ATE in the OLS estimation, it is easy to see that the SLATE inn 
the spatially lagged IV estimation includes a spatially selection bias, which could be 
greater than the selection bias in the OLS estimate. This implies that the extent of 
endogeneity of the spatially lagged IV estimation is greater than the extent of 
endogeneity of the OLS estimation.  
    When there exists inter-regional correlation between the explanatory variables and 
the disturbances in the spatial autocorrelation of the unobserved confounders, or there 
 79 
exists inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the unobserved confounders, or both, the 
independence assumption is violated. In addition, when there exists inter-regional 
direct or indirect causal effect of the explanatory variable on the explained variable, 
the exclusion restriction is also violated. In this scenario, to derive 
𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�, the SLATE, it is known that  
𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1� = 𝔼𝔼[𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] + 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1� (4.21) 
and similarly, 
𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼[𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] + 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� (4.22) 
    Therefore, the SLATE becomes the sum of the “relaxed spatially local ATT” in the 
spatially lagged IV estimate, that is, 
𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1� − 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0]  
and the “spatially local selection bias” in the spatially lagged IV estimate, that is, 
𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�
𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� . 
    As a result, when both the independence assumption and the exclusion restriction 
are violated, the endogeneity of the spatially lagged IV estimate is due to the “spatially 
local selection bias” and the “relaxed spatially local ATT” in the spatially lagged IV 
estimate. 
    Compared with the ATE in the OLS estimation, it is easy to see that the SLATE in 
the spatially lagged IV estimate, which violates both the independence assumption and 
the exclusion restriction, includes a “spatially local selection bias”, which could be 
greater than the selection bias in the OLS estimation. What’s more, it is also easy to 
see that the SLATE in the spatially lagged IV estimate, when only the independence 
assumption is violated, also includes the “relaxed spatially local ATT”, which is 
different from the “restricted spatially local ATT” in the spatially lagged IV estimate, 
when both the independence assumption and the exclusion restriction are violated. 
These imply that the extent of endogeneity of the spatially lagged IV estimate, which 
violates both the independence assumption and the exclusion restriction, is greater than 
 80 
that of the spatially lagged IV estimate, which violates only the independence 
assumption, and could also be greater than that in OLS. 
    To sum up, the reason why the OLS estimate suffers from the endogeneity is because 
it has the selection bias in its ATE. When the spatially lagged IV estimate only violates 
the spatial independence assumption, it suffers from the endogeneity, because the 
“restrict spatially local ATT” in its SLATE is different from the ATT in the OLS 
estimate’s ATE. When the spatially lagged IV estimate only violates the spatial 
exclusion restriction, it suffers from the endogeneity, because it has the “spatially local 
selection bias” in its SLATE. When the spatially lagged IV estimate violates both the 
spatial exclusion restriction and the spatial independence assumption, it suffers from 
the endogeneity because on one hand, it has the “spatially local selection bias” in its 
SLATE; on the other, the “relaxed spatially local ATT” in its SLATE is different from 
the ATT in the OLS estimate’s ATE. 
 
4.3. The Numerical Spatially Local Average Treatment Effects (SLATE) 
    In this section, I characterize the spatially local average treatment effects (SLATE) 
of the spatially lagged IV estimates numerically and compare it with the average 
treatment effect (ATE) of the OLS estimates. I demonstrate the spatially local average 
treatment effect theorem numerically, especially its two key properties: the spatial 
independence assumption and the spatial exclusion restriction. I find that a valid 
spatially lagged IV should satisfy the spatial independence assumption; that is, the 
explanatory variables should satisfy both the external and the internal spatial 
exogeneity. It should also satisfy the spatial exclusion restriction, both the direct and 
the indirect. 
 
A. The Spatial Independence 
    The first key property of a valid spatially lagged IV is the Spatial Independence 
Assumption. This property entails (1) the external spatial exogeneity of the explanatory 
variables, which means there is no inter-regional correlation between the explanatory 
variables and the disturbances in the spatial autocorrelation of the unobserved 
confounders, and (2) the internal exogeneity of the explanatory variables, which means 
there is no inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the explanatory variables. Both the 
 81 
external and the internal exogeneity ensure that 𝑾𝑾𝑿𝑿 has no correlation with the latent 
outcome, nor does it have correlation with the latent treatment condition. 
    The external exogeneity. To understand the external exogeneity of the spatially 
lagged IV, I start with the following standard spatial data generation process 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.23) 
𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝜸𝜸 (4.24) 
𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.25) 
where 𝑼𝑼 is the unobserved confounder. 𝑼𝑼 follows a spatial data generation process, 
and is derived as 
𝑼𝑼 = (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 (4.26) 
where (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1  is assumed to exist. This means the endogeneity from the 
unobserved confounder is due to the spatial autocorrelation of the unobserved 
confounders. 
    𝑾𝑾 is the 𝑛𝑛 × 𝑛𝑛 spatial weighting matrix. Let 𝜇𝜇𝑖𝑖𝑖𝑖 = 0, and assume that the spatial 
weighting matrix’s row elements sum to 1. 𝑾𝑾 is symmetric, that is, 𝑾𝑾′ = 𝑾𝑾. 
    Therefore, (4.23) becomes 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐 (4.27) 
    Assumption 1 discusses the external exogeneity of the explained variables in the 
spatial estimation. 
 
ASSUMPTION 1 (The external spatial exogeneity): There exists no inter-regional 
correlation between the explanatory variables and the disturbances in the spatial 
autocorrelation of the unobserved confounders, or specifically, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, 
although there may exist intra-regional correlation between the explanatory variables 
and the disturbances in the spatial autocorrelation of the unobserved confounders, or 
specifically, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗. 
 
    As a result of 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, 𝑿𝑿�, the OLS estimate, is biased and inconsistent. 
Specifically, 
𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 
    Given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, it is known that  
 82 
𝐸𝐸(𝑿𝑿′𝜸𝜸) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝑉𝑉1
⋮
𝑉𝑉𝑗𝑗
⋮
𝑉𝑉𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= �( � 𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑗𝑗=1 )𝐾𝐾𝑘𝑘=1 ≠ 0 
    Therefore, 𝐸𝐸(𝑿𝑿�) ≠ 𝑿𝑿. 
    Similarly, it is known that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝜸𝜸
𝑛𝑛
↛ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿� ↛ 𝑿𝑿, in other words, 
the OLS estimate is biased and inconsistent. ∎ 
    Given the external spatial exogeneity of the explanatory variables, Proposition 1 
demonstrates the conditions from which the endogeneity of the spatially lagged IV 
estimate is derived, and how these conditions make the spatially lagged IV valid. 
 
PROPOSITION 1 (The Special Spatial Independence Assumption): When there 
exists no inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the unobserved confounders, the spatially 
lagged IV estimate is unbiased and consistent. 
 
Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely 
the spatially lagged IV, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 
  Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that    
𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝑉𝑉1
⋮
𝑉𝑉𝑗𝑗
⋮
𝑉𝑉𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
 83 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . 
In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗
𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0. Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0, and 
thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿; in other 
words, the spatially lagged IV estimate is unbiased and consistent. ∎ 
    A typical assumption in the spatial data generation process is that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when 
observation 𝑖𝑖 and observation 𝑗𝑗 neighbor each other; otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. In this sense, 
Proposition 1 still holds. For the proof, see the Appendix. 
    The implication of the external exogeneity is that the spatially lagged IV works as a 
random shock. Specifically, on one hand, 𝑾𝑾𝑿𝑿, the spatial weighting matrix, has no 
correlation with 𝜸𝜸, the disturbances in the spatial autocorrelation of the unobserved 
confounders; therefore, 𝑾𝑾𝑿𝑿 has no correlation with the latent outcome. On the other 
hand, (4.25) implies that 𝑾𝑾𝑿𝑿 has a one-way causal effect on 𝑿𝑿; therefore, 𝑾𝑾𝑿𝑿 has no 
correlation with the latent treatment condition.  
    The internal exogeneity. The standard spatial model only assumes that the 
explanatory variables have no inter-regional correlation with the disturbances in the 
spatial autocorrelation of the unobserved confounders. However, when the unobserved 
confounder is not only influenced by its spatially lagged items, but also by the 
explanatory variables, it is also indispensable to assume that no inter-regional 
correlation exists between the explanatory variables and the disturbances in the spatial 
autocorrelation of the explanatory variables themselves. This is because when the 
unobserved confounder is influenced by the explanatory variables, the endogeneity 
from the unobserved confounder is not only due to the spatial autocorrelation of the 
unobserved confounders, but also to the spatial autocorrelation of the explanatory 
variables. 
    To discuss this internal exogeneity, I extend the standard spatial data generation 
 84 
process as 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.28) 
𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝝋𝝋𝑿𝑿 + 𝜸𝜸 (4.29) 
𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.30) 
    𝑼𝑼 follows a spatial data generation process, and is derived as 
𝑼𝑼 = (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸) (4.31) 
and 𝑿𝑿 also follows a spatial data generation process, and is derived as 
𝑿𝑿 = (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 (4.32) 
where (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1 and (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1 are assumed to exist. 
    Therefore, 𝑼𝑼  becomes (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] , which means the 
endogeneity from the unobserved confounder is not only due to the spatial 
autocorrelation of the unobserved confounders, but also to the spatial autocorrelation 
of the explanatory variables. Then 𝒀𝒀 becomes 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸)] + 𝝐𝝐 (4.33) 
    Assumption 2 discusses the internal exogeneity of the explained variables. 
 
ASSUMPTION 2 (The internal exogeneity): There exists no inter-regional 
correlation between the explanatory variables and the disturbances in the spatial 
autocorrelation of the explanatory variables themselves, or specifically, 𝑒𝑒𝑖𝑖𝜂𝜂𝑗𝑗 = 0 , 
when 𝑖𝑖 ≠ 𝑗𝑗 , although there may exist intra-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
explanatory variables themselves, or specifically, 𝑒𝑒𝑖𝑖𝜂𝜂𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗. 
 
    As a result of 𝑒𝑒𝑖𝑖𝜂𝜂𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, 𝑿𝑿�, the OLS estimate, is biased and inconsistent. 
Specifically, 
𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸] =  𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝑿𝑿′𝜼𝜼 
            +(𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 
    As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, 
it is known that 𝐸𝐸(𝑿𝑿′𝜸𝜸) ≠ 0 and 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝜸𝜸
𝑛𝑛
↛ 0.  
 85 
    Similarly, given what Assumption 2 implies that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, it is known 
that  
𝐸𝐸(𝑿𝑿′𝜼𝜼) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜂𝜂1
⋮
𝜂𝜂𝑗𝑗
⋮
𝜂𝜂𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= �( � 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑗𝑗=1 )𝐾𝐾𝑘𝑘=1 ≠ 0 
    Similar derivation shows that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝜼𝜼
𝑛𝑛
↛ 0 . As a result, 𝐸𝐸(𝑿𝑿�) ≠ 𝑿𝑿 , and 
𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿� ↛ 𝑿𝑿, in other words, the OLS estimate is biased and inconsistent. ∎ 
    Given both the external and the internal exogeneity of the explanatory variables, 
Proposition 2 demonstrates the conditions from which the endogeneity of the spatially 
lagged IV estimate is derived, and how these conditions make the spatially lagged IV 
valid. 
 
PROPOSITION 2 (The General Spatial Independence Assumption): When there 
exists no inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the unobserved confounders, and there 
exists no inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the explanatory variables themselves 
either, the spatially lagged IV estimate is unbiased and consistent. 
 
Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely 
the spatially lagged IV, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 
                +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 
    As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, 
it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0.  
 86 
    Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is 
known that     
𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜂𝜂1
⋮
𝜂𝜂𝑗𝑗
⋮
𝜂𝜂𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . 
As Assumption 2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗
𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0 , 
because 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0  and 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0, 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similar derivation also shows that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜼𝜼
𝑛𝑛
→ 0 . Because 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0 , 
𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿 ; in other words, the spatially lagged IV estimate is unbiased and 
consistent. ∎ 
    A typical assumption in spatial data generation process is that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0  when 
observation 𝑖𝑖 and observation 𝑗𝑗 neighbor each other; otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. In this sense, 
Proposition 2 still holds. For the proof, see the Appendix. 
    The internal exogeneity also implies that the spatially lagged IV works as a random 
shock. Specifically, 𝑾𝑾𝑿𝑿, the spatial weighting matrix, has neither correlation with 𝜸𝜸, 
nor with 𝜼𝜼, the disturbances in the spatial autocorrelation of the explanatory variables; 
therefore, 𝑾𝑾𝑿𝑿 has no correlation with the latent outcome. In addition, 𝑾𝑾𝑿𝑿’s one-way 
causal effect on 𝑿𝑿  means that 𝑾𝑾𝑿𝑿  has no correlation with the latent treatment 
condition.  
 
B. The Spatial Exclusion 
    The second key property of a valid spatially lagged IV is the Spatial Exclusion 
 87 
Restriction. This property entails (1) the direct spatial exclusion restriction of the 
spatially lagged IV, which means the spatially lagged IV has no direct causal impact 
on the explained variable, and (2) the indirect spatial exclusion restriction of the 
spatially lagged IV, which means the spatially lagged IV has no indirect causal impact 
on the explained variable. Both the direct and the indirect spatial exclusion restrictions 
ensure that the spatially lagged IV influences the explained variable only through the 
explanatory variable, excluding other influencing channels of the spatially lagged IV 
on the explained variable. 
    The direct spatial exclusion restriction. To understand the direct spatial exclusion 
restriction of the spatially lagged IV, I add 𝝍𝝍𝑾𝑾𝑿𝑿 , the direct causal impact of the 
spatially lagged IV on the explained variable, to the standard spatial data generation 
process, such that 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐 (4.34) 
𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝜸𝜸 (4.35) 
𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.36) 
    To derive the OLS estimate, we have 
𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 + 𝝍𝝍𝑿𝑿′𝑾𝑾𝑿𝑿 
because as discussed in Assumption 1, 𝐸𝐸(𝑿𝑿′𝜸𝜸) ≠ 0 , so 𝐸𝐸[(𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 −
𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸] ≠ 0 . Therefore, whether 𝝍𝝍 ≠ 𝟎𝟎  or not, 𝑿𝑿�  is biased. As for the 
asymmetric property, as discussed in Assumption 1, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝜸𝜸
𝑛𝑛
↛ 0 . Therefore, 
whether 𝝍𝝍 ≠ 𝟎𝟎 or not, 𝑿𝑿� is inconsistent. ∎ 
    Proposition 3 demonstrates the conditions from which the endogeneity of the 
spatially lagged IV estimate is derived, and how these conditions make the spatially 
lagged IV valid. 
 
PROPOSITION 3 (The Direct Exclusion Restriction): When the spatially lagged 
IV has no direct causal impact on the explained variable, the spatially lagged IV 
estimate is unbiased and consistent. 
 
Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely 
the spatially lagged IV, it is derived that 
 88 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 + 𝝍𝝍(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾′𝑾𝑾𝑿𝑿 
    Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is derived that 
𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0  and that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0 . However, when 𝝍𝝍 = 0 , we have 
𝝍𝝍(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝑾𝑾𝑿𝑿 = 0 . In this case, 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿 , and 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿 . In other 
words, the spatially lagged IV estimate is unbiased and consistent. ∎ 
    The implication of the direct exclusion restriction is that the spatially lagged IV 
influences the explained variable only through the explanatory variable, excluding the 
direct effect of the spatially lagged IV on the explained variable. Specifically, as is 
shown in (4.34), when 𝝍𝝍 = 0 , the explained variable is merely a function of the 
explanatory variable, excluding the spatially lagged IV. 
    The indirect exclusion restriction. In addition to the direct spatial exclusion 
restriction, a valid spatially lagged IV should also satisfy the indirect spatial exclusion 
restriction. To understand this, I add 𝝃𝝃𝑾𝑾𝑿𝑿, the indirect causal impact of the spatially 
lagged IV on the explained variable, in other words, the impact of the spatially lagged 
IV on the unobserved confounder, to the standard spatial data generation process, such 
that 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.37) 
𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝝃𝝃𝑾𝑾𝑿𝑿 + 𝜸𝜸 (4.38) 
𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.39) 
    𝑼𝑼 follows a spatial data generation process, and is derived as 
𝑼𝑼 = (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝃𝝃𝑾𝑾𝑿𝑿 + 𝜸𝜸) (4.40) 
and 𝑿𝑿 also follows a spatial data generation process, and is derived as 
𝑿𝑿 = (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 (4.41) 
where (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1 and (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1 are assumed to exist. 
    Because the endogeneity from the unobserved confounder is not only due to the 
spatial autocorrelation of the unobserved confounders, but also to the spatial 
autocorrelation of the explanatory variables, 𝑼𝑼  becomes (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 −
𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] and then 𝒀𝒀 becomes 
𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸]] + 𝝐𝝐 (4.42) 
 89 
    As a result of 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, as is discussed in Assumption 1, 𝑿𝑿�, the OLS 
estimate, is biased and inconsistent. Specifically, 
𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′�𝑿𝑿𝑿𝑿 + 𝑼𝑼�(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸]� + 𝝐𝝐� = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝑿𝑿′𝜼𝜼 
                    +(𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 
    On one hand, as is discussed in Assumption 2, 𝐸𝐸(𝑿𝑿′𝜸𝜸) ≠ 0 . On the other, as is 
discussed in Assumption 2, 𝐸𝐸(𝑿𝑿′𝜼𝜼) ≠ 0. Then no matter whether 𝝃𝝃 ≠ 𝟎𝟎 or not, 𝑿𝑿� is 
biased. As for the asymmetric property, because as is discussed in Assumption 2, 
𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝜸𝜸
𝑛𝑛
↛ 0 . As is discussed in Assumption 1, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝜼𝜼
𝑛𝑛
↛ 0 . Then no matter 
whether 𝝃𝝃 ≠ 𝟎𝟎 or not, 𝑿𝑿� is inconsistent. ∎ 
    Proposition 4 demonstrates the conditions from which the endogeneity of the 
spatially lagged IV estimate is derived, and how these conditions make the spatially 
lagged IV valid. 
 
PROPOSITION 4 (The Indirect Exclusion Restriction): When the spatially lagged 
IV has no direct causal impact on the explained variable, the spatially lagged IV 
estimate is unbiased and consistent. 
 
Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely 
the spatially lagged IV, it is derived that = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−𝟏𝟏[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−𝟏𝟏𝜼𝜼 + 𝜸𝜸]] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−𝟏𝟏𝑿𝑿′𝑾𝑾𝜼𝜼+ (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−𝟏𝟏𝑿𝑿′𝑾𝑾𝜸𝜸 
    Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is derived that 
𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0  and that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0 . Given what Assumption 2 implies that 
𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , it is derived that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0 and that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜼𝜼
𝑛𝑛
→ 0 . 
However, only when 𝝃𝝃 = 0 , we have (𝑿𝑿′𝑾𝑾𝑿𝑿)−𝟏𝟏𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−𝟏𝟏𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 −
𝝉𝝉𝑾𝑾)−𝟏𝟏𝑿𝑿′𝑾𝑾𝜼𝜼 = 0. In this case, 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿, and 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿. In other words, the 
spatially lagged IV estimate is unbiased and consistent. ∎ 
 90 
    The implication of the indirect exclusion restriction is that the spatially lagged IV 
influences the explained variable only through the explanatory variable, excluding the 
indirect effect of the spatially lagged IV on the explained variable. Specifically, as is 
shown in (4.38), when 𝝃𝝃 = 0, the unobserved confounder is merely influenced by its 
spatially lagged items, and thus the explained variable is merely a function of the 
explanatory variable, excluding the spatially lagged IV. 
 
C. The Spatially Local Average Treatment Effect 
    Section 4.2 discusses the Local Average Treatment Effects with the spatially lagged 
IV. In Section 4.3. C., I introduce the Spatially Local Average Treatment Effect 
(SLATE) Theorem. 
 
    Theorem 1: The Spatially Local Average Treatment Effect Theorem, which 
contains 
1. (The Spatially Independence Assumption): The External Exogeneity implies that 
there exists no inter-regional correlation between the explanatory variables and the 
disturbances in the spatial autocorrelation of the unobserved confounders, or 
specifically, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗. The Internal Exogeneity implies that there exists 
no inter-regional correlation between the explanatory variables and the disturbances in 
the spatial autocorrelation of the explanatory variables themselves, or specifically, 
𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 . In this sense, the spatially lagged IV works as a random 
assignment, specifically, [{𝑌𝑌𝑖𝑖(𝑒𝑒,𝒘𝒘𝑒𝑒� );∀𝑒𝑒,𝒘𝒘𝑒𝑒� },𝑋𝑋1𝑖𝑖,𝑋𝑋0𝑖𝑖 ] ⫫ 𝑾𝑾𝑿𝑿 (4.43) 
 
2. (The Spatially Exclusion Restriction): The Direct Spatial Exclusion Restriction of 
the spatially lagged IV implies that the spatially lagged IV has no direct causal impact 
on the explained variable, or specifically, 𝝍𝝍 = 0 . The Indirect Spatial Exclusion 
Restriction of the spatially lagged IV, which means the spatially lagged IV has no 
indirect causal impact on the explained variable, or specifically, 𝝃𝝃 = 0. In this sense, 
the spatially lagged IV influences the explained variable simply through the 
endogenous explanatory variable, specifically, 
𝑌𝑌𝑖𝑖(𝑒𝑒, 0) = 𝑌𝑌𝑖𝑖(𝑒𝑒, 1), 𝑒𝑒 = 0, 1 (4.44) 
 
 91 
3. (The Existence of First Stage): The endogenous explanatory variable is relevant with 
the spatially lagged IV, that is, 𝝉𝝉 ≠ 0 ; in other words, the first stage of the 2SLS 
estimation exists, specifically, 
𝐸𝐸[𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖] ≠ 0 
 
4. (Monotonicity): 𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖 ≥ 0, 𝑖𝑖 
    When satisfying these four assumptions, the spatially lagged IV estimate is unbiased 
and consistent. 
   
    The implication of the spatially local average treatment effect is not the average 
causal effect of all, but only the average causal effect of the compliers, which is 
discussed in detail in the next section. 
 
4.4. The Dynamic Spatially Local Average Treatment Effects (SLATE) 
    This section numerically discusses the dynamic spatially local average treatment 
effect (SLATE). The spatially local average treatment effect is the average causal effect 
of the compliers, excluding the always-takers and the never-takers. Therefore, a 
common empirical strategy that excludes the always-takers and the never-takers is 
excluding the pioneer regions and straggler regions, when using the spatially lagged 
IV method.  
    More generally to the pioneers and stragglers, suppose the treatment spreads in 
different waves, the spatial weighting matrix is no longer symmetric but partially 
asymmetric. In this case, satisfying the SLATE theorem still makes the spatially lagged 
IV estimate unbiased and consistent; in other words, the SLATE theorem holds 
dynamically.  
 
A. The Compliers of the Spatially Local Average Treatment Effect 
    In light of the Local Average Treatment Effect Theorem, each observation can be 
classified, by its response to the spatially lagged IV, as compliers (𝑋𝑋1𝑖𝑖 = 1 and 𝑋𝑋0𝑖𝑖 =0), always-takers (𝑋𝑋1𝑖𝑖 = 1 and 𝑋𝑋0𝑖𝑖 = 1) and never-takers (𝑋𝑋1𝑖𝑖 = 0 and 𝑋𝑋0𝑖𝑖 = 0). The 
spatially local average treatment effect is the average causal effect of the complier 
regions. However, what we are interested in is the average causal effect of the treated 
regions, that is, 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] , of which the regions include the always-taker 
 92 
regions and the complier regions which select to take the treatment when 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1. 
When the spatial independence assumption is satisfied, that is, the spatially lagged IV 
works as a random assignment, the average causal effect of the treated regions is 
𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 1] ∙ 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] 
where  𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 1] is the average causal effect of the always-takers, and 
𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] is the average causal effect of the compliers which select to 
take the treatment when 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1 . It is also known that 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] +
𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] = 1. 
    Similarly, the average causal effect of the treated is 
𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 0] ∙ 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 0|𝑋𝑋𝑖𝑖 = 0] +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 0] 
where  𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 0]  is the average causal effect of the never-takers, and 
𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] is the average causal effect of the compliers who select not to 
take the treatment when 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1 . It is also known that 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 0|𝑋𝑋𝑖𝑖 = 0] +
𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 0] = 1. 
    As a result, the average causal effect is  
𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] ∙ 𝑃𝑃[𝑋𝑋𝑖𝑖 = 1] 
                          +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] ∙ 𝑃𝑃[𝑋𝑋𝑖𝑖 = 1] 
which is a weighted average of the average causal effect of the compliers, that of the 
always-takers, and that of the never-takers. 
    From these derivations, it is known that the spatially lagged IV estimate cannot 
distinguish the average causal effect of the compliers, that of the always-takers, and 
that of the never-takers. Therefore, spatially local average treatment effect (SLATE) is 
not the average causal effect, unless there exist no always-takers or never-takers. 
 
B. Pioneers and Stragglers 
    In the spatial data generation process, a typical scenario of always-takers is the 
pioneers, and a typical scenario of never-takers is the stragglers. Consider a type of 
treatment (policy, institutional change or unexpected shock), which happens in a 
region within a province or a state, and does not happen in any other neighboring or 
related region in that province or state; in this sense, the region with that treatment is 
named pioneer region. Similarly, if that treatment happens in almost all regions within 
 93 
a province or state, excluding a region; in this sense, the region without that treatment 
is named straggler region. 
    An example of pioneer regions and straggler regions regards the introduction of 
local direct elections for village leaders in rural China. In late 1990s, the Chinese 
central government stipulated that village leaders be directly elected by local village 
residents, namely the local direct election. Before implementing this policy to all 
villages, each province in China selected a limited number of villages as pioneers to 
introduce the local direct election as a kind of policy experiment. Therefore, the 
pioneer villages are always-takers in terms of the implementation of the local direct 
election, or specifically, {𝑋𝑋𝑖𝑖 = 1 ⊥ 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖}. Similarly, after implementing this to almost 
all villages, there exist a limited number of villages as stragglers in each province in 
China that have not introduced the local direct election. Therefore, the straggler 
villages are never-takers in terms of the implementation of the local direct election, or 
specifically, {𝑋𝑋𝑖𝑖 = 0 ⊥ 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖} . (Martinez-Bravo et al., 2012; Martinez-Bravo et al., 
2017; Wong et al., 2017). 
    The implication of pioneers and stragglers is that when using the spatially lagged 
IV method, excluding pioneers and stragglers makes the spatially local average 
treatment effect approximately to the average causal effect. Besides, if the pioneers 
and stragglers are a small portion of the whole sample set, the spatially local average 
treatment effect is also approximately equal to the average causal effect. Specifically, 
𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] 
                +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 0] 
where 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖] is the average causal effect, and it is equal to the spatially local 
average treatment effect. 
    A natural question arises that how the SLATE theorem addresses the problem that 
the spatially local average treatment effect does not equal to the average causal effect; 
in other words, if no pioneers or stragglers are excluded, and if the spatial 
independence assumption and the spatial exclusion restriction are satisfied, is the 
spatially lagged IV estimate still unbiased and consistent? 
    Suppose no pioneers or stragglers are excluded, then the spatial weighting matrix is 
no longer symmetric. Specifically, define 𝜇𝜇𝑖𝑖𝑗𝑗 the spatial correlation from region 𝑖𝑖 on 
region 𝑗𝑗, and 𝜇𝜇𝑗𝑗𝑖𝑖 as the spatial correlation from region 𝑗𝑗 on region 𝑖𝑖. Without losing 
generality, let 𝑖𝑖 = 1 represent the pioneer region, and let 𝑖𝑖 = 𝑁𝑁 represent the pioneer 
 94 
region. As a result, 𝜇𝜇12 ≠ 𝜇𝜇21, 𝜇𝜇13 ≠ 𝜇𝜇31,…, 𝜇𝜇1𝑁𝑁 ≠ 𝜇𝜇𝑁𝑁1, and the implication is that 
due to its “pioneering role” in spreading the treatment, the pioneer region’s spatial 
impact on other regions is different from the other way around. Similarly, 𝜇𝜇𝑁𝑁1 ≠ 𝜇𝜇1𝑁𝑁, 
𝜇𝜇𝑁𝑁2 ≠ 𝜇𝜇2𝑁𝑁, …, 𝜇𝜇𝑁𝑁,𝑁𝑁−1 ≠ 𝜇𝜇𝑁𝑁−1,𝑁𝑁, and the implication is that due to its “straggling 
role” in spreading the treatment, the straggler region’s spatial impact on other regions 
is also different from the other way around.  
    Then Corollary 1 discusses upon satisfying the external exogeneity in the spatial 
independence assumption, whether the spatially lagged IV estimate excluding pioneers 
and stragglers is the same as that including them. 
 
COROLLARY 1: When there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
unobserved confounders, the spatially lagged IV estimate is unbiased and consistent, 
either excluding pioneer regions and straggler regions or not. 
 
Proof: Denote ?̇?𝑾 as the spatial weighting matrix excluding pioneers and stragglers, ?̇?𝑿 
as the explanatory variables excluding pioneers and stragglers, and ?̇?𝒀  as the 
explanatory variables excluding pioneers and stragglers. Using the spatially lagged IV 
method, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = ��?̇?𝑾?̇?𝑿�′?̇?𝑿�−1 �?̇?𝑾?̇?𝑿�′?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾 �?̇?𝑿𝑿𝑿 + 𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝜸 + ?̇?𝝐� = 𝑿𝑿 + �?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾?̇?𝜸 
Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that    
𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜸� 
=
⎣
⎢
⎢
⎢
⎡
𝑒𝑒22 … 𝑒𝑒𝑖𝑖2 … 𝑒𝑒𝑁𝑁−1,2
⋮
𝑒𝑒2𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁−1,𝑘𝑘
⋮
𝑒𝑒2𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁−1,𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇22 … 𝜇𝜇𝑖𝑖2 … 𝜇𝜇𝑁𝑁−1,2
⋮
𝜇𝜇2𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁−1,𝑗𝑗
⋮
𝜇𝜇2,𝑁𝑁−1 … 𝜇𝜇𝑖𝑖,𝑁𝑁−1 … 𝜇𝜇𝑁𝑁−1,𝑁𝑁−1⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝑉𝑉1
⋮
𝑉𝑉𝑗𝑗
⋮
𝑉𝑉𝑁𝑁−1⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2
�
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
 95 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0 . 
In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗
𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. Accordingly, it is known that 𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜸� = 0, and 
thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
?̇?𝑿′?̇?𝑾?̇?𝜸
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿; in other 
words, the spatially lagged IV estimate is unbiased and consistent. 
    Suppose including pioneers and stragglers, the spatial weighting matrix is 
asymmetric, that is, 𝑾𝑾 ≠ 𝑾𝑾′. Using the spatially lagged IV method, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾′𝜸𝜸 
    Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that    
 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝑉𝑉1
⋮
𝑉𝑉𝑗𝑗
⋮
𝑉𝑉𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
+� � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗
𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
 96 
+� � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗
𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0 . 
In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗
𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0  and ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁𝐾𝐾𝑘𝑘=1 = 0 . 
Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other 
words, the spatially lagged IV estimate is unbiased and consistent. ∎ 
    Corollary 2 discusses upon satisfying the internal exogeneity in the spatial 
independence assumption, whether the spatially lagged IV estimate excluding pioneers 
and stragglers is the same as that including them. 
 
COROLLARY 2: When there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
explanatory variable, the spatially lagged IV estimate is unbiased and consistent, either 
excluding pioneer regions and straggler regions or not. 
 
Proof: Denote ?̇?𝑾 as the spatial weighting matrix excluding pioneers and stragglers, ?̇?𝑿 
as the explanatory variables excluding pioneers and stragglers, and ?̇?𝒀  as the 
explanatory variables excluding pioneers and stragglers. Using the spatially lagged IV 
method, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = ��?̇?𝑾?̇?𝑿�′?̇?𝑿�−1 �?̇?𝑾?̇?𝑿�′?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾 �?̇?𝑿𝑿𝑿 + 𝑼𝑼 ��𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1�𝝋𝝋?̇?𝑿 + ?̇?𝜸�� + ?̇?𝝐� = 𝑿𝑿 + �?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾 �𝝋𝝋�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝜼 + ?̇?𝜸� = 𝑿𝑿 + �?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾?̇?𝜸 
                +�?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1𝝋𝝋�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾?̇?𝜼 
    As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, 
it is known that 𝐸𝐸(?̇?𝑿′?̇?𝑾?̇?𝜸) = 0 and 𝑝𝑝 lim
𝑛𝑛→∞
?̇?𝑿′?̇?𝑾?̇?𝜸
𝑛𝑛
→ 0.  
 97 
    Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is 
known that    
   𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜼� 
=
⎣
⎢
⎢
⎢
⎡
𝑒𝑒22 … 𝑒𝑒𝑖𝑖2 … 𝑒𝑒𝑁𝑁−1,2
⋮
𝑒𝑒2𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁−1,𝑘𝑘
⋮
𝑒𝑒2𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁−1,𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇22 … 𝜇𝜇𝑖𝑖2 … 𝜇𝜇𝑁𝑁−1,2
⋮
𝜇𝜇2𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁−1,𝑗𝑗
⋮
𝜇𝜇2,𝑁𝑁−1 … 𝜇𝜇𝑖𝑖,𝑁𝑁−1 … 𝜇𝜇𝑁𝑁−1,𝑁𝑁−1⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜂𝜂1
⋮
𝜂𝜂𝑗𝑗
⋮
𝜂𝜂𝑁𝑁−1⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2
�
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. 
In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗
𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. Accordingly, it is known that 𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜼� = 0, and 
thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
?̇?𝑿′?̇?𝑾?̇?𝜼
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other 
words, the spatially lagged IV estimate is unbiased and consistent. 
    Suppose including pioneers and stragglers, the spatial weighting matrix is 
asymmetric, that is, 𝑾𝑾 ≠ 𝑾𝑾′. Using the spatially lagged IV method, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 
                +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 
    As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, 
it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0.  
    Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is 
 98 
known that    
𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜂𝜂1
⋮
𝜂𝜂𝑗𝑗
⋮
𝜂𝜂𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
                       +� � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗
𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁−1
𝑗𝑗=2
𝐾𝐾
𝑘𝑘=1
 
                       +� � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗
𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. 
In addition, as Assumption 2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗
𝑁𝑁−1
𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0  and ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁𝐾𝐾𝑘𝑘=1 = 0 . 
Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜼𝜼) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜼𝜼
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other 
words, the spatially lagged IV estimate is unbiased and consistent. ∎ 
    To sum up, upon satisfying the spatial independence assumption, together with the 
spatial exclusion restriction if necessary, the spatially lagged IV estimate is unbiased 
and consistent, no matter whether the pioneers and stragglers are excluded or not.  
 
C. The Numerical Dynamic Spatially Local Average Treatment Effect (SLATE) 
    A natural generalization of pioneers and stragglers lies in multiple waves of 
treatment implemented to an area, namely the dynamic spatially local average 
treatment effect (SLATE). This case, similar to the pioneer and straggler one, has the 
asymmetric spatial weighting matrix. Without losing generality, suppose the treatment 
 99 
is implemented in three waves. Let 𝑖𝑖 = 1 …𝑃𝑃 represent the first wave of regions, and 
let 𝑖𝑖 = 𝑆𝑆…𝑁𝑁 represent the third, and also the last, waves of regions.  
    Then Corollary 3 discusses upon satisfying the external exogeneity in the spatial 
independence assumption, whether the spatially lagged IV estimate is still unbiased 
and consistent when there exist multiple waves of implementation of the treatment 
(without losing generality, suppose there exist three waves of implementation of the 
treatment). 
 
COROLLARY 3: When there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
unobserved confounders, the spatially lagged IV estimate is unbiased and consistent, 
even if there exist multiple waves of implementation of the treatment. 
 
Proof: See the Appendix. ∎ 
    Corollary 4 discusses upon satisfying the internal exogeneity in the spatial 
independence assumption, whether the spatially lagged IV estimate is still unbiased 
and consistent when there exist multiple waves of implementation of the treatment 
(without losing generality, suppose there exist three waves of implementation of the 
treatment). 
 
COROLLARY 4: When there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
explanatory variable, the spatially lagged IV estimate is unbiased and consistent, even 
if the treatment has multiple waves of implementation. 
 
Proof: See the Appendix. ∎ 
    Accordingly, Theorem 2 discusses the SLATE theorem with multiple waves of 
implementation of the treatment: 
 
    Theorem 2: The Dynamic Spatially Local Average Treatment Effect Theorem 
(SLATE), which states that  
  Upon satisfying the SLATE theorem, including the spatial independence assumption, 
and the spatial exclusion restriction if necessary, the spatially lagged IV estimate is 
 100 
unbiased and consistent, even if there exist multiple waves of implementation of the 
treatment. 
    The implication of the dynamic SLATE theorem is that as the time-varying and 
regional varying unobserved factors are incorporated in the random errors of the 
spatial autocorrelation of either the unobserved confounder or the explanatory variable, 
the endogenous explanatory variable is uncorrelated with those unobserved factors, as 
is implied by the spatial independence assumption. In a word, the SLATE theorem 
holds given the dynamic implementation of the treatment. 
 
4.5. Conclusion 
    This paper introduces the spatially local average treatment effect (SLATE) theorem 
to discuss the validity of the spatially lagged IV. This paper finds that when the 
spatially lagged IV satisfies the spatial independence assumption, including the 
external and internal exogeneity, and the spatial exclusion restriction, both direct and 
indirect, the spatially lagged IV estimate is unbiased and consistent. Even if there exist 
multiple waves of implementation of the treatment, with pioneers and stragglers as a 
distinct example, the spatially lagged IV method is still valid, namely the dynamic 
spatially local average treatment effect. These findings imply that using the spatially 
lagged explanatory variable as the IV helps address the endogeneity, especially when 
lacking in other valid IVs.  
    Most applied researchers pay sufficient attention to the exclusion restriction in the 
LATE framework, yet tend to ignore the independence assumption (Wang and 
Bellemare, 2020). In both the LATE and the SLATE frameworks, it is relatively easier 
to identify whether the exclusion restriction is satisfied in a given data generation 
process, especially with the theoretical argument. However, it is quite difficult to 
identify the satisfaction of the independence assumption, as “working as a random 
assignment” is abstract. The spatially independence assumption, fortunately, provides 
an easier way to identify it, because it limits the discussion to the spatial dimension, 
which refers to the external and internal exogeneity.       
    It is usually challenging to use observational data, rather than experimental data, to 
identify the treatment effect of explanatory variables (Angrist and Krueger, 2001; 
Freeman, 2005). On the other hand, using experimental data in causal identification 
usually lacks underlying theoretical relationships (Rosenzweig and Wolpin, 2000). 
 101 
This paper demonstrates that the spatially lagged IV is valid, and requires no other 
data. With the accumulation of spatial data sets, the empirical studies could suffer less 
from the endogeneity concern.  
 102 
References 
 
Alesina, Alberto, and Guido Tabellini. 2007. “Bureaucrats or Politicians? Part I: A Single 
Policy Task.” American Economic Review 97 (1): 169-179. 
Anderson, Theodore Wilbur, and Cheng Hsiao. "Estimation of Dynamic Models with 
Error Components." Journal of the American Statistical Association 76, no. 375 
(1981): 598-606. 
Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. "Identification of Causal 
Effects Using Instrumental Variables." Journal of the American Statistical 
Association 91, no. 434 (1996): 444-455. 
Angrist, Joshua D., and Alan B. Krueger. "Instrumental Variables and the Search for 
Identification: From Supply and Demand to Natural Experiments." Journal of 
Economic Perspectives 15, no. 4 (2001): 69-85. 
Angrist, Joshua D., and Kevin Lang. "Does school integration generate peer effects? 
Evidence from Boston's Metco Program." American Economic Review 94, no. 5 
(2004): 1613-1634. 
Angrist, Joshua, and Jörn-Steffen Pischke. "Mostly Harmless Econometrics: An 
Empiricists Guide." Princeton: Princeton University Press, 2009. 
Bai, Ying, and Ruixue Jia. 2016. “Elite Recruitment and Political Stability: The Impact of 
the Abolition of China’s Civil Service Exam.” Econometrica 84 (2): 677-733. 
Baker, Wayne E., and Robert R. Faulkner. 1993. “The Social Organization of Conspiracy: 
Illegal Networks in the Heavy Electrical Equipment Industry.” American 
Sociological Review 58 (6): 837-860 
Bell, Daniel A. 2016. The China Model: Political Meritocracy and the Limits of 
Democracy. Princeton, NJ: Princeton University Press.  
Bellemare, Marc F., Takaaki Masaki, and Thomas B. Pepinsky. "Lagged Explanatory 
Variables and the Estimation of Causal Effect." The Journal of Politics 79, no. 3 
(2017): 949-963. 
Besley, Timothy. 2005. “Political Selection.” Journal of Economic Perspectives 19 (3): 
43-60. 
 103 
Besley, Timothy. 2006. Principled Agents: The Political Economy of Good Government. 
New York, NY: Oxford University Press on Demand.  
Besley, Timothy, Jose G. Montalvo, and Marta Reynal-Querol. 2011. “Do Educated 
Leaders Matter?” The Economic Journal 121 (554): F205-227. 
Bloom, Nicholas, Renata Lemos, Raffaella Sadun, and John Van Reenen. 2015. “Does 
Management Matter in Schools?” The Economic Journal 125 (584): 647-674. 
Blundell, Richard, and Stephen Bond. "Initial Conditions and Moment Restrictions in 
Dynamic Panel Data Models." Journal of Econometrics 87, no. 1 (1998): 115-143. 
Blundell, Richard, and Stephen Bond. "GMM Estimation with Persistent Panel Data: An 
Application to Production Functions." Econometric Reviews 19, no. 3 (2000): 321-
340. 
Bronars, Stephen G., and Jeff Grogger. "The Economic Consequences of Unwed 
Motherhood: Using Twin Births as A Natural Experiment." The American Economic 
Review (1994): 1141-1156. 
Central Committee of the Chinese Communist Party. 1999. Working Regulation on the 
Rural Grassroots Organizations of the Chinese Communist Party (WRRGOCCP). 
(In Chinese) 
    http://news.12371.cn/2015/03/11/ARTI1426061212036535.shtml 
Chan, Joseph. 2013. “Political Meritocracy and Meritorious Rule: A Confucian 
Perspective.” The East Asian Challenge for Democracy: Political Meritocracy in 
Comparative Perspective edited by Daniel A. Bell, and Chenyang Li, New York, NY : 
Cambridge University Press. 
Dal Bó, Ernesto, Frederico Finan, Olle Folke, Torsten Persson, and Johanna Rickne. 2017. 
“Who Becomes a Politician?” The Quarterly Journal of Economics 132 (4): 1877-
1914. 
De Janvry, Alain, Frederico Finan, and Elisabeth Sadoulet. 2012. “Local Electoral 
Incentives and Decentralized Program Performance.” Review of Economics and 
Statistics 94 (3): 672-685. 
Elman, Benjamin A. 2013. Civil Examinations and Meritocracy in Late Imperial China. 
Cambridge, MA: Harvard University Press.  
Ferraz, Claudio, and Frederico Finan. 2011. “Electoral Accountability and Corruption: 
 104 
Evidence from the Audits of Local Governments.” American Economic Review 101 
(4): 1274-1311. 
Freedman, David A. "Linear Statistical Models for Causation: A Critical Review." Wiley 
StatsRef: Statistics Reference Online (2005). 
Ghatak, Maitreesh. 1999. “Group Lending, Local Information and Peer Selection.” 
Journal of Development Economics 60 (1): 27-50.  
Hamilton, Alexander, James Madison, and John Jay. 2008. The Federalist Papers. New 
York City, NY: Oxford University Press.  
Hayek, Friedrich August. 1945. “The Use of Knowledge in Society.” The American 
Economic Review 35 (4): 519-530. 
Heckman, James J. "Dummy Endogenous Variables in a Simultaneous Equation System." 
Econometrica 46, no. 4 (1978): 931-959. 
Imbens, Guido. Instrumental Variables: An Econometrician's Perspective. No. w19983. 
National Bureau of Economic Research, 2014. 
Imbens, Guido W., and Joshua D. Angrist. "Identification and Estimation of Local Average 
Treatment Effects." Econometrica 62, no. 2 (1994): 467-475. 
Jones, Benjamin F., and Benjamin A. Olken. 2005. “Do Leaders Matter? National 
Leadership and Growth since World War II.” The Quarterly Journal of Economics 
120 (3): 835-864. 
Kazin, Michael, Rebecca Edwards, and Adam Rothman, eds. 2009. The Princeton 
Encyclopedia of American Political History. (Two volume set). Princeton, NJ: 
Princeton University Press. 
Koop, Gary, Dale J. Poirier, and Justin L. Tobias. 2007. Bayesian Econometric Methods. 
New York City, NY: Cambridge University Press. 
Krueger, Alan B. "Experimental Estimates of Education Production Functions." The 
Quarterly Journal of Economics 114, no. 2 (1999): 497-532. 
Laffont, Jean-Jacques. 2001. Incentives and Political Economy. New York City, NY: 
Oxford University Press.  
Lancaster, Tony. 2004. An Introduction to Modern Bayesian Econometrics. Oxford: 
Blackwell. 
 105 
Liu, Chengfang, Renfu Luo, Scott Rozelle, and Linxiu Zhang. 2009. “Infrastructure 
Investment in Rural China: Is Quality Being Compromised During Quantity 
Expansion?” The China Journal 61: 105-129. 
Loeper, Antoine. 2017. “Cross-border Externalities and Cooperation Among 
Representative Democracies.” European Economic Review (91): 180-208. 
Martinez-Bravo, Monica, Gerard Padró i Miquel, Nancy Qian, and Yang Yao. 2014. 
“Political Reform in China: Elections, Public Goods and Income Distribution.” 
SSRN Working Paper.  
    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356343 
Martinez-Bravo, Monica, Gerard Padró I. Miquel, Nancy Qian, and Yang Yao. The Rise 
and Fall of Local Elections in China: Theory and Empirical Evidence on the 
Autocrat's Trade-off. No. w24032. National Bureau of Economic Research, 2017. 
Martinez-Bravo, Monica, Nancy Qian, and Yang Yao. 2011. “Do Local Elections in Non-
democracies Increase Accountability? Evidence from Rural China.” NBER Working 
Paper w16948.  
    https://www.nber.org/papers/w16948  
Martinez-Bravo, Monica, Nancy Qian, and Yang Yao. Elections in China. No. w18101. 
National Bureau of Economic Research, 2012. 
Menaldo, Victor. "The Fiscal Roots of Financial Underdevelopment." American Journal 
of Political Science 60, no. 2 (2016): 456-471. 
Milgrom, Paul, and John Roberts. 1986. “Price and Advertising Signals of Product 
Quality.” Journal of Political Economy 94 (4): 796-821. 
National People’s Congress of China. 1998. Organic Law of the Village Committees 
(OLVC).  
https://www.cecc.gov/resources/legal-provisions/organic-law-of-the-villagers-
committees-of-the-peoples-republic-of-china  
O'Brien, Kevin J., and Rongbin Han. 2009. "Path to Democracy? Assessing Village 
Elections in China." Journal of Contemporary China 18 (60): 359-378. 
O’Brien, Kevin J., and Lianjiang Li. 2000. “Accommodating ‘Democracy’ in A One-party 
State: Introducing Village Elections in China.” The China Quarterly 162: 465-489. 
O'Brien, Kevin J., and Suisheng Zhao, eds. 2014. Grassroots Elections in China. 
 106 
Abingdon: Routledge. 
Oi, Jean C., and Scott Rozelle. 2000. “Elections and Power: The Locus of Decision-
making in Chinese Villages.” The China Quarterly 162: 513-539.  
Oreopoulos, Philip. "Estimating Average and Local Average Treatment Effects of 
Education When Compulsory Schooling Laws Really Matter." American Economic 
Review 96, no. 1 (2006): 152-175. 
Padró i Miquel, Gerard, Nancy Qian, Yiqing Xu, and Yang Yao. 2015. “Making 
Democracy Work: Culture, Social Capital and Elections in China.” Social Capital 
and Elections in China. NBER Working Paper w21058. 
    https://www.nber.org/papers/w21058  
Pearl, Judea. Causality. Cambridge University Press, 2009. 
Qian, Mu. 2012. Chinese Political Gain and Losses During the Past Dynasties. Hong 
Kong, SAR: SDX Joint Publishing Company.  
Robins, James M., Miguel Angel Hernán, and Babette Brumback. "Marginal Structural 
Models and Causal Inference in Epidemiology." Epidemiology 11, no. 5 (2000): 551.  
Rosenzweig, Mark R., and Kenneth I. Wolpin. "Testing the Quantity-Quality Fertility 
Model: The Use of Twins as A Natural Experiment." Econometrica (1980): 227-240. 
Rosenzweig, Mark R., and Kenneth I. Wolpin. "Natural" Natural Experiments" in 
Economics." Journal of Economic Literature 38, no. 4 (2000): 827-874. 
Sienkewicz, Thomas J. 2003. Encyclopedia of the Ancient World. Pasadena, CA : Salem 
Press.  
Sovey, Allison J., and Donald P. Green. "Instrumental Variables Estimation in Political 
Science: A Readers’ Guide." American Journal of Political Science 55, no. 1 (2011): 
188-200. 
Spence, Michael. 1973. “Job Market Signaling.” The Quarterly Journal of Economics 87 
(3): 355-374. 
Stock, James H., and Francesco Trebbi. "Retrospectives: Who Invented Instrumental 
Variable Regression?". Journal of Economic Perspectives 17, no. 3 (2003): 177-194. 
Tang, Huangfeng. 2016. “New Meritocracy: The Democratization and Modernization of 
Bureaucratic Selection in China.” (In Chinese). Fudan Journal: Social Science 
Edition 4: 144-154.  
 107 
Todd, Petra E., and Kenneth I. Wolpin. "On the Specification and Estimation of the 
Production Function for Cognitive Achievement." The Economic Journal 113, no. 
485 (2003). 
Wang, Yu, and Marc F. Bellemare. "Lagged Variables as Instruments." (2019). 
Wong, Ho Lun, Yu Wang, Renfu Luo, Linxiu Zhang, and Scott Rozelle. 2017. “Local 
Governance and the Quality of Local Infrastructure: Evidence from Village Road 
Projects in Rural China.” Journal of Public Economics 152: 119-132. 
Yao, Yang, and Muyang Zhang. 2015. “Subnational Leaders and Economic Growth: 
Evidence from Chinese Cities.” Journal of Economic Growth 20 (4): 405-436. 
Zhang, Weiwei. 2012. “Meritocracy Versus Democracy.” The New York Times 
 
 
 
 
 
 
 108 
Appendices for Local Direct Elections, Local Information, 
and Meritocratic Selection 
A. Posterior Distribution of the Virtue and Capacity of Village Leader 
Candidates  
    A.1 Virtue: Following Koop et al. (2007) and Lancaster (2004), we derive the 
density kernel of the posterior distribution of virtue, such that  
(A1)           𝑝𝑝(𝛼𝛼𝑖𝑖|𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶 ) = 𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ [𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 ) … 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼) … 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 )]],      
where the vector 𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶  is [𝛺𝛺𝑖𝑖1𝛼𝛼 …𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 …𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 ]′. 
    First, we derive 𝛾𝛾(𝛼𝛼𝑖𝑖), the density kernel of the prior distribution of virtue. As the 
prior distribution is a normal distribution truncated at [0, 1], we have 
(A2)            𝛾𝛾(𝛼𝛼𝑖𝑖) = 𝛾𝛾(𝛼𝛼𝑖𝑖|0 ≤ 𝛼𝛼𝑖𝑖 ≤ 1,𝛼𝛼𝑖𝑖𝐸𝐸) 
                              = 1𝜎𝜎𝛼𝛼𝑒𝑒2 𝜙𝜙[𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ]
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) 
                              = �2𝜋𝜋𝜎𝜎𝛼𝛼𝑒𝑒2 �−12∙𝐸𝐸𝑒𝑒𝐴𝐴�−�𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖𝑒𝑒�22𝜎𝜎𝛼𝛼𝑒𝑒2 �
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒)  
                              ∝
𝐸𝐸𝑒𝑒𝐴𝐴 {−�𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖𝑒𝑒�2
2𝜎𝜎𝛼𝛼𝑒𝑒
2 }
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒), 
where 𝑃𝑃(𝛼𝛼𝑖𝑖|0 ≤ 𝛼𝛼𝑖𝑖 ≤ 1,𝛼𝛼𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) − 𝛷𝛷(−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) represents the probability that 𝛼𝛼𝑖𝑖 
is at [0, 1], contingent on 𝛼𝛼𝑖𝑖𝐸𝐸. 
      Second, we derive the joint density of 𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶  as a likelihood function given by 
(A3)             𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶) = ∏ �2𝜋𝜋𝜎𝜎𝜐𝜐𝛼𝛼2 �−12 ∙ 𝑒𝑒𝑒𝑒𝑝𝑝 �−�𝛺𝛺𝑖𝑖𝑡𝑡𝛼𝛼−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2 �𝑖𝑖𝑖𝑖=1  
                                      = �2𝜋𝜋𝜎𝜎𝜐𝜐𝛼𝛼2 �−𝑇𝑇2 ∙ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑ �𝛺𝛺𝑖𝑖𝑡𝑡𝛼𝛼−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2𝑖𝑖𝑖𝑖=1 � 
                                      ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �− 1
2𝜎𝜎𝜐𝜐𝛼𝛼
2 ∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛼𝛼𝑖𝑖)2𝑖𝑖𝑖𝑖=1 �                                            ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �− 1
2𝜎𝜎𝜐𝜐𝛼𝛼
2 ∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛺𝛺𝚤𝚤𝛼𝛼���� + 𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)2𝑖𝑖𝑖𝑖=1 � 
 ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �− 1
2𝜎𝜎𝜐𝜐𝛼𝛼
2 [∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛺𝛺𝚤𝚤𝛼𝛼����)2𝑖𝑖𝑖𝑖=1 + ∑ (𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)2𝑖𝑖𝑖𝑖=1 + 2∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛺𝛺𝚤𝚤𝛼𝛼����)(𝛺𝛺𝚤𝚤𝛼𝛼���� −𝑖𝑖𝑖𝑖=1
𝛼𝛼𝑖𝑖)�, 
 109 
where 𝛺𝛺𝚤𝚤𝛼𝛼���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼𝑖𝑖𝑖𝑖=1 . As ∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − Ω𝚤𝚤𝛼𝛼����)2𝑖𝑖𝑖𝑖=1 , the second-order moment, is a constant, 2∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − Ω𝚤𝚤𝛼𝛼����)(𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)𝑖𝑖𝑖𝑖=1 = 2(𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − Ω𝚤𝚤𝛼𝛼����)𝑖𝑖𝑖𝑖=1 = 0, we have 
(A4)      𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶) ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑ �𝛺𝛺𝚤𝚤𝛼𝛼����−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2𝑖𝑖𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−𝑁𝑁 �𝛺𝛺𝚤𝚤𝛼𝛼����−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2 �. 
    Third, the density kernel of the posterior distribution of virtue is 
(A5)    𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶)𝛾𝛾(𝛼𝛼𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴�−𝑖𝑖�𝛺𝛺𝚤𝚤𝛼𝛼�����−𝛼𝛼𝑖𝑖�
2
2𝜎𝜎𝜐𝜐𝛼𝛼
2 � 𝐸𝐸𝑒𝑒𝐴𝐴�
−�𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖
𝑒𝑒�
2
2𝜎𝜎𝛼𝛼𝑒𝑒
2 �
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒)  
                ∝
−
1
2
𝐸𝐸𝑒𝑒𝐴𝐴 { 𝑇𝑇
𝜎𝜎𝜐𝜐𝛼𝛼
2 �𝛺𝛺𝚤𝚤
𝛼𝛼����2−2𝛼𝛼𝑖𝑖𝛺𝛺𝚤𝚤
𝛼𝛼����+𝛼𝛼𝑖𝑖
2�+
1
𝜎𝜎𝛼𝛼𝑒𝑒
2 �𝛼𝛼𝑖𝑖
2−2𝛼𝛼𝑖𝑖
𝑒𝑒𝛼𝛼𝑖𝑖+𝛼𝛼𝑖𝑖
𝑒𝑒2�}
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒)  
                ∝
−
1
2
𝐸𝐸𝑒𝑒𝐴𝐴 { 1
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝜎𝜎𝛼𝛼𝑒𝑒
2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒
2
𝛼𝛼𝑖𝑖
2−2
𝜎𝜎
𝜐𝜐𝛼𝛼
2 𝜎𝜎
𝛼𝛼𝑒𝑒
2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒
2 �
1
𝜎𝜎𝛼𝛼𝑒𝑒
2 𝛼𝛼𝑖𝑖
𝑒𝑒+
𝑇𝑇
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝛺𝛺𝚤𝚤
𝛼𝛼������
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝜎𝜎𝛼𝛼𝑒𝑒
2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒
2
𝛼𝛼𝑖𝑖+
𝑇𝑇
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝛺𝛺𝚤𝚤
𝛼𝛼����2+
1
𝜎𝜎𝛼𝛼𝑒𝑒
2 𝛼𝛼𝑖𝑖
𝑒𝑒2}
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒)  
                ∝
−
1
2
𝐸𝐸𝑒𝑒𝐴𝐴 { 1
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝜎𝜎𝛼𝛼𝑒𝑒
2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒
2
𝛼𝛼𝑖𝑖
2−2
𝜎𝜎
𝜐𝜐𝛼𝛼
2 𝜎𝜎
𝛼𝛼𝑒𝑒
2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒
2 �
1
𝜎𝜎𝛼𝛼𝑒𝑒
2 𝛼𝛼𝑖𝑖
𝑒𝑒+
𝑇𝑇
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝛺𝛺𝚤𝚤
𝛼𝛼������
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝜎𝜎𝛼𝛼𝑒𝑒
2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒
2
𝛼𝛼𝑖𝑖}
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) . 
    Completing the square in the numerator of (A5), we have, 
(A6)             𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶) ∙ 𝛾𝛾(𝛼𝛼𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴⎩⎪⎨
⎪
⎧
−
1
2
[ 1
𝜎𝜎𝜐𝜐𝛼𝛼
2 𝜎𝜎𝛼𝛼𝑒𝑒
2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒
2
[𝛼𝛼𝑖𝑖−{ 𝜎𝜎𝜐𝜐𝛼𝛼2 𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒2 � 1𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝑒𝑒+ 𝑇𝑇𝜎𝜎𝜐𝜐𝛼𝛼2 𝛺𝛺𝚤𝚤𝛼𝛼�����}]2]
⎭
⎪
⎬
⎪
⎫
𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) , 
where 𝑃𝑃(𝛼𝛼𝑖𝑖|0 ≤ 𝛼𝛼𝑖𝑖 ≤ 1,𝛼𝛼𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) − 𝛷𝛷(−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) . Therefore, this density kernel is 
still a truncated normal distribution, in which the posterior variance of virtue is 
(A7)                                        𝛴𝛴(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2 𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2  
and the posterior mean of virtue is  
(A8)                  𝑆𝑆(𝛼𝛼𝑖𝑖) = 𝛴𝛴(𝛼𝛼𝑖𝑖) � 1𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜐𝜐𝛼𝛼2 𝛺𝛺𝚤𝚤𝛼𝛼����� = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛺𝛺𝚤𝚤𝛼𝛼����. 
    As  𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼~𝑁𝑁(𝛼𝛼𝑖𝑖,𝜎𝜎𝜐𝜐𝛼𝛼2 ) and 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 (𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴) → +∞, we have 
(A9)                                   𝛺𝛺𝚤𝚤𝛼𝛼���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼𝑖𝑖𝑖𝑖=1 = 𝐸𝐸(𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼) = 𝛼𝛼𝑖𝑖. 
  Thus, the posterior mean of virtue is  
(A10)                                𝑆𝑆(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖. 
 110 
 
    A.2 Capacity: Similar to virtue, we derive the density kernel of the posterior 
distribution of capacity, such that  
(A11)               𝑝𝑝�𝜃𝜃𝑖𝑖�𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 � = 𝛾𝛾(𝜃𝜃𝑖𝑖) ∙ [𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖1𝜃𝜃 �… 𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 �… 𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 �]], 
where the vector 𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼  is [𝛺𝛺𝑖𝑖1𝜃𝜃 …𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 …𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 ]. 
    First, we derive 𝛾𝛾(𝜃𝜃𝑖𝑖), the density kernel of the prior distribution of capacity. As the 
prior distribution is a normal distribution truncated at [0, 1], we have 
(A12)                             𝛾𝛾(𝜃𝜃𝑖𝑖) = 𝛾𝛾(𝜃𝜃𝑖𝑖|0 ≤ 𝜃𝜃𝑖𝑖 ≤ 1,𝜃𝜃𝑖𝑖𝐸𝐸) 
                                                 ∝
𝐸𝐸𝑒𝑒𝐴𝐴 {−�𝜃𝜃𝑖𝑖−𝜃𝜃𝑖𝑖𝑒𝑒�2
2𝜎𝜎𝜃𝜃𝑒𝑒
2 }
𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒), 
where 𝑃𝑃(𝜃𝜃𝑖𝑖|0 ≤ 𝜃𝜃𝑖𝑖 ≤ 1,𝜃𝜃𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) − 𝛷𝛷(−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) represents the probability that 𝜃𝜃𝑖𝑖 is 
at [0, 1], contingent on 𝜃𝜃𝑖𝑖𝐸𝐸. 
    Second, we derive the joint density of 𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼  as a likelihood function given by 
(A13)                 𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 � = ∏ �2𝜋𝜋𝜎𝜎𝜔𝜔𝜃𝜃2 �−12 ∙ 𝑒𝑒𝑒𝑒𝑝𝑝 �−�𝛺𝛺𝑖𝑖𝑡𝑡𝜃𝜃−𝜃𝜃𝑖𝑖�22𝜎𝜎
𝜔𝜔𝜃𝜃
2 �
𝑖𝑖
𝑖𝑖=1  
                                           ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑
�𝛺𝛺𝑖𝑖𝑡𝑡
𝜃𝜃−𝛺𝛺𝚤𝚤
𝜃𝜃����+𝛺𝛺𝚤𝚤
𝜃𝜃����−𝜃𝜃𝑖𝑖�
2
2𝜎𝜎
𝜔𝜔𝜃𝜃
2
𝑖𝑖
𝑖𝑖=1 � 
 ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 � −1
2𝜎𝜎
𝜔𝜔𝜃𝜃
2 [∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃�����2𝑖𝑖𝑖𝑖=1 + ∑ �𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�2𝑖𝑖𝑖𝑖=1 + 2∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃������𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�𝑖𝑖𝑖𝑖=1 �, 
where 𝛺𝛺𝚤𝚤𝜃𝜃���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃𝑖𝑖𝑖𝑖=1 . As ∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃�����2𝑖𝑖𝑖𝑖=1 , the second-order moment, is a constant, 2∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃������𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�𝑖𝑖𝑖𝑖=1 = 2�𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃�����𝑖𝑖𝑖𝑖=1 = 0, we have 
(A14)                  𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑ �𝛺𝛺𝚤𝚤𝜃𝜃����−𝜃𝜃𝑖𝑖�22𝜎𝜎
𝜔𝜔𝜃𝜃
2
𝑖𝑖
𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−𝑁𝑁
�𝛺𝛺𝚤𝚤
𝜃𝜃����−𝜃𝜃𝑖𝑖�
2
2𝜎𝜎
𝜔𝜔𝜃𝜃
2 �. 
    Third, the density kernel of the posterior distribution of capacity is 
(A15)                  𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 �𝛾𝛾(𝜃𝜃𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴�−𝑖𝑖�𝛺𝛺𝚤𝚤𝜃𝜃�����−𝜃𝜃𝑖𝑖�
2
2𝜎𝜎
𝜔𝜔𝜃𝜃
2 �exp�
−�𝜃𝜃𝑖𝑖−𝜃𝜃𝑖𝑖
𝑒𝑒�
2
2𝜎𝜎𝛼𝛼𝑒𝑒
2 �
𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒)  
                          ∝
−
1
2
𝐸𝐸𝑒𝑒𝐴𝐴 { 𝑇𝑇
𝜎𝜎
𝜔𝜔𝜃𝜃
2 �𝛺𝛺𝚤𝚤
𝜃𝜃����2−2𝜃𝜃𝑖𝑖𝛺𝛺𝚤𝚤
𝜃𝜃����+𝜃𝜃𝑖𝑖
2�+
1
𝜎𝜎𝜃𝜃𝑒𝑒
2 �𝜃𝜃𝑖𝑖
2−2𝜃𝜃𝑖𝑖
𝑒𝑒𝜃𝜃𝑖𝑖+𝜃𝜃𝑖𝑖
𝑒𝑒2�}
𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒)  
 111 
                            ∝
−
1
2
𝐸𝐸𝑒𝑒𝐴𝐴 { 1
𝜎𝜎
𝜔𝜔𝜃𝜃
2 𝜎𝜎𝜃𝜃𝑒𝑒
2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒
2
𝜃𝜃𝑖𝑖
2−2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 𝜎𝜎
𝜃𝜃𝑒𝑒
2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒
2 �
1
𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖
𝑒𝑒+
𝑇𝑇
𝜎𝜎
𝜔𝜔𝜃𝜃
2 𝛺𝛺𝚤𝚤
𝜃𝜃������
𝜎𝜎
𝜔𝜔𝜃𝜃
2 𝜎𝜎𝜃𝜃𝑒𝑒
2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒
2
𝜃𝜃𝑖𝑖}
𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒) . 
    Completing the square in the numerator of (A15), we have 
(A16)           𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 �𝛾𝛾(𝜃𝜃𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴⎩⎪⎨
⎪
⎧
−
1
2
[ 1
𝜎𝜎
𝜔𝜔𝜃𝜃
2 𝜎𝜎𝜃𝜃𝑒𝑒
2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒
2
[𝜃𝜃𝑖𝑖−{ 𝜎𝜎𝜔𝜔𝜃𝜃2 𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒
2 �
1
𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖
𝑒𝑒+
𝑇𝑇
𝜎𝜎
𝜔𝜔𝜃𝜃
2 𝛺𝛺𝚤𝚤
𝜃𝜃�����}]2]
⎭
⎪
⎬
⎪
⎫
𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒) , 
where 𝑃𝑃(𝜃𝜃𝑖𝑖|0 ≤ 𝜃𝜃𝑖𝑖 ≤ 1,𝜃𝜃𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) − 𝛷𝛷(−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) . Therefore, this density kernel is 
still a truncated normal distribution, in which the posterior variance of capacity is 
(A17)                                            𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2 𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2  
and the posterior mean of capacity is  
(A18)                  𝑆𝑆(𝜃𝜃𝑖𝑖) = 𝛴𝛴(𝜃𝜃𝑖𝑖) � 1𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜔𝜔𝜃𝜃2 𝛺𝛺𝚤𝚤𝜃𝜃����� = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 𝛺𝛺𝚤𝚤𝜃𝜃����. 
    As  𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃~𝑁𝑁(𝜃𝜃𝑖𝑖 ,𝜎𝜎𝜔𝜔𝜃𝜃2 ) and 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 (𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴) → +∞, we have 
(A19)                                  𝛺𝛺𝚤𝚤𝜃𝜃���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃𝑖𝑖𝑖𝑖=1 = 𝐸𝐸�𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃� = 𝜃𝜃𝑖𝑖. 
  Thus, the posterior mean of capacity is  
(A20)                                 𝑆𝑆(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖
𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒
2 𝜃𝜃𝑖𝑖 .
 112 
B. Comparison of the Expectation and Variance of the Competence of Village 
Leaders Before and After the Introduction of Local Direct Elections 
    B.1 Expectation: We compare, in a representative village, the expectation of the 
competence of the elected village leader and the appointed village leader. Specifically, 
we compare (B1)                    [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]𝐴𝐴𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 10 𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖
∫ ∫ 𝐴𝐴𝑖𝑖
𝐸𝐸𝐸𝐸𝑒𝑒 1
0 𝑑𝑑𝛼𝛼𝑖𝑖
1
0 𝑑𝑑𝜃𝜃𝑖𝑖
 
and 
(B2)                   [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 ≡ 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖) = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]𝐴𝐴𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴10 𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖
∫ ∫ 𝐴𝐴𝑖𝑖
𝐴𝐴𝐴𝐴𝐴𝐴1
0 𝑑𝑑𝛼𝛼𝑖𝑖
1
0 𝑑𝑑𝜃𝜃𝑖𝑖
. 
And we know that 
(B3)               𝐴𝐴𝑖𝑖
𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) = 𝜇𝜇𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) + (1 − 𝜇𝜇)𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖), 
where  
(B4)       𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) = 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + �1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖)𝜎𝜎𝛼𝛼𝑒𝑒2 � 𝛼𝛼𝑖𝑖 
                                     = 𝜎𝜎𝜐𝜐𝛼𝛼2
𝜎𝜎𝜐𝜐𝛼𝛼
2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖, 
(B5)       𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) = �𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + �1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 � 𝜃𝜃𝑖𝑖� 
                                     = 𝜎𝜎𝜔𝜔𝜃𝜃2
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖. 
    As 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 >  𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 , comparing the competence of the elected village leader and 
appointed village leader is equivalent to having 
(B6)                  𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖][𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010
∫ ∫ �𝜇𝜇𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝜃𝜃𝑖𝑖)� 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 , 
where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖 , 𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝜈𝜈𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖 ; 𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖 , 𝑦𝑦 ≡
𝜎𝜎𝜃𝜃𝑒𝑒
2 𝑖𝑖
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝜎𝜎𝜃𝜃𝑒𝑒
2 𝑖𝑖
. 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 𝐶𝐶𝑉𝑉 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. Then we need to prove that 
(B7)                                         𝜕𝜕𝜕𝜕
𝜕𝜕𝑖𝑖
= 𝜕𝜕𝜕𝜕
𝜕𝜕𝑒𝑒
∙
𝜕𝜕𝑒𝑒
𝜕𝜕𝑖𝑖
+ 𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
∙
𝜕𝜕𝜕𝜕
𝜕𝜕𝑖𝑖
> 0. 
    It is easy to prove that 𝜕𝜕𝑒𝑒
𝜕𝜕𝑖𝑖
> 0 and 𝜕𝜕𝜕𝜕
𝜕𝜕𝑖𝑖
> 0, so we only need to prove that 𝜕𝜕𝜕𝜕
𝜕𝜕𝑒𝑒
> 0 and 
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
> 0. 
    First, simplify the numerator of (B6), 
� � [𝜇𝜇𝛼𝛼𝑖𝑖 + (1 − 𝜇𝜇)𝜃𝜃𝑖𝑖]{𝜇𝜇𝑙𝑙[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]}𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1
0
1
0
 
 113 
=
∫ ∫ �
𝜇𝜇2𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]+𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖] + 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]� 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 , 
where 
(B8)                           ∫ ∫ 𝜇𝜇2𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
                           = ∫ ∫ 𝜇𝜇2𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸𝛼𝛼𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇2𝑙𝑙𝑒𝑒𝛼𝛼𝑖𝑖2𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  = 1
2
𝜇𝜇2𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 13 𝜇𝜇2𝑙𝑙𝑒𝑒, 
(B9)        ∫ ∫ (1 − 𝜇𝜇)2𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
         = ∫ ∫ (1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ (1 − 𝜇𝜇)2𝑙𝑙𝑦𝑦𝜃𝜃𝑖𝑖2𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  = 1
2
(1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 13 (1 − 𝜇𝜇)2𝑙𝑙𝑦𝑦, 
(B10)       ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
         = ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖𝑦𝑦𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
         = 1
2
𝜇𝜇(1 − 𝜇𝜇)(1 − 𝑦𝑦)𝑙𝑙𝜃𝜃𝑖𝑖𝐸𝐸 + 14 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝑦𝑦, 
(B11)       ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
         = ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝑒𝑒𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
         = 1
2
𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 14 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝑒𝑒. 
    Therefore, the numerator of (B6) becomes 
(B12) 1
2
𝜇𝜇𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 12 (1 − 𝜇𝜇)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 112 𝜇𝜇(𝜇𝜇 + 3)𝑙𝑙𝑒𝑒 + 112 (𝜇𝜇 − 4)(𝜇𝜇 − 1)𝑙𝑙𝑦𝑦. 
    Second, simplify the denominator of (B6), 
(B13)      ∫ ∫ {𝜇𝜇𝑙𝑙[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]}𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
         = ∫ ∫ 𝜇𝜇𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇𝑙𝑙𝑒𝑒𝛼𝛼𝑖𝑖 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
           +∫ ∫ (1 − 𝜇𝜇)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ (1 − 𝜇𝜇)𝑙𝑙𝑦𝑦𝜃𝜃𝑖𝑖 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010  
         = 𝜇𝜇𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 12 𝜇𝜇𝑙𝑙𝑒𝑒 + (1 − 𝜇𝜇)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 12 (1 − 𝜇𝜇)𝑙𝑙𝑦𝑦, 
    Accordingly, (B6) becomes 
(B14)        𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = 12𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(𝜇𝜇+3)𝐸𝐸𝑒𝑒+ 112(𝜇𝜇−4)(𝜇𝜇−1)𝐸𝐸𝜕𝜕
𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 , 
where the first order derivative of the numerator of (B14) with respect to 𝑒𝑒  is 
−
1
2
𝜇𝜇𝑙𝑙𝛼𝛼𝑖𝑖
𝐸𝐸 + 1
12
𝜇𝜇(𝜇𝜇 + 3)𝑙𝑙, and the first order derivative of the denominator of (B14) 
 114 
with respect to 𝑒𝑒 is −𝜇𝜇𝑙𝑙𝛼𝛼𝑖𝑖𝐸𝐸 + 12 𝜇𝜇𝑙𝑙. 
    Then, we can obtain 
(B15)              𝜕𝜕𝜕𝜕
𝜕𝜕𝑒𝑒
= 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸𝜕𝜕𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(2𝜇𝜇−1)𝐸𝐸𝜕𝜕
�𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2  
                            = 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2𝜇𝜇�1−𝛼𝛼𝑖𝑖𝑒𝑒�+2𝛼𝛼𝑖𝑖𝑒𝑒−1�
�𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 . 
    To ensure that 𝜕𝜕𝜕𝜕
𝜕𝜕𝑒𝑒
> 0, we need to have 𝜇𝜇(1 − 𝜇𝜇)(2𝜇𝜇 − 1) ≥ 0, as shown in the first 
step of (B15), that is, 1 ≥ 𝜇𝜇 ≥ 1
2
 or 𝜇𝜇 = 0 . However, when 0 < 𝜇𝜇 < 1
2
 , we need to 
rearrange the items of the numerator in (B15), as shown in the second step of (B15), 
so that when we have 2𝜇𝜇(1 − 𝛼𝛼𝑖𝑖𝐸𝐸) + 2𝛼𝛼𝑖𝑖𝐸𝐸 − 1 ≥ 2𝜇𝜇(1 − 𝛼𝛼𝑖𝑖𝐸𝐸) ≥ 0, that is, 𝛼𝛼𝑖𝑖𝐸𝐸 ≥ 12, we 
have 𝜕𝜕𝜕𝜕
𝜕𝜕𝑒𝑒
> 0 . In other words, 𝜕𝜕𝜕𝜕
𝜕𝜕𝑒𝑒
> 0 holds, when 1 ≥ 𝜇𝜇 ≥ 1
2
 , or 𝜇𝜇 = 0 , or both 0 <
𝜇𝜇 < 1
2
 and 𝛼𝛼𝑖𝑖𝐸𝐸 ≥
1
2
. 
    Now we discuss 𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
. The first order derivative of the numerator of (B14) with respect 
to 𝑦𝑦 is = −1
2
(1 − 𝜇𝜇)𝑙𝑙 ∙ 𝜃𝜃𝑖𝑖𝐸𝐸 + 112 (𝜇𝜇 − 4)(𝜇𝜇 − 1)𝑙𝑙, and the first order derivative of the 
denominator of (B14) with respect to 𝑦𝑦 is −(1 − 𝜇𝜇)𝑙𝑙 ∙ 𝜃𝜃𝑖𝑖𝐸𝐸 + 12 (1 − 𝜇𝜇)𝑙𝑙. 
    Then, we can obtain 
(B16)     𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
= 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸𝑒𝑒𝜃𝜃𝑖𝑖𝑒𝑒+ 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(1−2𝜇𝜇)𝐸𝐸𝑒𝑒
�𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2  
                   = 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)𝐸𝐸𝑒𝑒�1−2𝜇𝜇+2𝜇𝜇∙𝜃𝜃𝑖𝑖𝑒𝑒�
�𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 . 
    To ensure that 𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
> 0, we need to have 𝜇𝜇(1 − 𝜇𝜇)(1 − 2𝜇𝜇) ≥ 0, as shown in the first 
step of (B16), that is, 0 ≤ 𝜇𝜇 ≤ 1
2
 or 𝜇𝜇 = 1 . However, when 1
2
< 𝜇𝜇 < 1 , we need to 
rearrange the items of the numerator in (B16), as shown in the second step of (B16), 
so that when we have 1 − 2𝜇𝜇 + 2𝜇𝜇 ∙ 𝜃𝜃𝑖𝑖𝐸𝐸 ≥ 1 − 2𝜇𝜇 + 𝜇𝜇 ≥ 0, that is, 𝜃𝜃𝑖𝑖𝐸𝐸 ≥ 12, we have 
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
> 0. In other words, 𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
> 0 holds, when 0 ≤ 𝜇𝜇 ≤ 1
2
, or 𝜇𝜇 = 1, or both 1
2
< 𝜇𝜇 < 1 
and 𝜃𝜃𝑖𝑖𝐸𝐸 ≥
1
2
. 
    By proving that 𝜕𝜕𝜕𝜕
𝜕𝜕𝑒𝑒
> 0 and 𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
> 0 , we prove that 𝜕𝜕𝜕𝜕
𝜕𝜕𝑖𝑖
> 0 . As 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 >  𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 , we 
know that 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖). 
 115 
    In summary, when 𝜇𝜇 = 0, 1, 1
2
 , we have [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖) ≡[𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 . When 0 < 𝜇𝜇 < 1
2
 , by assuming that 𝛼𝛼𝑖𝑖𝐸𝐸 ≥
1
2
 , we have [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡
𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖) ≡ [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 . When 12 < 𝜇𝜇 < 1 , by assuming that 𝜃𝜃𝑖𝑖𝐸𝐸 ≥ 12 , we 
have [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖) ≡ [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴.  
   
    B.2 Variance: We then compare, in a representative village, the variance of the 
competence of the elected village leader and appointed village leader. We know that 
(B17)                𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[(𝜋𝜋𝑖𝑖)2] − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)2[(𝜋𝜋𝑖𝑖)]. 
    We first define 𝑔𝑔(𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)) ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[(𝜋𝜋𝑖𝑖)2] and we have 
(B18)                𝑔𝑔�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]2[𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010
∫ ∫ [𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 , 
where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖 , 𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖 , 𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝐶𝐶𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖 , and 𝑦𝑦 ≡
𝜎𝜎𝜃𝜃𝑒𝑒
2 𝑖𝑖
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝜎𝜎𝜃𝜃𝑒𝑒
2 𝑖𝑖
. 
    First, we simplify the numerator of 𝑔𝑔(∙).  
(B19)    ∫ ∫
[𝜇𝜇2𝑙𝑙𝛼𝛼𝑖𝑖2 + (1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖2 + 2𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖] ∙{𝜇𝜇𝑙𝑙[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]}𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010  
= � � � 𝜇𝜇3𝑙𝑙𝛼𝛼𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + 𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖2[(1− 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]+𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)3𝑙𝑙𝜃𝜃𝑖𝑖2[(1− 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]+2𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + 2𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]�𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖
1
0
1
0
 
, where 
(B20)   ∫ ∫ 𝜇𝜇3𝑙𝑙𝛼𝛼𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 = 13 𝜇𝜇3𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 14 𝜇𝜇3𝑙𝑙𝑒𝑒, 
(B21)               ∫ ∫ 𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖2[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖  
                        = 1
3
𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 16 𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝑦𝑦, 
(B22)               ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖  
                         = 1
3
𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 16 𝜇𝜇𝑙𝑙(1 − 𝜇𝜇)2𝑒𝑒, 
(B23)                ∫ ∫ (1 − 𝜇𝜇)3𝑙𝑙𝜃𝜃𝑖𝑖2[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖  
                         = 1
3
(1 − 𝜇𝜇)3𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 14 (1 − 𝜇𝜇)3𝑙𝑙𝑦𝑦, 
(B24)                 ∫ ∫ 2𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖  
                         = 1
2
𝜇𝜇2(1 − 𝜇𝜇)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 13 𝜇𝜇2(1 − 𝜇𝜇)𝑙𝑙𝑒𝑒, 
 116 
(B25)                 ∫ ∫ 2𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 
                          = 1
2
𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 13 𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝑦𝑦. 
    The numerator of 𝑔𝑔(∙) becomes 1
6
𝜇𝜇(𝜇𝜇2 − 𝜇𝜇 + 2)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 16 (1 − 𝜇𝜇)(𝜇𝜇2 − 𝜇𝜇 +2)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 112 𝜇𝜇(𝜇𝜇2 + 2)𝑙𝑙𝑒𝑒 + 112 (1 − 𝜇𝜇)(𝜇𝜇2 − 2𝜇𝜇 + 3)𝑙𝑙𝑦𝑦. Therefore, we have 
(B26) 𝑔𝑔(∙) =
1
6
𝜇𝜇�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+16(1−𝜇𝜇)�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇�𝜇𝜇2+2�𝐸𝐸𝑒𝑒+ 112(1−𝜇𝜇)�𝜇𝜇2−2𝜇𝜇+3�𝐸𝐸𝜕𝜕
𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 . 
    Then, we define 𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[𝜋𝜋𝑖𝑖]  and we know that 
𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = 12𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(𝜇𝜇+3)𝐸𝐸𝑒𝑒+ 112(𝜇𝜇−4)(𝜇𝜇−1)𝐸𝐸𝜕𝜕
𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕  from (B14). 
    From (B17), we know that  
(B27)                               𝜕𝜕𝑉𝑉𝑉𝑉𝑉𝑉(∙)
𝜕𝜕𝑖𝑖
= 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
− 2𝑓𝑓(∙) ∙ 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
. 
    It is known that 𝑓𝑓(∙) > 0, 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
> 0. We then need to derive 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
= 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑒𝑒
∙
𝜕𝜕𝑒𝑒
𝜕𝜕𝑖𝑖
+ 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝜕𝜕
∙
𝜕𝜕𝜕𝜕
𝜕𝜕𝑖𝑖
. It is known that 𝜕𝜕𝑒𝑒
𝜕𝜕𝑖𝑖
> 0, 𝜕𝜕𝜕𝜕
𝜕𝜕𝑖𝑖
> 0, therefore, we have 
(B28)            𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑒𝑒
= 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸𝜕𝜕𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(2𝜇𝜇−1)𝐸𝐸𝜕𝜕[𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕]2 , 
(B29)        𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝜕𝜕
= 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸𝑒𝑒𝜃𝜃𝑖𝑖𝑒𝑒+ 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(1−2𝜇𝜇)𝐸𝐸𝑒𝑒
�𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 , 
  Thus,    
𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
= 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸𝜕𝜕𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(2𝜇𝜇−1)𝐸𝐸𝜕𝜕
�𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 ∙ 𝜕𝜕𝑒𝑒𝜕𝜕𝑖𝑖  
             + 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸𝑒𝑒𝜃𝜃𝑖𝑖𝑒𝑒+ 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(1−2𝜇𝜇)𝐸𝐸𝑒𝑒
�𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 ∙ 𝜕𝜕𝜕𝜕𝜕𝜕𝑖𝑖. 
    It is easy to prove that 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
= 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
. In addition, from (B14), we can deduce that 
𝑓𝑓(∙) > 1
2
, as the numerator of 𝑓𝑓(∙) is twice as large as the denominator of 𝑓𝑓(∙). As a 
result, we can deduce from (B27) that 𝜕𝜕𝑉𝑉𝑉𝑉𝑉𝑉(∙)
𝜕𝜕𝑖𝑖
= 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
− 2𝑓𝑓(∙) ∙ 𝜕𝜕𝜕𝜕(∙)
𝜕𝜕𝑖𝑖
< 0 . As 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, we know that 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) < 𝑉𝑉𝑉𝑉𝑉𝑉𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖).
 117 
C. Comparison of the Expectation and the Variance of the Competence of Village 
Party Secretaries Before and After the Introduction of Local Direct Elections 
    C.1 Expectation: We compare, in a representative village, the expectation of the 
competence of the village party secretary, of whom part of the candidates, namely the 
Type I village party branch members, are directly elected by local village residents, and 
that expectation of the village party secretary, of whom part of the candidates, namely the 
Type I village party branch members, are appointed by township officials.  
    Note that 𝜋𝜋𝑗𝑗 ≡ 𝜇𝜇𝛼𝛼𝑗𝑗 + (1 − 𝜇𝜇)𝜃𝜃𝑗𝑗 , 𝜋𝜋?̃?𝚥 ≡ 𝜇𝜇𝛼𝛼?̃?𝚥 + (1 − 𝜇𝜇)𝜃𝜃?̃?𝚥 , 𝜋𝜋𝑗𝑗𝑢𝑢 ≡ 𝜇𝜇𝛼𝛼𝑗𝑗𝑢𝑢 + (1 − 𝜇𝜇)𝜃𝜃𝑗𝑗𝑢𝑢 , and 
𝜋𝜋?̃?𝚥
𝑢𝑢 ≡ 𝜇𝜇𝛼𝛼?̃?𝚥
𝑢𝑢 + (1 − 𝜇𝜇)𝜃𝜃?̃?𝚥𝑢𝑢. Specifically, we compare (C1)                      [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝑒𝑒𝑉𝑉𝑗𝑗𝐸𝐸𝐸𝐸𝑒𝑒10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�
∫ 𝑉𝑉𝑗𝑗
𝐸𝐸𝐸𝐸𝑒𝑒1
0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥�
1
0 𝑑𝑑𝜋𝜋𝚥𝚥�
 
and (C2)                      [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 = 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐴𝐴𝐴𝐴𝐴𝐴𝑉𝑉𝑗𝑗𝐴𝐴𝐴𝐴𝐴𝐴10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�
∫ 𝑉𝑉𝑗𝑗
𝐴𝐴𝐴𝐴𝐴𝐴1
0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥�
1
0 𝑑𝑑𝜋𝜋𝚥𝚥�
. 
    In a general way, the expectation of the competence of the village party secretary is (C3)                             [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆 = 𝔼𝔼(𝜋𝜋𝑗𝑗,?̃?𝚥) = ∫ 𝜋𝜋𝑗𝑗𝑉𝑉𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�
∫ 𝑉𝑉𝑗𝑗
1
0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥�
1
0 𝑑𝑑𝜋𝜋𝚥𝚥�
, 
where 
(C4)                              𝑅𝑅𝑗𝑗 = 𝜇𝜇𝜇𝜇 � 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗� 
                                          +(1 − 𝜇𝜇)𝜇𝜇 � 𝜎𝜎𝜂𝜂2
𝜎𝜎𝜂𝜂
2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2 𝜃𝜃𝑗𝑗
𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2
𝜎𝜎𝜂𝜂
2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2 𝜃𝜃𝑗𝑗�, 
where 𝜑𝜑𝛼𝛼 ≡
𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢
2
𝜎𝜎𝜀𝜀
2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢
2 , 𝜑𝜑𝜃𝜃 ≡
𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2
𝜎𝜎𝜂𝜂
2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢
2 . 
    Because 𝑚𝑚 → ∞, we have 𝑝𝑝 lim
𝑗𝑗→∞
𝜑𝜑𝛼𝛼 = 𝑝𝑝 lim
𝑗𝑗→∞
𝜑𝜑𝜃𝜃 ≡ 𝜑𝜑. Therefore, 
(C5)                𝑅𝑅𝑗𝑗 = 𝜇𝜇𝜇𝜇�(1 − 𝜑𝜑)𝛼𝛼𝑗𝑗𝑢𝑢 + 𝜑𝜑𝛼𝛼𝑗𝑗� + (1 − 𝜇𝜇)𝜇𝜇�(1 − 𝜑𝜑)𝛼𝛼𝑗𝑗𝑢𝑢 + 𝜑𝜑𝛼𝛼𝑗𝑗� 
                             = (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜑𝜑𝜇𝜇𝜋𝜋𝑗𝑗. 
    Similarly,  
(C6)                                    𝑅𝑅?̃?𝚥 = (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢 + 𝜑𝜑𝜇𝜇𝜋𝜋?̃?𝚥. 
    As a result, we have (C7)    𝔼𝔼(𝜋𝜋𝑗𝑗,?̃?𝚥) = ∫ 𝜋𝜋𝑗𝑗�(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝑗𝑗�10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥��(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝚥𝚥��10 𝑑𝑑𝜋𝜋𝚥𝚥�
∫ �(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝑗𝑗�10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ �(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝚥𝚥��10 𝑑𝑑𝜋𝜋𝚥𝚥�  
 118 
        = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢 ∫ 𝜋𝜋𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝑗𝑗210 𝑑𝑑𝜋𝜋𝑗𝑗+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢 ∫ 𝜋𝜋𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝚥𝚥�210 𝑑𝑑𝜋𝜋𝚥𝚥�(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� . 
    ∫ 𝜋𝜋𝑗𝑗
1
0
𝑑𝑑𝜋𝜋𝑗𝑗 ≡ 𝐸𝐸(𝜋𝜋𝑗𝑗)  represents the expectation of the competence of all village 
committee members, who are the Type-I village party branch members. As discussed in 
Sections 2.3.A. and 2.3.B., the inferences and selection of each village committee member, 
including the village leader as representative, are homogenous. Therefore, the expectation 
of the competence of each village committee member is homogenous. In other words, 
𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗), the expectation of the competence, is the same for each 𝑗𝑗𝑘𝑘. As a result, 𝐸𝐸�𝜋𝜋𝑗𝑗� =
𝐸𝐸�𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗) � = 𝔼𝔼1(𝜋𝜋1), where 𝑗𝑗 = 1 represents the village leader as representative of all 
village committee members, and 𝔼𝔼1(𝜋𝜋1) represents the expectation of the competence of 
the village leader, such that  
(C8)          𝔼𝔼1(𝜋𝜋1) ≡ 𝑓𝑓(𝑁𝑁) ≡  𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖][𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010
∫ ∫ �𝜇𝜇𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝜃𝜃𝑖𝑖)� 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 , 
where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖,𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝜈𝜈𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖;𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖,𝑦𝑦 ≡ 𝜎𝜎𝜃𝜃𝑒𝑒2 𝑖𝑖𝜎𝜎𝜔𝜔𝜃𝜃2 +𝜎𝜎𝜃𝜃𝑒𝑒2 𝑖𝑖.  
    Similarly, ∫ 𝜋𝜋𝑗𝑗2
1
0
𝑑𝑑𝜋𝜋𝑗𝑗 ≡ 𝐸𝐸(𝜋𝜋𝑗𝑗2) represents the expectation of the square competence of 
all village committee members. As discussed in Sections 2.3.A. and 2.3.B., the inferences 
and selection of each village committee member, including the village leader as 
representative, are homogenous. Therefore, the expectation of the square competence of 
each village committee member is homogenous. In other words, 𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗2), the expectation 
of the square competence, is the same for each 𝑗𝑗 . As a result, 𝐸𝐸�𝜋𝜋𝑗𝑗2� = 𝐸𝐸�𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗2) � =
𝔼𝔼1(𝜋𝜋12) , where 𝑗𝑗 = 1  represents the village leader as representative of all village 
committee members, and 𝔼𝔼1(𝜋𝜋12) represents the expectation of the square competence of 
the village leader, such that  
(C9)         𝔼𝔼1(𝜋𝜋12) ≡ 𝑔𝑔(𝑁𝑁) ≡ 𝑔𝑔�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]2[𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 ∫ ∫ [𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 , 
where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖 , 𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖 , 𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝐶𝐶𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖 , and 𝑦𝑦 ≡
𝜎𝜎𝜃𝜃𝑒𝑒
2 𝑖𝑖
𝜎𝜎
𝜔𝜔𝜃𝜃
2 +𝜎𝜎𝜃𝜃𝑒𝑒
2 𝑖𝑖
. 
    As for the Type-II village party branch members, they are appointed by the 
 119 
representative township official all the time, and we can denote the times of natural 
communication between the representative township official and each Type-II village 
party branch member candidate as 𝑁𝑁�  . Therefore, similar to the derivations above, 
∫ 𝜋𝜋?̃?𝚥
1
0
𝑑𝑑𝜋𝜋?̃?𝚥 = 𝑓𝑓(𝑁𝑁�), ∫ 𝜋𝜋?̃?𝚥210 𝑑𝑑𝜋𝜋?̃?𝚥 = 𝑔𝑔(𝑁𝑁�). 
    (C7) becomes  
(C10)                               
(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕(𝑖𝑖)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)  
    From Appendix B.2., it is easy to deduce that 𝑓𝑓(𝑁𝑁) > 𝑔𝑔(𝑁𝑁) and 𝑓𝑓′(𝑁𝑁) = 𝑔𝑔′(𝑁𝑁). We can 
also derive that 𝑓𝑓�𝑁𝑁�� ≤ 1 and 𝑔𝑔�𝑁𝑁�� ≤ 𝑓𝑓�𝑁𝑁��, and thus we know that (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢𝑓𝑓�𝑁𝑁�� +
𝜑𝜑𝜇𝜇𝑔𝑔�𝑁𝑁�� ≤ (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢 + 𝜑𝜑𝜇𝜇𝑓𝑓�𝑁𝑁�� . Therefore, it is easily derived that the first order 
derivative of (C10) with respect to 𝑁𝑁 is greater than 0. 
    We know that  
(C11)                𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕�𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒�+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒�+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒�+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)  
and that 
(C12)                𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕�𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴�+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴�(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴�+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�) .     Because 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 , we know that [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� =[𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴. 
 
    C.2 Variance: We compare, in a representative village, the variance of the competence 
of the village party secretary, of whom part of the candidates, namely the Type I village 
party branch members, are directly elected by local village residents, and that variance of 
the village party secretary, of whom part of the candidates, namely the Type I village party 
branch members, are appointed by township officials. Specifically, we compare (C13)                     𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸2��𝜋𝜋𝑗𝑗,?̃?𝚥�� 
and (C14)                      𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) = 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴2��𝜋𝜋𝑗𝑗,?̃?𝚥�� 
    In a general way, the variance of the competence of the village party secretary is 
(C15)                        𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋) = 𝔼𝔼 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼2��𝜋𝜋𝑗𝑗,?̃?𝚥�� 
 120 
    As is shown in Appendix C.1.,  
(C16)                    𝔼𝔼(𝜋𝜋𝑗𝑗,?̃?𝚥) = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕(𝑖𝑖)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)  
    Therefore, the numerator of 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋), by calculation, is 
(C17)      𝐴𝐴 ≡ (1 − 𝜑𝜑)2𝜇𝜇2𝜋𝜋𝑗𝑗𝑢𝑢�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑔𝑔(𝑁𝑁) − 𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2𝜋𝜋𝑗𝑗𝑢𝑢𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁) +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 − 2𝜋𝜋?̃?𝚥𝑢𝑢�𝑓𝑓�𝑁𝑁��𝑔𝑔(𝑁𝑁) + (1 − 𝜑𝜑)2𝜇𝜇2𝜋𝜋?̃?𝚥𝑢𝑢�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑔𝑔�𝑁𝑁�� +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋?̃?𝚥𝑢𝑢 − 2𝜋𝜋𝑗𝑗𝑢𝑢�𝑓𝑓(𝑁𝑁)𝑔𝑔�𝑁𝑁�� − 𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2𝜋𝜋?̃?𝚥𝑢𝑢𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁�� +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�ℎ(𝑁𝑁) + 𝜑𝜑2𝜇𝜇2𝑓𝑓(𝑁𝑁)ℎ(𝑁𝑁) + 𝜑𝜑2𝜇𝜇2𝑓𝑓�𝑁𝑁��ℎ(𝑁𝑁) +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�ℎ�𝑁𝑁�� + 𝜑𝜑2𝜇𝜇2𝑓𝑓(𝑁𝑁)ℎ�𝑁𝑁�� + 𝜑𝜑2𝜇𝜇2𝑓𝑓�𝑁𝑁��ℎ�𝑁𝑁�� 
−(1 − 𝜑𝜑)2𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢�2𝑓𝑓2(𝑁𝑁) − 𝜑𝜑2𝜇𝜇2𝑔𝑔2(𝑁𝑁) 
−(1 − 𝜑𝜑)2𝜇𝜇2�𝜋𝜋?̃?𝚥𝑢𝑢�2𝑓𝑓2�𝑁𝑁�� − 𝜑𝜑2𝜇𝜇2𝑔𝑔2�𝑁𝑁�� 
−2(1 −𝜑𝜑)2𝜇𝜇2𝜋𝜋𝑗𝑗𝑢𝑢𝜋𝜋?̃?𝚥𝑢𝑢𝑓𝑓(𝑁𝑁)𝑓𝑓�𝑁𝑁�� − 2𝜑𝜑2𝜇𝜇2𝑔𝑔(𝑁𝑁)𝑔𝑔�𝑁𝑁�� 
and the denominator of 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋), by calculation, is 
(C18)      𝐵𝐵 ≡ �(1 −𝜑𝜑)𝜇𝜇𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜑𝜑𝜇𝜇𝑓𝑓(𝑁𝑁) + (1 −𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢 + 𝜑𝜑𝜇𝜇𝑓𝑓�𝑁𝑁���2 = (1 − 𝜑𝜑)2𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�2 + 𝜑𝜑2𝜇𝜇2𝑓𝑓2(𝑁𝑁) + 𝜑𝜑2𝜇𝜇2𝑓𝑓2�𝑁𝑁�� +2𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑓𝑓(𝑁𝑁) + 2𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑓𝑓�𝑁𝑁�� +2𝜑𝜑2𝜇𝜇2𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁) 
where 
(C19)    𝑓𝑓(𝑁𝑁) =  12𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(𝜇𝜇+3)𝐸𝐸𝑒𝑒+ 112(𝜇𝜇−4)(𝜇𝜇−1)𝐸𝐸𝜕𝜕
𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕  
(C20)    𝑔𝑔(𝑁𝑁) =  
      
1
6
𝜇𝜇�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+16(1−𝜇𝜇)�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇�𝜇𝜇2+2�𝐸𝐸𝑒𝑒+ 112(1−𝜇𝜇)�𝜇𝜇2−2𝜇𝜇+3�𝐸𝐸𝜕𝜕
𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕  
(C21)    ℎ(𝑁𝑁) =  
 
1
4
𝜇𝜇(𝜇𝜇2−𝜇𝜇+1)𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒−14(𝜇𝜇3−2𝜇𝜇2+2𝜇𝜇−1)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 1120𝜇𝜇(4𝜇𝜇3+10𝜇𝜇2−5𝜇𝜇+15)𝐸𝐸𝑒𝑒+ 1120(4𝜇𝜇4−26𝜇𝜇3+49𝜇𝜇2−51𝜇𝜇+24)𝐸𝐸𝜕𝜕
𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕  
  It is easily to see that 
(C22)            0 < ℎ(𝑁𝑁) < 𝑔𝑔(𝑁𝑁) < 1
2
< 𝑓𝑓(𝑁𝑁) < 1 
(C23)            0 < ℎ�𝑁𝑁�� < 𝑔𝑔�𝑁𝑁�� < 1
2
< 𝑓𝑓�𝑁𝑁�� < 1 
 121 
(C24)            0 < ℎ′(𝑁𝑁) < 𝑔𝑔′(𝑁𝑁) = 𝑓𝑓′(𝑁𝑁) < 1 
(C25)            0 < ℎ′�𝑁𝑁�� < 𝑔𝑔′�𝑁𝑁�� = 𝑓𝑓′�𝑁𝑁�� < 1 
    It is assumed that 𝑁𝑁 > 𝑁𝑁� , so we have 𝑓𝑓(𝑁𝑁) > 𝑓𝑓�𝑁𝑁��, 𝑔𝑔(𝑁𝑁) > 𝑔𝑔�𝑁𝑁��, ℎ(𝑁𝑁) > ℎ�𝑁𝑁��, 
    Taking derivative with respect to 𝑁𝑁, we have 𝜕𝜕𝑉𝑉𝑉𝑉𝑉𝑉
𝑉𝑉𝑉𝑉𝑉𝑉(𝜋𝜋)
𝜕𝜕𝑖𝑖
= 𝐴𝐴′𝐵𝐵−𝐵𝐵′𝐴𝐴
𝐵𝐵2
, where 𝐴𝐴′𝐵𝐵 − 𝐵𝐵′𝐴𝐴 
is 
−2𝜇𝜇4𝑓𝑓2(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) 
−2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓′(𝑁𝑁) 
−4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓(𝑁𝑁)𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑔𝑔2�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2�𝑁𝑁��𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +𝜇𝜇4𝑓𝑓3(𝑁𝑁)ℎ′(𝑁𝑁) + 𝜇𝜇4𝑓𝑓3�𝑁𝑁��ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓2(𝑁𝑁)ℎ′(𝑁𝑁) 
−𝜇𝜇4𝑓𝑓2(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4ℎ�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) 
−𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) 
where lim
𝑖𝑖,𝑖𝑖�→1𝜑𝜑 = 1. 
    Because of (C22) – (C25), and 𝑓𝑓(𝑁𝑁) > 𝑓𝑓�𝑁𝑁��, 𝑔𝑔(𝑁𝑁) > 𝑔𝑔�𝑁𝑁��, ℎ(𝑁𝑁) > ℎ�𝑁𝑁��, it is easily 
derived that in 𝐴𝐴′𝐵𝐵 − 𝐵𝐵′𝐴𝐴,  
−2𝜇𝜇4𝑓𝑓2(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) 
−2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓′(𝑁𝑁) 
−4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓(𝑁𝑁)𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑔𝑔2�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2�𝑁𝑁��𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) < 0 
    However, it is ambiguous whether  +𝜇𝜇4𝑓𝑓3(𝑁𝑁)ℎ′(𝑁𝑁) + 𝜇𝜇4𝑓𝑓3�𝑁𝑁��ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓2(𝑁𝑁)ℎ′(𝑁𝑁) 
−𝜇𝜇4𝑓𝑓2(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4ℎ�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) 
−𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) 
is positive or negative. Accordingly, it is also ambiguous whether the derivative of 
𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋)  with respect to 𝑁𝑁  is negative or not, that is, it is ambiguous whether 
 122 
introducing the local direct election reduces the variance of the competence of the village 
party secretary or not. 
 
 123 
Appendices of Lagged Variables as Instruments 
A. Derivation of 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) 
    In Scenario 1 and 2, following the appendix of Bellemare et al. (2017), we have, 
given the equations (3.2) and (3.3), or (3.17) and (3.18), the expression that 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶 �1𝜌𝜌𝑋𝑋𝑖𝑖 − 𝜅𝜅𝜌𝜌𝑈𝑈𝑖𝑖 − 1𝜌𝜌 𝜂𝜂𝑖𝑖 , 1𝜙𝜙𝑈𝑈𝑖𝑖 − 1𝜙𝜙 𝜐𝜐𝑖𝑖� (A. 1) 
    Then we have 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 1𝜙𝜙𝜌𝜌 [𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)] 
which yields 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) = 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) (A. 2) 
    Since 𝜌𝜌,𝜙𝜙 ∈ (0,1), both 𝑋𝑋 and 𝑈𝑈 are mean-reverting series, that is, the covariance 
between 𝑋𝑋 and 𝑈𝑈 does not depend on 𝑋𝑋. In other words, asymptotically, we have 
𝑝𝑝 lim
𝑖𝑖→∞
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝑝𝑝 lim
𝑖𝑖→∞
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 3) 
    Therefore, (A.2) becomes 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) = 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 4) 
implying that 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈)1 − 𝜙𝜙𝜌𝜌 (A. 5) 
    In Scenario 3, similarly, we have, given the equations (3.26) and (3.27), the 
expression that 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶 �1𝜌𝜌𝑋𝑋𝑖𝑖 − 𝜅𝜅𝜌𝜌𝑈𝑈𝑖𝑖 − 1𝜌𝜌 𝜂𝜂𝑖𝑖, 1𝜙𝜙𝑈𝑈𝑖𝑖 − 𝜓𝜓𝜙𝜙𝑋𝑋𝑖𝑖−1 − 1𝜙𝜙 𝜐𝜐𝑖𝑖� (A. 6) 
    Then we have 
            𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 1
𝜙𝜙𝜌𝜌
[𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) + 𝜅𝜅𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)] (A. 7) 
which yields 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) + 𝜅𝜅𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)= 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) (A. 8) 
    Similarly, since 𝜌𝜌,𝜙𝜙 ∈ (0,1) , both 𝑋𝑋 and 𝑈𝑈 are mean-reverting series, that is, the 
covariance between 𝑋𝑋 and 𝑈𝑈 does not depend on 𝑋𝑋. In other words, asymptotically, we 
have 
 124 
𝑝𝑝 lim
𝑖𝑖→∞
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝑝𝑝 lim
𝑖𝑖→∞
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 9) 
    We also know that 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝜌𝜌𝑋𝑋𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖 + 𝜂𝜂𝑖𝑖,𝑋𝑋𝑖𝑖−1) = 𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) + 𝜅𝜅𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) 
and that 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝜙𝜙𝑈𝑈𝑖𝑖−1 + 𝜓𝜓𝑋𝑋𝑖𝑖−1 + 𝜐𝜐𝑖𝑖) = 𝜙𝜙 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) + 𝜓𝜓𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) 
    Therefore, (A.8) becomes 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) − 𝜓𝜓𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) = 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 9) 
implying that 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) + 𝜓𝜓𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋)1 − 𝜙𝜙𝜌𝜌 (A. 10) 
    Therefore, 
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑋𝑋−1,𝑈𝑈𝑋𝑋)
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑋𝑋−1) = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) + 𝜙𝜙𝜓𝜓𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) + 𝜓𝜓                          = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) + 𝜓𝜓1 − 𝜙𝜙𝜌𝜌 (A. 11) 
 
 
 
 
 
 
 
 125 
B. Effectiveness of the Lagged IV Across Sample Sizes 
 
    The likelihood of type-I error will be arbitrarily close to one for a large enough sample 
set. With a growing availability of large data sets, using lagged IV could result in the 
likelihood of type-I error almost one. In this sense, with a large enough sample set, even 
if the lagged IV estimate is consistent when it only violates the independence assumption, 
the unavoidable type-I error still makes the lagged IV method problematic. 
    To testify the effectiveness of the lagged IV method in terms of the likelihood of type-
I error, we run a series of simulations with ten times of individuals as in the simulations 
in Section 3.4. Our simulation follows the same data generating process (DGPs) as in 
section 3.3. In each simulation, we generate a panel with 𝑁𝑁 = 50 periods and 𝑁𝑁 = 1000 
cross-section units, for a total of 50,000 observations. 
    As is shown in Figure A1, the likelihood of type-I error gets closer to one, compared to 
the what is shown in Figure 2. This implies that as the sample size is enlarged, type-I error 
gets arbitrarily close to one. Similar implication of the patterns of type-I error likelihood 
is shown in Figure A2 to A6, as comparisons of the patterns of type-I error likelihood in 
Figure 3, 5, 6, 8 and 9. 
 
 126 
 
Figure 3.A1. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000
 127 
  
Figure 3.A2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000
 128 
  
Figure 3.A3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Explained Variable
 129 
  
Figure 3.A4. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Explained Variable
 130 
 
Figure 3.A5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Unobserved Confounder 
 131 
  
Figure 3.A6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Unobserved Confounder 
 
 132 
Appendices of Spatially Lagged Variables as Instruments: 
Spatially Local Average Treatment Effect (SLATE) in 
Estimation 
 
A. Derivations of Simpler Cases of PROPOSITION 1 and PROPOSITION 2 
 
    Proposition 1 states that when there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
unobserved confounders, the spatially lagged IV estimate is unbiased and consistent. 
Suppose 𝜇𝜇𝑖𝑖𝑗𝑗 > 0  when observation 𝑖𝑖  and observation 𝑗𝑗  neighbor to each other, 
otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0; in this sense, the proof of Proposition 1 is simpler. 
 
Proof of PROPOSITION 1: Using 𝑾𝑾𝑿𝑿 , the spatial weighting matrix, as the 
instrumental variables, namely the spatially lagged IV, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 
    It is known that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor to each 
other, otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. Denote 𝑖𝑖 = 𝚤𝚤̃ and 𝑗𝑗 = 𝚥𝚥̃ when observation 𝑖𝑖 and observation 
𝑗𝑗 don’t neighbor to each other, in other words, 𝜇𝜇?̃?𝚤?̃?𝚥 = 0. 
    Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that    
 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝑉𝑉1
⋮
𝑉𝑉𝑗𝑗
⋮
𝑉𝑉𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
 133 
+��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
+��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . 
As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when observation 𝑖𝑖 and observation 𝑗𝑗 don’t neighbor to each 
other, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 
1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . 
Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other 
words, the spatially lagged IV estimate is unbiased and consistent. 
 
    Proposition 2 states that when there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
unobserved confounders, and there exists no inter-regional correlation between the 
explanatory variables and the disturbances in the spatial autocorrelation of the 
explanatory variables themselves either, the spatially lagged IV estimate is unbiased 
and consistent. Suppose 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor to 
each other, otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0; in this sense, the proof of Proposition 2 is simpler. 
 
PROPOSITION 2: Using 𝑾𝑾𝑿𝑿 , the spatial weighting matrix, as the instrumental 
variables, namely the spatially lagged IV, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 
                +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 
 134 
    As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, 
it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0.  
    It is known that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor to each 
other, otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. Denote 𝑖𝑖 = 𝚤𝚤̃ and 𝑗𝑗 = 𝚥𝚥̃ when observation 𝑖𝑖 and observation 
𝑗𝑗 don’t neighbor to each other, in other words, 𝜇𝜇?̃?𝚤?̃?𝚥 = 0. 
    Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is 
known that     
𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜂𝜂1
⋮
𝜂𝜂𝑗𝑗
⋮
𝜂𝜂𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
                        +��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
                        +��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . 
As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when observation 𝑖𝑖 and observation 𝑗𝑗 don’t neighbor to each 
other, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 
2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . 
Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similar derivation shows that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜼𝜼
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other 
words, the spatially lagged IV estimate is unbiased and consistent. 
 
 135 
B. Proof of COROLLARY 3 and COROLLARY 4 
 
Proof of COROLLARY 3: Given the three waves of implementation of the treatment, the 
spatial weighting matrix is asymmetric, that is, 𝑾𝑾 ≠𝑾𝑾′. Using the spatially lagged IV 
method, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾′𝜸𝜸 
      Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that    
 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝑉𝑉1
⋮
𝑉𝑉𝑗𝑗
⋮
𝑉𝑉𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
+ � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
 
+��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑉𝑉
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑉𝑉
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=𝑆𝑆
𝐾𝐾
𝑘𝑘=1
 
= � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
+ � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
 
+��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑉𝑉
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑉𝑉
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁
𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=𝑆𝑆
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑆𝑆−1𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0. 
In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗
𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0 , ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑉𝑉𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑉𝑉𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0  and 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗
𝑁𝑁
𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=𝑆𝑆𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) = 0 , and 
 136 
thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0 . Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿 , in other 
words, the spatially lagged IV estimate is unbiased and consistent. ∎ 
 
Proof of COROLLARY 4: Given the three waves of implementation of the treatment, the 
spatial weighting matrix is asymmetric, that is, 𝑾𝑾 ≠𝑾𝑾′. Using the spatially lagged IV 
method, it is derived that 
𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 
                +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 
    As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it 
is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜸𝜸
𝑛𝑛
→ 0.  
    Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , it is 
known that       
 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜼𝜼) =
⎣
⎢
⎢
⎢
⎡
𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1
⋮
𝑒𝑒1𝑘𝑘
⋮
⋱ ⋮ ⋮
⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮
⋮ ⋮ ⋱
⋮
𝑒𝑒𝑁𝑁𝑘𝑘
⋮
𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1
⋮
𝜇𝜇1𝑗𝑗
⋮
⋱ ⋮ ⋮
⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮
⋮ ⋮ ⋱
⋮
𝜇𝜇𝑁𝑁𝑗𝑗
⋮
𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥
⎥
⎤
∙
⎣
⎢
⎢
⎢
⎡
𝜂𝜂1
⋮
𝜂𝜂𝑗𝑗
⋮
𝜂𝜂𝑁𝑁⎦
⎥
⎥
⎥
⎤ 
= ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=1
�
𝑁𝑁
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
 
= � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
+ � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
 
+��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑉𝑉
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑉𝑉
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=𝑆𝑆
𝐾𝐾
𝑘𝑘=1
 
 137 
= � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
+ � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �
𝑆𝑆−1
𝑗𝑗=𝑉𝑉+1
𝐾𝐾
𝑘𝑘=1
 
+��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑉𝑉
𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �
𝑉𝑉
𝑗𝑗=1
𝐾𝐾
𝑘𝑘=1
+ ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁
𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �
𝑁𝑁
𝑗𝑗=𝑆𝑆
𝐾𝐾
𝑘𝑘=1
 
    As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑆𝑆−1𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0. 
In addition, as Assumption 2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗
𝑆𝑆−1
𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0 , ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑉𝑉𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑉𝑉𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0  and 
∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗
𝑁𝑁
𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=𝑆𝑆𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜼𝜼) = 0 , and 
thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. 
    Similarly, it is also known that 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿′𝑾𝑾𝜼𝜼
𝑛𝑛
→ 0. Therefore, 𝑝𝑝 lim
𝑛𝑛→∞
𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other 
words, the spatially lagged IV estimate is unbiased and consistent. ∎