Three Essays on Political Economy and the Methods A DISSERTATION SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Yu Wang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Advisor: Marc F. Bellemare April 2020 © Yu Wang 2020 ALL RIGHTS RESERVED i Acknowledgements I would like to thank my Ph.D. advisor, Prof. Marc F. Bellemare, whose advising and instruction have influenced and benefited me tremendously. He has offered numerous invaluable instructions and helps for my job market paper, which was also my second- year paper. In co-authoring with me writing the second essay in my dissertation, he has offered me insightful and critical ideas on how commonly seen empirical strategies addresses problems in identification, such as endogeneity. He has also provided me invaluable academic support so that I could have a good start in econometric theory. My deep appreciation also goes to the opportunity of summer research assistantship that he offered, his recommendation in my job market and other application for the graduate assistantship, as well as the suggestions on job market he provided so that I could prepare to be a job market candidate very early. I’m deeply grateful to my Ph.D. dissertation committee members: Prof. Jay Coggins, Prof. John Freeman and Prof. Paul Glewwe. My heart-felt appreciation goes to their invaluable academic instruction on the three essays in my dissertation, their recommendation in my job market and other application for the graduate assistantship, as well as their suggestions on my job market. I also thank Prof. Steve Miller, Prof. Elton Mykerezi, Prof. Terry Roe and Prof. Sean Sylvia for invaluable comments and suggestions on my essays. I’m also deeply grateful to my master advisor, Prof. Renfu Luo from Peking University. My sincere appreciation goes to his advising and instruction on my publication, my job market paper, and other papers in progress or in plan. I would sincerely thank his insightful ideas on Chinese economy. With his ideas, I’m holding strong confidence in Chinese economic development, and devoted to telling a good story about China in my research. I would like to thank my supervisors at MPC, Dr. Tracy Kugler, Dr. David Van Riper and Dr. Jonathan Schroeder for their instruction on data management. I would also thank other professors who have supervised my graduate research assistantship and graduate teaching assistantship, Prof. Ragui Assaad, Prof. Chengyan Yue and Prof. Jeff Apland. ii I would like to thank our DGSs, Prof. Joe Ritter and Prof. Rodney Smith, our graduate coordinator, Jenna Mead and Gary Cooper, for their academic supports and suggestions. I would thank Elaine Reber for uploading recommendation letters for me. I would thank Linda Eells for helping me looking for literature. I would like to thank all professors that have taught me with courses in our department, as well as professors in other departments, especially Colleen Meyers for Practicum in University Teaching for Nonnative English Speakers (Grad 5105), Prof. Tim Kehoe for International Trade (Econ 8401) and Prof. Mikhail Safonov for Math Analysis (Math 5615H & 5616H). I would like to thank Sebastian Anti, Haseed Ali, Yuan Chai, Xiangwen Kong, Qingxiao Li, Bixuan Sun, Huichun Sun, Berenger Djoumessi Tiague, Khoa Vu, Jingjing Wang, Yanghao Wang, Zhiyu Wang, Shuoli Zhao, and all other friends in our department. I would also thank my friends from other universities who have helped me: Yongdong Liu (Assistant Professor, UCL), Liangjie Wu (Ph.D. Candidate, U Chicago). I would like to thank Prof. Chris Blattman, Prof. Dave Donaldson, Prof. Christopher Neilson, Prof. Nathan Nunn, Prof. David Zilberman, and all other professors who have provided me with invaluable academic suggestions for my research. My deepest appreciation and love go to my wife Fan, my daughter Grace, my son Vincent, and my parents. iii Dedication To my wife, Fan, and my two kids, Grace and Vincent. iv Abstract This dissertation consists of three essays regarding political economy and the theoretical discussion of two empirical methods. Chapter 2 (Essay 1) discusses how the introduction of local direct elections, by providing local information, facilitates the fulfillment of the meritocratic selection of local leaders. Using the Bayesian framework, this paper finds that because local residents, the voters, communicate with the local leader candidates of more times than upper officials do, local residents infer each local leader candidate’s virtue or capacity more accurately and precisely. This paper then shows that due to the higher accuracy, the expectation of the competence (a weighted average of virtue and capacity) of the elected local leader is higher than that of the appointed leader; due to the higher precision, the variance of the competence of the elected local leader is lower than that of the appointed leader. Chapter 3 (Essay 2) discusses the lagged IV method, namely using the lagged endogenous explanatory variable as its instrumental variable (IV). This paper starts with a conceptual framework, and then conducts the numerical analysis. It shows that when the lagged IV only violates the independence assumption, the lagged IV estimate is consistent, and has lower bias than the OLS estimate; however, when the lagged IV violates both the independence assumption and the exclusion restriction, the lagged IV estimate is inconsistent, and has much higher bias than the OLS estimate. The simulation results support the numerical analysis. Chapter 4 (Essay 3) discusses the spatially lagged IV method, namely using the spatially lagged endogenous explanatory variable, namely the spatial weighting matrix, as its instrumental variable (IV). This paper introduces the spatially local average treatment effect (SLATE) theorem, which consists of two key properties: the spatial independence assumption and the spatial exclusion restriction. This paper demonstrates that when the spatially lagged IV satisfies the spatial independence assumption and the spatial exclusion restriction, its estimate is unbiased and consistent. Even if the treatment has multiple waves of implementation, the spatially lagged IV is still valid. v Contents Acknowledgements ..................................................................................................................... i Dedication ................................................................................................................................. iii Abstract ..................................................................................................................................... iv Contents ..................................................................................................................................... v List of Tables ............................................................................................................................ vi List of Figures .......................................................................................................................... vii 1. Introduction ........................................................................................................................ 1 2. Local Direct Elections, Local Information, and Meritocratic Selection ............................ 3 2.1. Introduction ......................................................................................................................... 3 2.2. Local Governance in Rural China ....................................................................................... 6 2.3. Meritocratic Selection with the Improved Inference Effectiveness .................................... 8 2.4. Meritocratic Selection with the Improved Candidate Pool ............................................... 22 2.5. Concluding Remarks ......................................................................................................... 28 3. Lagged Variables as Instruments ...................................................................................... 32 3.1. Introduction ....................................................................................................................... 32 3.2. Theoretical Framework ..................................................................................................... 36 3.3. Numerical Analysis ........................................................................................................... 42 3.4. Simulation Analysis .......................................................................................................... 50 3.5. Conclusion ........................................................................................................................ 56 4. Spatially Lagged Variables as Instruments: The Spatially Local Average Treatment Effect (SLATE) in Estimation ............................................................................................................ 69 4.1. Introduction ....................................................................................................................... 69 4.2. Theoretical Framework ..................................................................................................... 71 4.3. The Numerical Spatially Local Average Treatment Effects (SLATE) ............................. 80 4.4. The Dynamic Spatially Local Average Treatment Effects (SLATE) ............................... 91 4.5. Conclusion ...................................................................................................................... 100 References .............................................................................................................................. 102 Appendices for Local Direct Elections, Local Information, and Meritocratic Selection ....... 108 Appendices of Lagged Variables as Instruments ................................................................... 123 Appendices of Spatially Lagged Variables as Instruments: Spatially Local Average Treatment Effect (SLATE) in Estimation ............................................................................................... 132 vi List of Tables Table 3.1. Reviewed Journals Published in 2013-2018, Using Lagged IV Methods .............. 58 Table 3.2. Simulation Parameters ............................................................................................ 59 vii List of Figures Figure 2.1. Inference Accuracy and Precision with Natural Communication Times .............. 31 Figure 3.1. Representation of Monte Carlo Simulation Setup ............................................... 60 Figure 3.2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5 ..................... 61 Figure 3.3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5 ..................... 62 Figure 3.4. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑋𝑋 − 1 Also Has Causal Effects on 𝑌𝑌𝑋𝑋 ...................................................................................................................................... 63 Figure 3.5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on Explained Variable................................................................................................................. 64 Figure 3.6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on Explained Variable................................................................................................................. 65 Figure 3.7. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑋𝑋 − 1 Also Has Causal Effects on 𝑈𝑈𝑋𝑋 ...................................................................................................................................... 66 Figure 3.8. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on Unobserved Confounder ........................................................................................................ 67 Figure 3.9. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on Unobserved Confounder ........................................................................................................ 68 Figure 3.A1. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000 ............................................................................................................................................. 126 Figure 3.A2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000 ............................................................................................................................................. 127 Figure 3.A3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Explained Variable ......................................................................................... 128 Figure 3.A4. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Explained Variable ......................................................................................... 129 Figure 3.A5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Unobserved Confounder ................................................................................. 130 Figure 3.A6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Unobserved Confounder ................................................................................. 131 1 1. Introduction Discuss political issues is usually indispensable in economic research. This is because as production scale and social complexity rise, it is impossible to ignore or underestimate the role that the government or other authorities play. One of the core issues in political economy is political selection, yet most previous studies emphasize accountability, including political incentives and monitoring. Although ex post accountability is of great importance, ex ante political selection aiming at selecting politicians with the most competence, also deserves sufficient academic attention. Therefore, to gain greater insight into achieving a better governance, I choose to study political selection. As for methodology, I noticed that the instrumental variable (IV) method is popular among applied social science studies. However, it is difficult to find a valid IV, thus most researchers look for alternative IVs like the lagged IV or the spatially lagged IV. Few theoretical studies, however, have discussed the validity of these IVs in detail. Therefore, chapter 3 and 4 will theoretically discuss these two methods in detail. Chapter 2: This paper studies the relationship between local direct elections and meritocratic selection through the mechanism of local information. This paper’s theoretical model, based on the Bayesian inference framework, shows that local direct elections in rural China facilitate the meritocratic selection of both village committee members and village party secretaries. Local direct elections transfer the authority for selecting village committee members, from township officials to village residents. Because village residents, compared to township officials, have advantages in the local information on village committee candidates, they infer those candidates’ virtue and capacity more accurately and precisely. The introduction of local direct elections, with such improved inference effectiveness, enhances the expectations of the competence of village committee members, while reducing the variances of that competence. Further, some or all village committee members are also the candidates for village party secretaries. Therefore, with such improved candidate pools, the expectations of the competence of village party secretaries are also enhanced, yet the variances are ambiguously changed. 2 Chapter 3: Lagged explanatory variables remain commonly used as instrumental variables (IVs) to address endogeneity concerns in empirical studies with observational data. Few theoretical studies, however, address whether “lagged IVs” mitigate endogeneity. We develop a structural model in which dynamics among the endogenous explanatory variable and the unobserved confounders cannot be ruled out, and look at the endogeneity of lagged IV estimates. We then use Monte Carlo simulations to illustrate our analytical findings. We show that in the discussion of the Local Average Treatment Effect, when the lagged IVs only violate the independence assumption, the lagged IV method mitigates the endogeneity problem by yielding consistent estimates in which the biases are smaller than those in the OLS estimates. However, when the lagged IVs violate both the independence assumption and the exclusion restriction, the lagged IV method cannot mitigate, but even aggravate, the endogeneity, by yielding inconsistent estimates in which the biases are much greater than those in the OLS estimates. Chapter 4: Spatially lagged variables, more standardly, the spatial weighting matrices, are commonly used as instrumental variables to address the endogeneity in estimation, yet theoretical discussion about whether spatially lagged variables are valid instruments is lacking. In light of the Local Average Treatment Effects Theorem, this paper introduces the Spatially Local Average Treatment Effects (SLATE) Theorem to discuss such validity. This theorem demonstrates (1) the spatial independence assumption, that there is no inter-regional correlation between the endogenous explanatory variables and the disturbances in the spatial autocorrelation of either the unobserved confounders or the endogenous explanatory variables, and (2) the spatial exclusion restriction, that the spatially lagged IV has neither direct nor indirect causal impact on the explained variable. This paper’s theory shows that typical spatial weighting matrices serving as the spatially lagged IVs satisfy the spatial independence assumption, yielding unbiased and consistent estimates; however, if those matrices violate the spatial exclusion restriction, the estimates are biased and inconsistent. This paper also discusses the dynamic spatially local average treatment effect (SLATE), and shows that the spatially lagged IV method is acceptable even if the treatment involves multiple waves of implementation. 3 2. Local Direct Elections, Local Information, and Meritocratic Selection1 YU WANG2 AND RENFU LUO3 “When the Grand course was pursued, a public and common spirit ruled all under the sky; they chose men of talents, virtue, and ability; their words were sincere, and what they cultivated was harmony.” – Confucius (450 BC, translated by James Legge [1885]), Li Chi: Book of Rites. “The aim of every political constitution, is or ought to be, first to obtain for rulers men who possess most wisdom to discern, and most virtue to pursue, the common good of society.” – Hamilton, Madison and Jay (1788 [2008]), The Federalist Papers. 2.1. Introduction Meritocratic selection has been pursued around the world since ancient times. Since around 500 BC, Chinese politicians and philosophers have argued that those who govern should be selected by merit rather than inherited status (Sienkewicz, 2003). When the concept of meritocracy spread to Europe and the U.S., it was favored by philosophers (Kazin et al., 2009) and advocated in political statements (Hamilton, Madison and Jay, 2008). 1 This research has been supported by funding from the National Natural Science Foundation of China (Grant No.71873008). The authors declare that they have no relevant or material financial interests that relate to the research described in this paper. We would like to thank Marc Bellemare, Loren Brandt, Jay Coggins, Paul Glewwe, John Freeman, Elton Mykerezi, Terry Roe, and Sean Sylvia for their valuable comments and suggestions. All errors are ours. 2 Wang: Department of Applied Economics, University of Minnesota (email: wang5979@umn.edu). 3 Luo: China Center for Agricultural Policies, School of Advanced Agricultural Sciences, Peking University (e- mail: luorf.ccap@pku.edu.cn). 4 China has developed a series of top-down political selection schemes that emphasize the assessment, recommendation, and promotion of politicians based on their virtue and capacity, aiming at ex-ante meritocratic selection4. However, these top-down political selection schemes may suffer from adverse selection, thus their effects on the meritocratic selection of politicians could be limited. In contrast to China’s top-down schemes, elections—bottom-up political selection schemes—were established in ancient Western regimes and gradually prevailed 5. However, most previous studies emphasize elections’ role in addressing moral hazard—that is, facilitating the ex-post accountability of politicians (Laffont, 2001; Besley, 2005). This paper studies how local direct elections address adverse selection and facilitate the meritocratic selection of politicians by providing local information. Chinese local governance is a typical political selection context that is characterized as a stratified governance structure with several small-scale organizations on a grassroots level. The introduction of local direct elections to Chinese local governance enables an institutional comparison to identify the relationship between, and the mechanism of, local direct elections and meritocratic selection. Specifically, after local direct elections were introduced, village leaders (the small organizations’ executive leaders) and other village committee members6 were no longer appointed by township officials, but directly elected by village residents. We build a theoretical model showing that local direct elections facilitate the meritocratic selection of all village committee members by providing more local information on the virtue and capacity of village committee candidates. Village party secretaries (the small organizations’ highest leaders), who supervise the village committees, are still appointed by township officials. Even so, our theoretical model shows that local direct elections of village committee members facilitate the meritocratic selection of village party secretaries 4 Around 134 BC, an assessment and recommendation system for noble families was established (Qian, 2012). Following the expansion of enfranchisement, the civil examination system of scholars was developed around 605 AD. This system prevailed for more than 1,000 years and greatly influenced the political selection schemes of China and other countries (Elman, 2013; Bai and Jia, 2016; Bell, 2016). 5 Around 508 BC, Athenian democracy was established. Through the expansion of enfranchisement, this electoral system has evolved into the modern representative democracy (Loeper, 2017). 6 In the following, village committee members include village leaders, who are chairs of village committees, and other village committee members. 5 because some or all village committee members are candidates and are thus likely to be appointed village party secretaries by township officials. The essential mechanism through which local direct elections work in the meritocratic selection of politicians is local voters’ advantages in local information on political candidates. Our model uses the Bayesian inference framework to frame this. This is different from studies focusing on the strategic behaviors between politicians and voters7 and studies using the game theory framework. The introduction of local direct elections allows village residents rather than township officials to select village committee members. Village residents naturally communicate with village committee candidates more often than with township officials, implying that village residents have an advantage in obtaining local information about these candidates (Ghatak, 1999; Bell, 2016). Therefore, village residents can infer the virtue and capacity of village committee candidates more accurately and precisely than township officials can.8 In other words, the inference effectiveness is improved by local direct elections. Due to village residents’ advantages in local information, local direct elections that empower them address adverse selection by facilitating the meritocratic selection of village committees. With more accurate inferences, our theoretical model proves that in a representative village, the expected competence (a weighted average of virtue and capacity) of each elected village committee member is greater than that of each appointed village committee member. Our model also proves, with the more precise inference, that the variance in the competence of each elected village committee member is smaller than that of each appointed village committee member. These theoretical findings of improved inference effectiveness are in line with Hayek (1945) and Chan (2013), who found that assessment and decisions should be left to people with local information advantages. Aggregating the local information on political candidates, local direct elections further facilitate the meritocratic selection of superior politicians, who are promoted by appointment based on their performance, in a stratified governance structure. More specifically, our model shows that by improving the candidate pools of village party 7 See Laffont (2001) and Besley (2006) for theoretical demonstrations, Ferraz and Finan (2011) and De Janvry et al. (2012) for empirical evidence, and Bell (2016) for demonstrations in political science. 8 In the Bayesian inference, “more accurately” means that the posterior mean of virtue (or capacity) is closer to the real value of virtue (or capacity), whereas “more precisely” means that the posterior variance of virtue (or capacity) becomes smaller. These are discussed in detail in Section 2.3. 6 secretaries, local direct elections facilitate the meritocratic selection of village party secretaries, the highest village officials, and the chairs of village party branches. Some or all village committee members, including village leaders, are also village party branch members and are therefore candidates to become village party secretaries (O’Brien and Li, 2000). Upon observing their performance in village affairs, village party branch members are likely to be appointed village party secretaries by township officials. In a representative village, local direct election enhances the expected competence of each village committee member. In other words, the candidate pool for the village party secretary is improved. Our model shows that the expected competence of the village party secretary also increases; yet the variance of the competence of the village party secretary is changed ambiguously. These theoretical findings of improved candidate pools imply that the local information provided by local direct elections benefits bottom-up local political selection directly, yet such local information benefits top-down political selection in higher governance ladders indirectly and limitedly. The remainder of this paper is organized as follows. Section 2.2 introduces the institutional background. Section 2.3 develops the theory of meritocratic selection with improved inference effectiveness. Section 2.4 develops the theory of meritocratic selection with improved candidate pools. And Section 2.5 concludes the paper. 2.2. Local Governance in Rural China The administrative organizations of Chinese villages consist of two committees. Village committees, which are de facto government entities at the village level, are chaired by village leaders and composed of other members. Village party branches, which represent the village-level leadership of the Chinese Communist Party (CCP), are chaired by village party secretaries and composed of other members. In practice, some or all village committee members, especially village leaders, are also members of village party branches. Likewise, some or all village party branch members are also members of village committees. Village party branches oversee village committees; thus, village party secretaries are superior to village leaders in the governance hierarchy, as stipulated by the Organic Law of Village Committees (OLVC) (National People’s Congress of China, 1998) and the Working Regulation for Rural Grassroots Organizations of the Chinese Communist Party (Central Committee of the Chinese Communist Party, 1999). Village 7 committees are responsible for providing village infrastructure and public services, developing the local economy, and improving village residents’ income (National People’s Congress of China, 1998; Martinez-Bravo et al., 2011). The role of village party branches in the development of the local economy is to approve village committees’ plans and to monitor their implementation (Oi and Rozelle, 2000). The likelihood of being selected as either a village committee member or a village party branch member is positively associated with both the candidate’s virtue and her capacity (Bell, 2016; Tang, 2016); this is rooted in the concept and practice of meritocratic selection in Chinese history (Zhang, 2012). The selection of village committee members, including village leaders, requires candidates to be law-abiding, have moral integrity, be intrinsically motivated to serve village residents, and have a diploma and administrative capacity (National People’s Congress of China, 1998). The selection of village party branch members, including village party secretaries, requires candidates to have professional knowledge and skills, to be responsive to the needs and demands of the village residents, and to be intrinsically motivated to serve them (Central Committee of the Chinese Communist Party, 1999). The selection scheme for village leaders and other village committee members has changed: previously, they were appointed by township officials, but they are now directly elected by village residents. The establishment of local direct elections in rural China has been a gradual process (Martinez-Bravo et al., 2014). 9 By 2010, most villages had introduced local direct elections for village leaders and other village committee members (Padró i Miquel et al., 2015; Wong et al., 2017). In contrast, village party secretaries and other village party branch members are still appointed by township officials (Central Committee of the Chinese Communist Party, 1999). The affairs of village party branches are comprehensively supervised by the township officials of the township party branches10. By observing the performance of 9 In 1987, the National People’s Congress of China passed the OLVC, which stipulated that village leaders and other village committee members were to be elected. In 1998, the National People’s Congress of China passed a revised OLVC that introduced local direct elections for village leaders and other village committee members in rural China, resulting in the election of village leaders and other village committee members through open nomination and competitive elections (O’Brien and Han, 2009). After the national legislation in 1998, each province in China introduced its own Provincial Measures for Implementing the Organic Law of Village Committees to provide additional instructions on the implementation of local direct elections (O’Brien and Zhao, 2014). Counties and townships followed (Wong et al., 2017). 10 Township officials’ supervision duties include, but are not limited to, whether to set a village party branch, how 8 each village party branch candidate (village committee members and other village residents, both party members and non-party members), township officials decide whether to appoint them village party branch members11. If a village party branch candidate is not yet a party member, township officials can decide to make her a party member first and then appoint her a village party branch member. 2.3. Meritocratic Selection with the Improved Inference Effectiveness Our theoretical model based on the Bayesian inference framework investigates how the introduction of local direct election to a representative village facilitates the meritocratic selection of village committee members. The mechanism is that as the introduction of local direct election provides more local information on each village committee candidate, the inference effectiveness of the virtue and the capacity of each candidate is improved. A. Inferences on Village Committee Candidates In this section, we use the Bayesian inference framework to discuss assumptions about the virtue and capacity of village committee candidates in a representative village. Because the village leader is the chair of the village committee, we only discuss the village leader and consider her the representative of all village committee members. Thus, our theoretical findings in this section also apply to other village committee members. We find that because the representative village resident has an advantage in terms of local information about village leader candidates—that is, she naturally communicates with the village leader candidates more often than with the representative township official—her inferences of the virtue and capacity of these candidates are more accurate and precise. By more accurate we mean the posterior to select party members in each village, how to appoint village party branch members, and how to appoint one village party branch member to be village party secretary. Although elections are conducted among all village party members to select village party branch members, election procedures, including the nomination process and stipulating election standards, are directly led by township officials (Central Committee of the Chinese Communist Party, 1999). 11 Although village party branch members are stipulated elected by village party members, the whole election procedure are directly led by township officials (Central Committee of the Chinese Communist Party, 1999). Therefore, it is regarded that village party branch members are appointed by township officials (O’Brien and Li, 2000). 9 mean of the virtue or capacity gets closer to its real value, and by more precise we mean the posterior variance of the virtue or capacity gets smaller. In a word, the inference effectiveness is improved. 1. Setup: We consider a representative village in which all adult village residents are potential village leader candidates. Each resident has two personal characteristics, virtue and capacity,12 both of which are assumed to be independent and identically distributed on [0, 1] with a mean of 0.5.13 After the introduction of local direct elections in this representative village, a pool of village leader candidates, a subset of all potential candidates, competes to be elected village leader by the village residents. Before this introduction, the pool of village leader candidates competed to be appointed village leader by the township officials. The virtue of village leader candidate 𝑖𝑖 is denoted by 𝛼𝛼𝑖𝑖 , with 𝛼𝛼𝑖𝑖 ∈ [0,1], and her capacity is denoted by 𝜃𝜃𝑖𝑖 , with 𝜃𝜃𝑖𝑖 ∈ [0,1] , where 𝑖𝑖 = {1,2, … } . In the following analysis, we discuss one representative village resident or township official instead of village residents or township officials, assuming that village residents or township officials have homogenous inferences of each candidate for the village committee, including the village leader.14 2. Bayesian Inferences: The representative village resident or township official cannot directly obtain each village leader candidate’s virtue or capacity. Thus, natural communication is necessary. Natural communication is defined as daily communication at work, in everyday life, or in other circumstances in which communicators behave naturally and artlessly (Bell, 2016). Importantly, it does not lead to illegal outcomes in the management of village affairs, such as conspiracy or 12 Virtue refers to characteristics including, but not limited to, being law-abiding, having moral integrity, and being intrinsically motivated to serve the village residents (Central Committee of the Chinese Communist Party, 1999; Dal Bó et al., 2017; National People’s Congress of China, 1998). Capacity refers to characteristics including, but not limited to, having professional knowledge and administrative skills (Alesina and Tabellini, 2007; Central Committee of the Chinese Communist Party, 1999; Dal Bó et al., 2017; National People’s Congress of China, 1998). 13 The virtue and capacity of village residents are both assumed to be bounded because (1) the number of residents in a village, a local area, is usually small (Liu et al., 2009; Martinez-Bravo et al., 2014) and (2) their socio-economic characteristics, thinking patterns, and behaviors tend to be homogenous due to homogenous socio-economic, cultural, and institutional constraints, and, in the long term, generation-by-generation interactions in local areas (Bell, 2016). For simplicity, we assume that their virtue and capacity are both bounded at [0, 1]. 14 The perceptions are both assumed to be homogenous (1) because of the small number of village residents and township officials and (2) because village residents and township officials have homogenous socio-economic, cultural, and institutional constraints. 10 rent-seeking (Baker and Faulkner, 1993). The representative village resident or township official obtains 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 , a series of observations of the virtue of village leader candidate 𝑖𝑖 through natural communication at the 𝑋𝑋 − th occurrence 15 , where 𝑋𝑋 =1, … ,𝑁𝑁 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 or 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 and 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 represent the total numbers of occurrences16 of natural communication between the representative village resident and township official and each village leader candidate before a candidate is selected as village leader. 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 is given by (2.1) 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼=𝛼𝛼𝑖𝑖 + 𝜐𝜐𝑖𝑖𝑖𝑖, where 𝜐𝜐𝑖𝑖𝑖𝑖 is a series of random shocks when observing virtue. Similarly, the representative village resident or township official, by naturally communicating with each village leader candidate, obtains 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 , a series of observations of the capacity of village leader candidate 𝑖𝑖 at time 𝑋𝑋 = 1, … ,𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 and 1, … ,𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 is given by (2.2) 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃=𝜃𝜃𝑖𝑖 + 𝜔𝜔𝑖𝑖𝑖𝑖, where 𝜔𝜔𝑖𝑖𝑖𝑖 is a series of random shocks when observing capacity. As natural communication happens often and in various situations, communicators have little opportunity—and are thus unwilling—to behave strategically to hide their true personal characteristics. Therefore, it is acceptable to assume that first, the times at which that natural communication occurs is sufficiently large, that is, 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) →+∞, and second, that the series of observations of virtue or capacity is normally distributed. ASSUMPTION 1 (Natural communication and observations on virtue and capacity): The representative village resident or township official communicates with the village leader candidates naturally, which means that we have 𝜐𝜐𝑖𝑖𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜐𝜐𝛼𝛼2 ) and 𝜔𝜔𝑖𝑖𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜔𝜔𝜃𝜃2 ), and often, which means that 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 and 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 are sufficiently large. Based on Assumption 1, we have 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼~𝑁𝑁(𝛼𝛼𝑖𝑖,𝜎𝜎𝜐𝜐𝛼𝛼2 ), and 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃~𝑁𝑁(𝜃𝜃𝑖𝑖 ,𝜎𝜎𝜔𝜔𝜃𝜃2 ). Prior to natural communication, the representative village resident and the township official have their own prior perceptions of the virtue and capacity of each village leader candidate, as discussed below. 15 Natural communication at “the 𝑋𝑋-th occurrence” is equivalent to “time 𝑋𝑋” for short in the following context. 16 “Numbers of occurrences” is equivalent to “times” for short in the following context. 11 ASSUMPTION 2 (Prior distribution of the virtue and capacity of village leader candidates): The representative village resident or township official’s prior perceptions of the virtue of village leader candidate 𝑖𝑖 are distributed as 𝑁𝑁�𝛼𝛼𝑖𝑖𝐸𝐸 ,𝜎𝜎𝛼𝛼𝑒𝑒2 �, truncated at [0,1], where 𝛼𝛼𝑖𝑖𝐸𝐸 ∈ [0, 1]. Their prior perceptions of the capacity of village leader candidate 𝑖𝑖 are distributed as 𝑁𝑁�𝜃𝜃𝑖𝑖𝐸𝐸 ,𝜎𝜎𝜃𝜃𝑒𝑒2 � , truncated at [0,1] , where 𝜃𝜃𝑖𝑖𝐸𝐸 ∈[0, 1]. 𝛼𝛼𝑖𝑖𝐸𝐸 and 𝜃𝜃𝑖𝑖𝐸𝐸, the prior means, and 𝜎𝜎𝛼𝛼𝑒𝑒2 and 𝜎𝜎𝜃𝜃𝑒𝑒2 , the prior variances, are known to the representative village resident or township official. As virtue and capacity are assumed to be bounded at [0, 1] and the representative village resident or township official engages in long-term natural communication with each village leader candidate, their prior perceptions of the virtue or capacity of each village leader candidate are truncated at [0,1]. Based on their prior perceptions and observations in natural communication, the representative village resident or township official obtains inferred perceptions of the virtue of the village leader candidates. According to Bayes’ rule, these posterior perceptions are obtained by iteration, such that the inferred perceptions of virtue at time 𝑋𝑋 depend on the inferred perceptions of virtue at time 𝑋𝑋 − 1 and the observations of virtue at time 𝑋𝑋. We now introduce the derivation of the density kernel of the posterior distribution of the virtue of village leader candidate 𝑖𝑖. The representative village resident and the township official have identical prior perceptions of the virtue of village leader candidate 𝑖𝑖, whose density kernel is 𝛾𝛾(𝛼𝛼𝑖𝑖) After naturally communicating with village leader candidate 𝑖𝑖 for the first time, the representative village resident or township official updates the density kernel of the posterior distribution of her virtue as (2.3) 𝑝𝑝(𝛼𝛼𝑖𝑖|Ω𝑖𝑖1𝛼𝛼 ) = 𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;Ω𝑖𝑖1𝛼𝛼 ). This posterior distribution at time 𝑋𝑋 = 1 is also the previous distribution at time 𝑋𝑋 = 2. After natural communication at time 𝑋𝑋 = 2, the updated density kernel of the posterior distribution is given by (2.4) 𝑝𝑝(𝛼𝛼𝑖𝑖|𝛺𝛺𝑖𝑖1𝛼𝛼 ,𝛺𝛺𝑖𝑖2𝛼𝛼 ) = [𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 )] ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖2𝛼𝛼 ) = 𝛾𝛾(𝛼𝛼𝑖𝑖)[𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 ) ∙ 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖2𝛼𝛼 )]. Repeating this iteration, after 𝑁𝑁 times of natural communication between the representative village resident or township official and village leader candidate 𝑖𝑖, the 12 density kernel of the posterior distribution of the virtue of village leader candidate 𝑖𝑖 becomes (See the Appendix) (2.5) 𝑝𝑝(𝛼𝛼𝑖𝑖|𝛺𝛺𝑖𝑖1𝛼𝛼 ∙∙∙ 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 ) = 𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ [𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 ) … 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 )] ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 {−1 2 � 1 𝛴𝛴(𝛼𝛼𝑖𝑖) (𝛼𝛼𝑖𝑖 − 𝑆𝑆(𝛼𝛼𝑖𝑖))2�}, where the posterior mean of the virtue of village leader candidate 𝑖𝑖 is (2.6) 𝑆𝑆(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖, and the posterior variance of the virtue of village leader candidate 𝑖𝑖 is (2.7) 𝛴𝛴(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 . The posterior mean and the posterior variance are dynamic across the times of natural communication. That is, at time 𝑋𝑋 = 1, … ,𝑁𝑁, the posterior means of virtue are { 𝜎𝜎𝜐𝜐𝛼𝛼2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 𝐸𝐸 + 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖, … , 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖}, and the posterior variances of virtue are { 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝜎𝜎𝛼𝛼𝑒𝑒 2 , … , 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 }. Given different total occurrences of natural communication in relation to election and appointment, the representative village resident (by election) or the representative township official (by appointment) obtains the posterior mean of the virtue of village leader candidate 𝑖𝑖 as (2.8) 𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖, and obtains the posterior variance of the virtue of village leader candidate 𝑖𝑖 as (2.9) 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 . Following a similar Bayesian inference, the density kernel of the posterior distribution of the capacity of village leader candidate 𝑖𝑖 is given by (See the Appendix) (2.10) 𝑝𝑝�𝜃𝜃𝑖𝑖�𝛺𝛺𝑖𝑖1𝜃𝜃 ∙∙∙ 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 � = 𝛾𝛾(𝜃𝜃𝑖𝑖) ∙ [𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖1𝜃𝜃 �… 𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 �] ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 {−1 2 � 1 𝛴𝛴(𝜃𝜃𝑖𝑖) (𝜃𝜃𝑖𝑖 − 𝑆𝑆(𝜃𝜃𝑖𝑖))2�}, where the posterior mean of the capacity of village leader candidate 𝑖𝑖 is (2.11) 𝑆𝑆(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 , and the posterior variance of the capacity of village leader candidate 𝑖𝑖 is 13 (2.12) 𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 . The posterior mean and the posterior variance are dynamic across occurrences of natural communication. That is, at time 𝑋𝑋 = 1, … ,𝑁𝑁, the posterior means of capacity are { 𝜎𝜎𝜔𝜔𝜃𝜃2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 𝐸𝐸 + 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 , … , 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖} , and the posterior variances of capacity are { 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝜎𝜎𝜃𝜃𝑒𝑒 2 , … , 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 }. Given the different total occurrences of natural communication in relation to election and appointment, the representative village resident (by election) or the representative township official (by appointment) obtains the posterior mean of the capacity of village leader candidate 𝑖𝑖 as (2.13) 𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖 , and their posterior variance of the capacity of village leader candidate 𝑖𝑖 is (2.14) 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 . Proposition 1 explains how the accumulation of natural communication improves inferences of the virtue and capacity of village leader candidates. PROPOSITION 1: As the times that the representative village resident or township official communicates naturally with the village leader candidates increases, their inference of each village leader candidate’s virtue and capacity is improved in the following aspects: (a) Inference Precision increases with occurrences of natural communication, as evidenced by the decrease in the posterior variance of virtue (or capacity) with the total occurrences of natural communication. (b) Inference Accuracy increases with the times of natural communication, as evidenced by the decrease in the difference between the posterior mean of virtue (or capacity) and the real value of virtue (or capacity) with the total occurrences of natural communication. (c) Marginal Inference Accuracy decreases with occurrences of natural communication, as evidenced by the increase in the second-order derivative of the 14 difference between the posterior mean of virtue (or capacity) and the real value of virtue (or capacity) with total occurrences of natural communication. Proof: (a) The first-order derivative of 𝛴𝛴(𝛼𝛼𝑖𝑖) with respect to 𝑋𝑋 is (2.15) 𝜕𝜕[𝛴𝛴(𝛼𝛼𝑖𝑖)] 𝜕𝜕𝑖𝑖 = −𝜎𝜎𝛼𝛼𝑒𝑒4 𝜎𝜎𝜐𝜐𝛼𝛼2(𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 )2 < 0. Similarly, the first-order derivative of 𝛴𝛴(𝜃𝜃𝑖𝑖) with respect to 𝑋𝑋 is (2.16) 𝜕𝜕[𝛴𝛴(𝜃𝜃𝑖𝑖)] 𝜕𝜕𝑖𝑖 = −𝜎𝜎𝜃𝜃𝑒𝑒4 𝜎𝜎𝜔𝜔𝜃𝜃2(𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 )2 < 0. (b) The difference between the posterior mean of virtue and the real value of virtue is (2.17) |𝑆𝑆(𝛼𝛼𝑖𝑖) − 𝛼𝛼𝑖𝑖| = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 |𝛼𝛼𝑖𝑖𝐸𝐸 − 𝛼𝛼𝑖𝑖| = 𝛴𝛴(𝛼𝛼𝑖𝑖)𝜎𝜎𝛼𝛼𝑒𝑒2 |𝛼𝛼𝑖𝑖𝐸𝐸 − 𝛼𝛼𝑖𝑖|. Therefore, we obtain (2.18) 𝜕𝜕|𝑆𝑆(𝛼𝛼𝑖𝑖)−𝛼𝛼𝑖𝑖| 𝜕𝜕𝑖𝑖 = |𝛼𝛼𝑖𝑖𝑒𝑒−𝛼𝛼𝑖𝑖| 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝜕𝜕[𝛴𝛴(𝛼𝛼𝑖𝑖)] 𝜕𝜕𝑖𝑖 < 0. Similarly, the difference between the posterior mean of capacity and the real value of capacity is (2.19) |𝑆𝑆(𝜃𝜃𝑖𝑖) − 𝜃𝜃𝑖𝑖| = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 |𝜃𝜃𝑖𝑖𝐸𝐸 − 𝜃𝜃𝑖𝑖| = 𝛴𝛴(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 |𝜃𝜃𝑖𝑖𝐸𝐸 − 𝜃𝜃𝑖𝑖|, Therefore, we obtain (2.20) 𝜕𝜕|𝑆𝑆(𝜃𝜃𝑖𝑖)−𝜃𝜃𝑖𝑖| 𝜕𝜕𝑖𝑖 = |𝜃𝜃𝑖𝑖𝑒𝑒−𝜃𝜃𝑖𝑖| 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜕𝜕𝛴𝛴(𝜃𝜃𝑖𝑖) 𝜕𝜕𝑖𝑖 < 0. (c) The second-order derivative of 𝛴𝛴(𝛼𝛼𝑖𝑖) with respect to 𝑋𝑋 is (2.21) 𝜕𝜕 2𝛴𝛴(𝛼𝛼𝑖𝑖) 𝜕𝜕𝑖𝑖2 = 2𝜎𝜎𝛼𝛼𝑒𝑒 6 𝜎𝜎𝜐𝜐𝛼𝛼 2(𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 )3 > 0. Therefore, the second-order derivative of |𝑆𝑆(𝛼𝛼𝑖𝑖) − 𝛼𝛼𝑖𝑖| with respect to 𝑋𝑋 is (2.22) 𝜕𝜕 2|𝑆𝑆(𝛼𝛼𝑖𝑖)−𝛼𝛼𝑖𝑖| 𝜕𝜕𝑖𝑖2 = |𝛼𝛼𝑖𝑖𝑒𝑒−𝛼𝛼𝑖𝑖| 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝜕𝜕2𝛴𝛴(𝛼𝛼𝑖𝑖) 𝜕𝜕𝑖𝑖2 > 0. Similarly, the second-order derivative of 𝛴𝛴(𝜃𝜃𝑖𝑖) with respect to 𝑋𝑋 is (2.23) 𝜕𝜕 2𝛴𝛴(𝜃𝜃𝑖𝑖) 𝜕𝜕𝑖𝑖2 = 2𝜎𝜎𝜃𝜃𝑒𝑒 6 𝜎𝜎 𝜔𝜔𝜃𝜃 2(𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 )3 > 0. Therefore, the second-order derivative of |𝑆𝑆(𝜃𝜃𝑖𝑖) − 𝜃𝜃𝑖𝑖| with respect to 𝑋𝑋 is (2.24) 𝜕𝜕 2|𝑆𝑆(𝜃𝜃𝑖𝑖)−𝜃𝜃𝑖𝑖| 𝜕𝜕𝑖𝑖2 = |𝜃𝜃𝑖𝑖𝑒𝑒−𝜃𝜃𝑖𝑖| 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜕𝜕2𝛴𝛴(𝜃𝜃𝑖𝑖) 𝜕𝜕𝑖𝑖2 > 0. ∎ 15 The implication of the improved precision and accuracy of inferences is that each natural communication brings local information on the village leader candidates, leading to more precise and accurate inferences of their virtue and capacity. For marginal inference accuracy, the implication is that as the times of natural communication increases, the amount of local information that can be used to infer the virtue and capacity of the village leader candidates decreases. 3. Institutional Comparison (Inference Accuracy and Precision): To compare inference precision and inference accuracy before and after the introduction of local direct elections, we assume about 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 and 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, the total occurrences s of natural communication between the representative village resident and the representative township official with each village leader candidate. ASSUMPTION 3: 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. Assumption 3 indicates that the representative village resident naturally communicates with the village leader candidates more often than the representative township official, implying that the representative village resident has an advantage in terms of local information about the village leader candidates. The reason is that the village leader candidates are also residents of the representative village, and have long- term and frequent natural communication with other village residents in various situations (Bell, 2016). For instance, the village leader candidates and other village residents have usually known each other since childhood. As they grow up together in the village, they communicate frequently at school, in production or commercial activities, and in everyday life. In contrast, residents have fewer opportunities to communicate naturally with township officials. The reasons may be that the village leader candidates usually communicate with township officials when dealing with the public affairs of the village or private affairs related to township administration, and that township officials are often posted across different towns. Given that 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, we compare the precision and accuracy of the inferences of the virtue and capacity of each village leader candidate before and after the introduction of local direct elections. (a) The representative village resident’s inferences regarding the virtue and capacity of the village leader candidates are more precise before electing one as village leader 16 compared with those of the representative township official before appointing one as village leader. (b) The representative village resident’s inferences regarding the virtue and capacity of the village leader candidates are more accurate before electing one as village leader, compared with those of the representative township official before appointing one as village leader. Figure 1 shows that as the times the representative village resident or township official communicates naturally with the village leader candidates increases, (a) the bandwidth of the square root of the posterior variance decreases, reflecting greater inference precision; (b) the difference between the posterior mean of virtue (or capacity) and the real value of virtue (or capacity) decreases, reflecting greater inference accuracy; and (c) the curve of the posterior mean is concave ascending and convex descending, reflecting reduced marginal inference accuracy. As shown in Figure 1, the bandwidth representing the square root of the posterior variance of the virtue or capacity of the village leader candidates by election is lower than that by appointment, implying more precise inference by the representative village resident. The difference between the posterior mean of the virtue or capacity of the village leader candidates and its real value by election is smaller than that by appointment, implying greater inference accuracy by the representative village resident. In summary, the representative village resident, by naturally communicating more often with the village leader candidates, has more local information about their virtue and capacity than about the representative township official. Therefore, as local direct elections allow the representative village resident and not the representative township official to select the village leader, the virtue and capacity of the village leader candidates are inferred with greater precision and accuracy. This theoretical demonstration applies to all village committee members. The inferences for each village committee member, including the village leader as [Insert Figure 1 here] 17 representative, are homogenous. Accordingly, the virtue and capacity of each village committee member are inferred more precisely and accurately. B. Selection of Village Committee Members In this section, we discuss how local direct elections facilitate the meritocratic selection of village committee members through the improved inference effectiveness. We discuss the village leader as the representative of all village committee members, and our theoretical findings also apply to other village committee members. We find that by providing more accurate inferences about the virtue and capacity of village leader candidates, local direct election in a representative village improves the expected competence of the village leader. In addition, by providing more precise inferences about the virtue and capacity of village leader candidates, local direct election reduces the variance in the competence of the village leader. 1. Setup: The representative village resident and township official both select the village leader candidate with the highest competence as the village leader. Our theory defines 𝜋𝜋𝑖𝑖, the competence of village leader candidate 𝑖𝑖, as a weighted average of the virtue and capacity of village leader candidate 𝑖𝑖, such that 𝜋𝜋𝑖𝑖 ≡ 𝜇𝜇𝛼𝛼𝑖𝑖 + (1 − 𝜇𝜇)𝜃𝜃𝑖𝑖. 𝜇𝜇 represents the weight assigned by the representative village resident or township official to the virtue of the village leader candidates, and 𝜇𝜇 ∈ [0, 1] 17, so 𝜇𝜇 is also called the village leader’s virtue-capacity spectrum. We have the following assumption about 𝜇𝜇: ASSUMPTION 4 (Village leader’s virtue-capacity spectrum): 𝜇𝜇 = 𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇 = 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉. Assumption 4 states that 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉, the representative village resident’s preference for the village leader’s virtue-capacity spectrum, is equal to 𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇 , the representative township official’s preference for the village leader’s virtue-capacity spectrum, which is valued with 𝜇𝜇. 17 The value that 𝜇𝜇 takes has general implications. In public sectors, we assume politicians’ virtue-capacity spectrum to be 𝜇𝜇 ∈ (0.5, 1]. In contrast, in private sectors, we assume the leaders’ virtue-capacity spectrum to be 𝜇𝜇 ∈ [0, 0.5) because private sectors have less public purpose, instead tending to emphasize making profits. 18 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉 is contingent on the village leader’s responsibility in managing village affairs and serving village residents. In other words, the representative village resident will select a village leader whose virtue-capacity spectrum is 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉 in order to safeguard village residents’ rights and protect village residents’ interests. It is stipulated that the village leader not only to be capable of developing the local economy, but also to abide by the laws, be intrinsically motivated to serve the people, and protect people’s rights and interests (National People’s Congress of China, 1998). Therefore, the assumption underlying 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉 is that there exists a virtue-capacity spectrum of the village leader, and that the village leader makes full use of her virtue and capacity to safeguard and protect the representative village resident’s rights and interests. 𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇 = 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉 implies that the representative township official would select a village leader whose virtue-capacity spectrum is 𝜇𝜇𝑉𝑉𝑉𝑉,𝑖𝑖𝑇𝑇, which is equal to 𝜇𝜇𝑉𝑉𝑉𝑉,𝑉𝑉𝑉𝑉. This is because the representative township official’s preference for the village leader’s virtue-capacity spectrum is stipulated consistent with the representative village resident’s preference. In other words, the representative township official is required to safeguard and protect the representative village resident’s rights and interests in each village (National People’s Congress of China, 2004). Expected Competence. To calculate 𝜋𝜋𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴), the expected competence of the elected (or appointed) village leader in a representative village, we calculate 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖), the weighted average of the competence of all village leader candidates in a representative village that has already (or has not) introduced local direct elections, with weight 𝐴𝐴𝑖𝑖 as the probability that village leader candidate 𝑖𝑖 will be elected (or appointed). 𝐴𝐴𝑖𝑖 has the following properties: (a) 𝐴𝐴𝑖𝑖 is contingent on the competence of village leader candidate 𝑖𝑖. (b) 𝐴𝐴𝑖𝑖 ∈ [0, 1]; thus, its value represents the probability of electing or appointing village leader candidate 𝑖𝑖 as village leader. (c) 𝐴𝐴𝑖𝑖 is positively associated with the virtue and capacity of village leader candidate 𝑖𝑖, which reflects a positive screening of the election and appointment of village leaders (Dal Bó et al., 2017) based on the candidates’ virtue and capacity. Specifically, 𝜕𝜕𝐴𝐴𝑖𝑖 𝜕𝜕𝛼𝛼𝑖𝑖 >0 and 𝜕𝜕𝐴𝐴𝑖𝑖 𝜕𝜕𝜃𝜃𝑖𝑖 > 0. To satisfy these three properties, for simplicity and without loss of generality, we assume that village leader candidate 𝑖𝑖’s probability of being elected or appointed as 19 village leader increases linearly with the weighted average of her posterior mean of virtue and her posterior mean of capacity, with weight 𝜇𝜇, the virtue-capacity spectrum. Following Alesina and Tabellini (2007), the probability of being elected or appointed can be considered a reward. Therefore, the probability that village leader candidate 𝑖𝑖 will be elected or appointed is given by (2.25) 𝐴𝐴𝑖𝑖 = 𝑙𝑙[𝜇𝜇𝑆𝑆(𝛼𝛼𝑖𝑖) + (1 − 𝜇𝜇)𝑆𝑆(𝜃𝜃𝑖𝑖)] = 𝜇𝜇𝑙𝑙[𝛴𝛴(𝛼𝛼𝑖𝑖) 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 𝐸𝐸 + (1 − 𝛴𝛴(𝛼𝛼𝑖𝑖) 𝜎𝜎𝛼𝛼𝑒𝑒 2 )𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[𝛴𝛴(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + (1 − 𝛴𝛴(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 )𝜃𝜃𝑖𝑖], where 𝛴𝛴(𝛼𝛼𝑖𝑖) ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 , 𝛴𝛴(𝜃𝜃𝑖𝑖) ≡ 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 , and 𝑙𝑙 ∈ [0, 1]. As discussed in Section 2.3.A., 𝛴𝛴(𝛼𝛼𝑖𝑖) and 𝛴𝛴(𝜃𝜃𝑖𝑖) measure inference precision. As the number of occurrences of natural communication increases, 𝛴𝛴(𝛼𝛼𝑖𝑖) and 𝛴𝛴(𝜃𝜃𝑖𝑖) decrease; thus, 𝐴𝐴𝑖𝑖 tends to be the product of the inferred competence and 𝑙𝑙. Therefore, when 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 , 𝛴𝛴(𝛼𝛼𝑖𝑖) = 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸(𝛼𝛼𝑖𝑖), and 𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸(𝜃𝜃𝑖𝑖), we have 𝐴𝐴𝑖𝑖 = 𝐴𝐴𝑖𝑖𝐸𝐸𝐸𝐸𝐸𝐸 , the probability that village leader candidate 𝑖𝑖 will be elected. When 𝑁𝑁 = 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, 𝛴𝛴(𝛼𝛼𝑖𝑖) = 𝛴𝛴𝐴𝐴𝐴𝐴𝐴𝐴(𝛼𝛼𝑖𝑖), and 𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝛴𝛴𝐴𝐴𝐴𝐴𝐴𝐴(𝜃𝜃𝑖𝑖), we have 𝐴𝐴𝑖𝑖 = 𝐴𝐴𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴 , the probability that village leader candidate 𝑖𝑖 will be appointed. When calculating the weighted average of the competence of all village leader candidates with weight 𝐴𝐴𝑖𝑖, as ∫ ∫ 𝐴𝐴𝑖𝑖 1 0 𝑑𝑑𝛼𝛼𝑖𝑖 1 0 𝑑𝑑𝜃𝜃𝑖𝑖 < 1—that is, the sum of all weights is less than 1—we should have ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖 + (1 − 𝜇𝜇)𝜃𝜃𝑖𝑖]10 𝐴𝐴𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖, the weighted average of the competence of all village leader candidates, divided by ∫ ∫ 𝐴𝐴𝑖𝑖 1 0 𝑑𝑑𝛼𝛼𝑖𝑖 1 0 𝑑𝑑𝜃𝜃𝑖𝑖 to standardize the weights. As a result, the expected competence of the elected (or appointed) village leader is given by (2.26) 𝜋𝜋𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖) = ∫ ∫ [𝜋𝜋𝑖𝑖]𝐴𝐴𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)10 𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖 ∫ ∫ 𝐴𝐴𝑖𝑖 𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)1 0 𝑑𝑑𝛼𝛼𝑖𝑖 1 0 𝑑𝑑𝜃𝜃𝑖𝑖 , where the probability that village leader candidate 𝑖𝑖 will be elected (or appointed) is (2.27) 𝐴𝐴𝑖𝑖 𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) = 𝜇𝜇𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) + (1 − 𝜇𝜇)𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) = 𝜇𝜇𝑙𝑙 �𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 𝐸𝐸 + �1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) 𝜎𝜎𝛼𝛼𝑒𝑒 2 � 𝛼𝛼𝑖𝑖� +(1 − 𝜇𝜇)𝑙𝑙[𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 𝐸𝐸 + (1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) 𝜎𝜎𝜃𝜃𝑒𝑒 2 )𝜃𝜃𝑖𝑖], where 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 , 𝛴𝛴𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) ≡ 𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 , and 𝑙𝑙 ∈[0, 1]. This shows that what distinguishes the expected competence of the elected 20 village leader and the appointed village leader in a representative village is the times that the representative village resident and township official communicate naturally with the village leader candidates. Variance of Competence. We can also calculate 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋), the variance of the competence of the elected (or appointed) village leader in a representative village. To this end, we calculate 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖), the variance of the competence of all village leader candidates in a representative village that has already (or has not) introduced local direct elections. By definition, the variance of the competence of the elected (or appointed) village leader in a representative village is (2.28) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋) = 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[(𝜋𝜋𝑖𝑖)2] − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)2[𝜋𝜋𝑖𝑖]. This measures the extent to which the competence of the elected (or appointed) village leader varies. Similar to the expectation, what distinguishes the variance of the competence of the elected village leader and the appointed village leader in a representative village is the times that the representative village resident and township official communicate naturally with the village leader candidates. 2. Institutional Comparison (Meritocratic Selection of Village Leaders): Proposition 2 compares [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 , the expected competence of the elected village leader, with [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴, the expected competence of the appointed village leader. PROPOSITION 2: The expected competence of the elected village leader is greater than that of the appointed village leader in a representative village. Specifically, we have (2.29) [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 > [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴, where [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋𝑖𝑖) and [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 ≡ 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖). Here are some specific cases: Case 1: 𝜇𝜇 = 1; that is, the representative village resident or township official only considers the virtue of the village leader candidates. Case 2: 𝜇𝜇 = 0; that is, the representative village resident or township official only considers the capacity of the village leader candidates. Case 3: 𝜇𝜇 = 1 2 ; that is, the representative village resident or township official considers the virtue and capacity of the village leader candidates with equal weights. 21 Case 4: 𝜇𝜇 ∈ (1 2 , 1); that is, the representative village resident or township official puts more emphasis on virtue. This is contingent on 𝜃𝜃𝑖𝑖𝐸𝐸 ∈ [0.5, 1]; that is, the prior mean of the capacity of each village leader candidate is greater than the mean of the real value of the capacity of all potential village leader candidates. Case 5: 𝜇𝜇 ∈ (0, 1 2 ); that is, the representative village resident or township puts more emphasis on capacity. This is contingent on 𝛼𝛼𝑖𝑖𝐸𝐸 ∈ [0.5, 1]; that is, the prior mean of the capacity of each village leader candidate is greater than the mean of the real value of the capacity of all potential village leader candidates. Proof: See the Appendix. ∎ In practice, Case 4 and Case 5 both hold; that is, the conditions in both cases exist all the time. According to the requirements of the OLVC, village leader candidates satisfy certain personal characteristics in terms of capacity, such as having a diploma or management experience (National People’s Congress of China, 1998). In this sense, the mean of the prior perceptions of the representative village resident or township official regarding the capacity of each village leader candidate is greater than or equal to 0.5, the mean of the capacity of all village residents—in other words, 𝜃𝜃𝑖𝑖𝐸𝐸 ∈ [0.5, 1]. The OLVC also requires that village leader candidates satisfy certain personal characteristics in terms of virtue, such as having no criminal record or affiliation with the Communist Party of China (National People’s Congress of China, 1998). In this sense, the mean of the prior perceptions of the representative village resident or township official regarding the virtue of the village leader candidates is greater than or equal to 0.5, the mean of the virtue of all village residents. Proposition 3 compares 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋), the variance of the competence of the elected village leader, with 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋), the variance of the competence of the appointed village leader. PROPOSITION 3: The variance of the competence of the elected village leader is smaller than that of the appointed village leader in a representative village. Specifically, we have (2.30) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) < 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) regardless of the value of 𝜇𝜇 in [0, 1]. 22 As discussed in (2.28), 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) = 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋𝑖𝑖) and 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) = 𝑉𝑉𝑉𝑉𝑉𝑉𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖). Proof: See the Appendix. ∎ In summary, after the introduction of local direct elections in a representative village, the expected competence of the village leader increases, while its variance decreases. This implies that by providing local information on the virtue and capacity of village leader candidates, local direct elections facilitate the meritocratic selection of village leaders. This theoretical demonstration applies to all village committee members. The selection of each village committee member, including the village leader as representative, is homogenous. Thus, the expected competence of each village committee member is also homogenous. Accordingly, by providing local information on the virtue and capacity of village committee candidates, local direct elections increase the expected competence of each village committee member while reducing the variance. In other words, local direct elections facilitate the meritocratic selection of all village committee members. C. Summary This section demonstrates that local direct elections facilitate meritocratic selection for village committee members. Local direct elections, by providing local information on each village committee candidate, improve the inference effectiveness of the virtue and the capacity of each candidate. After local direct elections transfer the authority for selecting village committee members from township officials to village residents, who have an advantage regarding local information about these candidates, the virtue and capacity of these candidates are inferred with greater precision and accuracy. As a result, the expected competence of village committee members increase and the variances decrease, namely the meritocratic selection of village committee members. It is noted that with the virtue-capacity spectrum in our model, our demonstrations apply to a continuum of selection scenarios, from purely virtue-based to purely capacity-based. 2.4. Meritocratic Selection with the Improved Candidate Pool 23 Our theoretical model based on the Bayesian inference framework investigates how the introduction of local direct election to a representative village facilitates the meritocratic selection of the village party secretary. The mechanism is that as the introduction of local direct election facilitates the meritocratic selection of village committee members, the candidate pool of the village party secretary is improved. A. Performance-based Promotion of Village Party Secretaries The inferences of the village party secretary candidates and the selection of the village party secretary are known as the performance-based promotion, which is similar to those discussed in Sections 2.3.A. and 2.3.B. Village party branch members are candidates of the village party secretary. The village party branch has two types of members: (I) individuals who are both members of the village party branch and the village committee, denoted by village party branch member 𝑗𝑗, 𝑗𝑗 = {1,2, … }; and (II) individuals who are only members of the village party branch, denoted by village party branch member 𝚥𝚥̃ , 𝚥𝚥̃ = {1,2, … } . The village leader is a Type I village party branch member. The candidate pool for the village party secretary is partially improved by local direct elections. After the introduction of local direct elections, the expected competence of Type I village party branch members increases, while that of Type II village party branch members remains unchanged. With local direct elections, Type I village party branch members are no longer appointed by the representative township official, but elected by the representative village resident; thus, their expected competence increases, as discussed in Section 2.3.B. In contrast, as Type II village party branch members are still appointed by the representative township official, their competence remains unchanged. In contrast, before the introduction of local direct elections, the expected competence of Type I and that of Type II village party branch members are the same because both types were appointed by the representative township official. In a representative village, the representative township official cannot observe each village party branch member’s virtue or capacity. To infer their virtue, the representative township official observes the performance of each village party branch member and obtains 𝑃𝑃𝑗𝑗𝑗𝑗𝛼𝛼 or 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼 , the performance on virtue of village party branch member𝑗𝑗 or 𝚥𝚥̃ in task 𝑚𝑚 = 1, 2, … ,𝑀𝑀, 𝑀𝑀 being sufficiently large. 𝑃𝑃𝑗𝑗𝑗𝑗𝛼𝛼 is given by 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼 24 =𝛼𝛼𝑗𝑗 + 𝜀𝜀𝑗𝑗𝑗𝑗 , and 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼 is given by 𝑃𝑃?̃?𝚥𝑗𝑗𝛼𝛼 =𝛼𝛼?̃?𝚥 + 𝜀𝜀?̃?𝚥𝑗𝑗 , where 𝜀𝜀𝑗𝑗𝑗𝑗 and 𝜀𝜀?̃?𝚥𝑗𝑗 represent two series of random shocks, 𝜀𝜀𝑗𝑗𝑗𝑗, 𝜀𝜀?̃?𝚥𝑗𝑗~𝑁𝑁(0,𝜎𝜎𝜀𝜀2). Similarly, the representative township official observes the performance of each village party branch member and obtains 𝑃𝑃𝑗𝑗𝑗𝑗𝜃𝜃 or 𝑃𝑃?̃?𝚥𝑗𝑗𝜃𝜃 , the performance on capacity of village party branch member 𝑗𝑗 or 𝚥𝚥̃ in task 𝑚𝑚 = 1, 2, … ,𝑀𝑀, 𝑀𝑀 being sufficiently large. 𝑃𝑃𝑗𝑗𝑗𝑗 𝜃𝜃 is given by 𝑃𝑃𝑗𝑗𝑗𝑗𝜃𝜃 =𝜃𝜃𝑗𝑗 + 𝜂𝜂𝑗𝑗𝑗𝑗, and 𝑃𝑃?̃?𝚥𝑗𝑗𝜃𝜃 is given by 𝑃𝑃?̃?𝚥𝑗𝑗𝜃𝜃 =𝜃𝜃?̃?𝚥 + 𝜂𝜂?̃?𝚥𝑗𝑗, where 𝜂𝜂𝑗𝑗𝑗𝑗 and 𝜂𝜂?̃?𝚥𝑗𝑗 represent two series of random shocks18, 𝜂𝜂𝑗𝑗𝑗𝑗 , 𝜂𝜂?̃?𝚥𝑗𝑗~𝑁𝑁(0,𝜎𝜎𝜂𝜂2). Prior to performance assessment, the representative township official has her own prior perceptions of the virtue and capacity of each village party branch member, whose distribution is discussed below. ASSUMPTION 5 (Prior distribution of the virtue and capacity of village party branch members): The representative township official’s prior perceptions of the virtue of village party branch member 𝑗𝑗 or 𝚥𝚥̃ are distributed as 𝑁𝑁�𝛼𝛼𝑗𝑗𝑢𝑢,𝜎𝜎𝛼𝛼𝑢𝑢2 � or 𝑁𝑁�𝛼𝛼?̃?𝚥𝑢𝑢,𝜎𝜎𝛼𝛼𝑢𝑢2 � , truncated at [0,1] , where 𝛼𝛼𝑗𝑗𝑢𝑢,𝛼𝛼?̃?𝚥𝑢𝑢 ∈ [0, 1] . Her prior perceptions of the capacity of village party branch member 𝑗𝑗𝑘𝑘 or 𝑗𝑗𝑘𝑘� are distributed as 𝑁𝑁�𝜃𝜃𝑗𝑗𝑢𝑢,𝜎𝜎𝜃𝜃𝑢𝑢2 � or 𝑁𝑁�𝜃𝜃?̃?𝚥𝑢𝑢,𝜎𝜎𝜃𝜃𝑢𝑢2 � , truncated at [0,1], where 𝜃𝜃𝑗𝑗𝑢𝑢, 𝜃𝜃?̃?𝚥𝑢𝑢 ∈ [0, 1]. The prior means and the prior variances are known to the representative township official. Therefore, similar to Section 2.3.A., the posterior mean of the virtue of village party branch member 𝑗𝑗 is (2.31) 𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝛼𝛼𝑗𝑗� = 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗 , and the posterior mean of the virtue of village party branch member 𝚥𝚥̃ is (2.32) 𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝛼𝛼?̃?𝚥� = 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥. The posterior mean of the capacity of village party branch member 𝑗𝑗 is (2.33) 𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝜃𝜃𝑗𝑗� = 𝜎𝜎𝜂𝜂2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃𝑗𝑗 , and the posterior mean of the capacity of village party branch member 𝚥𝚥̃ is (2.34) 𝑆𝑆𝑉𝑉𝑉𝑉𝑆𝑆�𝜃𝜃?̃?𝚥� = 𝜎𝜎𝜂𝜂2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃?̃?𝚥𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2𝜎𝜎𝜂𝜂2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜃𝜃?̃?𝚥. 18 This follows Jones and Olken (2005), Besley et al. (2011), Yao and Zhang (2015), and Bloom et al. (2015). 25 Similar to Section 2.3.B., the representative township official selects the village party branch member with the highest competence as the village party secretary. Our theory defines 𝜋𝜋𝑗𝑗 , the competence of Type I village party branch member 𝚥𝚥̃ , as a weighted average of the virtue and capacity of village party branch member 𝑗𝑗, such that 𝜋𝜋𝑗𝑗 ≡ 𝜇𝜇𝛼𝛼𝑗𝑗 + (1 − 𝜇𝜇)𝜃𝜃𝑗𝑗 , and defines 𝜋𝜋?̃?𝚥 , the competence of Type II village party branch member 𝚥𝚥̃, as a weighted average of the virtue and capacity of village party branch member 𝚥𝚥̃, such that 𝜋𝜋?̃?𝚥 ≡ 𝜇𝜇𝛼𝛼?̃?𝚥 + (1 − 𝜇𝜇)𝜃𝜃?̃?𝚥. [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝐸𝐸 is the competence of Type I village party branch member 𝑗𝑗, who was elected, and [𝜋𝜋𝑗𝑗]𝐴𝐴𝐴𝐴𝐴𝐴 is the competence of Type I village party branch member 𝑗𝑗, who was appointed. As Type II village party branch members are always appointed, 𝜋𝜋?̃?𝚥 is uncorrelated with the introduction of local direct elections. 𝜇𝜇 represents the village party secretary’s virtue-capacity spectrum, which is assumed to be the same as the village leader’s virtue-capacity spectrum. This is because (1) both the village leader and the village party secretary deal with village affairs and (2) the representative township official, to safeguard and protect village residents’ rights and interests, is required to select the most competent village leader and the most competent village party secretary (National People’s Congress of China, 2004). Similar to Section 2.3.B., the likelihood of appointing village party branch member 𝑗𝑗𝑘𝑘 as the village party secretary increases linearly with the posterior mean of her competence. Specifically, (2.35) 𝑅𝑅𝑗𝑗 = 𝜇𝜇𝜇𝜇 � 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗� +(1 − 𝜇𝜇)𝜇𝜇 � 𝜎𝜎𝜂𝜂2 𝜎𝜎𝜂𝜂 2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 𝜃𝜃𝑗𝑗 𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜎𝜎𝜂𝜂 2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 𝜃𝜃𝑗𝑗� and the likelihood of appointing village party branch member 𝚥𝚥̃ as the village party secretary increases linearly with the posterior mean of her competence. Specifically, (2.36) 𝑅𝑅?̃?𝚥 = 𝜇𝜇𝜇𝜇 � 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼?̃?𝚥� +(1 − 𝜇𝜇)𝜇𝜇 � 𝜎𝜎𝜂𝜂2 𝜎𝜎𝜂𝜂 2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 𝜃𝜃?̃?𝚥 𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜎𝜎𝜂𝜂 2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 𝜃𝜃?̃?𝚥�, where 𝜇𝜇 ∈ [0, 1]. 26 B. Selection of Village Party Secretaries In this section, we discuss how local direct elections facilitate the meritocratic selection of the village party secretary through the improved candidate pool. We find that because the local direct election in a representative village improves the expected competence of village committee members, who are candidates of the village party secretary, the expectation of the competence of the village party secretary is also improved. However, the variance of the competence of the village party secretary is ambiguously changed. Expected Competence. This section calculates [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆, the expected competence of the village party secretary. It then compares [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 and [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 . [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 refers to the expected competence of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are directly elected by local village residents. [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 refers to the expected competence of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are appointed by township officials. To calculate [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆, we calculate the weighted average of the competence of all elected (or appointed) Type I village party branch members and that of all Type II village party branch members. As discussed above, [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) is the competence of Type I village party branch member 𝑗𝑗, who was elected or appointed, and 𝑅𝑅𝑗𝑗 𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) is village party branch member 𝚥𝚥̃’s likelihood of being appointed village party secretary. As discussed earlier, 𝜋𝜋?̃?𝚥 is not correlated with the introduction of local direct elections. As a result, the expected competence of the village party secretary is (2.37) [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆 ≡ 𝔼𝔼 (𝜋𝜋𝑗𝑗,?̃?𝚥) = ∫ 𝜋𝜋𝑗𝑗𝑉𝑉𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� ∫ 𝑉𝑉𝑗𝑗 1 0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥� 1 0 𝑑𝑑𝜋𝜋𝚥𝚥� . Specifically, we compare (2.38) [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝑒𝑒𝑉𝑉𝑗𝑗𝐸𝐸𝐸𝐸𝑒𝑒10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� ∫ 𝑉𝑉𝑗𝑗 𝐸𝐸𝐸𝐸𝑒𝑒1 0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥� 1 0 𝑑𝑑𝜋𝜋𝚥𝚥� and (2.39) [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 ≡ 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐴𝐴𝐴𝐴𝐴𝐴𝑉𝑉𝑗𝑗𝐴𝐴𝐴𝐴𝐴𝐴10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� ∫ 𝑉𝑉𝑗𝑗 𝐴𝐴𝐴𝐴𝐴𝐴1 0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥� 1 0 𝑑𝑑𝜋𝜋𝚥𝚥� . Proposition 4 explains how local direct elections improves the competence of the village party secretary in a representative village, because of the improved candidate pool. 27 PROPOSITION 4: After the introduction of local direct elections in a representative village, the expected competence of the village party secretary increases due to the increased expected competence of Type I village party branch members. Specifically, we have (2.40) [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 > [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 That is, the expected competence of a village party secretary who has been directly elected as a village committee member is greater than that of a village party secretary who has never been elected directly as a village committee member. Proof: See the Appendix. ∎ Variance of Competence. This section calculates 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋) , the variance of the competence of the village party secretary. It then compares 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) and 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋). 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) refers to the variance of the competence of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are directly elected by local village residents. 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) refers to the variance of the competence of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are appointed by township officials. By definition, the variance of the competence of the village party secretary in a representative village is (2.41) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋) = 𝔼𝔼 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼2��𝜋𝜋𝑗𝑗,?̃?𝚥�� Specifically, we compare (2.42) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸2��𝜋𝜋𝑗𝑗,?̃?𝚥�� and (2.43) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) = 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴2��𝜋𝜋𝑗𝑗,?̃?𝚥�� Proposition 5 discusses how local direct elections facilitate the meritocratic selection of the village party secretary because of the improved candidate pool. PROPOSITION 5: After the introduction of local direct elections in a representative village, the variance of the competence of the village party secretary is changed ambiguously 28 Proof: See the Appendix. ∎ C. Summary This section demonstrates that local direct elections indirectly and limitedly facilitate the meritocratic selection of village party secretaries by improving the candidate pools. The introduction of local direct elections turns some village party branch members, who are also village committee members, from appointed to elected, and thus improves the expected competence of those village committee members. Therefore, the expected competence of village party branch members is also improved. In this sense, as the pools of the candidates of village party secretaries are partially improved, the expected competence of village party secretaries also increases. However, the variance of the competence of village party secretaries is ambiguously changed. 2.5. Concluding Remarks To the best of our knowledge, this paper is the first to use local information to address adverse selection in political selection. Two major problems of political economy, both related to information asymmetry, are political selection that suffers from adverse selection and political incentive that suffers from moral hazard. Many studies address moral hazard, either by explaining current institutional arrangements or by designing new mechanisms, to discuss the incentives of politicians (Laffont, 2001; Besley, 2006). However, few studies discuss adverse selection in the selection of politicians (Besley, 2005). In contrast, adverse selection is commonly discussed, for example, in the literature on job market signaling (Spence, 1973) and product advertising (Milgrom and Roberts, 1986). This paper’s theory emphasizes the role of local information in addressing adverse selection in political selection. It is shown that accumulated local information, quantified with the increasing numbers of natural communication between political candidates and political decision makers, improve the accuracy of inferences of each political candidate’s virtue and capacity. Specifically, as the number of occurrences of natural communication increases, political decision makers infer each political candidate’s virtue and capacity more accurately and precisely. To frame the theory, this paper uses the Bayesian inference framework rather than the game theory framework 29 because this paper focuses on the influences of accumulating local information on the inference effectiveness rather than the strategic behaviors of political candidates and decision makers. The essential mechanism through which local direct elections work in the meritocratic selection of politicians is the local information on the political candidates that local direct elections provide. In the context of Chinese local governance, by providing local information on each village committee candidate’s virtue and capacity, the introduction of local direct elections enhances the expected competence of each village committee member and reduces the variance. Further, the expected competence of the village party secretary is also enhanced indirectly. In this sense, to evaluate the effect of an institutional arrangement on meritocratic selection, one of the criteria is whether such an institutional arrangement provides sufficient local information to infer political candidates’ virtue and capacity. This paper’s theory, although discussed in the context of Chinese local governance, has general implications. The essential scenario that this paper’s theory discusses is characterized as grassroots-level small groups in the stratified governance structure, regardless of whether in a rural or urban area or a public or private sector. In this essential scenario, the leader of each small group is locally and directly elected and then has the opportunity to be promoted upward based on her performance. A typical example is an organization with multiple departments, in which the head of each department is directly elected within each department; those heads are likely to be promoted to the board of the organization based on their performance. In this process, the local information on each head candidate’s virtue and capacity is aggregated, facilitating the meritocratic selection of each head and the board directors. Several limitations and extensions should be noted. (1) This paper only discusses the unilateral inference of local leader candidates by the decision makers (local voters or upper officials), and it finds that local direct elections work in meritocratic selection. However, when discussing bilateral inference, side effects may accompany local direct elections. For instance, compared to upper officials, it could be easier for local leader candidates to bribe or conspire with local residents due to their increased natural communication. (2) Local voters or upper officials’ perceptions of each local leader candidate are assumed to be homogenous. A future study could discuss how various distributions of local voters or upper officials’ perceptions influence political selection. 30 (3) Our theory does not consider whether the meritocratic selection facilitated by the local direct election yields an equilibrium in which village committee members, village party secretaries, village residents, and township officials all obtain optimized gains. Future studies could discuss the existence of such an equilibrium with either unilateral inference or bilateral inference. 31 Figure 2.1. Inference Accuracy and Precision with Natural Communication Times 32 3. Lagged Variables as Instruments1 Yu Wang2 and Marc Bellemare3 3.1. Introduction To address endogeneity concerns in empirical studies with observational data, it is common to use a lagged endogenous variable as the instrumental variable (IV) in estimation. This strategy, namely “lagged IV”, is popular among applied researchers, because it requires no other variables as IVs, which are usually difficult to find. Admittedly, lagged variables may not be proper IVs because they are not exogenous; it is often argued that lagged variables could at least alleviate endogeneity to some extent (Anderson and Hsiao, 1981; Todd and Wolpin, 2003). However, few formal theoretical analyses have been conducted to discuss whether the lagged IV method reduces the threat of endogeneity. Applied researchers have few theoretical references regarding under what conditions the lagged IV method could alleviate the endogenous problem, and regarding under what conditions the lagged IV method would even aggravate such a problem. In this paper, we provide a theoretical argument on the validity of the lagged IV strategy in response to the endogeneity concern and simulation results to support our findings. We find that when the lagged IV does not have direct causal impact on the explained variable or on the unobserved confounder, it only violates the independence assumption, but not the exclusion restriction, as stated in the Local Average Treatment Effect (LATE) Theorem. In this case, the Local Average Treatment Effect (LATE) in the lagged IV estimation only consists of the “restricted local Average Treatment on the Treated (restricted local ATT)”. Comparatively, the Average Treatment Effect (ATE) in the OLS estimation consists of both the “Average Treatment on the Treated (ATT)” and the selection bias. As a result, 1 We thank Jay Coggins, John Freeman, Paul Glewwe and Steve Miller for valuable comments and suggestions. All errors are authors’. 2 Wang: Corresponding Author, Department of Applied Economics, University of Minnesota, email: wang5979@umn.edu. 3 Bellemare: Department of Applied Economics, University of Minnesota, email: mbellema@umn.edu. 33 when the lagged IV only violates the independence assumption, its estimate could yield less extent of endogeneity than the OLS estimate; in other words, the lagged IV method could mitigate the endogeneity. However, when the lagged IV has direct causal impact either on the explained variable or on the unobserved confounder, or both, it violates not only the independence assumption, but also the exclusion restriction. In these three cases, the Local Average Treatment Effect (LATE) in the lagged IV estimation consists of both the “relaxed local Average Treatment on the Treated (relaxed local ATT)” and the “local selection bias”. Comparatively, the Average Treatment Effect (ATE) in the OLS estimation consists of both the “Average Treatment on the Treated (ATT)” and the selection bias. As a result, when the lagged IV violates both the independence assumption and the exclusion restriction, its estimate could yield a larger extent of endogeneity than the OLS estimate, and thus the lagged IV method cannot mitigate, but even aggravates the endogeneity. We set up a structural model to compare the ATE in the OLS estimate and the LATE in the lagged IV estimate both qualitatively and quantitatively. In this model, the explained variable is determined by an explanatory variable, an unobserved confounder variable, and perhaps the lagged explanatory variable. The explanatory variable is determined by its one-order lagged item and also the unobserved confounder; in addition, the unobserved confounder has a positive serial correlation and may also be influenced by the lagged explanatory variable. With this model, we discuss four scenarios of causal relationships. In Scenario 1, the lagged explanatory variable has no direct causal effect on the explained variable, nor on the unobserved confounder. Therefore, Scenario 1 only violates the independence assumption. In Scenario 2, the lagged explanatory variable has direct causal effect on the explained variable; in Scenario 3, the lagged explanatory variable has direct causal effect on the unobserved confounder; and in Scenario 4, the lagged explanatory variable has direct causal effect on both. Therefore, Scenarios 2, 3 and 4 violate both the independence assumption and the exclusion restriction. In line with our theoretical framework, our numerical analysis and simulation results show that in Scenario 1, (1) both the OLS estimate and the lagged IV estimate are biased, and the bias of the lagged IV estimate is smaller than that of the OLS estimate. (2) Both the OLS estimate and the lagged IV estimate are consistent. (3) The larger extent to which 34 the independence assumption is violated, the higher bias the lagged IV estimate suffers from. (4) The root mean squared errors (RMSEs) show similar patterns as the biases. (5) The likelihood that the lagged IV estimate suffers from the type-I error is very high, and close to 1. In a word, when only violating the independence assumption, the lagged IV method is acceptable as its estimate is consistent, and has less bias than the OLS, yet is still problematic in the likelihood of type-I error that its estimate suffers from. In Scenarios 2, 3 and 4, our numerical analysis and simulation results show that (1) both the OLS estimate and the lagged IV estimate are biased, and the bias of the lagged IV estimate is smaller than that of the OLS estimate. (2) Both the OLS estimate and the lagged IV estimate are inconsistent. In Scenarios 2 and 4, the lagged IV estimate has a larger extent of inconsistency than the OLS estimate; in Scenario 3, it is ambiguous whether the lagged IV estimate has a larger extent of inconsistency than the OLS estimate or not. (3) The larger extent to which the exclusion restriction is violated, the higher the bias the lagged IV estimate suffers from. (4) The root mean squared errors (RMSEs) show similar patterns as the biases. (5) The likelihood that the lagged IV estimate suffers from the type- I error is very high, and close to 1. In a word, when violating both the independence assumption and the exclusion restriction, the lagged IV method is unacceptable as its estimate is inconsistent, could even aggravate the endogeneity by enlarging the bias in the estimate, and is problematic in the likelihood of type-I error that its estimate suffers from. It is argued in Blundell and Bond (1998, 2000) that since the lagged explanatory variable is weakly correlated with the endogenous explanatory variable’s first difference, the GMM method using lagged explanatory variables may not solve endogeneity. Instead of discussing the GMM method only, our analysis provides more general types of endogeneity, focusing on using the one-order lagged explanatory variable as the single IV in estimation, as commonly done in empirical studies. Rossi (2014) also argues against using the lagged explanatory variable as the IV. Our analysis results, based on mathematical proof and simulation, are consistent with the previous literature. For applied researchers in social sciences, our findings provide implications that the lagged IV method cannot mitigate, and may even aggravate, endogeneity. Our analysis also contributes to the credible estimates of causal inference with the LATE Theorem in instrumental estimation (Angrist et al., 1996; Imbens, 2014). 35 To see how common the practice of lagged IV method is, we examine all articles published in the top general academic journals in economics and political science. We identify the articles using lagged IV method by searching the text of each paper for key words such as “lag”, “lagged” or “lagging”, then determining whether in any of those papers the endogenous variables are using their lagged items as their instrumental variables in estimations. In these papers, lagged explanatory variables are used as instrumental variables either as a main method to address the endogeneity, or as a robustness check for the baseline estimation result. Table 1 shows the number of papers using the lagged IV method, published in economic journals including American Economic Review, Econometrica, Journal of Political Economy, Quarterly Journal of Economics, Review of Economic Studies and Review of Economics and Statistics, and in political science journals including American Political Science Review, American Journal of Political Science, British Journal of Political Science, Comparative Political Studies and Journal of Politics, between 2013 and 2018. In total, we find 31 papers in 2013-2018 using the lagged IV method, of which 19 are in economics and 12 are in political sciences. Narrowing to 2015-2018, 15 papers use the lagged IV method, of which 9 are in economics and 6 are in political science. These papers use first-order lagged, or first-order lagged with multi-order lagged explanatory variable as instrumental variables to alleviate the endogenous concerns, which, in most papers, are attributed from unobserved confounders. Most papers mention that the data availability of lagged explanatory variable is one of the key reasons why it is used as the IV; however, they didn’t discuss the difference between the bias of lagged IV method and that of OLS in detail. More importantly, seldom do they discuss whether the lagged explanatory variable has an explicit direct causal effect on the explained variable in identification, namely whether the lagged IV violates the exclusion restriction, which makes the estimation bias of the lagged IV method more questionable. This literature review shows that the lagged IV method is commonly used in economic and political science research, and that authors of those papers believe that although the lagged IV method may only mitigate the endogeneity, the lagged endogenous variable is a somewhat valid instrumental variable, because it is at least exogenous to some extent, and satisfies the relevance restriction. Our analysis is to see, by comparing with OLS, 36 whether lagged IV method lowers estimation bias, under specific parameter values, or exaggerates the bias instead. The rest of this paper is organized as follows. Section 3.2 discusses the theoretical framework. Section 3.3 derives the numerical analysis in light of the theoretical framework. Section 3.4 presents simulation results which support our numerical analysis results. Section 3.5 summarizes. 3.2. Theoretical Framework This section compares the endogeneity in lagged IV and that in OLS qualitatively, by deriving the local average treatment effects (LATE) in lagged IV estimation and the average treatment effects (ATE) in OLS estimation. In light of the LATE theorem (Angrist and Pischke, 2009), this section finds that due to the synchronous relationship between the lagged IV and the unobserved confounder, the lagged IV estimation violates the independence assumption. As a result, the lagged IV estimate suffers from endogeneity, and it is ambiguous whether the lagged IV estimation has less endogeneity than the OLS estimate. This section also finds that if the lagged IV influences the explained variable not only through the explanatory variable causally, but also through the unobserved confounders causally, the lagged IV violates the exclusion restriction in addition to the independence assumption. As a result, the lagged IV estimate suffers an explicitly greater extent of endogeneity than the OLS estimate. A. Setup Three sources of endogeneity in identification exist: unobserved confounders that influence both the explanatory variable and the explained variable, measurement errors in the explanatory variable, and reverse causality between the explanatory variable and the explained variable. In empirical studies, the first source is most common, due to the lack of randomized treatment, and usually unobserved factors influence both the explained variable and the explanatory variable (Stock and Trebbi, 2003; Angist and Krueger, 2001). Empirically there are three reasons why the lagged explanatory variable may serve as a valid IV. For the relevance restriction, autocorrelation in the explanatory variable implies that the endogenous variable is, to some extent, correlated with its lagged item. For the 37 exclusion restriction, suppose theoretically no causal relationship exists between the lagged explanatory variable and the explained variable, it is held that the lagged explanatory variable is highly likely to be an exogenous IV. For data availability, the lagged IV method requires no other data and is convenient in panel data sets, especially with the increasing availability of long panel data sets. However, in the case of unobserved confounders, if autocorrelation exists both in the explanatory variable and in the unobserved confounders, the lagged explanatory variable could be correlated, through the lagged unobserved confounders, with the unobserved confounders in the current period, leading to biased estimates. To explain this, suppose the structural model is that 𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.1) where 𝑌𝑌𝑖𝑖𝑖𝑖,𝑋𝑋𝑖𝑖𝑖𝑖,𝑋𝑋𝑖𝑖,𝑖𝑖−1,𝑈𝑈𝑖𝑖𝑖𝑖 represent the explained variable, the explanatory variable, the lagged explanatory variable and the unobserved confounder, respectively, and 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖𝑖𝑖,𝑈𝑈𝑖𝑖𝑖𝑖) ≠ 0, so that there is indeed an identification problem. If 𝜉𝜉 ≠ 0, the lagged explanatory variable has a direct impact on the explained variable; otherwise such a lagged impact does not exist. The autocorrelation function of the explanatory variable is 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.2) The autocorrelation function of the unobserved confounder is 𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜓𝜓𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.3) If 𝜓𝜓 ≠ 0 , the lagged explanatory variable has a direct impact on the unobserved confounder; otherwise such an impact does not exist. Therefore, we have four scenarios of endogeneity in the dynamic causal relationship framework: Scenario 1: 𝜉𝜉 = 0, and 𝜓𝜓 = 0. In this scenario, the lagged explanatory variable has no explicit impact on the explained variable, nor does it have any explicit impact on the unobserved confounder. Scenario 2: 𝜉𝜉 ≠ 0, while 𝜓𝜓 = 0. In this scenario, the lagged explanatory variable has a direct impact on the explained variable, but has no explicit impact on the unobserved confounder. Scenario 3: 𝜉𝜉 = 0, while 𝜓𝜓 ≠ 0. In this scenario, the lagged explanatory variable has 38 no explicit impact on the explained variable, but has a direct impact on the unobserved confounder. Scenario 4: 𝜉𝜉 ≠ 0, and 𝜓𝜓 ≠ 0. In this scenario, the lagged explanatory variable has a direct impact on the explained variable, and also has a direct impact on the unobserved confounder. In light of the LATE theorem by Angrist and Pischke (2009), we discuss the local average treatment effect of lagged IV method. For simplicity and without losing generality, we assume a binary-valued explanatory variable with a value of 1 or 0. Denote 𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 𝑒𝑒�) as individual 𝑖𝑖’s latent outcome when its treatment is 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝑒𝑒 and its lagged treatment, the lagged IV, is 𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 𝑒𝑒� . To specify the heterogeneous causal effect of lagged IV, we denote 𝑋𝑋1𝑖𝑖𝑖𝑖 as individual 𝑖𝑖’s latent treatment state when 𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1, and 𝑋𝑋0𝑖𝑖𝑖𝑖 as individual 𝑖𝑖 ’s latent treatment state when 𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0 . Therefore, the observed treatment state is expressed latently as 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝑋𝑋0𝑖𝑖𝑖𝑖 + (𝑋𝑋1𝑖𝑖𝑖𝑖 − 𝑋𝑋0𝑖𝑖𝑖𝑖)𝑋𝑋𝑖𝑖,𝑖𝑖−1 (3.4) in which either 𝑋𝑋1𝑖𝑖𝑖𝑖 or 𝑋𝑋0𝑖𝑖𝑖𝑖 can be observed, and (𝑋𝑋1𝑖𝑖𝑖𝑖 − 𝑋𝑋0𝑖𝑖𝑖𝑖) represents the heterogeneous causal effect of 𝑋𝑋𝑖𝑖,𝑖𝑖−1 . With these notations, we state the independence assumption and the exclusion restriction of the lagged IV as follows: 1. The independence assumption implies that the instrumental variable should have no association with latent outcome, nor should it have any association with latent treatment state. Specifically, we have [{𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 𝑒𝑒�);∀𝑒𝑒, 𝑒𝑒�},𝑋𝑋1𝑖𝑖𝑖𝑖,𝑋𝑋0𝑖𝑖𝑖𝑖 ] ⫫ 𝑋𝑋𝑖𝑖,𝑖𝑖−1 (3.5) This implies that the lagged IV should have similar effects as a random assignment does. In other words, the lagged IV should be uncorrelated with the explained variable or with the latent treatment state by the explanatory variable. Scenario 1 violates the independence assumption, because the lagged IV is synchronously correlated with the unobserved confounder. Specifically, because 𝑈𝑈𝑖𝑖,𝑖𝑖−1 causally influences 𝑈𝑈𝑖𝑖𝑖𝑖 by marginal effect 𝜙𝜙, and causally influences 𝑋𝑋𝑖𝑖,𝑖𝑖−1 by marginal effect 𝜅𝜅 , 𝑋𝑋𝑖𝑖,𝑖𝑖−1 and 𝑈𝑈𝑖𝑖𝑖𝑖 have a simultaneous relationship. In other words, as 𝑋𝑋𝑖𝑖,𝑖𝑖−1 changes, 𝑈𝑈𝑖𝑖𝑖𝑖 changes not causally but synchronously, and further causes 𝑌𝑌𝑖𝑖𝑖𝑖’s change. In other words, as 𝑋𝑋𝑖𝑖,𝑖𝑖−1 changes by 1 unit, 𝑈𝑈𝑖𝑖𝑖𝑖 changes synchronously by 𝜙𝜙𝜅𝜅 unit. As a result, 39 𝑋𝑋𝑖𝑖,𝑖𝑖−1 violates the independence assumption because it does not serve as a random exogenous shock. Only by assuming that no dynamics exist among the unobserved confounders can the independence assumption be satisfied, yet unfortunately, it is almost impossible. This implies that it is almost unavoidable for the lagged IV to be problematic. 2. The exclusion restriction implies that 𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 𝑒𝑒�) is only the function of 𝑒𝑒, in other words, the lagged IV influences the explained variable only through the explanatory variable. This is denoted as 𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 0) = 𝑌𝑌𝑖𝑖𝑖𝑖(𝑒𝑒, 1), 𝑒𝑒 = 0, 1 (3.6) In Scenario 2, since 𝜉𝜉 ≠ 0, 𝑋𝑋𝑖𝑖,𝑖𝑖−1 has a direct causal influence on 𝑌𝑌𝑖𝑖𝑖𝑖 by marginal effect 𝜉𝜉. In Scenario 3, although 𝜉𝜉 = 0, since 𝜓𝜓 ≠ 0, 𝑋𝑋𝑖𝑖,𝑖𝑖−1 has a direct causal influence on 𝑌𝑌𝑖𝑖𝑖𝑖 by marginal effect 𝛿𝛿𝜓𝜓, derived from (2.1) and (2.3). As a result, both Scenario 2 and 3 violate not only the independence assumption, but also the exclusion restriction. The same is true for Scenario 4, which is a combination of Scenario 2 and 3. As in Scenario 3, 𝑋𝑋𝑖𝑖,𝑖𝑖−1 has a direct impact on 𝑈𝑈𝑖𝑖𝑖𝑖 , which could include more than one unobserved covariate, 𝑋𝑋𝑖𝑖,𝑖𝑖−1 could have more than one causal path to influence the 𝑌𝑌𝑖𝑖𝑖𝑖. Accordingly, it is difficult to argue against the possible existence of Scenario 3, which results in the violation of the exclusion restriction being almost inevitable. B. The LATE in Lagged IV and The ATE in OLS To compare the endogeneity in lagged IV and that in OLS, we first discuss the average treatment effect (ATE) in OLS, such that 𝔼𝔼[𝑌𝑌𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0]= 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] + 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0]= 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖] + 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0] (3.7) where 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖] is the average treatment effect on the treated (ATT), the causal effect that we are interested in. 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋𝑖𝑖𝑖𝑖 = 0] is the selection bias, the source of the endogeneity that the OLS estimate suffers from. In lagged IV estimation, the local average treatment effect (LATE), in light of Angrist and Pischke (2009), is 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼[𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖|𝑋𝑋1𝑖𝑖𝑖𝑖 > 𝑋𝑋0𝑖𝑖𝑖𝑖] , when both the exclusion restriction and the independence assumption are satisfied. Here 40 the LATE is the causal effect that we are interested in. 1. The LATE in Scenario 1: In Scenario 1, where only the independence assumption is violated, we have 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� (3.8) Because the exclusion restriction is satisfied, we have 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� = 𝑌𝑌0𝑖𝑖𝑖𝑖 , 𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� = 𝑌𝑌1𝑖𝑖𝑖𝑖. Therefore, 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖 + (𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� + 𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� (3.9) Similarly, we have 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖 + (𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� + 𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.10) As the exclusion restriction is satisfied, we have 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼�𝑌𝑌0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.11) Therefore, the LATE becomes 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�(𝑌𝑌1𝑖𝑖𝑖𝑖 − 𝑌𝑌0𝑖𝑖𝑖𝑖)𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� 𝔼𝔼�𝑋𝑋1𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋0𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.12) which is the “restrict local ATT”. As a result, in Scenario 1, the inconsistent lagged IV estimates are due to the “restrict local ATT” in the lagged IV. Compared with the ATE in the OLS estimation, it is easy to see that the LATE in Scenario 1 of the lagged IV estimation does not include a selection bias, implying that the extent of endogeneity of Scenario 1 of the lagged IV estimation is smaller than the extent of endogeneity of the OLS estimation. What’s more, it is also easy to see that the LATE in Scenario 1 of the lagged IV estimation still yields some extent of endogeneity. This is because of (1) the lagged IV’s dependency on the latent treatment, scaled by 𝜌𝜌 , the marginal causal effect of the lagged IV on the treatment variable, and of (2) the lagged IV’s dependency on the latent outcome, scaled by 𝜙𝜙 𝜅𝜅 , the synchronous relationship between 41 the lagged IV and the unobserved confounder. Because the unobserved confounder’s marginal causal effect on the outcome variable is 𝛿𝛿, we could initially predict that the key parameters for the extent of endogeneity of Scenario 1 of the lagged IV estimation are 𝜌𝜌, 𝜙𝜙, 𝜅𝜅 and 𝛿𝛿. 2. The LATE in Scenario 2, 3 and 4: In Scenario 2, 3 and 4, both the exclusion restriction and the independence assumption are violated, to derive 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0�, the LATE, we have 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� = 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� + 𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� (3.13) and similarly, 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼 �𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1� + �𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� = 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� + 𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� (3.14) Therefore, the LATE becomes the sum of the “relaxed local ATT” in the lagged IV, that is, 𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1�� 𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� − 𝔼𝔼 ��𝑌𝑌𝑖𝑖𝑖𝑖�1,𝑋𝑋𝑖𝑖,𝑖𝑖−1� − 𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1� − 𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� and the “local selection bias” in the lagged IV, that is, 𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖𝑖𝑖�0,𝑋𝑋𝑖𝑖,𝑖𝑖−1��𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖𝑖𝑖�𝑋𝑋𝑖𝑖,𝑖𝑖−1 = 0� . As a result, in Scenarios 2, 3 and 4, the inconsistent lagged IV estimates are due to the “local selection bias” and the “relaxed local ATT” in the lagged IV. Compared with the ATE in the OLS estimation, it is easy to see that the LATEs in Scenarios 2, 3 and 4 of the lagged IV estimation include a “local selection bias”, which could be greater than the selection bias in the OLS estimation. What’s more, it is also easy to see that the LATEs in Scenarios 2, 3 and 4 of the lagged IV estimation also include the “relaxed ATSs”, which are different from the “restricted ATT” in Scenario 1. These imply that the extent of inconsistency of the estimates in Scenarios 2, 3 and 4 are greater than that in Scenario 1, and could be greater than that in OLS. To sum up, the OLS estimate suffers from endogeneity, because it has selection bias in 42 its ATE. When the lagged IV estimate only violates the independence assumption, it suffers from endogeneity, because the “restrict local ATT” in its LATE is different from the ATT in the OLS estimate’s ATE. When the lagged IV estimate violates both the exclusion restriction and the independence assumption, it suffers from endogeneity, because on one hand, it has the “local selection bias” in its LATE; on the other, the “relaxed local ATT” in its LATE is different from the ATT in the OLS estimate’s ATE. 3.3. Numerical Analysis The theoretical framework demonstrates why using the lagged explanatory variable as the IV in the instrumental estimation is unlikely to mitigate the endogeneity problem. In this section, we characterize the LATE of lagged IV estimates quantitatively, and compare it with the ATE of OLS estimates. The numerical analysis results are consistent with what we find in the theoretical framework. For simplicity, we set up a bivariate regression scenario, and discuss the 𝐴𝐴𝑅𝑅(1) process in the data generation process both for the endogenous explanatory variable and for the unobserved confounder. A. LATE and ATE Scenario 1: We first quantitatively discuss the LATE in Scenario 1, which only violates the independence assumption but not the exclusion restriction. Following Bellemare et. al (2017), we consider the following model 𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.15) 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.16) 𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.17) where 𝑖𝑖 and 𝑋𝑋 are units and time index, respectively, and 𝑖𝑖 = 1, 2, … 𝐼𝐼, 𝑋𝑋 = 1, 2, … ,𝑁𝑁. For simplicity, we drop 𝑖𝑖 for the reminder of this session. 𝑌𝑌𝑖𝑖 is the main explained variable, and 𝑋𝑋𝑖𝑖 represents the explanatory variable. Since 𝑈𝑈𝑖𝑖 , the unobserved confounder is omitted from the OLS estimation, it suffers from endogeneity. The 𝐴𝐴𝑅𝑅(1) process implies that 𝑋𝑋𝑖𝑖 is determined both by its lagged value and by the unobserved confounder, and that 𝑈𝑈𝑖𝑖 is determined by its one-order lagged value. For coefficients we assume that 𝜌𝜌,𝜙𝜙 ∈(0,1); for random errors we assume that 𝜂𝜂𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜂𝜂2), 𝜖𝜖𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝜖𝜖2), and 𝐶𝐶𝑖𝑖~𝑁𝑁(0,𝜎𝜎𝑣𝑣2). It is well known that without an unobserved confounder, OLS yields consistent 43 estimates. However, given the fact that the unobserved confounder exists, and is omitted in the regression, OLS yields inconsistent estimates, such that ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑌𝑌𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.18) Therefore, (3.18) implies that in Scenario 1, the OLS estimate is biased, in which 𝛿𝛿𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑈𝑈𝑡𝑡) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) , the bias, is in line with the selection bias in the ATE. To discuss the consistency of the OLS estimate, we need to use equation (A.3) in the Online Appendix, and then we could derive the following expression that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)1−𝜙𝜙𝜙𝜙 (3.19) Therefore, plugging equation (3.19) into (3.18), we have an expression that ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)(1−𝜙𝜙𝜙𝜙)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) = 𝛽𝛽 + 𝛿𝛿𝜅𝜅∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1(1−𝜙𝜙𝜙𝜙)∑ 𝑋𝑋𝑡𝑡2𝑇𝑇𝑡𝑡=1 (3.20) Using the Slutsky theorem, (3.20) becomes 𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] (3.21) where 𝛽𝛽 is in line with the ATT, and 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑈𝑈𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the selection bias, in the ATE. Because 𝜙𝜙 ∈ (0,1) , (3.16) and (3.17) imply that as 𝑁𝑁 → ∞ , 𝑝𝑝lim �1 𝑖𝑖 �∑ 𝑈𝑈𝑖𝑖 2𝑖𝑖 𝑖𝑖=1 ≪ 𝑝𝑝lim �1𝑖𝑖�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 → 𝛽𝛽; in other words, the OLS estimate in Scenario 1 is consistent. Now consider an IV estimation using 𝑋𝑋𝑖𝑖−1 as the instrumental variable for 𝑋𝑋𝑖𝑖. The IV estimates expression implies that ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑌𝑌𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.22) Plugging equation (3.15) into (3.22), we have 44 ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.23) and then ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) = 𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) 𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) (3.24) Therefore, (3.24) implies that in Scenario 1, the lagged IV estimate is biased, in which 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1) 𝜙𝜙+𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1) , the bias, is in line with the “restrict local ATT” in the LATE, in which 𝜅𝜅 is the key parameter determining to what extent the lagged IV estimate is biased. This is because the extent to which the independence assumption of the lagged IV violates, is measured by 𝜅𝜅. To discuss the consistency of the lagged IV estimate in Scenario 1, using equation (A3) in the Appendix, we can also derive the following expression that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝜙𝜙𝑈𝑈𝑖𝑖−1 + 𝜈𝜈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝜙𝜙𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.25) Therefore, we have ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝛽𝛽 + 𝛿𝛿𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜙𝜙𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌 𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) (3.26) Using the Slutsky theorem, (3.26) becomes 𝑝𝑝lim ?̂?𝛽𝐼𝐼𝑉𝑉,1 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ]𝜌𝜌 𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌) �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝜅𝜅2[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] (3.27) 45 where 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ]𝜌𝜌 𝜙𝜙 (1−𝜙𝜙𝜙𝜙)�𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 �+𝜅𝜅 2[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑈𝑈𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the “restrict local ATT” in the LATE. Because 𝜙𝜙 ∈ (0,1) , (3.16) and (3.17) imply that as 𝑁𝑁 → ∞ , 𝑝𝑝lim �1 𝑖𝑖 �∑ 𝑈𝑈𝑖𝑖 2𝑖𝑖 𝑖𝑖=1 ≪ 𝑝𝑝lim �1𝑖𝑖�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝐼𝐼𝑉𝑉,1 → 𝛽𝛽; in other words, the lagged IV estimate in Scenario 1 is consistent. Scenario 2: We then discuss the ATE and the LATE in Scenario 2, which not only violates the independence assumption but also the exclusion restriction. We consider the following model 𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.28) 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.29) 𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.30) For simplicity, we drop 𝑖𝑖 for the reminder of this session, and everything is similar to those in Section 3.3.A. Consider the OLS estimate in Scenario 2, such that ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑌𝑌𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜉𝜉𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.31) Therefore, (3.31) implies that in Scenario 2, the OLS estimate is biased, in which 𝛿𝛿𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑈𝑈𝑡𝑡) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) + 𝜉𝜉𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑋𝑋𝑡𝑡−1)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) , the bias, is in line with the selection bias in the ATE. To discuss the consistency of the OLS estimate, we need to use equation (A.3) in the Appendix, and then we could derive the following expression that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)1−𝜙𝜙𝜙𝜙 (3.32) and that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝜌𝜌 + 𝜙𝜙𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.33) Therefore, we have an expression that 46 ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)(1−𝜙𝜙𝜙𝜙)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) + 𝜉𝜉𝜌𝜌 + 𝜙𝜙𝜉𝜉𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑡𝑡)(1−𝜙𝜙𝜙𝜙)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) = 𝛽𝛽 + 𝛿𝛿𝜅𝜅∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1(1−𝜙𝜙𝜙𝜙)∑ 𝑋𝑋𝑡𝑡2𝑇𝑇𝑡𝑡=1 + 𝜙𝜙𝜉𝜉𝜅𝜅2 ∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1(1−𝜙𝜙𝜙𝜙)∑ 𝑋𝑋𝑡𝑡2𝑇𝑇𝑡𝑡=1 (3.34) Using the Slutsky theorem, (3.34) becomes 𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝜉𝜉𝜌𝜌 + (𝛿𝛿𝜅𝜅+𝜙𝜙𝜉𝜉𝜅𝜅2)[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] (3.35) where 𝛽𝛽 is in line with the ATT, and 𝜉𝜉𝜌𝜌 + (𝛿𝛿𝜅𝜅+𝜙𝜙𝜉𝜉𝜅𝜅2)[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the selection bias, in the ATE. Because 𝜙𝜙 ∈ (0,1), (3.38) and (3.39) imply that as 𝑁𝑁 → ∞, 𝑝𝑝lim �1 𝑖𝑖 �∑ 𝑈𝑈𝑖𝑖 2𝑖𝑖 𝑖𝑖=1 ≪ 𝑝𝑝lim �1𝑖𝑖�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 → 𝛽𝛽 + 𝜉𝜉𝜌𝜌; in other words, the OLS estimate in Scenario 2 is inconsistent. Consider an IV estimation using 𝑋𝑋𝑖𝑖−1 as the instrumental variable for 𝑋𝑋𝑖𝑖 , the IV estimate expression implies that ?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑌𝑌𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.36) Plugging equation (3.28) into (3.36), we have ?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝜉𝜉𝑋𝑋𝑖𝑖−1 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.37) and then ?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝛽𝛽 + 𝜉𝜉𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) = 𝛽𝛽 + 𝜉𝜉 1 𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) 𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) (3.38) Therefore, (3.38) implies that in Scenario 2, the lagged IV estimate is biased, in which 𝜉𝜉 1 𝜙𝜙+𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1) is in line with the “local selection bias”, and 𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1)𝜙𝜙+𝜅𝜅𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1) is in line with the “relaxed local ATT” in the LATE in Scenario 2, in which not only 𝜅𝜅, but also 𝜉𝜉, are the key parameters determining to what extent the lagged IV estimate is biased. This is because the extent to which the exclusion restriction of the lagged IV violates, is measured by 𝜉𝜉. 47 Then we discuss the consistency of the lagged IV estimate in Scenario 2. We have already known, from the Online Appendix, that 𝑝𝑝 lim 𝑖𝑖→∞ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 −𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.39) Therefore, we have ?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝛽𝛽 + 𝜉𝜉(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝛿𝛿𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜙𝜙𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) = 𝛽𝛽 + 𝜉𝜉 �1𝜙𝜙 − 𝜌𝜌� 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)𝜌𝜌 𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) (3.40) Using the Slutsky theorem, we have 𝑝𝑝lim?̂?𝛽𝐼𝐼𝑉𝑉,2 = 𝛽𝛽 + 𝜉𝜉 �1𝜙𝜙 − 𝜌𝜌� �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝛿𝛿𝜅𝜅[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] 𝜌𝜌 � 1 𝜙𝜙 − 𝜌𝜌� �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝜅𝜅2𝜙𝜙 [𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] (3.41) where 𝜉𝜉� 1 𝜙𝜙 −𝜙𝜙��𝐴𝐴lim� 1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 � 𝜙𝜙� 1 𝜙𝜙 −𝜙𝜙��𝐴𝐴lim� 1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 �+ 𝜅𝜅2 𝜙𝜙 [𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑈𝑈𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the “local selection bias”, and 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ] 𝜙𝜙� 1 𝜙𝜙 −𝜙𝜙��𝐴𝐴lim� 1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 �+ 𝜅𝜅2 𝜙𝜙 [𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑈𝑈𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the “relaxed local ATT”, in the LATE in Scenario 2. Because 𝜙𝜙 ∈ (0,1) , (3.28) and (3.29) imply that as 𝑁𝑁 → ∞ , 𝑝𝑝 lim 𝑖𝑖→∞ � 1 𝑖𝑖 �∑ 𝑈𝑈𝑖𝑖 2𝑖𝑖 𝑖𝑖=1 ≪ 𝑝𝑝 lim 𝑖𝑖→∞ � 1 𝑖𝑖 �∑ 𝑋𝑋𝑖𝑖 2𝑖𝑖 𝑖𝑖=1 . As a result, 𝑝𝑝 lim𝑖𝑖→∞ ?̂?𝛽𝐼𝐼𝑉𝑉,2 → 𝛽𝛽 + 𝜉𝜉𝜙𝜙 ; in other words, the lagged IV estimate in Scenario 2 is inconsistent. We could also derive that in Scenario 2, 𝑝𝑝 lim 𝑖𝑖→∞ ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 → 𝛽𝛽 + 𝜉𝜉𝜌𝜌; in other words, the OLS estimate in Scenario 2 is inconsistent. As 𝜉𝜉 𝜙𝜙 > 𝜉𝜉𝜌𝜌 , we know that the lagged IV estimate has significantly larger extent of inconsistency that the OLS estimate. Scenario 3: We then discuss the ATE and the LATE in Scenario 3, which violates both the independence assumption and the exclusion restriction. We consider the following model 𝑌𝑌𝑖𝑖𝑖𝑖 = 𝛽𝛽𝑋𝑋𝑖𝑖𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜖𝜖𝑖𝑖𝑖𝑖 (3.42) 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜌𝜌𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖𝑖𝑖 + 𝜂𝜂𝑖𝑖𝑖𝑖 (3.43) 48 𝑈𝑈𝑖𝑖𝑖𝑖 = 𝜙𝜙𝑈𝑈𝑖𝑖,𝑖𝑖−1 + 𝜓𝜓𝑋𝑋𝑖𝑖,𝑖𝑖−1 + 𝜈𝜈𝑖𝑖𝑖𝑖 (3.44) For simplicity, we drop 𝑖𝑖 for the reminder of this session, and everything is similar to those in Section 3.3.A. Consider the OLS estimate in Scenario 3, such that ?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑌𝑌𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) (3.45) Therefore, (3.45) implies that in Scenario 3, the OLS estimate is biased, in which 𝛿𝛿𝛿𝛿𝛿𝛿𝑣𝑣(𝑋𝑋𝑡𝑡,𝑈𝑈𝑡𝑡) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡) , the bias, is in line with the selection bias in the ATE. To discuss the consistency of the OLS estimate, we need to use equation (A.11) in the Online Appendix. Therefore, we know that using the Slutsky theorem, we have 𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝜓𝜓𝜙𝜙(1−𝜙𝜙𝜙𝜙) + 𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] (3.46) where 𝛽𝛽 is in line with the ATT, and 𝜓𝜓𝜙𝜙(1−𝜙𝜙𝜙𝜙) + 𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the selection bias, in the ATE. Therefore, in Scenario 3, the OLS estimate is inconsistent. Consider an IV estimation using 𝑋𝑋𝑖𝑖−1 as the instrumental variable for 𝑋𝑋𝑖𝑖, the lagged IV estimate expression implies that ?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑌𝑌𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.47) Plugging equation (3.28) into (3.47), we have ?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝛽𝛽𝑋𝑋𝑖𝑖 + 𝛿𝛿𝑈𝑈𝑖𝑖 + 𝜖𝜖𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) (3.48) and then ?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝛽𝛽 + 𝛿𝛿𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖) = 𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) 𝜌𝜌 + 𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) (3.49) 49 Therefore, (3.49) implies that in Scenario 3, the lagged IV estimate is biased, in which 𝛽𝛽 + 𝛿𝛿 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1) 𝜙𝜙+𝜅𝜅 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑡𝑡−1,𝑈𝑈𝑡𝑡) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑡𝑡−1) is in line with the “local selection bias” and the “relaxed local ATT”, in the LATE in Scenario 3, in which not only 𝜅𝜅 , but also 𝜓𝜓 , are the key parameters determining to what extent the lagged IV estimate is biased. This is because the extent to which the exclusion restriction of the lagged IV violates, is measured by 𝜓𝜓. Then we discuss the consistency of the lagged IV estimate in Scenario 3. We have already known, from the Online Appendix, that 𝑝𝑝 lim 𝑖𝑖→∞ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖−1) = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜓𝜓1 − 𝜙𝜙𝜌𝜌 Therefore, we have 𝑝𝑝lim?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) + 𝛿𝛿𝜓𝜓𝜙𝜙 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖)[𝜌𝜌𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌) + 𝜓𝜓𝜅𝜅]𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑖𝑖) + 𝜅𝜅2𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) (3.50) Using the Slutsky theorem, we have 𝑝𝑝lim?̂?𝛽𝐼𝐼𝑉𝑉,3 = 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] + 𝛿𝛿𝜓𝜓𝜙𝜙 �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 �[𝜌𝜌𝜙𝜙 (1 − 𝜙𝜙𝜌𝜌) + 𝜓𝜓𝜅𝜅] �𝑝𝑝lim �1𝑁𝑁�∑ 𝑋𝑋𝑖𝑖2𝑖𝑖𝑖𝑖=1 � + 𝜅𝜅2[𝑝𝑝lim �1𝑁𝑁�∑ 𝑈𝑈𝑖𝑖2𝑖𝑖𝑖𝑖=1 ] (3.51) where 𝛿𝛿𝛿𝛿 𝜙𝜙 �𝐴𝐴lim� 1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 �[𝜌𝜌 𝜙𝜙 (1−𝜙𝜙𝜙𝜙)+𝜓𝜓𝜅𝜅]�𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 �+𝜅𝜅 2[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑈𝑈𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the “local selection bias”, 𝛽𝛽 + 𝛿𝛿𝜅𝜅[𝐴𝐴lim�1𝑇𝑇�∑ 𝑈𝑈𝑡𝑡2𝑇𝑇𝑡𝑡=1 ][𝜌𝜌 𝜙𝜙 (1−𝜙𝜙𝜙𝜙)+𝜓𝜓𝜅𝜅]�𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 �+𝜅𝜅 2[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑈𝑈𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ] is in line with the “relaxed local ATT”, in the LATE in Scenario 3. It is easy to see that the “relaxed local ATT” in Scenario 3 is smaller than the “restrict local ATT” in Scenario 1; however, due to the “local selection bias” in Scenario 3, ?̂?𝛽𝐼𝐼𝑉𝑉,3 in Scenario 3 has greater extent of inconsistency than ?̂?𝛽𝐼𝐼𝑉𝑉,1 in Scenario 1. When comparing with OLS, we know that in Scenario 3, 𝑝𝑝lim?̂?𝛽𝑇𝑇𝑉𝑉𝑆𝑆 = 𝛽𝛽 + 𝜓𝜓𝜙𝜙(1−𝜙𝜙𝜙𝜙) + 𝜅𝜅[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑈𝑈𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ](1−𝜙𝜙𝜙𝜙)[𝐴𝐴lim�1 𝑇𝑇 �∑ 𝑋𝑋𝑡𝑡 2𝑇𝑇 𝑡𝑡=1 ]. Therefore, in Scenario 3, it is ambiguous whether the lagged IV estimate has a larger extent of inconsistency than the OLS estimate. Scenario 4: Combining the discussion of Scenarios 2 and 3, we know that the lagged 50 IV estimate in Scenario 4 could have a greater extent of inconsistency than the OLS estimate. B. Implications Our implications for empirical research are: (1) If both 𝜉𝜉 = 0 and 𝜓𝜓 = 0 , the lagged IV satisfies the exclusion restriction, but violates the independence assumption. In this scenario, the lagged IV is safe, because its estimate is consistent. (2) If 𝜉𝜉 ≠ 0 but 𝜓𝜓 = 0, the lagged IV violates both the exclusion restriction and the independence assumption. In this scenario, the lagged IV is unambiguously unsafe, because its estimate is inconsistent and has a larger extent of inconsistency than the OLS estimate. (3) If 𝜉𝜉 = 0 but 𝜓𝜓 ≠ 0, the lagged IV violates both the exclusion restriction and the independence assumption. In this scenario, the lagged IV is unambiguously unsafe, because its estimate is inconsistent. In addition, it is ambiguous whether the lagged IV estimate has a larger extent of inconsistency than the OLS estimate. (4) If 𝜉𝜉 ≠ 0 and 𝜓𝜓 ≠ 0, the lagged IV violates both the exclusion restriction and the independence assumption. In this scenario, the lagged IV is unambiguously unsafe, because its estimate is inconsistent and has a larger extent of inconsistency than the OLS estimate. 3.4. Simulation Analysis So far, we have, with mathematical arguments, shown that using the lagged explanatory variable as the instrumental variable in estimations can either alleviate or aggravate endogeneity issues. By characterizing the source and magnitude of bias in the lagged IV and that in OLS analytically, in a simple 𝐴𝐴𝑅𝑅(1) process setup, we have found that whether the bias in OLS is larger than that in the lagged IV method is determined by whether, and how, the independence assumption and (or) the exclusion restriction is violated. In this section, we use Monte Carlo methods to create a simulation of the theoretical setups of our four scenarios discussed in the conceptual framework and the numerical analysis, to quantitatively discuss the bias of both the lagged IV estimates and the OLS 51 estimates, together with the root mean squared errors (RMSE) and the likelihood of type- I errors of the lagged IV and OLS estimations. A. Setup We start with Scenario 1, which only violates the independence assumption but not the exclusion restriction. Figure 1 parameterizes the relations between the explained variable, the explanatory variable and the unobserved confounders in Scenario 1. As is shown, the unobserved confounder, regarded as a general representation of endogeneity source, is correlated both with 𝑌𝑌𝑖𝑖 and with 𝑋𝑋𝑖𝑖. 𝛿𝛿, the direct marginal effect of 𝑈𝑈𝑖𝑖 on 𝑌𝑌𝑖𝑖, is normalized as 1. 𝛽𝛽, the direct marginal effect of 𝑋𝑋𝑖𝑖 on 𝑌𝑌𝑖𝑖, is assigned with 0 and 2. The first key parameter in our simulation is 𝜅𝜅, the marginal effect of 𝑈𝑈𝑖𝑖 on 𝑋𝑋𝑖𝑖 in our setup, which measures the magnitude of the endogeneity at the violation of the independence assumption. The value of 𝜅𝜅 is assigned with 0.5 and 2, to represent the attenuated and the amplified marginal effect of 𝑈𝑈 on 𝑋𝑋, respectively. The second and the third key parameters are the autocorrelation parameters 𝜌𝜌 and 𝜙𝜙. They are set with 0.5 and with {0, 0.1, 0.2, …, 0.9}, alternatively, to represent the relevance of 𝑋𝑋𝑖𝑖, the endogenous variable, and 𝑋𝑋𝑖𝑖−1, the lagged IV, relative to the relevance of the current and the lagged unobserved confounder. In each simulation, we generate a panel with 𝑁𝑁 = 50 periods and 𝑁𝑁 = 100 cross-section units, for a total of 5,000 observations. Our simulation follows the same data generating process (DGPs) as in Section 3.3. Each set of parameter values, shown in Table 2, are simulated 100 times. Then three estimators of 𝛽𝛽 are illustrated: (1) the “naïve” estimator (?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸), or the OLS estimator, that regresses 𝑌𝑌𝑖𝑖 on 𝑋𝑋𝑖𝑖 and ignores the unobserved confounder, (2) the “lagged IV” estimator (?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉) that regresses 𝑌𝑌𝑖𝑖 on 𝑋𝑋𝑖𝑖 and uses 𝑋𝑋𝑖𝑖−1 as the IV for 𝑋𝑋𝑖𝑖 , and (3) the “correct” estimator (?̂?𝛽𝛿𝛿𝑇𝑇𝑉𝑉𝑉𝑉𝐸𝐸𝛿𝛿𝑖𝑖) that regresses 𝑌𝑌𝑖𝑖 on 𝑋𝑋𝑖𝑖 and also the unobserved confounder. Here the “correct” estimator is the counterfactual, and since researchers cannot observe the confounders in their applied studies, our DGPs provides the tests of the performance of both the OLS estimates and the lagged IV estimates, by comparing each of their biases with the “correct” estimator, of which the bias is zero. To make our analysis simple and straightforward, we just use the one-period autocorrelation. Three criteria are used to evaluate the performance of the lagged IV estimates: (1) bias, 52 (2) root mean squared error (RMSE), and (3) likelihood of type-I error, which tells researchers the extent to which they could make false inference on the estimates, rejecting the true null hypotheses that 𝛽𝛽 = 0. We then discuss Scenario 2, which violates not only the independence assumption, but also the exclusion restriction directly. Figure 4 parameterizes the relations between the explained variable, the explanatory variable, and the unobserved confounders in Scenario 2. In this scenario, the first key parameter in our simulation is 𝜉𝜉, the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑌𝑌𝑖𝑖 in our setup, which measures the magnitude of the endogeneity at the violation of the exclusion restriction. The value of 𝜉𝜉 is assigned with 0.5 and 2, to represent the attenuated and the amplified marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑌𝑌𝑖𝑖, respectively. After those, we discuss Scenario 3, which violates not only the independence assumption, but also the exclusion restriction indirectly. Figure 7 parameterizes the relations between the explained variable, the explanatory variable and the unobserved confounders in Scenario 3. In this scenario, the first key parameter in our simulation is 𝜓𝜓, the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑈𝑈𝑖𝑖 in our setup, which measures the magnitude of the endogeneity at the violation of the exclusion restriction. The value of 𝜓𝜓 is assigned with 0.5 and 2, to represent the attenuated and the amplified marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑈𝑈𝑖𝑖, respectively. B. Monte Carlo Simulation Results Figure 2 summarizes the simulation results when 𝜅𝜅=0.5 and 2, 𝜌𝜌 = 0.5, and 𝜙𝜙 ranges from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and the bias of the lagged IV estimate is smaller than that of the OLS estimate. This is consistent with our theoretical prediction that as the lagged IV only violates the independence assumption in Scenario 1, it is less problematic than the OLS estimate. (2) As 𝜙𝜙 increases, the bias of the lagged IV estimate also increases; as 𝜅𝜅 increases, the bias of the lagged IV estimate decreases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the independence assumption is quantified with 𝜙𝜙 𝜅𝜅 , the synchronous change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; as 𝜙𝜙 𝜅𝜅 increases, the independence assumption is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. 53 (3) The RMSEs show similar patterns as the biases. Admittedly, it is argued that applied researchers may not be interested in whether the degree of their estimates are biased; instead, it really matters whether the p-values in their t-test result in a false rejection of the null hypothesis that 𝛽𝛽 = 0 , at some level of significance. Therefore, in the simulation, we also see what would happen, provided that the null hypothesis is true (𝛽𝛽 = 0), if applied researchers use the lagged IV method to test the alternative hypothesis that 𝛽𝛽 ≠ 0. Here we use the 95% confidence levels. Our simulation results imply that when 𝜅𝜅 > 0 and as 𝜙𝜙 ranges from 0 to 1, the likelihood of type-I errors rises dramatically. The reason is that lagged IV identification will lead to nonzero estimates of 𝛽𝛽 even if 𝛽𝛽 = 0, because 𝛿𝛿, the marginal effect of the unobserved confounder on the explained variable, and 𝜅𝜅 , the marginal effect of the unobserved confounder on the explanatory variable, are both nonzero. In addition, similar to the magnitude of estimation bias, the likelihood of rejecting the true null hypothesis rises dramatically and becomes close to 1, as 𝜙𝜙 goes up. Accordingly, these results and interpretation suggest that using the lagged IV method in estimation in response to endogeneity from unobserved confounders can hardly help mitigate the type-I errors, because applied researchers may tend to reject the null hypotheses that are true, and finally find that the numbers of the estimated associations are spurious while in fact they do exist. To step further, Figure 3 represents the simulation results when 𝜙𝜙 = 0.5, 𝜌𝜌 ranges from 0 to 1, and 𝜅𝜅=0.5 and 2. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and the bias of the lagged IV estimate is smaller than that of the OLS estimate. This is consistent with our theoretical prediction that as the lagged IV only violates the independence assumption in Scenario 1, it is less problematic than the OLS estimate. (2) As 𝜌𝜌 increases, the bias of the lagged IV estimate decreases. This shows that as the relevance of the lagged IV and the endogenous variable goes up, the validity of the lagged IV also goes up. (3) As 𝜅𝜅 increases, the bias of the lagged IV estimate decreases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the independence assumption is quantified with 𝜙𝜙 𝜅𝜅 , the synchronous change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; as 𝜙𝜙 𝜅𝜅 increases, the independence assumption is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. (4) The RMSEs show similar patterns as 54 the biases. (5) The likelihood of the type-I error is very high. In sum, our simulation results convey unambiguous message that if lagged explanatory variable does not have a direct causal effect on the explained variable, or on unobserved confounder, using the lagged explanatory variable as the IV in instrumental estimation would mitigate the estimation bias and RMSE. What’s worse, type-I errors can hardly be mitigated by the lagged IV method in applied research. These results imply that even if the exclusion restriction is satisfied, the lagged IV method is still problematic. We also discuss the case in which the lagged explanatory variable has a direct causal effect on the explained variable, the case in which the lagged explanatory variable has a direct causal effect on the unobserved confounder, and the case in which the lagged explanatory variable has direct causal effects both on the explained variable and on the unobserved confounder. These cases coincide with Scenarios 2, 3 and 4 discussed in our conceptual framework. These three cases yield much different results regarding estimation bias and RMSE, in that both bias and RMSE in lagged IV estimation are significantly larger than those in OLS; besides, in these three cases the likelihood of type-I errors are close to, or even equal to one, and significantly higher than those in OLS. These results imply that when lagged IV estimation violates both the exclusion restriction and the independence assumption, it even aggravates the endogeneity. Figure 5 summarizes the simulation results when 𝜉𝜉=0.5 and 2, 𝜌𝜌 = 0.5, and 𝜙𝜙 ranges from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is consistent with our theoretical prediction that as the lagged IV violates both the independence assumption and the exclusion restriction in Scenario 2, it is much more problematic than the OLS estimate. (2) As 𝜙𝜙 increases, the bias of the lagged IV estimate also increases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the independence assumption is quantified with 𝜙𝜙 𝜅𝜅 , the synchronous change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; as 𝜙𝜙 𝜅𝜅 increases, the independence assumption is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. (3) As 𝜉𝜉 increases, the bias of the lagged IV estimate also increases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the exclusion restriction in Scenario 55 2 is quantified with 𝜉𝜉 , the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑌𝑌𝑖𝑖 ; as 𝜉𝜉 increases, the exclusion restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the type-I error is very high, and close to 1. Figure 6 summarizes the simulation results when 𝜉𝜉=0.5 and 2, 𝜙𝜙 = 0.5, and 𝜌𝜌 ranges from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is consistent with our theoretical prediction that as the lagged IV violates both the independence assumption and the exclusion restriction in Scenario 2, it is much more problematic than the OLS estimate. (2) As 𝜌𝜌 increases, the bias of the lagged IV estimate decreases. This shows that as the relevance of the lagged IV and the endogenous variable goes up, the validity of the lagged IV also goes up. (3) As 𝜉𝜉 increases, the bias of the lagged IV estimate also increases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the exclusion restriction in Scenario 2 is quantified with 𝜉𝜉 , the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑌𝑌𝑖𝑖 ; as 𝜉𝜉 increases, the exclusion restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the type-I error is very high, and close to 1. Figure 8 summarizes the simulation results when 𝜓𝜓=0.5 and 2, 𝜌𝜌 = 0.5, and 𝜙𝜙 ranges from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is consistent with our theoretical prediction that as the lagged IV violates both the independence assumption and the exclusion restriction in Scenario 3, it is much more problematic than the OLS estimate. (2) As 𝜙𝜙 increases, the bias of the lagged IV estimate also increases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the independence assumption is quantified with 𝜙𝜙 𝜅𝜅 , the synchronous change of 𝑈𝑈𝑖𝑖 by 𝑋𝑋𝑖𝑖; as 𝜙𝜙 𝜅𝜅 increases, the independence assumption is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. (3) As 𝜓𝜓 increases, the bias of the lagged IV estimate also increases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the exclusion restriction in Scenario 56 3 is quantified with 𝜓𝜓, the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑈𝑈𝑖𝑖; as 𝜓𝜓 increases, the exclusion restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the type-I error is very high, and close to 1. Figure 9 summarizes the simulation results when 𝜓𝜓=0.5 and 2, 𝜙𝜙 = 0.5, and 𝜌𝜌 ranges from 0 to 0.9. The simulation results show that (1) both ?̂?𝛽𝑁𝑁𝐴𝐴𝐼𝐼𝑉𝑉𝐸𝐸 and ?̂?𝛽𝑉𝑉𝐴𝐴𝐿𝐿𝐼𝐼𝑉𝑉 are biased, and the bias of the lagged IV estimate is much larger than that of the OLS estimate. This is consistent with our theoretical prediction that as the lagged IV violates both the independence assumption and the exclusion restriction in Scenario 3, it is much more problematic than the OLS estimate. (2) As 𝜌𝜌 increases, the bias of the lagged IV estimate decreases. This shows that as the relevance of the lagged IV and the endogenous variable goes up, the validity of the lagged IV also goes up. (3) As 𝜓𝜓 increases, the bias of the lagged IV estimate also increases. This is also consistent with our theoretical prediction that the lagged IV estimate’s violation of the exclusion restriction in Scenario 3 is quantified with 𝜓𝜓 , the marginal effect of 𝑋𝑋𝑖𝑖−1 on 𝑈𝑈𝑖𝑖 ; as 𝜓𝜓 increases, the exclusion restriction is violated to a larger extent and as a result, the lagged IV estimate suffers from higher bias. (4) The RMSEs show similar patterns as the biases. (5) The likelihood of the type-I error is very high, and close to 1. 3.5. Conclusion Given the discussion of the independence assumption and the exclusion restriction in the lagged IV estimation regarding four scenarios, it is implied that if the lagged IV satisfies the exclusion restriction by strictly assuming the non-existence of specific causal influence, the lagged IV method is acceptable and helpful, as its estimate is consistent and yields less bias than the OLS estimate. However, the violation of the independence assumption still makes the lagged IV method troubling, as it is of high likelihood that the lagged IV estimate suffers from the type-I error. If the lagged IV violates both the independence assumption and the exclusion restriction, its estimate is unambiguously inconsistent and yields much more bias than the OLS estimate. Few applied researchers have discussed the independence assumption and the exclusion restriction in detail, assuming empirically that the lagged IV method could at least yield 57 estimates with a lower bias than that of OLS. However, only by holding the non-existence of specific causal influence, as well as with limited ranges of parameter values, the bias of the lagged IV method is smaller than that of OLS, while by relaxing the strict assumption, the lagged IV method could even enlarge the bias. In addition to estimation biases, no matter whether relaxing such an assumption or not, the high likelihood of type- I error always jeopardizes the validity of the lagged IV method. What’s worse, since the causal impacts of the lagged explanatory variable on unobserved covariates can hardly be excluded, not only the independence assumption but also the exclusion restriction are inevitably violated, resulting in a larger estimation bias for the lagged IV than for the OLS most of the time. Causal inference usually requires experimental data to identify the treatment effect of explanatory variables. With observational data, natural experiments are usually indispensable to provide an exogenous shock in causal identification (Angrist and Krueger, 2001; Freeman, 2005), although they lack underlying theoretical relationships (Rosenzweig and Wolpin, 2000). Therefore, valid instrumental variables are likely to obtained from natural experiments because in this sense, they are very likely to be exogenous and satisfy both the independence assumption and the exclusion restriction. Lagged explanatory variables, on the contrary, have a simultaneous relationship with the unobserved confounder that influences the explained variable, and the lagged IV lacks the exogeneity as a natural experiment. Therefore, the lagged IV method can hardly provide additional information in causal inference. 58 Table 3.1. Reviewed Journals Published in 2013-2018, Using Lagged IV Methods Journal Name Discipline 2013-2018 2015-2018 American Economic Review Economics 5 3 Econometrica Economics 0 0 Journal of Political Economy Economics 1 0 Quarterly Journal of Economics Economics 3 2 Review of Economic Studies Economics 3 1 Review of Economics & Statistics Economics 7 2 American Political Science Review Political Science 1 0 American Journal of Political Science Political Science 1 1 British Journal of Political Science Political Science 6 4 Comparative Political Studies Political Science 3 1 Journal of Politics Political Science 1 0 59 Table 3.2. Simulation Parameters Parameters Causal Pathway Simulation Values Basic Parameters 𝛽𝛽 𝑋𝑋𝑖𝑖 → 𝑌𝑌𝑖𝑖 {0, 2} 𝛿𝛿 𝑈𝑈𝑖𝑖 → 𝑌𝑌𝑖𝑖 {1} Key Parameters 𝜙𝜙 𝑈𝑈𝑖𝑖−1 → 𝑈𝑈𝑖𝑖 {0, 0.1, 0.2,…,0.9}, {0.5} 𝜌𝜌 𝑋𝑋𝑖𝑖−1 → 𝑋𝑋𝑖𝑖 {0.5}, {0, 0.1, 0.2,…,0.9} 𝜅𝜅 𝑈𝑈𝑖𝑖 → 𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖−1 → 𝑋𝑋𝑖𝑖−1 {0.5, 2} 𝜉𝜉 𝑋𝑋𝑖𝑖−1 → 𝑌𝑌𝑖𝑖 {0.5, 2} 𝜓𝜓 𝑋𝑋𝑖𝑖−1 → 𝑈𝑈𝑖𝑖 {0.5, 2} 60 Figure 3.1. Representation of Monte Carlo Simulation Setup 61 Figure 3.2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5 62 Figure 3.3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5 63 Figure 3.4. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑖𝑖−1 Also Has Causal Effects on 𝑌𝑌𝑖𝑖 64 Figure 3.5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on Explained Variable 65 Figure 3.6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on Explained Variable 66 Figure 3.7. Representation of Monte Carlo Simulation Setup: 𝑋𝑋𝑖𝑖−1 Also Has Causal Effects on 𝑈𝑈𝑖𝑖 67 Figure 3.8. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1; Lagged Causality on Unobserved Confounder 68 Figure 3.9. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1; Lagged Causality on Unobserved Confounder 69 4. Spatially Lagged Variables as Instruments: The Spatially Local Average Treatment Effect (SLATE) in Estimation1 YU WANG2 4.1. Introduction It is becoming more common to use the spatially lagged variables, or technically speaking, the spatial weighting matrices, as the instrumental variables (IVs) to address the endogeneity concerns. This is due to the accumulating validity of spatial data and the lack of valid IVs. Typically, when lacking valid IVs, the neighboring variables of the endogenous variables are used as the IVs in empirical studies (Wong et al., 2017). In the spatial econometric context, the neighboring IVs constitute the spatial weighting matrices of the endogenous variables. Few formal theoretical analyses, however, have been conducted to discuss whether the spatially lagged IV method addresses endogeneity. Valid IVs should satisfy, according to the Local Average Treatment Effects (LATE) Theorem, the independence assumption and the exclusion restriction. However, it is unknown whether the LATE Theorem has any specific form when using the spatially lagged IV. In this paper, I demonstrate the Spatially Local Average Treatment Effects (SLATE) to theoretically discuss the validity of the spatially lagged IV strategy and raise the Spatially Local Average Treatment Effects (SLATE) theorem, which includes the spatial independence assumption and the exclusion restriction. The spatial independence assumption states that there is no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, namely the external spatial exogeneity of the explanatory variable; and that there is no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables, namely the internal exogeneity of the explanatory variables. Both the external and the internal exogeneity ensure that the spatially lagged IV has no 1 I thank Dave Donaldson for valuable comments and suggestions. All errors are the author’s. 2 Wang: Department of Applied Economics, University of Minnesota, email: wang5979@umn.edu. 70 correlation with the latent outcome of the explained variable, nor does it have correlation with the latent treatment condition. To ensure the unbiased and consistent estimate of the spatially lagged IV method, the spatially lagged IV should also satisfy the spatial exclusion restriction, which consists of the direct, and the indirect, spatial exclusion restriction. The direct spatial exclusion restriction of the spatially lagged IV means the spatially lagged IV has no direct causal impact on the explained variable, and the indirect spatial exclusion restriction of the spatially lagged IV means the spatially lagged IV has no indirect causal impact on the explained variable. Both the direct and the indirect spatial exclusion restrictions ensure that the spatially lagged IV influences the explained variable only through the explanatory variable, excluding other influencing channels of the spatially lagged IV on the explained variable. Accordingly, upon satisfying the spatial independence assumption and the spatial exclusion restriction, the spatially lagged IV estimate is unbiased and consistent. I set up a structural model to compare the average treatment effect (ATE) in the OLS estimate and the spatially local average treatment effect (SLATE) in the lagged IV estimate both qualitatively and quantitatively. In this model, the explained variable is determined by both an explanatory variable and an unobserved confounder, and perhaps also by the spatially lagged explanatory variable. The explanatory variable is determined by the spatially lagged explanatory variable and also the unobserved confounder. The unobserved confounder is determined by the spatially lagged unobserved confounder, and may also be influenced by the spatially lagged explanatory variable. It is found that when the spatially lagged IV estimate violates the spatial exclusion restriction or the spatial independence assumption, or both, it suffers from the endogeneity, because on one hand, the spatially lagged IV estimate has the “spatially local selection bias” in its SLATE; on the other, the “relaxed spatially local ATT” in its SLATE is different from the average treatment effect on the treated (ATT) in the OLS estimate’s ATE. I then numerically discuss the spatially local average treatment effect. I characterize the spatially local average treatment effect (SLATE) of the spatially lagged IV estimates numerically, and compare them with the average treatment effect (ATE) of the OLS estimates. I find that a valid spatially lagged IV should satisfy both the spatial independence assumption, that is, the explanatory variables should satisfy both the 71 external and the internal spatial exogeneity, and the spatial exclusion restriction, both the direct and the indirect. Accordingly, I raise the spatially local average treatment effect (SLATE) theorem. I also discuss the dynamic spatially local average treatment effect numerically. I find that when satisfying the SLATE theorem, including the spatial independence assumption, and the spatial exclusion restriction if necessary, the spatially lagged IV estimate is unbiased and consistent, even if the treatment is implemented in multiple waves. I use pioneers and stragglers, a situation commonly seen in empirical studies, as an example to explain the dynamic SLATE. My findings provide implications for applied researchers that the spatially lagged IV method, to a large extent, addresses the endogeneity concern, even if the treatment has multiple waves of implementation of the treatment. By discussing the SLATE theorem, my analysis also contributes to the credible estimates of causal inference with the LATE theorem in instrumental estimation (Angrist et al., 1996; Imbens, 2014). The rest of this paper is organized as follows. Section 4.2 discusses the theoretical framework. Section 4.3 derives the numerical analysis of the spatial local average treatment effect, and introduces the spatially local average treatment effect theorem. Section 4.4 numerically discusses the dynamic spatially local average treatment effect. And Section 4.5 summarizes. 4.2. Theoretical Framework This section derives the spatially local average treatment effects (SLATE) in the spatially lagged IV estimation and the average treatment effects (ATE) in the OLS estimation. In light of the local average treatment effects (LATE) theorem (Angrist and Pischke, 2009), this section shows that a valid spatially lagged IV should satisfy both the independence assumption and the exclusion restriction. In a data generation process with the spatial autocorrelation of the unobserved confounder and the explanatory variable, I compare the SLATE in the spatially lagged IV estimation and the ATE in the OLS estimation. I find that as the spatially lagged IV estimate violates the independence assumption, it yields an estimation bias smaller than the bias in the OLS estimation. I also find that as the spatially lagged IV estimate violates the exclusion restriction, it yields an estimation bias greater than the bias in the OLS estimation. 72 A. Setup In empirical studies, the existence of unobserved confounders that influence both the explanatory variable and the explained variable is the most common source of endogeneity (Stock and Trebbi, 2003; Angist and Krueger, 2001). Measurement errors in the explanatory variable and reverse causality between the explanatory variable and the explained variable, the other two sources of endogeneity, can also be largely attributed to unobserved confounders (Krueger, 1999; Angrist and Lang, 2004). With the accumulating availability of spatial data sets, the spatially lagged explanatory variable is serving more frequently as the instrumental variable, namely the spatially lagged IV. The spatially lagged IV, with its standard form of the spatial weighting matrix of the endogenous explanatory variable, is commonly regarded as a valid IV. For the relevance restriction, the spatial weighting matrix implies to what extent the spatially lagged IV is correlated with the endogenous explanatory variable. For the exclusion restriction, suppose theoretically that the data generation process does not take the spatial Durbin form; if so, then the spatially lagged IV influences the explained variable only through the endogenous explanatory variable. However, when taking the unobserved confounder into consideration, it is inevitable to think about whether the spatially lagged IV works as a random shock, in other words, whether the spatially lagged IV satisfies the independence assumption. In a typical spatial econometric model with the drive of omitted variables, the unobserved confounder follows the spatial autocorrelation, and the explanatory variable may be correlated with the random error in the unobserved confounder’s spatial autocorrelation. In addition, the endogeneity of the explanatory variable may also come from the explanatory variable’s spatial autocorrelation; thus the explanatory variable may be correlated with the random error in the explanatory variable’s spatial autocorrelation. As a result, the OLS estimate in this spatial econometric model is biased and inconsistent. On the other hand, suppose the spatially lagged IV is not correlated with the random error in the unobserved confounder’s spatial autocorrelation, nor is it correlated with the random error in the explanatory variable’s spatial autocorrelation; therefore, the spatially lagged IV is not correlated with the latent state of the explained variable, nor is it correlated with the latent state of the explanatory variable; in other words, the spatially lagged IV works as a random shock. 73 To explain these above, I use the following structural model as the standard data generation process, such that 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.1) where 𝒀𝒀,𝑿𝑿,𝑼𝑼 represent the explained variable, the explanatory variable and the unobserved confounder, respectively. 𝝐𝝐 represents an independent and identically distributed random error. The spatial autocorrelation function of the unobserved confounder is 𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝝋𝝋𝑿𝑿 + 𝜸𝜸 (4.2) where 𝑾𝑾 represents the spatial weighting matrix, and the explanatory variable is correlated with the random error in this spatial autocorrelation process, such that 𝐸𝐸(𝑿𝑿,𝜸𝜸) ≠ 0. The spatial autocorrelation function of the explanatory variable is 𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.3) where the explanatory variable is also correlated with the random error in this spatial autocorrelation process, such that 𝐸𝐸(𝑿𝑿,𝜼𝜼) ≠ 0. The independence assumption demonstrates that a valid instrumental variable should work as a random shock. On one hand, (4.3) shows that the spatially lagged IV is not correlated with the latent state of the explanatory variable. On the other hand, when 𝝋𝝋 = 𝟎𝟎, 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜸𝜸) = 0 implies that the spatially lagged IV is not correlated with the latent state of the explained variable; when 𝝋𝝋 ≠ 𝟎𝟎, it requires both 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜸𝜸) = 0 and 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜼𝜼) = 0 imply that the spatially lagged IV is not correlated with the latent state of the explained variable. In this spatial data generation process, 𝑾𝑾𝑿𝑿 represents a set of weighted averages of neighboring explanatory variables. More specifically, 𝑾𝑾𝑿𝑿 = ∑ [∑ 𝜇𝜇~𝑖𝑖𝑿𝑿~𝒊𝒊~𝑖𝑖 ]𝑖𝑖 , where ∑ 𝜇𝜇~𝑖𝑖𝒙𝒙~𝒊𝒊~𝑖𝑖 represents a weighted average of the explanatory variable of all the individual 𝑖𝑖 ’s neighboring individuals, and ∑ [∑ 𝜇𝜇~𝑖𝑖𝑿𝑿~𝒊𝒊~𝑖𝑖 ]𝑖𝑖 represents the set summing up all those weighted averages. Therefore, 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜸𝜸) = 0 means there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders. Similarly, 𝐸𝐸(𝑾𝑾𝑿𝑿,𝜼𝜼) = 0 means there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variable. These two assumptions imply that in this spatial data generation process, the spatially lagged IV is not causally influenced by either the explanatory 74 variable or the unobserved confounder, nor does the spatially lagged IV have any synchronic relationship with the explanatory variable or the unobserved confounder. In a word, the spatially lagged IV works as a random shock, satisfying the independence assumption. It is also inevitable to think about whether the spatially lagged IV influences the explained variable only through the explanatory variable, in other words, whether the spatially lagged IV satisfies the exclusion restriction. In the standard data generation process above, the spatially lagged IV has no influencing channel but the explanatory variable on the explained variable. If the spatially lagged IV also has a direct causal effect on the explained variable, it violates the exclusion restriction. In addition, if the spatially lagged IV also has an indirect causal effect on the explained variable, in other words, the spatially lagged IV influences the explained variable through the unobserved confounder, it still violates the exclusion restriction. In these cases, the spatially lagged IV estimate may yield much larger bias and inconsistency, compared to the OLS estimate. To explain these cases, I keep using the structural model above as the standard data generation process, such that 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐 (4.4) 𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + +𝝃𝝃𝑾𝑾𝑿𝑿 + 𝜸𝜸 (4.5) 𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.6) in which I add 𝝍𝝍𝑾𝑾𝑿𝑿, the direct causal effect of the spatially lagged IV, and 𝝃𝝃𝑾𝑾𝑿𝑿, the indirect causal effect of the spatially lagged IV, on the explained variable. When 𝝍𝝍 ≠ 𝟎𝟎, or 𝝃𝝃 ≠ 𝟎𝟎, or both, the exclusion restriction is violated. In this spatial data generation process, 𝝍𝝍 = 𝟎𝟎 means there exists no inter-regional direct causal effect of the explanatory variable on the explained variable, and 𝝃𝝃 = 𝟎𝟎 means there exists no inter-regional indirect causal effect of the explanatory variable on the explained variable. As a result, this structural model shows that the inter- regional causal effect of the explanatory variable on the explained variable, namely the spatially lagged IV, exists only through the explanatory variable. In a word, the spatially lagged IV works excluding other causal channels, satisfying the exclusion restriction. 75 B. The SLATE in the Spatially Lagged IV and the ATE in OLS In this section, I conceptually discuss the extent to which the spatially lagged IV method addresses the endogeneity and the extent to which the OLS method addresses the endogeneity, by comparing the spatially local average treatment effect (SLATE) in the spatially lagged IV estimate and the Average Treatment Effect (ATE) in the OLS estimate. In light of the local average treatment effect (LATE) theorem in Angrist and Pischke (2009), I discuss the spatially local average treatment effect in the spatially lagged IV estimation. For simplicity and without losing generality, I assume a binary- valued explanatory variable and an explained variable with values of 1 or 03, and also assume that there is only one explanatory variable. Denote 𝑌𝑌(𝑒𝑒,𝒘𝒘𝑒𝑒� ) as region 𝑖𝑖 ’s latent outcome when its treatment is 𝑋𝑋𝑖𝑖 = 𝑒𝑒 and its spatially lagged treatment, the spatially lagged IV, is 𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 = 𝒘𝒘𝑒𝑒� , where 𝑾𝑾𝒊𝒊 represents the spatial weighting vector, a row vector, of the explanatory variable of region 𝑖𝑖. To specify the heterogeneous causal effect of the spatially lagged IV, denote 𝑋𝑋1𝑖𝑖 as region 𝑖𝑖’s latent treatment state when ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 = 1 and 𝑋𝑋0𝑖𝑖 as region 𝑖𝑖’s latent treatment state when ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 = 0. Therefore, the observed treatment state is latently represented as 𝑋𝑋𝑖𝑖 = 𝑋𝑋0𝑖𝑖 + (𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖)�𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 (4.7) in which either 𝑋𝑋1𝑖𝑖 or 𝑋𝑋0𝑖𝑖 can be observed, and (𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖) represents the heterogeneous causal effect of ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 the spatially lagged IV. These notations comply with (or conform to) the independence assumption which states that the instrumental variable should have no association with latent outcome, nor should it have any association with latent treatment state. Specifically, [{𝑌𝑌𝑖𝑖(𝑒𝑒,𝒘𝒘𝑒𝑒� );∀𝑒𝑒,𝒘𝒘𝑒𝑒� },𝑋𝑋1𝑖𝑖,𝑋𝑋0𝑖𝑖 ] ⫫�𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 (4.8) This implies that the effects of a spatially lagged IV should be similar to the effects of a random assignment. In other words, the spatially lagged IV should be uncorrelated with the explained variable or with the latent treatment state by the explanatory variable. Similarly, the exclusion restriction is stated that 𝑌𝑌𝑖𝑖(𝑒𝑒,𝒘𝒘𝑒𝑒� ) is only the function of 𝑋𝑋𝑖𝑖, 3 This assumption is based on the latent index model (Heckman, 1978). Specifically, the binary values of the explanatory variable and the explained variable can be regarded as the ultimate choice, which is influenced by an unobserved decision process with the latent revenue and cost, as well as a random error, if necessary. 76 but not ∑ 𝜇𝜇~𝑖𝑖𝑋𝑋~𝑖𝑖~𝑖𝑖 ; in other words, the spatially lagged IV influences the explained variable only through the explanatory variable. This is denoted as 𝑌𝑌𝑖𝑖(𝑒𝑒, 0) = 𝑌𝑌𝑖𝑖(𝑒𝑒, 1), 𝑒𝑒 = 0, 1 (4.9) When 𝝍𝝍 ≠ 𝟎𝟎 , namely the direct causal effect of the spatially lagged IV on the explained variable exists, or 𝝃𝝃 ≠ 𝟎𝟎, namely the indirect causal effect exists, or if both exist, the exclusion restriction is violated. To compare the endogeneity in the spatially lagged IV estimation and in the OLS estimation, I first discuss the average treatment effect (ATE) in OLS, such that 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑌𝑌1𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] + 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖] + 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] (4.10) where 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖] is the average treatment effect on the treated (ATT), the causal effect of interests. 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] is the selection bias, from the endogeneity suffered by the OLS estimate. In the spatially lagged IV estimation, the spatially local average treatment effect (SLATE), in light of Angrist and Pischke (2009), is 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] , when both the exclusion restriction and the independence assumption are satisfied. Here the SLATE is the causal effect that I’m interested in. When there exists inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, or there exists inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, or both, the independence assumption is violated. When there exists no inter-regional direct or indirect causal effect of the explanatory variable on the explained variable, the exclusion restriction is satisfied. In this scenario, it is known that 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�(4.11) Because the exclusion restriction is satisfied, it is known that 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) = 𝑌𝑌0𝑖𝑖 , 𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) = 𝑌𝑌1𝑖𝑖. Therefore, 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼[𝑌𝑌0𝑖𝑖 + (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] + 𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] (4.12) Similarly, 77 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋0𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼[𝑌𝑌0𝑖𝑖 + (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] + 𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] (4.13) As the exclusion restriction is satisfied, it is also known that 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] (4.14) Therefore, the SLATE in the spatially lagged IV estimation becomes 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼�(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋0𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� (4.15) which refers to the “restricted spatially local ATT”. As a result, the endogeneity of the spatially lagged IV estimate results from the “restrict spatially local ATT” in the spatially lagged IV estimate. Compared with the ATE in the OLS estimation, it is easy to see that the SLATE in the spatially lagged IV estimation does not include a selection bias, implying that the extent of endogeneity of the spatially lagged IV estimation is smaller than the extent of endogeneity of the OLS estimation, which includes 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0], a selection bias. Because of the violation of the independence assumption, (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖 and (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖 , the latent outcomes, are not independent from 𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 , the spatially lagged IV; specifically, 𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] ≠ (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋1𝑖𝑖 , 𝔼𝔼[(𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] ≠ (𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖)𝑋𝑋0𝑖𝑖 . Because of the same reason, 𝑋𝑋1𝑖𝑖 and 𝑋𝑋0𝑖𝑖 , the latent treatment states, are also not independent from 𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 ; specifically, 𝔼𝔼[𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] ≠ 𝑋𝑋1𝑖𝑖, 𝔼𝔼[𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] ≠ 𝑋𝑋0𝑖𝑖. As a result, the SLATE cannot be simplified as 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖]. When there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, nor does there exist inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, the independence assumption is satisfied. However, when there exists inter-regional direct or indirect causal effect of the explanatory variable on the explained variable, the exclusion restriction is violated. In this scenario, to derive 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�, the SLATE, it is known that 78 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�(4.16) Because the independence assumption is satisfied, it is known that 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�= 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖� (4.17) Therefore, 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖� (4.18) Similarly, 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋0𝑖𝑖 = 0� = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋0𝑖𝑖� (4.19) In addition, satisfying the independence assumption implies that 𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑋𝑋1𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑋𝑋0𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼[𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖] Therefore, the SLATE in the spatially lagged IV estimation becomes 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)−𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋1𝑖𝑖�−𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖)−𝑌𝑌𝑖𝑖(0,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖)�𝑋𝑋0𝑖𝑖� 𝔼𝔼[𝑋𝑋1𝑖𝑖−𝑋𝑋0𝑖𝑖]= 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)−𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�(𝑋𝑋1𝑖𝑖−𝑋𝑋0𝑖𝑖)� 𝔼𝔼[𝑋𝑋1𝑖𝑖−𝑋𝑋0𝑖𝑖]= 𝔼𝔼[𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)]= 𝔼𝔼[𝑌𝑌1𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖]+𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖]= 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] + 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] (4.20) in which 𝔼𝔼[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] refers to the “relaxed spatially ATT”, and 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] − 𝔼𝔼[𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖] refers to the “spatially selection bias”. As a result, the endogeneity of the spatially lagged IV estimate results from the “restricted spatially selection bias” in the spatially lagged IV estimate. Compared with the ATE in the OLS estimation, it is easy to see that the SLATE inn the spatially lagged IV estimation includes a spatially selection bias, which could be greater than the selection bias in the OLS estimate. This implies that the extent of endogeneity of the spatially lagged IV estimation is greater than the extent of endogeneity of the OLS estimation. When there exists inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, or there 79 exists inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, or both, the independence assumption is violated. In addition, when there exists inter-regional direct or indirect causal effect of the explanatory variable on the explained variable, the exclusion restriction is also violated. In this scenario, to derive 𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0�, the SLATE, it is known that 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1� = 𝔼𝔼[𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] + 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1� (4.21) and similarly, 𝔼𝔼[𝑌𝑌𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] = 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) + �𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� = 𝔼𝔼[𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] + 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� (4.22) Therefore, the SLATE becomes the sum of the “relaxed spatially local ATT” in the spatially lagged IV estimate, that is, 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1� − 𝔼𝔼��𝑌𝑌𝑖𝑖(1,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖) − 𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1] − 𝔼𝔼[𝑋𝑋𝑖𝑖|𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0] and the “spatially local selection bias” in the spatially lagged IV estimate, that is, 𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑌𝑌𝑖𝑖(0,𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖)�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� 𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 1�−𝔼𝔼�𝑋𝑋𝑖𝑖�𝑾𝑾𝒊𝒊𝑋𝑋𝑖𝑖 = 0� . As a result, when both the independence assumption and the exclusion restriction are violated, the endogeneity of the spatially lagged IV estimate is due to the “spatially local selection bias” and the “relaxed spatially local ATT” in the spatially lagged IV estimate. Compared with the ATE in the OLS estimation, it is easy to see that the SLATE in the spatially lagged IV estimate, which violates both the independence assumption and the exclusion restriction, includes a “spatially local selection bias”, which could be greater than the selection bias in the OLS estimation. What’s more, it is also easy to see that the SLATE in the spatially lagged IV estimate, when only the independence assumption is violated, also includes the “relaxed spatially local ATT”, which is different from the “restricted spatially local ATT” in the spatially lagged IV estimate, when both the independence assumption and the exclusion restriction are violated. These imply that the extent of endogeneity of the spatially lagged IV estimate, which violates both the independence assumption and the exclusion restriction, is greater than 80 that of the spatially lagged IV estimate, which violates only the independence assumption, and could also be greater than that in OLS. To sum up, the reason why the OLS estimate suffers from the endogeneity is because it has the selection bias in its ATE. When the spatially lagged IV estimate only violates the spatial independence assumption, it suffers from the endogeneity, because the “restrict spatially local ATT” in its SLATE is different from the ATT in the OLS estimate’s ATE. When the spatially lagged IV estimate only violates the spatial exclusion restriction, it suffers from the endogeneity, because it has the “spatially local selection bias” in its SLATE. When the spatially lagged IV estimate violates both the spatial exclusion restriction and the spatial independence assumption, it suffers from the endogeneity because on one hand, it has the “spatially local selection bias” in its SLATE; on the other, the “relaxed spatially local ATT” in its SLATE is different from the ATT in the OLS estimate’s ATE. 4.3. The Numerical Spatially Local Average Treatment Effects (SLATE) In this section, I characterize the spatially local average treatment effects (SLATE) of the spatially lagged IV estimates numerically and compare it with the average treatment effect (ATE) of the OLS estimates. I demonstrate the spatially local average treatment effect theorem numerically, especially its two key properties: the spatial independence assumption and the spatial exclusion restriction. I find that a valid spatially lagged IV should satisfy the spatial independence assumption; that is, the explanatory variables should satisfy both the external and the internal spatial exogeneity. It should also satisfy the spatial exclusion restriction, both the direct and the indirect. A. The Spatial Independence The first key property of a valid spatially lagged IV is the Spatial Independence Assumption. This property entails (1) the external spatial exogeneity of the explanatory variables, which means there is no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, and (2) the internal exogeneity of the explanatory variables, which means there is no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables. Both the 81 external and the internal exogeneity ensure that 𝑾𝑾𝑿𝑿 has no correlation with the latent outcome, nor does it have correlation with the latent treatment condition. The external exogeneity. To understand the external exogeneity of the spatially lagged IV, I start with the following standard spatial data generation process 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.23) 𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝜸𝜸 (4.24) 𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.25) where 𝑼𝑼 is the unobserved confounder. 𝑼𝑼 follows a spatial data generation process, and is derived as 𝑼𝑼 = (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 (4.26) where (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1 is assumed to exist. This means the endogeneity from the unobserved confounder is due to the spatial autocorrelation of the unobserved confounders. 𝑾𝑾 is the 𝑛𝑛 × 𝑛𝑛 spatial weighting matrix. Let 𝜇𝜇𝑖𝑖𝑖𝑖 = 0, and assume that the spatial weighting matrix’s row elements sum to 1. 𝑾𝑾 is symmetric, that is, 𝑾𝑾′ = 𝑾𝑾. Therefore, (4.23) becomes 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐 (4.27) Assumption 1 discusses the external exogeneity of the explained variables in the spatial estimation. ASSUMPTION 1 (The external spatial exogeneity): There exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, or specifically, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, although there may exist intra-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, or specifically, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗. As a result of 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, 𝑿𝑿�, the OLS estimate, is biased and inconsistent. Specifically, 𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 Given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, it is known that 82 𝐸𝐸(𝑿𝑿′𝜸𝜸) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝑉𝑉1 ⋮ 𝑉𝑉𝑗𝑗 ⋮ 𝑉𝑉𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = �( � 𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑗𝑗=1 )𝐾𝐾𝑘𝑘=1 ≠ 0 Therefore, 𝐸𝐸(𝑿𝑿�) ≠ 𝑿𝑿. Similarly, it is known that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝜸𝜸 𝑛𝑛 ↛ 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿� ↛ 𝑿𝑿, in other words, the OLS estimate is biased and inconsistent. ∎ Given the external spatial exogeneity of the explanatory variables, Proposition 1 demonstrates the conditions from which the endogeneity of the spatially lagged IV estimate is derived, and how these conditions make the spatially lagged IV valid. PROPOSITION 1 (The Special Spatial Independence Assumption): When there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, the spatially lagged IV estimate is unbiased and consistent. Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely the spatially lagged IV, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝑉𝑉1 ⋮ 𝑉𝑉𝑗𝑗 ⋮ 𝑉𝑉𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 83 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗 𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0. Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿; in other words, the spatially lagged IV estimate is unbiased and consistent. ∎ A typical assumption in the spatial data generation process is that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor each other; otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. In this sense, Proposition 1 still holds. For the proof, see the Appendix. The implication of the external exogeneity is that the spatially lagged IV works as a random shock. Specifically, on one hand, 𝑾𝑾𝑿𝑿, the spatial weighting matrix, has no correlation with 𝜸𝜸, the disturbances in the spatial autocorrelation of the unobserved confounders; therefore, 𝑾𝑾𝑿𝑿 has no correlation with the latent outcome. On the other hand, (4.25) implies that 𝑾𝑾𝑿𝑿 has a one-way causal effect on 𝑿𝑿; therefore, 𝑾𝑾𝑿𝑿 has no correlation with the latent treatment condition. The internal exogeneity. The standard spatial model only assumes that the explanatory variables have no inter-regional correlation with the disturbances in the spatial autocorrelation of the unobserved confounders. However, when the unobserved confounder is not only influenced by its spatially lagged items, but also by the explanatory variables, it is also indispensable to assume that no inter-regional correlation exists between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables themselves. This is because when the unobserved confounder is influenced by the explanatory variables, the endogeneity from the unobserved confounder is not only due to the spatial autocorrelation of the unobserved confounders, but also to the spatial autocorrelation of the explanatory variables. To discuss this internal exogeneity, I extend the standard spatial data generation 84 process as 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.28) 𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝝋𝝋𝑿𝑿 + 𝜸𝜸 (4.29) 𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.30) 𝑼𝑼 follows a spatial data generation process, and is derived as 𝑼𝑼 = (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸) (4.31) and 𝑿𝑿 also follows a spatial data generation process, and is derived as 𝑿𝑿 = (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 (4.32) where (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1 and (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1 are assumed to exist. Therefore, 𝑼𝑼 becomes (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] , which means the endogeneity from the unobserved confounder is not only due to the spatial autocorrelation of the unobserved confounders, but also to the spatial autocorrelation of the explanatory variables. Then 𝒀𝒀 becomes 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸)] + 𝝐𝝐 (4.33) Assumption 2 discusses the internal exogeneity of the explained variables. ASSUMPTION 2 (The internal exogeneity): There exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables themselves, or specifically, 𝑒𝑒𝑖𝑖𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , although there may exist intra-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables themselves, or specifically, 𝑒𝑒𝑖𝑖𝜂𝜂𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗. As a result of 𝑒𝑒𝑖𝑖𝜂𝜂𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, 𝑿𝑿�, the OLS estimate, is biased and inconsistent. Specifically, 𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝑿𝑿′𝜼𝜼 +(𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝜸𝜸) ≠ 0 and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝜸𝜸 𝑛𝑛 ↛ 0. 85 Similarly, given what Assumption 2 implies that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝜼𝜼) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜂𝜂1 ⋮ 𝜂𝜂𝑗𝑗 ⋮ 𝜂𝜂𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = �( � 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑗𝑗=1 )𝐾𝐾𝑘𝑘=1 ≠ 0 Similar derivation shows that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝜼𝜼 𝑛𝑛 ↛ 0 . As a result, 𝐸𝐸(𝑿𝑿�) ≠ 𝑿𝑿 , and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿� ↛ 𝑿𝑿, in other words, the OLS estimate is biased and inconsistent. ∎ Given both the external and the internal exogeneity of the explanatory variables, Proposition 2 demonstrates the conditions from which the endogeneity of the spatially lagged IV estimate is derived, and how these conditions make the spatially lagged IV valid. PROPOSITION 2 (The General Spatial Independence Assumption): When there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, and there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables themselves either, the spatially lagged IV estimate is unbiased and consistent. Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely the spatially lagged IV, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0. 86 Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜂𝜂1 ⋮ 𝜂𝜂𝑗𝑗 ⋮ 𝜂𝜂𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . As Assumption 2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0 , because 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0, 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similar derivation also shows that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜼𝜼 𝑛𝑛 → 0 . Because 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0 , 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿 ; in other words, the spatially lagged IV estimate is unbiased and consistent. ∎ A typical assumption in spatial data generation process is that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor each other; otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. In this sense, Proposition 2 still holds. For the proof, see the Appendix. The internal exogeneity also implies that the spatially lagged IV works as a random shock. Specifically, 𝑾𝑾𝑿𝑿, the spatial weighting matrix, has neither correlation with 𝜸𝜸, nor with 𝜼𝜼, the disturbances in the spatial autocorrelation of the explanatory variables; therefore, 𝑾𝑾𝑿𝑿 has no correlation with the latent outcome. In addition, 𝑾𝑾𝑿𝑿’s one-way causal effect on 𝑿𝑿 means that 𝑾𝑾𝑿𝑿 has no correlation with the latent treatment condition. B. The Spatial Exclusion The second key property of a valid spatially lagged IV is the Spatial Exclusion 87 Restriction. This property entails (1) the direct spatial exclusion restriction of the spatially lagged IV, which means the spatially lagged IV has no direct causal impact on the explained variable, and (2) the indirect spatial exclusion restriction of the spatially lagged IV, which means the spatially lagged IV has no indirect causal impact on the explained variable. Both the direct and the indirect spatial exclusion restrictions ensure that the spatially lagged IV influences the explained variable only through the explanatory variable, excluding other influencing channels of the spatially lagged IV on the explained variable. The direct spatial exclusion restriction. To understand the direct spatial exclusion restriction of the spatially lagged IV, I add 𝝍𝝍𝑾𝑾𝑿𝑿 , the direct causal impact of the spatially lagged IV on the explained variable, to the standard spatial data generation process, such that 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐 (4.34) 𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝜸𝜸 (4.35) 𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.36) To derive the OLS estimate, we have 𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 + 𝝍𝝍𝑿𝑿′𝑾𝑾𝑿𝑿 because as discussed in Assumption 1, 𝐸𝐸(𝑿𝑿′𝜸𝜸) ≠ 0 , so 𝐸𝐸[(𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸] ≠ 0 . Therefore, whether 𝝍𝝍 ≠ 𝟎𝟎 or not, 𝑿𝑿� is biased. As for the asymmetric property, as discussed in Assumption 1, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝜸𝜸 𝑛𝑛 ↛ 0 . Therefore, whether 𝝍𝝍 ≠ 𝟎𝟎 or not, 𝑿𝑿� is inconsistent. ∎ Proposition 3 demonstrates the conditions from which the endogeneity of the spatially lagged IV estimate is derived, and how these conditions make the spatially lagged IV valid. PROPOSITION 3 (The Direct Exclusion Restriction): When the spatially lagged IV has no direct causal impact on the explained variable, the spatially lagged IV estimate is unbiased and consistent. Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely the spatially lagged IV, it is derived that 88 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝍𝝍𝑾𝑾𝑿𝑿 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 + 𝝍𝝍(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾′𝑾𝑾𝑿𝑿 Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is derived that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0 . However, when 𝝍𝝍 = 0 , we have 𝝍𝝍(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝑾𝑾𝑿𝑿 = 0 . In this case, 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿 , and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿 . In other words, the spatially lagged IV estimate is unbiased and consistent. ∎ The implication of the direct exclusion restriction is that the spatially lagged IV influences the explained variable only through the explanatory variable, excluding the direct effect of the spatially lagged IV on the explained variable. Specifically, as is shown in (4.34), when 𝝍𝝍 = 0 , the explained variable is merely a function of the explanatory variable, excluding the spatially lagged IV. The indirect exclusion restriction. In addition to the direct spatial exclusion restriction, a valid spatially lagged IV should also satisfy the indirect spatial exclusion restriction. To understand this, I add 𝝃𝝃𝑾𝑾𝑿𝑿, the indirect causal impact of the spatially lagged IV on the explained variable, in other words, the impact of the spatially lagged IV on the unobserved confounder, to the standard spatial data generation process, such that 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼𝑼𝑼 + 𝝐𝝐 (4.37) 𝑼𝑼 = 𝝆𝝆𝑾𝑾𝑼𝑼 + 𝝃𝝃𝑾𝑾𝑿𝑿 + 𝜸𝜸 (4.38) 𝑿𝑿 = 𝝉𝝉𝑾𝑾𝑿𝑿 + 𝜼𝜼 (4.39) 𝑼𝑼 follows a spatial data generation process, and is derived as 𝑼𝑼 = (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝃𝝃𝑾𝑾𝑿𝑿 + 𝜸𝜸) (4.40) and 𝑿𝑿 also follows a spatial data generation process, and is derived as 𝑿𝑿 = (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 (4.41) where (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1 and (𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1 are assumed to exist. Because the endogeneity from the unobserved confounder is not only due to the spatial autocorrelation of the unobserved confounders, but also to the spatial autocorrelation of the explanatory variables, 𝑼𝑼 becomes (𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] and then 𝒀𝒀 becomes 𝒀𝒀 = 𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸]] + 𝝐𝝐 (4.42) 89 As a result of 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 ≠ 0, when 𝑖𝑖 = 𝑗𝑗, as is discussed in Assumption 1, 𝑿𝑿�, the OLS estimate, is biased and inconsistent. Specifically, 𝑿𝑿� = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′𝒀𝒀 = (𝑿𝑿′𝑿𝑿)−1𝑿𝑿′�𝑿𝑿𝑿𝑿 + 𝑼𝑼�(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸]� + 𝝐𝝐� = 𝑿𝑿 + (𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−1𝑿𝑿′𝜼𝜼 +(𝑿𝑿′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝜸𝜸 On one hand, as is discussed in Assumption 2, 𝐸𝐸(𝑿𝑿′𝜸𝜸) ≠ 0 . On the other, as is discussed in Assumption 2, 𝐸𝐸(𝑿𝑿′𝜼𝜼) ≠ 0. Then no matter whether 𝝃𝝃 ≠ 𝟎𝟎 or not, 𝑿𝑿� is biased. As for the asymmetric property, because as is discussed in Assumption 2, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝜸𝜸 𝑛𝑛 ↛ 0 . As is discussed in Assumption 1, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝜼𝜼 𝑛𝑛 ↛ 0 . Then no matter whether 𝝃𝝃 ≠ 𝟎𝟎 or not, 𝑿𝑿� is inconsistent. ∎ Proposition 4 demonstrates the conditions from which the endogeneity of the spatially lagged IV estimate is derived, and how these conditions make the spatially lagged IV valid. PROPOSITION 4 (The Indirect Exclusion Restriction): When the spatially lagged IV has no direct causal impact on the explained variable, the spatially lagged IV estimate is unbiased and consistent. Proof: Using 𝑾𝑾𝑿𝑿, the spatial weighting matrix, as the instrumental variables, namely the spatially lagged IV, it is derived that = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−𝟏𝟏[𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−𝟏𝟏𝜼𝜼 + 𝜸𝜸]] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−𝟏𝟏𝑿𝑿′𝑾𝑾𝜼𝜼+ (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−𝟏𝟏𝑿𝑿′𝑾𝑾𝜸𝜸 Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is derived that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0 . Given what Assumption 2 implies that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , it is derived that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0 and that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜼𝜼 𝑛𝑛 → 0 . However, only when 𝝃𝝃 = 0 , we have (𝑿𝑿′𝑾𝑾𝑿𝑿)−𝟏𝟏𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−𝟏𝟏𝝃𝝃𝑾𝑾(𝑰𝑰𝒏𝒏 − 𝝉𝝉𝑾𝑾)−𝟏𝟏𝑿𝑿′𝑾𝑾𝜼𝜼 = 0. In this case, 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿, and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿. In other words, the spatially lagged IV estimate is unbiased and consistent. ∎ 90 The implication of the indirect exclusion restriction is that the spatially lagged IV influences the explained variable only through the explanatory variable, excluding the indirect effect of the spatially lagged IV on the explained variable. Specifically, as is shown in (4.38), when 𝝃𝝃 = 0, the unobserved confounder is merely influenced by its spatially lagged items, and thus the explained variable is merely a function of the explanatory variable, excluding the spatially lagged IV. C. The Spatially Local Average Treatment Effect Section 4.2 discusses the Local Average Treatment Effects with the spatially lagged IV. In Section 4.3. C., I introduce the Spatially Local Average Treatment Effect (SLATE) Theorem. Theorem 1: The Spatially Local Average Treatment Effect Theorem, which contains 1. (The Spatially Independence Assumption): The External Exogeneity implies that there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, or specifically, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗. The Internal Exogeneity implies that there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables themselves, or specifically, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 . In this sense, the spatially lagged IV works as a random assignment, specifically, [{𝑌𝑌𝑖𝑖(𝑒𝑒,𝒘𝒘𝑒𝑒� );∀𝑒𝑒,𝒘𝒘𝑒𝑒� },𝑋𝑋1𝑖𝑖,𝑋𝑋0𝑖𝑖 ] ⫫ 𝑾𝑾𝑿𝑿 (4.43) 2. (The Spatially Exclusion Restriction): The Direct Spatial Exclusion Restriction of the spatially lagged IV implies that the spatially lagged IV has no direct causal impact on the explained variable, or specifically, 𝝍𝝍 = 0 . The Indirect Spatial Exclusion Restriction of the spatially lagged IV, which means the spatially lagged IV has no indirect causal impact on the explained variable, or specifically, 𝝃𝝃 = 0. In this sense, the spatially lagged IV influences the explained variable simply through the endogenous explanatory variable, specifically, 𝑌𝑌𝑖𝑖(𝑒𝑒, 0) = 𝑌𝑌𝑖𝑖(𝑒𝑒, 1), 𝑒𝑒 = 0, 1 (4.44) 91 3. (The Existence of First Stage): The endogenous explanatory variable is relevant with the spatially lagged IV, that is, 𝝉𝝉 ≠ 0 ; in other words, the first stage of the 2SLS estimation exists, specifically, 𝐸𝐸[𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖] ≠ 0 4. (Monotonicity): 𝑋𝑋1𝑖𝑖 − 𝑋𝑋0𝑖𝑖 ≥ 0, 𝑖𝑖 When satisfying these four assumptions, the spatially lagged IV estimate is unbiased and consistent. The implication of the spatially local average treatment effect is not the average causal effect of all, but only the average causal effect of the compliers, which is discussed in detail in the next section. 4.4. The Dynamic Spatially Local Average Treatment Effects (SLATE) This section numerically discusses the dynamic spatially local average treatment effect (SLATE). The spatially local average treatment effect is the average causal effect of the compliers, excluding the always-takers and the never-takers. Therefore, a common empirical strategy that excludes the always-takers and the never-takers is excluding the pioneer regions and straggler regions, when using the spatially lagged IV method. More generally to the pioneers and stragglers, suppose the treatment spreads in different waves, the spatial weighting matrix is no longer symmetric but partially asymmetric. In this case, satisfying the SLATE theorem still makes the spatially lagged IV estimate unbiased and consistent; in other words, the SLATE theorem holds dynamically. A. The Compliers of the Spatially Local Average Treatment Effect In light of the Local Average Treatment Effect Theorem, each observation can be classified, by its response to the spatially lagged IV, as compliers (𝑋𝑋1𝑖𝑖 = 1 and 𝑋𝑋0𝑖𝑖 =0), always-takers (𝑋𝑋1𝑖𝑖 = 1 and 𝑋𝑋0𝑖𝑖 = 1) and never-takers (𝑋𝑋1𝑖𝑖 = 0 and 𝑋𝑋0𝑖𝑖 = 0). The spatially local average treatment effect is the average causal effect of the complier regions. However, what we are interested in is the average causal effect of the treated regions, that is, 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] , of which the regions include the always-taker 92 regions and the complier regions which select to take the treatment when 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1. When the spatial independence assumption is satisfied, that is, the spatially lagged IV works as a random assignment, the average causal effect of the treated regions is 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 1] ∙ 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] where 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 1] is the average causal effect of the always-takers, and 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] is the average causal effect of the compliers which select to take the treatment when 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1 . It is also known that 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] + 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] = 1. Similarly, the average causal effect of the treated is 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 0] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 0] ∙ 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 0|𝑋𝑋𝑖𝑖 = 0] +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 0] where 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋0𝑖𝑖 = 0] is the average causal effect of the never-takers, and 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] is the average causal effect of the compliers who select not to take the treatment when 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1 . It is also known that 𝑃𝑃[𝑋𝑋0𝑖𝑖 = 0|𝑋𝑋𝑖𝑖 = 0] + 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 0] = 1. As a result, the average causal effect is 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] ∙ 𝑃𝑃[𝑋𝑋𝑖𝑖 = 1] +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋𝑖𝑖 = 1] ∙ 𝑃𝑃[𝑋𝑋𝑖𝑖 = 1] which is a weighted average of the average causal effect of the compliers, that of the always-takers, and that of the never-takers. From these derivations, it is known that the spatially lagged IV estimate cannot distinguish the average causal effect of the compliers, that of the always-takers, and that of the never-takers. Therefore, spatially local average treatment effect (SLATE) is not the average causal effect, unless there exist no always-takers or never-takers. B. Pioneers and Stragglers In the spatial data generation process, a typical scenario of always-takers is the pioneers, and a typical scenario of never-takers is the stragglers. Consider a type of treatment (policy, institutional change or unexpected shock), which happens in a region within a province or a state, and does not happen in any other neighboring or related region in that province or state; in this sense, the region with that treatment is named pioneer region. Similarly, if that treatment happens in almost all regions within 93 a province or state, excluding a region; in this sense, the region without that treatment is named straggler region. An example of pioneer regions and straggler regions regards the introduction of local direct elections for village leaders in rural China. In late 1990s, the Chinese central government stipulated that village leaders be directly elected by local village residents, namely the local direct election. Before implementing this policy to all villages, each province in China selected a limited number of villages as pioneers to introduce the local direct election as a kind of policy experiment. Therefore, the pioneer villages are always-takers in terms of the implementation of the local direct election, or specifically, {𝑋𝑋𝑖𝑖 = 1 ⊥ 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖}. Similarly, after implementing this to almost all villages, there exist a limited number of villages as stragglers in each province in China that have not introduced the local direct election. Therefore, the straggler villages are never-takers in terms of the implementation of the local direct election, or specifically, {𝑋𝑋𝑖𝑖 = 0 ⊥ 𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖} . (Martinez-Bravo et al., 2012; Martinez-Bravo et al., 2017; Wong et al., 2017). The implication of pioneers and stragglers is that when using the spatially lagged IV method, excluding pioneers and stragglers makes the spatially local average treatment effect approximately to the average causal effect. Besides, if the pioneers and stragglers are a small portion of the whole sample set, the spatially local average treatment effect is also approximately equal to the average causal effect. Specifically, 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖] = 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 1] +𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖|𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖] ∙ 𝑃𝑃[𝑋𝑋1𝑖𝑖 > 𝑋𝑋0𝑖𝑖,𝑊𝑊𝑖𝑖𝑋𝑋𝑖𝑖 = 1|𝑋𝑋𝑖𝑖 = 0] where 𝐸𝐸[𝑌𝑌1𝑖𝑖 − 𝑌𝑌0𝑖𝑖] is the average causal effect, and it is equal to the spatially local average treatment effect. A natural question arises that how the SLATE theorem addresses the problem that the spatially local average treatment effect does not equal to the average causal effect; in other words, if no pioneers or stragglers are excluded, and if the spatial independence assumption and the spatial exclusion restriction are satisfied, is the spatially lagged IV estimate still unbiased and consistent? Suppose no pioneers or stragglers are excluded, then the spatial weighting matrix is no longer symmetric. Specifically, define 𝜇𝜇𝑖𝑖𝑗𝑗 the spatial correlation from region 𝑖𝑖 on region 𝑗𝑗, and 𝜇𝜇𝑗𝑗𝑖𝑖 as the spatial correlation from region 𝑗𝑗 on region 𝑖𝑖. Without losing generality, let 𝑖𝑖 = 1 represent the pioneer region, and let 𝑖𝑖 = 𝑁𝑁 represent the pioneer 94 region. As a result, 𝜇𝜇12 ≠ 𝜇𝜇21, 𝜇𝜇13 ≠ 𝜇𝜇31,…, 𝜇𝜇1𝑁𝑁 ≠ 𝜇𝜇𝑁𝑁1, and the implication is that due to its “pioneering role” in spreading the treatment, the pioneer region’s spatial impact on other regions is different from the other way around. Similarly, 𝜇𝜇𝑁𝑁1 ≠ 𝜇𝜇1𝑁𝑁, 𝜇𝜇𝑁𝑁2 ≠ 𝜇𝜇2𝑁𝑁, …, 𝜇𝜇𝑁𝑁,𝑁𝑁−1 ≠ 𝜇𝜇𝑁𝑁−1,𝑁𝑁, and the implication is that due to its “straggling role” in spreading the treatment, the straggler region’s spatial impact on other regions is also different from the other way around. Then Corollary 1 discusses upon satisfying the external exogeneity in the spatial independence assumption, whether the spatially lagged IV estimate excluding pioneers and stragglers is the same as that including them. COROLLARY 1: When there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, the spatially lagged IV estimate is unbiased and consistent, either excluding pioneer regions and straggler regions or not. Proof: Denote ?̇?𝑾 as the spatial weighting matrix excluding pioneers and stragglers, ?̇?𝑿 as the explanatory variables excluding pioneers and stragglers, and ?̇?𝒀 as the explanatory variables excluding pioneers and stragglers. Using the spatially lagged IV method, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = ��?̇?𝑾?̇?𝑿�′?̇?𝑿�−1 �?̇?𝑾?̇?𝑿�′?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾 �?̇?𝑿𝑿𝑿 + 𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝜸 + ?̇?𝝐� = 𝑿𝑿 + �?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾?̇?𝜸 Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜸� = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒22 … 𝑒𝑒𝑖𝑖2 … 𝑒𝑒𝑁𝑁−1,2 ⋮ 𝑒𝑒2𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁−1,𝑘𝑘 ⋮ 𝑒𝑒2𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁−1,𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇22 … 𝜇𝜇𝑖𝑖2 … 𝜇𝜇𝑁𝑁−1,2 ⋮ 𝜇𝜇2𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁−1,𝑗𝑗 ⋮ 𝜇𝜇2,𝑁𝑁−1 … 𝜇𝜇𝑖𝑖,𝑁𝑁−1 … 𝜇𝜇𝑁𝑁−1,𝑁𝑁−1⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝑉𝑉1 ⋮ 𝑉𝑉𝑗𝑗 ⋮ 𝑉𝑉𝑁𝑁−1⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 95 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0 . In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗 𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. Accordingly, it is known that 𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜸� = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ ?̇?𝑿′?̇?𝑾?̇?𝜸 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿; in other words, the spatially lagged IV estimate is unbiased and consistent. Suppose including pioneers and stragglers, the spatial weighting matrix is asymmetric, that is, 𝑾𝑾 ≠ 𝑾𝑾′. Using the spatially lagged IV method, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾′𝜸𝜸 Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝑉𝑉1 ⋮ 𝑉𝑉𝑗𝑗 ⋮ 𝑉𝑉𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 +� � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗 𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 96 +� � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗 𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0 . In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗 𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0 and ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other words, the spatially lagged IV estimate is unbiased and consistent. ∎ Corollary 2 discusses upon satisfying the internal exogeneity in the spatial independence assumption, whether the spatially lagged IV estimate excluding pioneers and stragglers is the same as that including them. COROLLARY 2: When there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variable, the spatially lagged IV estimate is unbiased and consistent, either excluding pioneer regions and straggler regions or not. Proof: Denote ?̇?𝑾 as the spatial weighting matrix excluding pioneers and stragglers, ?̇?𝑿 as the explanatory variables excluding pioneers and stragglers, and ?̇?𝒀 as the explanatory variables excluding pioneers and stragglers. Using the spatially lagged IV method, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = ��?̇?𝑾?̇?𝑿�′?̇?𝑿�−1 �?̇?𝑾?̇?𝑿�′?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾?̇?𝒀 = �?̇?𝑿′?̇?𝑾?̇?𝑿�−1?̇?𝑿′?̇?𝑾 �?̇?𝑿𝑿𝑿 + 𝑼𝑼 ��𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1�𝝋𝝋?̇?𝑿 + ?̇?𝜸�� + ?̇?𝝐� = 𝑿𝑿 + �?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾 �𝝋𝝋�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝜼 + ?̇?𝜸� = 𝑿𝑿 + �?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾?̇?𝜸 +�?̇?𝑿′?̇?𝑾?̇?𝑿�−1𝑼𝑼�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1𝝋𝝋�𝑰𝑰𝒏𝒏 − 𝝆𝝆?̇?𝑾�−1?̇?𝑿′?̇?𝑾?̇?𝜼 As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(?̇?𝑿′?̇?𝑾?̇?𝜸) = 0 and 𝑝𝑝 lim 𝑛𝑛→∞ ?̇?𝑿′?̇?𝑾?̇?𝜸 𝑛𝑛 → 0. 97 Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜼� = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒22 … 𝑒𝑒𝑖𝑖2 … 𝑒𝑒𝑁𝑁−1,2 ⋮ 𝑒𝑒2𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁−1,𝑘𝑘 ⋮ 𝑒𝑒2𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁−1,𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇22 … 𝜇𝜇𝑖𝑖2 … 𝜇𝜇𝑁𝑁−1,2 ⋮ 𝜇𝜇2𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁−1,𝑗𝑗 ⋮ 𝜇𝜇2,𝑁𝑁−1 … 𝜇𝜇𝑖𝑖,𝑁𝑁−1 … 𝜇𝜇𝑁𝑁−1,𝑁𝑁−1⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜂𝜂1 ⋮ 𝜂𝜂𝑗𝑗 ⋮ 𝜂𝜂𝑁𝑁−1⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. Accordingly, it is known that 𝐸𝐸�?̇?𝑿′?̇?𝑾?̇?𝜼� = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ ?̇?𝑿′?̇?𝑾?̇?𝜼 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other words, the spatially lagged IV estimate is unbiased and consistent. Suppose including pioneers and stragglers, the spatial weighting matrix is asymmetric, that is, 𝑾𝑾 ≠ 𝑾𝑾′. Using the spatially lagged IV method, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0. Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is 98 known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜂𝜂1 ⋮ 𝜂𝜂𝑗𝑗 ⋮ 𝜂𝜂𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 +� � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗 𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖=𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁−1 𝑗𝑗=2 𝐾𝐾 𝑘𝑘=1 +� � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁−1𝑖𝑖=2,𝑖𝑖=𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 𝑁𝑁−1 𝑖𝑖=2,𝑖𝑖≠𝑗𝑗 �𝑁𝑁−1𝑗𝑗=2𝐾𝐾𝑘𝑘=1 = 0 and ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑖𝑖=1,𝑁𝑁,𝑖𝑖≠𝑗𝑗 �𝑗𝑗=1,𝑁𝑁𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜼𝜼) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜼𝜼 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other words, the spatially lagged IV estimate is unbiased and consistent. ∎ To sum up, upon satisfying the spatial independence assumption, together with the spatial exclusion restriction if necessary, the spatially lagged IV estimate is unbiased and consistent, no matter whether the pioneers and stragglers are excluded or not. C. The Numerical Dynamic Spatially Local Average Treatment Effect (SLATE) A natural generalization of pioneers and stragglers lies in multiple waves of treatment implemented to an area, namely the dynamic spatially local average treatment effect (SLATE). This case, similar to the pioneer and straggler one, has the asymmetric spatial weighting matrix. Without losing generality, suppose the treatment 99 is implemented in three waves. Let 𝑖𝑖 = 1 …𝑃𝑃 represent the first wave of regions, and let 𝑖𝑖 = 𝑆𝑆…𝑁𝑁 represent the third, and also the last, waves of regions. Then Corollary 3 discusses upon satisfying the external exogeneity in the spatial independence assumption, whether the spatially lagged IV estimate is still unbiased and consistent when there exist multiple waves of implementation of the treatment (without losing generality, suppose there exist three waves of implementation of the treatment). COROLLARY 3: When there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, the spatially lagged IV estimate is unbiased and consistent, even if there exist multiple waves of implementation of the treatment. Proof: See the Appendix. ∎ Corollary 4 discusses upon satisfying the internal exogeneity in the spatial independence assumption, whether the spatially lagged IV estimate is still unbiased and consistent when there exist multiple waves of implementation of the treatment (without losing generality, suppose there exist three waves of implementation of the treatment). COROLLARY 4: When there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variable, the spatially lagged IV estimate is unbiased and consistent, even if the treatment has multiple waves of implementation. Proof: See the Appendix. ∎ Accordingly, Theorem 2 discusses the SLATE theorem with multiple waves of implementation of the treatment: Theorem 2: The Dynamic Spatially Local Average Treatment Effect Theorem (SLATE), which states that Upon satisfying the SLATE theorem, including the spatial independence assumption, and the spatial exclusion restriction if necessary, the spatially lagged IV estimate is 100 unbiased and consistent, even if there exist multiple waves of implementation of the treatment. The implication of the dynamic SLATE theorem is that as the time-varying and regional varying unobserved factors are incorporated in the random errors of the spatial autocorrelation of either the unobserved confounder or the explanatory variable, the endogenous explanatory variable is uncorrelated with those unobserved factors, as is implied by the spatial independence assumption. In a word, the SLATE theorem holds given the dynamic implementation of the treatment. 4.5. Conclusion This paper introduces the spatially local average treatment effect (SLATE) theorem to discuss the validity of the spatially lagged IV. This paper finds that when the spatially lagged IV satisfies the spatial independence assumption, including the external and internal exogeneity, and the spatial exclusion restriction, both direct and indirect, the spatially lagged IV estimate is unbiased and consistent. Even if there exist multiple waves of implementation of the treatment, with pioneers and stragglers as a distinct example, the spatially lagged IV method is still valid, namely the dynamic spatially local average treatment effect. These findings imply that using the spatially lagged explanatory variable as the IV helps address the endogeneity, especially when lacking in other valid IVs. Most applied researchers pay sufficient attention to the exclusion restriction in the LATE framework, yet tend to ignore the independence assumption (Wang and Bellemare, 2020). In both the LATE and the SLATE frameworks, it is relatively easier to identify whether the exclusion restriction is satisfied in a given data generation process, especially with the theoretical argument. However, it is quite difficult to identify the satisfaction of the independence assumption, as “working as a random assignment” is abstract. The spatially independence assumption, fortunately, provides an easier way to identify it, because it limits the discussion to the spatial dimension, which refers to the external and internal exogeneity. It is usually challenging to use observational data, rather than experimental data, to identify the treatment effect of explanatory variables (Angrist and Krueger, 2001; Freeman, 2005). On the other hand, using experimental data in causal identification usually lacks underlying theoretical relationships (Rosenzweig and Wolpin, 2000). 101 This paper demonstrates that the spatially lagged IV is valid, and requires no other data. With the accumulation of spatial data sets, the empirical studies could suffer less from the endogeneity concern. 102 References Alesina, Alberto, and Guido Tabellini. 2007. “Bureaucrats or Politicians? Part I: A Single Policy Task.” American Economic Review 97 (1): 169-179. Anderson, Theodore Wilbur, and Cheng Hsiao. "Estimation of Dynamic Models with Error Components." Journal of the American Statistical Association 76, no. 375 (1981): 598-606. Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. "Identification of Causal Effects Using Instrumental Variables." Journal of the American Statistical Association 91, no. 434 (1996): 444-455. Angrist, Joshua D., and Alan B. Krueger. "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments." Journal of Economic Perspectives 15, no. 4 (2001): 69-85. Angrist, Joshua D., and Kevin Lang. "Does school integration generate peer effects? Evidence from Boston's Metco Program." American Economic Review 94, no. 5 (2004): 1613-1634. Angrist, Joshua, and Jörn-Steffen Pischke. "Mostly Harmless Econometrics: An Empiricists Guide." Princeton: Princeton University Press, 2009. Bai, Ying, and Ruixue Jia. 2016. “Elite Recruitment and Political Stability: The Impact of the Abolition of China’s Civil Service Exam.” Econometrica 84 (2): 677-733. Baker, Wayne E., and Robert R. Faulkner. 1993. “The Social Organization of Conspiracy: Illegal Networks in the Heavy Electrical Equipment Industry.” American Sociological Review 58 (6): 837-860 Bell, Daniel A. 2016. The China Model: Political Meritocracy and the Limits of Democracy. Princeton, NJ: Princeton University Press. Bellemare, Marc F., Takaaki Masaki, and Thomas B. Pepinsky. "Lagged Explanatory Variables and the Estimation of Causal Effect." The Journal of Politics 79, no. 3 (2017): 949-963. Besley, Timothy. 2005. “Political Selection.” Journal of Economic Perspectives 19 (3): 43-60. 103 Besley, Timothy. 2006. Principled Agents: The Political Economy of Good Government. New York, NY: Oxford University Press on Demand. Besley, Timothy, Jose G. Montalvo, and Marta Reynal-Querol. 2011. “Do Educated Leaders Matter?” The Economic Journal 121 (554): F205-227. Bloom, Nicholas, Renata Lemos, Raffaella Sadun, and John Van Reenen. 2015. “Does Management Matter in Schools?” The Economic Journal 125 (584): 647-674. Blundell, Richard, and Stephen Bond. "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models." Journal of Econometrics 87, no. 1 (1998): 115-143. Blundell, Richard, and Stephen Bond. "GMM Estimation with Persistent Panel Data: An Application to Production Functions." Econometric Reviews 19, no. 3 (2000): 321- 340. Bronars, Stephen G., and Jeff Grogger. "The Economic Consequences of Unwed Motherhood: Using Twin Births as A Natural Experiment." The American Economic Review (1994): 1141-1156. Central Committee of the Chinese Communist Party. 1999. Working Regulation on the Rural Grassroots Organizations of the Chinese Communist Party (WRRGOCCP). (In Chinese) http://news.12371.cn/2015/03/11/ARTI1426061212036535.shtml Chan, Joseph. 2013. “Political Meritocracy and Meritorious Rule: A Confucian Perspective.” The East Asian Challenge for Democracy: Political Meritocracy in Comparative Perspective edited by Daniel A. Bell, and Chenyang Li, New York, NY : Cambridge University Press. Dal Bó, Ernesto, Frederico Finan, Olle Folke, Torsten Persson, and Johanna Rickne. 2017. “Who Becomes a Politician?” The Quarterly Journal of Economics 132 (4): 1877- 1914. De Janvry, Alain, Frederico Finan, and Elisabeth Sadoulet. 2012. “Local Electoral Incentives and Decentralized Program Performance.” Review of Economics and Statistics 94 (3): 672-685. Elman, Benjamin A. 2013. Civil Examinations and Meritocracy in Late Imperial China. Cambridge, MA: Harvard University Press. Ferraz, Claudio, and Frederico Finan. 2011. “Electoral Accountability and Corruption: 104 Evidence from the Audits of Local Governments.” American Economic Review 101 (4): 1274-1311. Freedman, David A. "Linear Statistical Models for Causation: A Critical Review." Wiley StatsRef: Statistics Reference Online (2005). Ghatak, Maitreesh. 1999. “Group Lending, Local Information and Peer Selection.” Journal of Development Economics 60 (1): 27-50. Hamilton, Alexander, James Madison, and John Jay. 2008. The Federalist Papers. New York City, NY: Oxford University Press. Hayek, Friedrich August. 1945. “The Use of Knowledge in Society.” The American Economic Review 35 (4): 519-530. Heckman, James J. "Dummy Endogenous Variables in a Simultaneous Equation System." Econometrica 46, no. 4 (1978): 931-959. Imbens, Guido. Instrumental Variables: An Econometrician's Perspective. No. w19983. National Bureau of Economic Research, 2014. Imbens, Guido W., and Joshua D. Angrist. "Identification and Estimation of Local Average Treatment Effects." Econometrica 62, no. 2 (1994): 467-475. Jones, Benjamin F., and Benjamin A. Olken. 2005. “Do Leaders Matter? National Leadership and Growth since World War II.” The Quarterly Journal of Economics 120 (3): 835-864. Kazin, Michael, Rebecca Edwards, and Adam Rothman, eds. 2009. The Princeton Encyclopedia of American Political History. (Two volume set). Princeton, NJ: Princeton University Press. Koop, Gary, Dale J. Poirier, and Justin L. Tobias. 2007. Bayesian Econometric Methods. New York City, NY: Cambridge University Press. Krueger, Alan B. "Experimental Estimates of Education Production Functions." The Quarterly Journal of Economics 114, no. 2 (1999): 497-532. Laffont, Jean-Jacques. 2001. Incentives and Political Economy. New York City, NY: Oxford University Press. Lancaster, Tony. 2004. An Introduction to Modern Bayesian Econometrics. Oxford: Blackwell. 105 Liu, Chengfang, Renfu Luo, Scott Rozelle, and Linxiu Zhang. 2009. “Infrastructure Investment in Rural China: Is Quality Being Compromised During Quantity Expansion?” The China Journal 61: 105-129. Loeper, Antoine. 2017. “Cross-border Externalities and Cooperation Among Representative Democracies.” European Economic Review (91): 180-208. Martinez-Bravo, Monica, Gerard Padró i Miquel, Nancy Qian, and Yang Yao. 2014. “Political Reform in China: Elections, Public Goods and Income Distribution.” SSRN Working Paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356343 Martinez-Bravo, Monica, Gerard Padró I. Miquel, Nancy Qian, and Yang Yao. The Rise and Fall of Local Elections in China: Theory and Empirical Evidence on the Autocrat's Trade-off. No. w24032. National Bureau of Economic Research, 2017. Martinez-Bravo, Monica, Nancy Qian, and Yang Yao. 2011. “Do Local Elections in Non- democracies Increase Accountability? Evidence from Rural China.” NBER Working Paper w16948. https://www.nber.org/papers/w16948 Martinez-Bravo, Monica, Nancy Qian, and Yang Yao. Elections in China. No. w18101. National Bureau of Economic Research, 2012. Menaldo, Victor. "The Fiscal Roots of Financial Underdevelopment." American Journal of Political Science 60, no. 2 (2016): 456-471. Milgrom, Paul, and John Roberts. 1986. “Price and Advertising Signals of Product Quality.” Journal of Political Economy 94 (4): 796-821. National People’s Congress of China. 1998. Organic Law of the Village Committees (OLVC). https://www.cecc.gov/resources/legal-provisions/organic-law-of-the-villagers- committees-of-the-peoples-republic-of-china O'Brien, Kevin J., and Rongbin Han. 2009. "Path to Democracy? Assessing Village Elections in China." Journal of Contemporary China 18 (60): 359-378. O’Brien, Kevin J., and Lianjiang Li. 2000. “Accommodating ‘Democracy’ in A One-party State: Introducing Village Elections in China.” The China Quarterly 162: 465-489. O'Brien, Kevin J., and Suisheng Zhao, eds. 2014. Grassroots Elections in China. 106 Abingdon: Routledge. Oi, Jean C., and Scott Rozelle. 2000. “Elections and Power: The Locus of Decision- making in Chinese Villages.” The China Quarterly 162: 513-539. Oreopoulos, Philip. "Estimating Average and Local Average Treatment Effects of Education When Compulsory Schooling Laws Really Matter." American Economic Review 96, no. 1 (2006): 152-175. Padró i Miquel, Gerard, Nancy Qian, Yiqing Xu, and Yang Yao. 2015. “Making Democracy Work: Culture, Social Capital and Elections in China.” Social Capital and Elections in China. NBER Working Paper w21058. https://www.nber.org/papers/w21058 Pearl, Judea. Causality. Cambridge University Press, 2009. Qian, Mu. 2012. Chinese Political Gain and Losses During the Past Dynasties. Hong Kong, SAR: SDX Joint Publishing Company. Robins, James M., Miguel Angel Hernán, and Babette Brumback. "Marginal Structural Models and Causal Inference in Epidemiology." Epidemiology 11, no. 5 (2000): 551. Rosenzweig, Mark R., and Kenneth I. Wolpin. "Testing the Quantity-Quality Fertility Model: The Use of Twins as A Natural Experiment." Econometrica (1980): 227-240. Rosenzweig, Mark R., and Kenneth I. Wolpin. "Natural" Natural Experiments" in Economics." Journal of Economic Literature 38, no. 4 (2000): 827-874. Sienkewicz, Thomas J. 2003. Encyclopedia of the Ancient World. Pasadena, CA : Salem Press. Sovey, Allison J., and Donald P. Green. "Instrumental Variables Estimation in Political Science: A Readers’ Guide." American Journal of Political Science 55, no. 1 (2011): 188-200. Spence, Michael. 1973. “Job Market Signaling.” The Quarterly Journal of Economics 87 (3): 355-374. Stock, James H., and Francesco Trebbi. "Retrospectives: Who Invented Instrumental Variable Regression?". Journal of Economic Perspectives 17, no. 3 (2003): 177-194. Tang, Huangfeng. 2016. “New Meritocracy: The Democratization and Modernization of Bureaucratic Selection in China.” (In Chinese). Fudan Journal: Social Science Edition 4: 144-154. 107 Todd, Petra E., and Kenneth I. Wolpin. "On the Specification and Estimation of the Production Function for Cognitive Achievement." The Economic Journal 113, no. 485 (2003). Wang, Yu, and Marc F. Bellemare. "Lagged Variables as Instruments." (2019). Wong, Ho Lun, Yu Wang, Renfu Luo, Linxiu Zhang, and Scott Rozelle. 2017. “Local Governance and the Quality of Local Infrastructure: Evidence from Village Road Projects in Rural China.” Journal of Public Economics 152: 119-132. Yao, Yang, and Muyang Zhang. 2015. “Subnational Leaders and Economic Growth: Evidence from Chinese Cities.” Journal of Economic Growth 20 (4): 405-436. Zhang, Weiwei. 2012. “Meritocracy Versus Democracy.” The New York Times 108 Appendices for Local Direct Elections, Local Information, and Meritocratic Selection A. Posterior Distribution of the Virtue and Capacity of Village Leader Candidates A.1 Virtue: Following Koop et al. (2007) and Lancaster (2004), we derive the density kernel of the posterior distribution of virtue, such that (A1) 𝑝𝑝(𝛼𝛼𝑖𝑖|𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶 ) = 𝛾𝛾(𝛼𝛼𝑖𝑖) ∙ [𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖1𝛼𝛼 ) … 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼) … 𝐿𝐿(𝛼𝛼𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 )]], where the vector 𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶 is [𝛺𝛺𝑖𝑖1𝛼𝛼 …𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 …𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 ]′. First, we derive 𝛾𝛾(𝛼𝛼𝑖𝑖), the density kernel of the prior distribution of virtue. As the prior distribution is a normal distribution truncated at [0, 1], we have (A2) 𝛾𝛾(𝛼𝛼𝑖𝑖) = 𝛾𝛾(𝛼𝛼𝑖𝑖|0 ≤ 𝛼𝛼𝑖𝑖 ≤ 1,𝛼𝛼𝑖𝑖𝐸𝐸) = 1𝜎𝜎𝛼𝛼𝑒𝑒2 𝜙𝜙[𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ] 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) = �2𝜋𝜋𝜎𝜎𝛼𝛼𝑒𝑒2 �−12∙𝐸𝐸𝑒𝑒𝐴𝐴�−�𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖𝑒𝑒�22𝜎𝜎𝛼𝛼𝑒𝑒2 � 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴 {−�𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖𝑒𝑒�2 2𝜎𝜎𝛼𝛼𝑒𝑒 2 } 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒), where 𝑃𝑃(𝛼𝛼𝑖𝑖|0 ≤ 𝛼𝛼𝑖𝑖 ≤ 1,𝛼𝛼𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) − 𝛷𝛷(−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) represents the probability that 𝛼𝛼𝑖𝑖 is at [0, 1], contingent on 𝛼𝛼𝑖𝑖𝐸𝐸. Second, we derive the joint density of 𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶 as a likelihood function given by (A3) 𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶) = ∏ �2𝜋𝜋𝜎𝜎𝜐𝜐𝛼𝛼2 �−12 ∙ 𝑒𝑒𝑒𝑒𝑝𝑝 �−�𝛺𝛺𝑖𝑖𝑡𝑡𝛼𝛼−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2 �𝑖𝑖𝑖𝑖=1 = �2𝜋𝜋𝜎𝜎𝜐𝜐𝛼𝛼2 �−𝑇𝑇2 ∙ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑ �𝛺𝛺𝑖𝑖𝑡𝑡𝛼𝛼−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2𝑖𝑖𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �− 1 2𝜎𝜎𝜐𝜐𝛼𝛼 2 ∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛼𝛼𝑖𝑖)2𝑖𝑖𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �− 1 2𝜎𝜎𝜐𝜐𝛼𝛼 2 ∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛺𝛺𝚤𝚤𝛼𝛼���� + 𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)2𝑖𝑖𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �− 1 2𝜎𝜎𝜐𝜐𝛼𝛼 2 [∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛺𝛺𝚤𝚤𝛼𝛼����)2𝑖𝑖𝑖𝑖=1 + ∑ (𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)2𝑖𝑖𝑖𝑖=1 + 2∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − 𝛺𝛺𝚤𝚤𝛼𝛼����)(𝛺𝛺𝚤𝚤𝛼𝛼���� −𝑖𝑖𝑖𝑖=1 𝛼𝛼𝑖𝑖)�, 109 where 𝛺𝛺𝚤𝚤𝛼𝛼���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼𝑖𝑖𝑖𝑖=1 . As ∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − Ω𝚤𝚤𝛼𝛼����)2𝑖𝑖𝑖𝑖=1 , the second-order moment, is a constant, 2∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − Ω𝚤𝚤𝛼𝛼����)(𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)𝑖𝑖𝑖𝑖=1 = 2(𝛺𝛺𝚤𝚤𝛼𝛼���� − 𝛼𝛼𝑖𝑖)∑ (𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼 − Ω𝚤𝚤𝛼𝛼����)𝑖𝑖𝑖𝑖=1 = 0, we have (A4) 𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶) ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑ �𝛺𝛺𝚤𝚤𝛼𝛼����−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2𝑖𝑖𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−𝑁𝑁 �𝛺𝛺𝚤𝚤𝛼𝛼����−𝛼𝛼𝑖𝑖�22𝜎𝜎𝜐𝜐𝛼𝛼2 �. Third, the density kernel of the posterior distribution of virtue is (A5) 𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶)𝛾𝛾(𝛼𝛼𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴�−𝑖𝑖�𝛺𝛺𝚤𝚤𝛼𝛼�����−𝛼𝛼𝑖𝑖� 2 2𝜎𝜎𝜐𝜐𝛼𝛼 2 � 𝐸𝐸𝑒𝑒𝐴𝐴� −�𝛼𝛼𝑖𝑖−𝛼𝛼𝑖𝑖 𝑒𝑒� 2 2𝜎𝜎𝛼𝛼𝑒𝑒 2 � 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) ∝ − 1 2 𝐸𝐸𝑒𝑒𝐴𝐴 { 𝑇𝑇 𝜎𝜎𝜐𝜐𝛼𝛼 2 �𝛺𝛺𝚤𝚤 𝛼𝛼����2−2𝛼𝛼𝑖𝑖𝛺𝛺𝚤𝚤 𝛼𝛼����+𝛼𝛼𝑖𝑖 2�+ 1 𝜎𝜎𝛼𝛼𝑒𝑒 2 �𝛼𝛼𝑖𝑖 2−2𝛼𝛼𝑖𝑖 𝑒𝑒𝛼𝛼𝑖𝑖+𝛼𝛼𝑖𝑖 𝑒𝑒2�} 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) ∝ − 1 2 𝐸𝐸𝑒𝑒𝐴𝐴 { 1 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 2−2 𝜎𝜎 𝜐𝜐𝛼𝛼 2 𝜎𝜎 𝛼𝛼𝑒𝑒 2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒 2 � 1 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 𝑒𝑒+ 𝑇𝑇 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝛺𝛺𝚤𝚤 𝛼𝛼������ 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖+ 𝑇𝑇 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝛺𝛺𝚤𝚤 𝛼𝛼����2+ 1 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 𝑒𝑒2} 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) ∝ − 1 2 𝐸𝐸𝑒𝑒𝐴𝐴 { 1 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 2−2 𝜎𝜎 𝜐𝜐𝛼𝛼 2 𝜎𝜎 𝛼𝛼𝑒𝑒 2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒 2 � 1 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖 𝑒𝑒+ 𝑇𝑇 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝛺𝛺𝚤𝚤 𝛼𝛼������ 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒 2 𝛼𝛼𝑖𝑖} 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) . Completing the square in the numerator of (A5), we have, (A6) 𝐿𝐿(𝛼𝛼𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝜶𝜶) ∙ 𝛾𝛾(𝛼𝛼𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴⎩⎪⎨ ⎪ ⎧ − 1 2 [ 1 𝜎𝜎𝜐𝜐𝛼𝛼 2 𝜎𝜎𝛼𝛼𝑒𝑒 2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒 2 [𝛼𝛼𝑖𝑖−{ 𝜎𝜎𝜐𝜐𝛼𝛼2 𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑇𝑇𝜎𝜎𝛼𝛼𝑒𝑒2 � 1𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝑒𝑒+ 𝑇𝑇𝜎𝜎𝜐𝜐𝛼𝛼2 𝛺𝛺𝚤𝚤𝛼𝛼�����}]2] ⎭ ⎪ ⎬ ⎪ ⎫ 𝑉𝑉(𝛼𝛼𝑖𝑖|0≤𝛼𝛼𝑖𝑖≤1,𝛼𝛼𝑖𝑖𝑒𝑒) , where 𝑃𝑃(𝛼𝛼𝑖𝑖|0 ≤ 𝛼𝛼𝑖𝑖 ≤ 1,𝛼𝛼𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) − 𝛷𝛷(−𝛼𝛼𝑖𝑖𝑒𝑒𝜎𝜎𝛼𝛼𝑒𝑒2 ) . Therefore, this density kernel is still a truncated normal distribution, in which the posterior variance of virtue is (A7) 𝛴𝛴(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2 𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 and the posterior mean of virtue is (A8) 𝑆𝑆(𝛼𝛼𝑖𝑖) = 𝛴𝛴(𝛼𝛼𝑖𝑖) � 1𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜐𝜐𝛼𝛼2 𝛺𝛺𝚤𝚤𝛼𝛼����� = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛺𝛺𝚤𝚤𝛼𝛼����. As 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼~𝑁𝑁(𝛼𝛼𝑖𝑖,𝜎𝜎𝜐𝜐𝛼𝛼2 ) and 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 (𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴) → +∞, we have (A9) 𝛺𝛺𝚤𝚤𝛼𝛼���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼𝑖𝑖𝑖𝑖=1 = 𝐸𝐸(𝛺𝛺𝑖𝑖𝑖𝑖𝛼𝛼) = 𝛼𝛼𝑖𝑖. Thus, the posterior mean of virtue is (A10) 𝑆𝑆(𝛼𝛼𝑖𝑖) = 𝜎𝜎𝜐𝜐𝛼𝛼2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖. 110 A.2 Capacity: Similar to virtue, we derive the density kernel of the posterior distribution of capacity, such that (A11) 𝑝𝑝�𝜃𝜃𝑖𝑖�𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 � = 𝛾𝛾(𝜃𝜃𝑖𝑖) ∙ [𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖1𝜃𝜃 �… 𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 �… 𝐿𝐿�𝜃𝜃𝑖𝑖;𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 �]], where the vector 𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 is [𝛺𝛺𝑖𝑖1𝜃𝜃 …𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 …𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 ]. First, we derive 𝛾𝛾(𝜃𝜃𝑖𝑖), the density kernel of the prior distribution of capacity. As the prior distribution is a normal distribution truncated at [0, 1], we have (A12) 𝛾𝛾(𝜃𝜃𝑖𝑖) = 𝛾𝛾(𝜃𝜃𝑖𝑖|0 ≤ 𝜃𝜃𝑖𝑖 ≤ 1,𝜃𝜃𝑖𝑖𝐸𝐸) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴 {−�𝜃𝜃𝑖𝑖−𝜃𝜃𝑖𝑖𝑒𝑒�2 2𝜎𝜎𝜃𝜃𝑒𝑒 2 } 𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒), where 𝑃𝑃(𝜃𝜃𝑖𝑖|0 ≤ 𝜃𝜃𝑖𝑖 ≤ 1,𝜃𝜃𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) − 𝛷𝛷(−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) represents the probability that 𝜃𝜃𝑖𝑖 is at [0, 1], contingent on 𝜃𝜃𝑖𝑖𝐸𝐸. Second, we derive the joint density of 𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 as a likelihood function given by (A13) 𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 � = ∏ �2𝜋𝜋𝜎𝜎𝜔𝜔𝜃𝜃2 �−12 ∙ 𝑒𝑒𝑒𝑒𝑝𝑝 �−�𝛺𝛺𝑖𝑖𝑡𝑡𝜃𝜃−𝜃𝜃𝑖𝑖�22𝜎𝜎 𝜔𝜔𝜃𝜃 2 � 𝑖𝑖 𝑖𝑖=1 ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑ �𝛺𝛺𝑖𝑖𝑡𝑡 𝜃𝜃−𝛺𝛺𝚤𝚤 𝜃𝜃����+𝛺𝛺𝚤𝚤 𝜃𝜃����−𝜃𝜃𝑖𝑖� 2 2𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝑖𝑖 𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 � −1 2𝜎𝜎 𝜔𝜔𝜃𝜃 2 [∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃�����2𝑖𝑖𝑖𝑖=1 + ∑ �𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�2𝑖𝑖𝑖𝑖=1 + 2∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃������𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�𝑖𝑖𝑖𝑖=1 �, where 𝛺𝛺𝚤𝚤𝜃𝜃���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃𝑖𝑖𝑖𝑖=1 . As ∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃�����2𝑖𝑖𝑖𝑖=1 , the second-order moment, is a constant, 2∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃������𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�𝑖𝑖𝑖𝑖=1 = 2�𝛺𝛺𝚤𝚤𝜃𝜃���� − 𝜃𝜃𝑖𝑖�∑ �𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃 − 𝛺𝛺𝚤𝚤𝜃𝜃�����𝑖𝑖𝑖𝑖=1 = 0, we have (A14) 𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−∑ �𝛺𝛺𝚤𝚤𝜃𝜃����−𝜃𝜃𝑖𝑖�22𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝑖𝑖 𝑖𝑖=1 � ∝ 𝑒𝑒𝑒𝑒𝑝𝑝 �−𝑁𝑁 �𝛺𝛺𝚤𝚤 𝜃𝜃����−𝜃𝜃𝑖𝑖� 2 2𝜎𝜎 𝜔𝜔𝜃𝜃 2 �. Third, the density kernel of the posterior distribution of capacity is (A15) 𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 �𝛾𝛾(𝜃𝜃𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴�−𝑖𝑖�𝛺𝛺𝚤𝚤𝜃𝜃�����−𝜃𝜃𝑖𝑖� 2 2𝜎𝜎 𝜔𝜔𝜃𝜃 2 �exp� −�𝜃𝜃𝑖𝑖−𝜃𝜃𝑖𝑖 𝑒𝑒� 2 2𝜎𝜎𝛼𝛼𝑒𝑒 2 � 𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒) ∝ − 1 2 𝐸𝐸𝑒𝑒𝐴𝐴 { 𝑇𝑇 𝜎𝜎 𝜔𝜔𝜃𝜃 2 �𝛺𝛺𝚤𝚤 𝜃𝜃����2−2𝜃𝜃𝑖𝑖𝛺𝛺𝚤𝚤 𝜃𝜃����+𝜃𝜃𝑖𝑖 2�+ 1 𝜎𝜎𝜃𝜃𝑒𝑒 2 �𝜃𝜃𝑖𝑖 2−2𝜃𝜃𝑖𝑖 𝑒𝑒𝜃𝜃𝑖𝑖+𝜃𝜃𝑖𝑖 𝑒𝑒2�} 𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒) 111 ∝ − 1 2 𝐸𝐸𝑒𝑒𝐴𝐴 { 1 𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 2−2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝜎𝜎 𝜃𝜃𝑒𝑒 2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒 2 � 1 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 𝑒𝑒+ 𝑇𝑇 𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝛺𝛺𝚤𝚤 𝜃𝜃������ 𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖} 𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒) . Completing the square in the numerator of (A15), we have (A16) 𝐿𝐿�𝜃𝜃𝑖𝑖;𝜴𝜴𝒊𝒊𝒊𝒊𝑼𝑼 �𝛾𝛾(𝜃𝜃𝑖𝑖) ∝ 𝐸𝐸𝑒𝑒𝐴𝐴⎩⎪⎨ ⎪ ⎧ − 1 2 [ 1 𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒 2 [𝜃𝜃𝑖𝑖−{ 𝜎𝜎𝜔𝜔𝜃𝜃2 𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑇𝑇𝜎𝜎𝜃𝜃𝑒𝑒 2 � 1 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 𝑒𝑒+ 𝑇𝑇 𝜎𝜎 𝜔𝜔𝜃𝜃 2 𝛺𝛺𝚤𝚤 𝜃𝜃�����}]2] ⎭ ⎪ ⎬ ⎪ ⎫ 𝑉𝑉(𝜃𝜃𝑖𝑖|0≤𝜃𝜃𝑖𝑖≤1,𝜃𝜃𝑖𝑖𝑒𝑒) , where 𝑃𝑃(𝜃𝜃𝑖𝑖|0 ≤ 𝜃𝜃𝑖𝑖 ≤ 1,𝜃𝜃𝑖𝑖𝐸𝐸) = 𝛷𝛷(1−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) − 𝛷𝛷(−𝜃𝜃𝑖𝑖𝑒𝑒𝜎𝜎𝜃𝜃𝑒𝑒2 ) . Therefore, this density kernel is still a truncated normal distribution, in which the posterior variance of capacity is (A17) 𝛴𝛴(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2 𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 and the posterior mean of capacity is (A18) 𝑆𝑆(𝜃𝜃𝑖𝑖) = 𝛴𝛴(𝜃𝜃𝑖𝑖) � 1𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜔𝜔𝜃𝜃2 𝛺𝛺𝚤𝚤𝜃𝜃����� = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 𝛺𝛺𝚤𝚤𝜃𝜃����. As 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃~𝑁𝑁(𝜃𝜃𝑖𝑖 ,𝜎𝜎𝜔𝜔𝜃𝜃2 ) and 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 (𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴) → +∞, we have (A19) 𝛺𝛺𝚤𝚤𝜃𝜃���� = 1𝑖𝑖 ∑ 𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃𝑖𝑖𝑖𝑖=1 = 𝐸𝐸�𝛺𝛺𝑖𝑖𝑖𝑖𝜃𝜃� = 𝜃𝜃𝑖𝑖. Thus, the posterior mean of capacity is (A20) 𝑆𝑆(𝜃𝜃𝑖𝑖) = 𝜎𝜎𝜔𝜔𝜃𝜃2𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 𝐸𝐸 + 𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝜎𝜎𝜃𝜃𝑒𝑒 2 𝜃𝜃𝑖𝑖 . 112 B. Comparison of the Expectation and Variance of the Competence of Village Leaders Before and After the Introduction of Local Direct Elections B.1 Expectation: We compare, in a representative village, the expectation of the competence of the elected village leader and the appointed village leader. Specifically, we compare (B1) [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]𝐴𝐴𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 10 𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖 ∫ ∫ 𝐴𝐴𝑖𝑖 𝐸𝐸𝐸𝐸𝑒𝑒 1 0 𝑑𝑑𝛼𝛼𝑖𝑖 1 0 𝑑𝑑𝜃𝜃𝑖𝑖 and (B2) [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 ≡ 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖) = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]𝐴𝐴𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴10 𝑑𝑑𝛼𝛼𝑖𝑖10 𝑑𝑑𝜃𝜃𝑖𝑖 ∫ ∫ 𝐴𝐴𝑖𝑖 𝐴𝐴𝐴𝐴𝐴𝐴1 0 𝑑𝑑𝛼𝛼𝑖𝑖 1 0 𝑑𝑑𝜃𝜃𝑖𝑖 . And we know that (B3) 𝐴𝐴𝑖𝑖 𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴) = 𝜇𝜇𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) + (1 − 𝜇𝜇)𝑙𝑙𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖), where (B4) 𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖) = 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + �1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝛼𝛼𝑖𝑖)𝜎𝜎𝛼𝛼𝑒𝑒2 � 𝛼𝛼𝑖𝑖 = 𝜎𝜎𝜐𝜐𝛼𝛼2 𝜎𝜎𝜐𝜐𝛼𝛼 2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2𝜎𝜎𝜐𝜐𝛼𝛼2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝛼𝛼𝑒𝑒2 𝛼𝛼𝑖𝑖, (B5) 𝑆𝑆𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖) = �𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + �1 − 𝛴𝛴𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜃𝜃𝑖𝑖)𝜎𝜎𝜃𝜃𝑒𝑒2 � 𝜃𝜃𝑖𝑖� = 𝜎𝜎𝜔𝜔𝜃𝜃2 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2𝜎𝜎𝜔𝜔𝜃𝜃2 +𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒 (𝐴𝐴𝐴𝐴𝐴𝐴)𝜎𝜎𝜃𝜃𝑒𝑒2 𝜃𝜃𝑖𝑖. As 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 , comparing the competence of the elected village leader and appointed village leader is equivalent to having (B6) 𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖][𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 ∫ ∫ �𝜇𝜇𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝜃𝜃𝑖𝑖)� 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 , where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖 , 𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝜈𝜈𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖 ; 𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖 , 𝑦𝑦 ≡ 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝑖𝑖 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝜎𝜎𝜃𝜃𝑒𝑒 2 𝑖𝑖 . 𝑁𝑁 = 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 𝐶𝐶𝑉𝑉 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴. Then we need to prove that (B7) 𝜕𝜕𝜕𝜕 𝜕𝜕𝑖𝑖 = 𝜕𝜕𝜕𝜕 𝜕𝜕𝑒𝑒 ∙ 𝜕𝜕𝑒𝑒 𝜕𝜕𝑖𝑖 + 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 ∙ 𝜕𝜕𝜕𝜕 𝜕𝜕𝑖𝑖 > 0. It is easy to prove that 𝜕𝜕𝑒𝑒 𝜕𝜕𝑖𝑖 > 0 and 𝜕𝜕𝜕𝜕 𝜕𝜕𝑖𝑖 > 0, so we only need to prove that 𝜕𝜕𝜕𝜕 𝜕𝜕𝑒𝑒 > 0 and 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 > 0. First, simplify the numerator of (B6), � � [𝜇𝜇𝛼𝛼𝑖𝑖 + (1 − 𝜇𝜇)𝜃𝜃𝑖𝑖]{𝜇𝜇𝑙𝑙[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]}𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1 0 1 0 113 = ∫ ∫ � 𝜇𝜇2𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]+𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖] + 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]� 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 , where (B8) ∫ ∫ 𝜇𝜇2𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = ∫ ∫ 𝜇𝜇2𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸𝛼𝛼𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇2𝑙𝑙𝑒𝑒𝛼𝛼𝑖𝑖2𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = 1 2 𝜇𝜇2𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 13 𝜇𝜇2𝑙𝑙𝑒𝑒, (B9) ∫ ∫ (1 − 𝜇𝜇)2𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = ∫ ∫ (1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ (1 − 𝜇𝜇)2𝑙𝑙𝑦𝑦𝜃𝜃𝑖𝑖2𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = 1 2 (1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 13 (1 − 𝜇𝜇)2𝑙𝑙𝑦𝑦, (B10) ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖𝑦𝑦𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = 1 2 𝜇𝜇(1 − 𝜇𝜇)(1 − 𝑦𝑦)𝑙𝑙𝜃𝜃𝑖𝑖𝐸𝐸 + 14 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝑦𝑦, (B11) ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝑒𝑒𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = 1 2 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 14 𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝑒𝑒. Therefore, the numerator of (B6) becomes (B12) 1 2 𝜇𝜇𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 12 (1 − 𝜇𝜇)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 112 𝜇𝜇(𝜇𝜇 + 3)𝑙𝑙𝑒𝑒 + 112 (𝜇𝜇 − 4)(𝜇𝜇 − 1)𝑙𝑙𝑦𝑦. Second, simplify the denominator of (B6), (B13) ∫ ∫ {𝜇𝜇𝑙𝑙[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]}𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = ∫ ∫ 𝜇𝜇𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ 𝜇𝜇𝑙𝑙𝑒𝑒𝛼𝛼𝑖𝑖 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 +∫ ∫ (1 − 𝜇𝜇)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 + ∫ ∫ (1 − 𝜇𝜇)𝑙𝑙𝑦𝑦𝜃𝜃𝑖𝑖 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 = 𝜇𝜇𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 12 𝜇𝜇𝑙𝑙𝑒𝑒 + (1 − 𝜇𝜇)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 12 (1 − 𝜇𝜇)𝑙𝑙𝑦𝑦, Accordingly, (B6) becomes (B14) 𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = 12𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(𝜇𝜇+3)𝐸𝐸𝑒𝑒+ 112(𝜇𝜇−4)(𝜇𝜇−1)𝐸𝐸𝜕𝜕 𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 , where the first order derivative of the numerator of (B14) with respect to 𝑒𝑒 is − 1 2 𝜇𝜇𝑙𝑙𝛼𝛼𝑖𝑖 𝐸𝐸 + 1 12 𝜇𝜇(𝜇𝜇 + 3)𝑙𝑙, and the first order derivative of the denominator of (B14) 114 with respect to 𝑒𝑒 is −𝜇𝜇𝑙𝑙𝛼𝛼𝑖𝑖𝐸𝐸 + 12 𝜇𝜇𝑙𝑙. Then, we can obtain (B15) 𝜕𝜕𝜕𝜕 𝜕𝜕𝑒𝑒 = 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸𝜕𝜕𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(2𝜇𝜇−1)𝐸𝐸𝜕𝜕 �𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 = 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2𝜇𝜇�1−𝛼𝛼𝑖𝑖𝑒𝑒�+2𝛼𝛼𝑖𝑖𝑒𝑒−1� �𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 . To ensure that 𝜕𝜕𝜕𝜕 𝜕𝜕𝑒𝑒 > 0, we need to have 𝜇𝜇(1 − 𝜇𝜇)(2𝜇𝜇 − 1) ≥ 0, as shown in the first step of (B15), that is, 1 ≥ 𝜇𝜇 ≥ 1 2 or 𝜇𝜇 = 0 . However, when 0 < 𝜇𝜇 < 1 2 , we need to rearrange the items of the numerator in (B15), as shown in the second step of (B15), so that when we have 2𝜇𝜇(1 − 𝛼𝛼𝑖𝑖𝐸𝐸) + 2𝛼𝛼𝑖𝑖𝐸𝐸 − 1 ≥ 2𝜇𝜇(1 − 𝛼𝛼𝑖𝑖𝐸𝐸) ≥ 0, that is, 𝛼𝛼𝑖𝑖𝐸𝐸 ≥ 12, we have 𝜕𝜕𝜕𝜕 𝜕𝜕𝑒𝑒 > 0 . In other words, 𝜕𝜕𝜕𝜕 𝜕𝜕𝑒𝑒 > 0 holds, when 1 ≥ 𝜇𝜇 ≥ 1 2 , or 𝜇𝜇 = 0 , or both 0 < 𝜇𝜇 < 1 2 and 𝛼𝛼𝑖𝑖𝐸𝐸 ≥ 1 2 . Now we discuss 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 . The first order derivative of the numerator of (B14) with respect to 𝑦𝑦 is = −1 2 (1 − 𝜇𝜇)𝑙𝑙 ∙ 𝜃𝜃𝑖𝑖𝐸𝐸 + 112 (𝜇𝜇 − 4)(𝜇𝜇 − 1)𝑙𝑙, and the first order derivative of the denominator of (B14) with respect to 𝑦𝑦 is −(1 − 𝜇𝜇)𝑙𝑙 ∙ 𝜃𝜃𝑖𝑖𝐸𝐸 + 12 (1 − 𝜇𝜇)𝑙𝑙. Then, we can obtain (B16) 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 = 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸𝑒𝑒𝜃𝜃𝑖𝑖𝑒𝑒+ 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(1−2𝜇𝜇)𝐸𝐸𝑒𝑒 �𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 = 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)𝐸𝐸𝑒𝑒�1−2𝜇𝜇+2𝜇𝜇∙𝜃𝜃𝑖𝑖𝑒𝑒� �𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 . To ensure that 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 > 0, we need to have 𝜇𝜇(1 − 𝜇𝜇)(1 − 2𝜇𝜇) ≥ 0, as shown in the first step of (B16), that is, 0 ≤ 𝜇𝜇 ≤ 1 2 or 𝜇𝜇 = 1 . However, when 1 2 < 𝜇𝜇 < 1 , we need to rearrange the items of the numerator in (B16), as shown in the second step of (B16), so that when we have 1 − 2𝜇𝜇 + 2𝜇𝜇 ∙ 𝜃𝜃𝑖𝑖𝐸𝐸 ≥ 1 − 2𝜇𝜇 + 𝜇𝜇 ≥ 0, that is, 𝜃𝜃𝑖𝑖𝐸𝐸 ≥ 12, we have 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 > 0. In other words, 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 > 0 holds, when 0 ≤ 𝜇𝜇 ≤ 1 2 , or 𝜇𝜇 = 1, or both 1 2 < 𝜇𝜇 < 1 and 𝜃𝜃𝑖𝑖𝐸𝐸 ≥ 1 2 . By proving that 𝜕𝜕𝜕𝜕 𝜕𝜕𝑒𝑒 > 0 and 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 > 0 , we prove that 𝜕𝜕𝜕𝜕 𝜕𝜕𝑖𝑖 > 0 . As 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 , we know that 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖). 115 In summary, when 𝜇𝜇 = 0, 1, 1 2 , we have [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖) ≡[𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 . When 0 < 𝜇𝜇 < 1 2 , by assuming that 𝛼𝛼𝑖𝑖𝐸𝐸 ≥ 1 2 , we have [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖) ≡ [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴 . When 12 < 𝜇𝜇 < 1 , by assuming that 𝜃𝜃𝑖𝑖𝐸𝐸 ≥ 12 , we have [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐸𝐸𝐸𝐸𝐸𝐸 ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 (𝜋𝜋𝑖𝑖) ≡ [𝜋𝜋]𝑉𝑉𝑉𝑉,𝐴𝐴𝐴𝐴𝐴𝐴. B.2 Variance: We then compare, in a representative village, the variance of the competence of the elected village leader and appointed village leader. We know that (B17) 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)(𝜋𝜋𝑖𝑖) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[(𝜋𝜋𝑖𝑖)2] − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)2[(𝜋𝜋𝑖𝑖)]. We first define 𝑔𝑔(𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)) ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[(𝜋𝜋𝑖𝑖)2] and we have (B18) 𝑔𝑔�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]2[𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 ∫ ∫ [𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 , where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖 , 𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖 , 𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝐶𝐶𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖 , and 𝑦𝑦 ≡ 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝑖𝑖 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝜎𝜎𝜃𝜃𝑒𝑒 2 𝑖𝑖 . First, we simplify the numerator of 𝑔𝑔(∙). (B19) ∫ ∫ [𝜇𝜇2𝑙𝑙𝛼𝛼𝑖𝑖2 + (1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖2 + 2𝜇𝜇(1 − 𝜇𝜇)𝑙𝑙𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖] ∙{𝜇𝜇𝑙𝑙[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)𝑙𝑙[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]}𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 = � � � 𝜇𝜇3𝑙𝑙𝛼𝛼𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + 𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖2[(1− 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]+𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + (1 − 𝜇𝜇)3𝑙𝑙𝜃𝜃𝑖𝑖2[(1− 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]+2𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖] + 2𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]�𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 1 0 1 0 , where (B20) ∫ ∫ 𝜇𝜇3𝑙𝑙𝛼𝛼𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 = 13 𝜇𝜇3𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 14 𝜇𝜇3𝑙𝑙𝑒𝑒, (B21) ∫ ∫ 𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖2[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 = 1 3 𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 16 𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝑦𝑦, (B22) ∫ ∫ 𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝜃𝜃𝑖𝑖2[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 = 1 3 𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 16 𝜇𝜇𝑙𝑙(1 − 𝜇𝜇)2𝑒𝑒, (B23) ∫ ∫ (1 − 𝜇𝜇)3𝑙𝑙𝜃𝜃𝑖𝑖2[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 = 1 3 (1 − 𝜇𝜇)3𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 14 (1 − 𝜇𝜇)3𝑙𝑙𝑦𝑦, (B24) ∫ ∫ 2𝜇𝜇2𝑙𝑙(1 − 𝜇𝜇)𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 = 1 2 𝜇𝜇2(1 − 𝜇𝜇)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 13 𝜇𝜇2(1 − 𝜇𝜇)𝑙𝑙𝑒𝑒, 116 (B25) ∫ ∫ 2𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝛼𝛼𝑖𝑖𝜃𝜃𝑖𝑖[(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖]1010 𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖 = 1 2 𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 13 𝜇𝜇(1 − 𝜇𝜇)2𝑙𝑙𝑦𝑦. The numerator of 𝑔𝑔(∙) becomes 1 6 𝜇𝜇(𝜇𝜇2 − 𝜇𝜇 + 2)𝑙𝑙(1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 16 (1 − 𝜇𝜇)(𝜇𝜇2 − 𝜇𝜇 +2)𝑙𝑙(1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 112 𝜇𝜇(𝜇𝜇2 + 2)𝑙𝑙𝑒𝑒 + 112 (1 − 𝜇𝜇)(𝜇𝜇2 − 2𝜇𝜇 + 3)𝑙𝑙𝑦𝑦. Therefore, we have (B26) 𝑔𝑔(∙) = 1 6 𝜇𝜇�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+16(1−𝜇𝜇)�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇�𝜇𝜇2+2�𝐸𝐸𝑒𝑒+ 112(1−𝜇𝜇)�𝜇𝜇2−2𝜇𝜇+3�𝐸𝐸𝜕𝜕 𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 . Then, we define 𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� ≡ 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 (𝐴𝐴𝐴𝐴𝐴𝐴)[𝜋𝜋𝑖𝑖] and we know that 𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = 12𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(𝜇𝜇+3)𝐸𝐸𝑒𝑒+ 112(𝜇𝜇−4)(𝜇𝜇−1)𝐸𝐸𝜕𝜕 𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 from (B14). From (B17), we know that (B27) 𝜕𝜕𝑉𝑉𝑉𝑉𝑉𝑉(∙) 𝜕𝜕𝑖𝑖 = 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 − 2𝑓𝑓(∙) ∙ 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 . It is known that 𝑓𝑓(∙) > 0, 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 > 0. We then need to derive 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 = 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑒𝑒 ∙ 𝜕𝜕𝑒𝑒 𝜕𝜕𝑖𝑖 + 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝜕𝜕 ∙ 𝜕𝜕𝜕𝜕 𝜕𝜕𝑖𝑖 . It is known that 𝜕𝜕𝑒𝑒 𝜕𝜕𝑖𝑖 > 0, 𝜕𝜕𝜕𝜕 𝜕𝜕𝑖𝑖 > 0, therefore, we have (B28) 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑒𝑒 = 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸𝜕𝜕𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(2𝜇𝜇−1)𝐸𝐸𝜕𝜕[𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕]2 , (B29) 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝜕𝜕 = 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸𝑒𝑒𝜃𝜃𝑖𝑖𝑒𝑒+ 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(1−2𝜇𝜇)𝐸𝐸𝑒𝑒 �𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 , Thus, 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 = 112𝜇𝜇3𝐸𝐸𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸𝜕𝜕𝛼𝛼𝑖𝑖𝑒𝑒+ 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(2𝜇𝜇−1)𝐸𝐸𝜕𝜕 �𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 ∙ 𝜕𝜕𝑒𝑒𝜕𝜕𝑖𝑖 + 112𝜇𝜇2(1−𝜇𝜇)𝐸𝐸𝑒𝑒𝜃𝜃𝑖𝑖𝑒𝑒+ 112(1−𝜇𝜇)3𝐸𝐸𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(1−𝜇𝜇)2𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+ 124𝜇𝜇(1−𝜇𝜇)(1−2𝜇𝜇)𝐸𝐸𝑒𝑒 �𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕�2 ∙ 𝜕𝜕𝜕𝜕𝜕𝜕𝑖𝑖. It is easy to prove that 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 = 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 . In addition, from (B14), we can deduce that 𝑓𝑓(∙) > 1 2 , as the numerator of 𝑓𝑓(∙) is twice as large as the denominator of 𝑓𝑓(∙). As a result, we can deduce from (B27) that 𝜕𝜕𝑉𝑉𝑉𝑉𝑉𝑉(∙) 𝜕𝜕𝑖𝑖 = 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 − 2𝑓𝑓(∙) ∙ 𝜕𝜕𝜕𝜕(∙) 𝜕𝜕𝑖𝑖 < 0 . As 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴, we know that 𝑉𝑉𝑉𝑉𝑉𝑉𝐸𝐸𝐸𝐸𝐸𝐸 (𝜋𝜋𝑖𝑖) < 𝑉𝑉𝑉𝑉𝑉𝑉𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋𝑖𝑖). 117 C. Comparison of the Expectation and the Variance of the Competence of Village Party Secretaries Before and After the Introduction of Local Direct Elections C.1 Expectation: We compare, in a representative village, the expectation of the competence of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are directly elected by local village residents, and that expectation of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are appointed by township officials. Note that 𝜋𝜋𝑗𝑗 ≡ 𝜇𝜇𝛼𝛼𝑗𝑗 + (1 − 𝜇𝜇)𝜃𝜃𝑗𝑗 , 𝜋𝜋?̃?𝚥 ≡ 𝜇𝜇𝛼𝛼?̃?𝚥 + (1 − 𝜇𝜇)𝜃𝜃?̃?𝚥 , 𝜋𝜋𝑗𝑗𝑢𝑢 ≡ 𝜇𝜇𝛼𝛼𝑗𝑗𝑢𝑢 + (1 − 𝜇𝜇)𝜃𝜃𝑗𝑗𝑢𝑢 , and 𝜋𝜋?̃?𝚥 𝑢𝑢 ≡ 𝜇𝜇𝛼𝛼?̃?𝚥 𝑢𝑢 + (1 − 𝜇𝜇)𝜃𝜃?̃?𝚥𝑢𝑢. Specifically, we compare (C1) [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐸𝐸𝐸𝐸𝑒𝑒𝑉𝑉𝑗𝑗𝐸𝐸𝐸𝐸𝑒𝑒10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� ∫ 𝑉𝑉𝑗𝑗 𝐸𝐸𝐸𝐸𝑒𝑒1 0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥� 1 0 𝑑𝑑𝜋𝜋𝚥𝚥� and (C2) [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴 = 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� = ∫ [𝜋𝜋𝑗𝑗]𝐴𝐴𝐴𝐴𝐴𝐴𝑉𝑉𝑗𝑗𝐴𝐴𝐴𝐴𝐴𝐴10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� ∫ 𝑉𝑉𝑗𝑗 𝐴𝐴𝐴𝐴𝐴𝐴1 0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥� 1 0 𝑑𝑑𝜋𝜋𝚥𝚥� . In a general way, the expectation of the competence of the village party secretary is (C3) [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆 = 𝔼𝔼(𝜋𝜋𝑗𝑗,?̃?𝚥) = ∫ 𝜋𝜋𝑗𝑗𝑉𝑉𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥�𝑉𝑉𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� ∫ 𝑉𝑉𝑗𝑗 1 0 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝑉𝑉𝚥𝚥� 1 0 𝑑𝑑𝜋𝜋𝚥𝚥� , where (C4) 𝑅𝑅𝑗𝑗 = 𝜇𝜇𝜇𝜇 � 𝜎𝜎𝜀𝜀2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗𝑢𝑢 + 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2𝜎𝜎𝜀𝜀2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢2 𝛼𝛼𝑗𝑗� +(1 − 𝜇𝜇)𝜇𝜇 � 𝜎𝜎𝜂𝜂2 𝜎𝜎𝜂𝜂 2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 𝜃𝜃𝑗𝑗 𝑢𝑢 + 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢2 𝜎𝜎𝜂𝜂 2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 𝜃𝜃𝑗𝑗�, where 𝜑𝜑𝛼𝛼 ≡ 𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢 2 𝜎𝜎𝜀𝜀 2+𝑗𝑗𝜎𝜎𝛼𝛼𝑢𝑢 2 , 𝜑𝜑𝜃𝜃 ≡ 𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 𝜎𝜎𝜂𝜂 2+𝑗𝑗𝜎𝜎𝜃𝜃𝑢𝑢 2 . Because 𝑚𝑚 → ∞, we have 𝑝𝑝 lim 𝑗𝑗→∞ 𝜑𝜑𝛼𝛼 = 𝑝𝑝 lim 𝑗𝑗→∞ 𝜑𝜑𝜃𝜃 ≡ 𝜑𝜑. Therefore, (C5) 𝑅𝑅𝑗𝑗 = 𝜇𝜇𝜇𝜇�(1 − 𝜑𝜑)𝛼𝛼𝑗𝑗𝑢𝑢 + 𝜑𝜑𝛼𝛼𝑗𝑗� + (1 − 𝜇𝜇)𝜇𝜇�(1 − 𝜑𝜑)𝛼𝛼𝑗𝑗𝑢𝑢 + 𝜑𝜑𝛼𝛼𝑗𝑗� = (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜑𝜑𝜇𝜇𝜋𝜋𝑗𝑗. Similarly, (C6) 𝑅𝑅?̃?𝚥 = (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢 + 𝜑𝜑𝜇𝜇𝜋𝜋?̃?𝚥. As a result, we have (C7) 𝔼𝔼(𝜋𝜋𝑗𝑗,?̃?𝚥) = ∫ 𝜋𝜋𝑗𝑗�(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝑗𝑗�10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ 𝜋𝜋𝚥𝚥��(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝚥𝚥��10 𝑑𝑑𝜋𝜋𝚥𝚥� ∫ �(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝑗𝑗�10 𝑑𝑑𝜋𝜋𝑗𝑗+∫ �(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜋𝜋𝚥𝚥��10 𝑑𝑑𝜋𝜋𝚥𝚥� 118 = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢 ∫ 𝜋𝜋𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝑗𝑗210 𝑑𝑑𝜋𝜋𝑗𝑗+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢 ∫ 𝜋𝜋𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥�+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝚥𝚥�210 𝑑𝑑𝜋𝜋𝚥𝚥�(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝑗𝑗10 𝑑𝑑𝜋𝜋𝑗𝑗+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤∫ 𝜋𝜋𝚥𝚥�10 𝑑𝑑𝜋𝜋𝚥𝚥� . ∫ 𝜋𝜋𝑗𝑗 1 0 𝑑𝑑𝜋𝜋𝑗𝑗 ≡ 𝐸𝐸(𝜋𝜋𝑗𝑗) represents the expectation of the competence of all village committee members, who are the Type-I village party branch members. As discussed in Sections 2.3.A. and 2.3.B., the inferences and selection of each village committee member, including the village leader as representative, are homogenous. Therefore, the expectation of the competence of each village committee member is homogenous. In other words, 𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗), the expectation of the competence, is the same for each 𝑗𝑗𝑘𝑘. As a result, 𝐸𝐸�𝜋𝜋𝑗𝑗� = 𝐸𝐸�𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗) � = 𝔼𝔼1(𝜋𝜋1), where 𝑗𝑗 = 1 represents the village leader as representative of all village committee members, and 𝔼𝔼1(𝜋𝜋1) represents the expectation of the competence of the village leader, such that (C8) 𝔼𝔼1(𝜋𝜋1) ≡ 𝑓𝑓(𝑁𝑁) ≡ 𝑓𝑓�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖][𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 ∫ ∫ �𝜇𝜇𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆𝐸𝐸𝐸𝐸𝑒𝑒(𝜃𝜃𝑖𝑖)� 𝑑𝑑𝛼𝛼𝑖𝑖 𝑑𝑑𝜃𝜃𝑖𝑖1010 , where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖,𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝜈𝜈𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖;𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖,𝑦𝑦 ≡ 𝜎𝜎𝜃𝜃𝑒𝑒2 𝑖𝑖𝜎𝜎𝜔𝜔𝜃𝜃2 +𝜎𝜎𝜃𝜃𝑒𝑒2 𝑖𝑖. Similarly, ∫ 𝜋𝜋𝑗𝑗2 1 0 𝑑𝑑𝜋𝜋𝑗𝑗 ≡ 𝐸𝐸(𝜋𝜋𝑗𝑗2) represents the expectation of the square competence of all village committee members. As discussed in Sections 2.3.A. and 2.3.B., the inferences and selection of each village committee member, including the village leader as representative, are homogenous. Therefore, the expectation of the square competence of each village committee member is homogenous. In other words, 𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗2), the expectation of the square competence, is the same for each 𝑗𝑗 . As a result, 𝐸𝐸�𝜋𝜋𝑗𝑗2� = 𝐸𝐸�𝔼𝔼𝑗𝑗(𝜋𝜋𝑗𝑗2) � = 𝔼𝔼1(𝜋𝜋12) , where 𝑗𝑗 = 1 represents the village leader as representative of all village committee members, and 𝔼𝔼1(𝜋𝜋12) represents the expectation of the square competence of the village leader, such that (C9) 𝔼𝔼1(𝜋𝜋12) ≡ 𝑔𝑔(𝑁𝑁) ≡ 𝑔𝑔�𝑒𝑒(𝑁𝑁),𝑦𝑦(𝑁𝑁)� = ∫ ∫ [𝜇𝜇𝛼𝛼𝑖𝑖+(1−𝜇𝜇)𝜃𝜃𝑖𝑖]2[𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 ∫ ∫ [𝜇𝜇𝐸𝐸𝑆𝑆(𝛼𝛼𝑖𝑖)+(1−𝜇𝜇)𝐸𝐸𝑆𝑆(𝜃𝜃𝑖𝑖)]𝑑𝑑𝛼𝛼𝑖𝑖𝑑𝑑𝜃𝜃𝑖𝑖1010 , where 𝑆𝑆(𝛼𝛼𝑖𝑖) = (1 − 𝑒𝑒)𝛼𝛼𝑖𝑖𝐸𝐸 + 𝑒𝑒𝛼𝛼𝑖𝑖 , 𝑆𝑆(𝜃𝜃𝑖𝑖) = (1 − 𝑦𝑦)𝜃𝜃𝑖𝑖𝐸𝐸 + 𝑦𝑦𝜃𝜃𝑖𝑖 , 𝑒𝑒 ≡ 𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖𝜎𝜎𝐶𝐶𝛼𝛼2 +𝜎𝜎𝛼𝛼𝑒𝑒2 𝑖𝑖 , and 𝑦𝑦 ≡ 𝜎𝜎𝜃𝜃𝑒𝑒 2 𝑖𝑖 𝜎𝜎 𝜔𝜔𝜃𝜃 2 +𝜎𝜎𝜃𝜃𝑒𝑒 2 𝑖𝑖 . As for the Type-II village party branch members, they are appointed by the 119 representative township official all the time, and we can denote the times of natural communication between the representative township official and each Type-II village party branch member candidate as 𝑁𝑁� . Therefore, similar to the derivations above, ∫ 𝜋𝜋?̃?𝚥 1 0 𝑑𝑑𝜋𝜋?̃?𝚥 = 𝑓𝑓(𝑁𝑁�), ∫ 𝜋𝜋?̃?𝚥210 𝑑𝑑𝜋𝜋?̃?𝚥 = 𝑔𝑔(𝑁𝑁�). (C7) becomes (C10) (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕(𝑖𝑖)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�) From Appendix B.2., it is easy to deduce that 𝑓𝑓(𝑁𝑁) > 𝑔𝑔(𝑁𝑁) and 𝑓𝑓′(𝑁𝑁) = 𝑔𝑔′(𝑁𝑁). We can also derive that 𝑓𝑓�𝑁𝑁�� ≤ 1 and 𝑔𝑔�𝑁𝑁�� ≤ 𝑓𝑓�𝑁𝑁��, and thus we know that (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢𝑓𝑓�𝑁𝑁�� + 𝜑𝜑𝜇𝜇𝑔𝑔�𝑁𝑁�� ≤ (1 − 𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢 + 𝜑𝜑𝜇𝜇𝑓𝑓�𝑁𝑁�� . Therefore, it is easily derived that the first order derivative of (C10) with respect to 𝑁𝑁 is greater than 0. We know that (C11) 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕�𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒�+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒�+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐸𝐸𝐸𝐸𝑒𝑒�+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�) and that (C12) 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕�𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴�+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴�(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕�𝑖𝑖𝐴𝐴𝐴𝐴𝐴𝐴�+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�) . Because 𝑁𝑁𝐸𝐸𝐸𝐸𝐸𝐸 > 𝑁𝑁𝐴𝐴𝐴𝐴𝐴𝐴 , we know that [𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸 = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸�𝜋𝜋𝑗𝑗,?̃?𝚥� > 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴�𝜋𝜋𝑗𝑗,?̃?𝚥� =[𝜋𝜋]𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴. C.2 Variance: We compare, in a representative village, the variance of the competence of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are directly elected by local village residents, and that variance of the village party secretary, of whom part of the candidates, namely the Type I village party branch members, are appointed by township officials. Specifically, we compare (C13) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐸𝐸𝐸𝐸𝐸𝐸(𝜋𝜋) = 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐸𝐸𝐸𝐸𝐸𝐸2��𝜋𝜋𝑗𝑗,?̃?𝚥�� and (C14) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆,𝐴𝐴𝐴𝐴𝐴𝐴(𝜋𝜋) = 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼𝐴𝐴𝐴𝐴𝐴𝐴2��𝜋𝜋𝑗𝑗,?̃?𝚥�� In a general way, the variance of the competence of the village party secretary is (C15) 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋) = 𝔼𝔼 ��𝜋𝜋𝑗𝑗,?̃?𝚥�2� − 𝔼𝔼2��𝜋𝜋𝑗𝑗,?̃?𝚥�� 120 As is shown in Appendix C.1., (C16) 𝔼𝔼(𝜋𝜋𝑗𝑗,?̃?𝚥) = (1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢𝜕𝜕(𝑖𝑖)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢𝜕𝜕(𝑖𝑖�)+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�)(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝑗𝑗𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖)+(1−𝜑𝜑)𝑤𝑤𝜋𝜋𝚥𝚥�𝑢𝑢+𝜑𝜑𝑤𝑤𝜕𝜕(𝑖𝑖�) Therefore, the numerator of 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋), by calculation, is (C17) 𝐴𝐴 ≡ (1 − 𝜑𝜑)2𝜇𝜇2𝜋𝜋𝑗𝑗𝑢𝑢�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑔𝑔(𝑁𝑁) − 𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2𝜋𝜋𝑗𝑗𝑢𝑢𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁) +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 − 2𝜋𝜋?̃?𝚥𝑢𝑢�𝑓𝑓�𝑁𝑁��𝑔𝑔(𝑁𝑁) + (1 − 𝜑𝜑)2𝜇𝜇2𝜋𝜋?̃?𝚥𝑢𝑢�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑔𝑔�𝑁𝑁�� +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋?̃?𝚥𝑢𝑢 − 2𝜋𝜋𝑗𝑗𝑢𝑢�𝑓𝑓(𝑁𝑁)𝑔𝑔�𝑁𝑁�� − 𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2𝜋𝜋?̃?𝚥𝑢𝑢𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁�� +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�ℎ(𝑁𝑁) + 𝜑𝜑2𝜇𝜇2𝑓𝑓(𝑁𝑁)ℎ(𝑁𝑁) + 𝜑𝜑2𝜇𝜇2𝑓𝑓�𝑁𝑁��ℎ(𝑁𝑁) +𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�ℎ�𝑁𝑁�� + 𝜑𝜑2𝜇𝜇2𝑓𝑓(𝑁𝑁)ℎ�𝑁𝑁�� + 𝜑𝜑2𝜇𝜇2𝑓𝑓�𝑁𝑁��ℎ�𝑁𝑁�� −(1 − 𝜑𝜑)2𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢�2𝑓𝑓2(𝑁𝑁) − 𝜑𝜑2𝜇𝜇2𝑔𝑔2(𝑁𝑁) −(1 − 𝜑𝜑)2𝜇𝜇2�𝜋𝜋?̃?𝚥𝑢𝑢�2𝑓𝑓2�𝑁𝑁�� − 𝜑𝜑2𝜇𝜇2𝑔𝑔2�𝑁𝑁�� −2(1 −𝜑𝜑)2𝜇𝜇2𝜋𝜋𝑗𝑗𝑢𝑢𝜋𝜋?̃?𝚥𝑢𝑢𝑓𝑓(𝑁𝑁)𝑓𝑓�𝑁𝑁�� − 2𝜑𝜑2𝜇𝜇2𝑔𝑔(𝑁𝑁)𝑔𝑔�𝑁𝑁�� and the denominator of 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋), by calculation, is (C18) 𝐵𝐵 ≡ �(1 −𝜑𝜑)𝜇𝜇𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜑𝜑𝜇𝜇𝑓𝑓(𝑁𝑁) + (1 −𝜑𝜑)𝜇𝜇𝜋𝜋?̃?𝚥𝑢𝑢 + 𝜑𝜑𝜇𝜇𝑓𝑓�𝑁𝑁���2 = (1 − 𝜑𝜑)2𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�2 + 𝜑𝜑2𝜇𝜇2𝑓𝑓2(𝑁𝑁) + 𝜑𝜑2𝜇𝜇2𝑓𝑓2�𝑁𝑁�� +2𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑓𝑓(𝑁𝑁) + 2𝜑𝜑(1 − 𝜑𝜑)𝜇𝜇2�𝜋𝜋𝑗𝑗𝑢𝑢 + 𝜋𝜋?̃?𝚥𝑢𝑢�𝑓𝑓�𝑁𝑁�� +2𝜑𝜑2𝜇𝜇2𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁) where (C19) 𝑓𝑓(𝑁𝑁) = 12𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇(𝜇𝜇+3)𝐸𝐸𝑒𝑒+ 112(𝜇𝜇−4)(𝜇𝜇−1)𝐸𝐸𝜕𝜕 𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 (C20) 𝑔𝑔(𝑁𝑁) = 1 6 𝜇𝜇�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+16(1−𝜇𝜇)�𝜇𝜇2−𝜇𝜇+2�𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 112𝜇𝜇�𝜇𝜇2+2�𝐸𝐸𝑒𝑒+ 112(1−𝜇𝜇)�𝜇𝜇2−2𝜇𝜇+3�𝐸𝐸𝜕𝜕 𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 (C21) ℎ(𝑁𝑁) = 1 4 𝜇𝜇(𝜇𝜇2−𝜇𝜇+1)𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒−14(𝜇𝜇3−2𝜇𝜇2+2𝜇𝜇−1)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+ 1120𝜇𝜇(4𝜇𝜇3+10𝜇𝜇2−5𝜇𝜇+15)𝐸𝐸𝑒𝑒+ 1120(4𝜇𝜇4−26𝜇𝜇3+49𝜇𝜇2−51𝜇𝜇+24)𝐸𝐸𝜕𝜕 𝜇𝜇𝐸𝐸(1−𝑒𝑒)𝛼𝛼𝑖𝑖𝑒𝑒+(1−𝜇𝜇)𝐸𝐸(1−𝜕𝜕)𝜃𝜃𝑖𝑖𝑒𝑒+12𝜇𝜇𝐸𝐸𝑒𝑒+12(1−𝜇𝜇)𝐸𝐸𝜕𝜕 It is easily to see that (C22) 0 < ℎ(𝑁𝑁) < 𝑔𝑔(𝑁𝑁) < 1 2 < 𝑓𝑓(𝑁𝑁) < 1 (C23) 0 < ℎ�𝑁𝑁�� < 𝑔𝑔�𝑁𝑁�� < 1 2 < 𝑓𝑓�𝑁𝑁�� < 1 121 (C24) 0 < ℎ′(𝑁𝑁) < 𝑔𝑔′(𝑁𝑁) = 𝑓𝑓′(𝑁𝑁) < 1 (C25) 0 < ℎ′�𝑁𝑁�� < 𝑔𝑔′�𝑁𝑁�� = 𝑓𝑓′�𝑁𝑁�� < 1 It is assumed that 𝑁𝑁 > 𝑁𝑁� , so we have 𝑓𝑓(𝑁𝑁) > 𝑓𝑓�𝑁𝑁��, 𝑔𝑔(𝑁𝑁) > 𝑔𝑔�𝑁𝑁��, ℎ(𝑁𝑁) > ℎ�𝑁𝑁��, Taking derivative with respect to 𝑁𝑁, we have 𝜕𝜕𝑉𝑉𝑉𝑉𝑉𝑉 𝑉𝑉𝑉𝑉𝑉𝑉(𝜋𝜋) 𝜕𝜕𝑖𝑖 = 𝐴𝐴′𝐵𝐵−𝐵𝐵′𝐴𝐴 𝐵𝐵2 , where 𝐴𝐴′𝐵𝐵 − 𝐵𝐵′𝐴𝐴 is −2𝜇𝜇4𝑓𝑓2(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) −2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓′(𝑁𝑁) −4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓(𝑁𝑁)𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑔𝑔2�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2�𝑁𝑁��𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +𝜇𝜇4𝑓𝑓3(𝑁𝑁)ℎ′(𝑁𝑁) + 𝜇𝜇4𝑓𝑓3�𝑁𝑁��ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓2(𝑁𝑁)ℎ′(𝑁𝑁) −𝜇𝜇4𝑓𝑓2(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4ℎ�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) −𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) where lim 𝑖𝑖,𝑖𝑖�→1𝜑𝜑 = 1. Because of (C22) – (C25), and 𝑓𝑓(𝑁𝑁) > 𝑓𝑓�𝑁𝑁��, 𝑔𝑔(𝑁𝑁) > 𝑔𝑔�𝑁𝑁��, ℎ(𝑁𝑁) > ℎ�𝑁𝑁��, it is easily derived that in 𝐴𝐴′𝐵𝐵 − 𝐵𝐵′𝐴𝐴, −2𝜇𝜇4𝑓𝑓2(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) −2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓′(𝑁𝑁) −4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓(𝑁𝑁)𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑔𝑔2�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑔𝑔�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) +2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) + 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔2�𝑁𝑁��𝑓𝑓′(𝑁𝑁) + 4𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑔𝑔�𝑁𝑁��𝑔𝑔(𝑁𝑁)𝑓𝑓′(𝑁𝑁) < 0 However, it is ambiguous whether +𝜇𝜇4𝑓𝑓3(𝑁𝑁)ℎ′(𝑁𝑁) + 𝜇𝜇4𝑓𝑓3�𝑁𝑁��ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓2�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ′(𝑁𝑁) + 3𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓2(𝑁𝑁)ℎ′(𝑁𝑁) −𝜇𝜇4𝑓𝑓2(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4ℎ�𝑁𝑁��𝑓𝑓2(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��𝑓𝑓(𝑁𝑁)ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) −𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ(𝑁𝑁)𝑓𝑓′(𝑁𝑁) − 𝜇𝜇4𝑓𝑓2�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓′(𝑁𝑁) − 2𝜇𝜇4𝑓𝑓�𝑁𝑁��ℎ�𝑁𝑁��𝑓𝑓(𝑁𝑁)𝑓𝑓′(𝑁𝑁) is positive or negative. Accordingly, it is also ambiguous whether the derivative of 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑆𝑆(𝜋𝜋) with respect to 𝑁𝑁 is negative or not, that is, it is ambiguous whether 122 introducing the local direct election reduces the variance of the competence of the village party secretary or not. 123 Appendices of Lagged Variables as Instruments A. Derivation of 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) In Scenario 1 and 2, following the appendix of Bellemare et al. (2017), we have, given the equations (3.2) and (3.3), or (3.17) and (3.18), the expression that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶 �1𝜌𝜌𝑋𝑋𝑖𝑖 − 𝜅𝜅𝜌𝜌𝑈𝑈𝑖𝑖 − 1𝜌𝜌 𝜂𝜂𝑖𝑖 , 1𝜙𝜙𝑈𝑈𝑖𝑖 − 1𝜙𝜙 𝜐𝜐𝑖𝑖� (A. 1) Then we have 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 1𝜙𝜙𝜌𝜌 [𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖)] which yields 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) = 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) (A. 2) Since 𝜌𝜌,𝜙𝜙 ∈ (0,1), both 𝑋𝑋 and 𝑈𝑈 are mean-reverting series, that is, the covariance between 𝑋𝑋 and 𝑈𝑈 does not depend on 𝑋𝑋. In other words, asymptotically, we have 𝑝𝑝 lim 𝑖𝑖→∞ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝑝𝑝 lim 𝑖𝑖→∞ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 3) Therefore, (A.2) becomes 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) = 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 4) implying that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈)1 − 𝜙𝜙𝜌𝜌 (A. 5) In Scenario 3, similarly, we have, given the equations (3.26) and (3.27), the expression that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶 �1𝜌𝜌𝑋𝑋𝑖𝑖 − 𝜅𝜅𝜌𝜌𝑈𝑈𝑖𝑖 − 1𝜌𝜌 𝜂𝜂𝑖𝑖, 1𝜙𝜙𝑈𝑈𝑖𝑖 − 𝜓𝜓𝜙𝜙𝑋𝑋𝑖𝑖−1 − 1𝜙𝜙 𝜐𝜐𝑖𝑖� (A. 6) Then we have 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 1 𝜙𝜙𝜌𝜌 [𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) + 𝜅𝜅𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)] (A. 7) which yields 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) − 𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈𝑖𝑖) + 𝜅𝜅𝜓𝜓𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖)= 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) (A. 8) Similarly, since 𝜌𝜌,𝜙𝜙 ∈ (0,1) , both 𝑋𝑋 and 𝑈𝑈 are mean-reverting series, that is, the covariance between 𝑋𝑋 and 𝑈𝑈 does not depend on 𝑋𝑋. In other words, asymptotically, we have 124 𝑝𝑝 lim 𝑖𝑖→∞ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑈𝑈𝑖𝑖) = 𝑝𝑝 lim 𝑖𝑖→∞ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 9) We also know that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖,𝑋𝑋𝑖𝑖−1) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝜌𝜌𝑋𝑋𝑖𝑖−1 + 𝜅𝜅𝑈𝑈𝑖𝑖 + 𝜂𝜂𝑖𝑖,𝑋𝑋𝑖𝑖−1) = 𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) + 𝜅𝜅𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) and that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝑈𝑈𝑖𝑖) = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑖𝑖−1,𝜙𝜙𝑈𝑈𝑖𝑖−1 + 𝜓𝜓𝑋𝑋𝑖𝑖−1 + 𝜐𝜐𝑖𝑖) = 𝜙𝜙 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) + 𝜓𝜓𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) Therefore, (A.8) becomes 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) − 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) − 𝜓𝜓𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) = 𝜙𝜙𝜌𝜌𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) (A. 9) implying that 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋,𝑈𝑈) = 𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) + 𝜓𝜓𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋)1 − 𝜙𝜙𝜌𝜌 (A. 10) Therefore, 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑋𝑋−1,𝑈𝑈𝑋𝑋) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑋𝑋−1) = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈) + 𝜙𝜙𝜓𝜓𝜌𝜌𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) + 𝜓𝜓 = 𝜙𝜙𝜅𝜅𝑉𝑉𝑉𝑉𝑉𝑉(𝑈𝑈)(1 − 𝜙𝜙𝜌𝜌)𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) + 𝜓𝜓1 − 𝜙𝜙𝜌𝜌 (A. 11) 125 B. Effectiveness of the Lagged IV Across Sample Sizes The likelihood of type-I error will be arbitrarily close to one for a large enough sample set. With a growing availability of large data sets, using lagged IV could result in the likelihood of type-I error almost one. In this sense, with a large enough sample set, even if the lagged IV estimate is consistent when it only violates the independence assumption, the unavoidable type-I error still makes the lagged IV method problematic. To testify the effectiveness of the lagged IV method in terms of the likelihood of type- I error, we run a series of simulations with ten times of individuals as in the simulations in Section 3.4. Our simulation follows the same data generating process (DGPs) as in section 3.3. In each simulation, we generate a panel with 𝑁𝑁 = 50 periods and 𝑁𝑁 = 1000 cross-section units, for a total of 50,000 observations. As is shown in Figure A1, the likelihood of type-I error gets closer to one, compared to the what is shown in Figure 2. This implies that as the sample size is enlarged, type-I error gets arbitrarily close to one. Similar implication of the patterns of type-I error likelihood is shown in Figure A2 to A6, as comparisons of the patterns of type-I error likelihood in Figure 3, 5, 6, 8 and 9. 126 Figure 3.A1. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝜌𝜌 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000 127 Figure 3.A2. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝜙𝜙 = 0.5, 𝑁𝑁𝑁𝑁 = 50,000 128 Figure 3.A3. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Explained Variable 129 Figure 3.A4. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Explained Variable 130 Figure 3.A5. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜙𝜙 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Unobserved Confounder 131 Figure 3.A6. Monte Carlo Results: 𝜅𝜅=0.5 and 2, 𝜌𝜌 ranges from 0 to 1, 𝑁𝑁𝑁𝑁 = 50,000; Lagged Causality on Unobserved Confounder 132 Appendices of Spatially Lagged Variables as Instruments: Spatially Local Average Treatment Effect (SLATE) in Estimation A. Derivations of Simpler Cases of PROPOSITION 1 and PROPOSITION 2 Proposition 1 states that when there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, the spatially lagged IV estimate is unbiased and consistent. Suppose 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor to each other, otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0; in this sense, the proof of Proposition 1 is simpler. Proof of PROPOSITION 1: Using 𝑾𝑾𝑿𝑿 , the spatial weighting matrix, as the instrumental variables, namely the spatially lagged IV, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 It is known that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor to each other, otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. Denote 𝑖𝑖 = 𝚤𝚤̃ and 𝑗𝑗 = 𝚥𝚥̃ when observation 𝑖𝑖 and observation 𝑗𝑗 don’t neighbor to each other, in other words, 𝜇𝜇?̃?𝚤?̃?𝚥 = 0. Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝑉𝑉1 ⋮ 𝑉𝑉𝑗𝑗 ⋮ 𝑉𝑉𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 133 +��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 +��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when observation 𝑖𝑖 and observation 𝑗𝑗 don’t neighbor to each other, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other words, the spatially lagged IV estimate is unbiased and consistent. Proposition 2 states that when there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the unobserved confounders, and there exists no inter-regional correlation between the explanatory variables and the disturbances in the spatial autocorrelation of the explanatory variables themselves either, the spatially lagged IV estimate is unbiased and consistent. Suppose 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor to each other, otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0; in this sense, the proof of Proposition 2 is simpler. PROPOSITION 2: Using 𝑾𝑾𝑿𝑿 , the spatial weighting matrix, as the instrumental variables, namely the spatially lagged IV, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 134 As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0. It is known that 𝜇𝜇𝑖𝑖𝑗𝑗 > 0 when observation 𝑖𝑖 and observation 𝑗𝑗 neighbor to each other, otherwise 𝜇𝜇𝑖𝑖𝑗𝑗 = 0. Denote 𝑖𝑖 = 𝚤𝚤̃ and 𝑗𝑗 = 𝚥𝚥̃ when observation 𝑖𝑖 and observation 𝑗𝑗 don’t neighbor to each other, in other words, 𝜇𝜇?̃?𝚤?̃?𝚥 = 0. Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜂𝜂1 ⋮ 𝜂𝜂𝑗𝑗 ⋮ 𝜂𝜂𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 +��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖=𝑗𝑗 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 +��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖=𝑗𝑗 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when observation 𝑖𝑖 and observation 𝑗𝑗 don’t neighbor to each other, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖=?̃?𝚤,𝑗𝑗=?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁𝑖𝑖=1,𝑖𝑖≠𝑗𝑗,𝑖𝑖≠?̃?𝚤,𝑗𝑗≠?̃?𝚥 �𝑁𝑁𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜼𝜼) = 0, and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similar derivation shows that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜼𝜼 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other words, the spatially lagged IV estimate is unbiased and consistent. 135 B. Proof of COROLLARY 3 and COROLLARY 4 Proof of COROLLARY 3: Given the three waves of implementation of the treatment, the spatial weighting matrix is asymmetric, that is, 𝑾𝑾 ≠𝑾𝑾′. Using the spatially lagged IV method, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′𝒀𝒀 = (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑿𝑿′𝑾𝑾′[𝑿𝑿𝑿𝑿 + 𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜸𝜸 + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾′𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾′𝜸𝜸 Given what Assumption 1 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝑉𝑉1 ⋮ 𝑉𝑉𝑗𝑗 ⋮ 𝑉𝑉𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 + � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 +��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑉𝑉 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑉𝑉 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=𝑆𝑆 𝐾𝐾 𝑘𝑘=1 = � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 + � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 +��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑉𝑉 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑉𝑉 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑁𝑁 𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=𝑆𝑆 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑆𝑆−1𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 1 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝛾𝛾𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗 𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0 , ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗𝑉𝑉𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑉𝑉𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 and ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝑉𝑉𝑗𝑗 𝑁𝑁 𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=𝑆𝑆𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜸𝜸) = 0 , and 136 thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0 . Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿 , in other words, the spatially lagged IV estimate is unbiased and consistent. ∎ Proof of COROLLARY 4: Given the three waves of implementation of the treatment, the spatial weighting matrix is asymmetric, that is, 𝑾𝑾 ≠𝑾𝑾′. Using the spatially lagged IV method, it is derived that 𝑿𝑿𝑰𝑰𝑰𝑰� = [(𝑾𝑾𝑿𝑿)′𝑿𝑿]−1(𝑾𝑾𝑿𝑿)′𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾𝒀𝒀 = (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑿𝑿′𝑾𝑾[𝑿𝑿𝑿𝑿 + 𝑼𝑼[(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1(𝝋𝝋𝑿𝑿 + 𝜸𝜸)] + 𝝐𝝐] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾[𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝜼𝜼 + 𝜸𝜸] = 𝑿𝑿 + (𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜸𝜸 +(𝑿𝑿′𝑾𝑾𝑿𝑿)−1𝑼𝑼(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝝋𝝋(𝑰𝑰𝒏𝒏 − 𝝆𝝆𝑾𝑾)−1𝑿𝑿′𝑾𝑾𝜼𝜼 As is discussed before, given what Assumption 1 implies that 𝑒𝑒𝑖𝑖𝛾𝛾𝑗𝑗 = 0, when 𝑖𝑖 ≠ 𝑗𝑗, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾𝜸𝜸) = 0 and 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜸𝜸 𝑛𝑛 → 0. Similarly, given what Assumption 2 demonstrates that 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜼𝜼) = ⎣ ⎢ ⎢ ⎢ ⎡ 𝑒𝑒11 … 𝑒𝑒𝑖𝑖1 … 𝑒𝑒𝑁𝑁1 ⋮ 𝑒𝑒1𝑘𝑘 ⋮ ⋱ ⋮ ⋮ ⋮ 𝑒𝑒𝑖𝑖𝑘𝑘 ⋮ ⋮ ⋮ ⋱ ⋮ 𝑒𝑒𝑁𝑁𝑘𝑘 ⋮ 𝑒𝑒1𝐾𝐾 … 𝑒𝑒𝑖𝑖𝐾𝐾 … 𝑒𝑒𝑁𝑁𝐾𝐾⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜇𝜇11 … 𝜇𝜇𝑖𝑖1 … 𝜇𝜇𝑁𝑁1 ⋮ 𝜇𝜇1𝑗𝑗 ⋮ ⋱ ⋮ ⋮ ⋮ 𝜇𝜇𝑖𝑖𝑗𝑗 ⋮ ⋮ ⋮ ⋱ ⋮ 𝜇𝜇𝑁𝑁𝑗𝑗 ⋮ 𝜇𝜇1𝑁𝑁 … 𝜇𝜇𝑖𝑖𝑁𝑁 … 𝜇𝜇𝑁𝑁𝑁𝑁⎦⎥⎥ ⎥ ⎤ ∙ ⎣ ⎢ ⎢ ⎢ ⎡ 𝜂𝜂1 ⋮ 𝜂𝜂𝑗𝑗 ⋮ 𝜂𝜂𝑁𝑁⎦ ⎥ ⎥ ⎥ ⎤ = ����𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=1 � 𝑁𝑁 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 = � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 + � � � � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 +��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑉𝑉 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑉𝑉 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝑒𝑒𝑖𝑖𝑘𝑘𝜇𝜇𝑖𝑖𝑗𝑗𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=𝑆𝑆 𝐾𝐾 𝑘𝑘=1 137 = � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 + � � � � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 � 𝑆𝑆−1 𝑗𝑗=𝑉𝑉+1 𝐾𝐾 𝑘𝑘=1 +��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑉𝑉 𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 � 𝑉𝑉 𝑗𝑗=1 𝐾𝐾 𝑘𝑘=1 + ��� � 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑁𝑁 𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 � 𝑁𝑁 𝑗𝑗=𝑆𝑆 𝐾𝐾 𝑘𝑘=1 As is assumed, 𝜇𝜇𝑖𝑖𝑗𝑗 = 0 when 𝑖𝑖 = 𝑗𝑗, therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑆𝑆−1𝑖𝑖=𝑉𝑉+1,𝑖𝑖=𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0. In addition, as Assumption 2 states, 𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 = 0 , when 𝑖𝑖 ≠ 𝑗𝑗 , therefore, ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 𝑆𝑆−1 𝑖𝑖=𝑉𝑉+1,𝑖𝑖≠𝑗𝑗 �𝑆𝑆−1𝑗𝑗=𝑉𝑉+1𝐾𝐾𝑘𝑘=1 = 0 , ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗𝑉𝑉𝑖𝑖=1,𝑖𝑖≠𝑗𝑗 �𝑉𝑉𝑗𝑗=1𝐾𝐾𝑘𝑘=1 = 0 and ∑ ∑ �∑ 𝜇𝜇𝑖𝑖𝑗𝑗𝑒𝑒𝑖𝑖𝑘𝑘𝜂𝜂𝑗𝑗 𝑁𝑁 𝑖𝑖=𝑆𝑆,𝑖𝑖≠𝑗𝑗 �𝑁𝑁𝑗𝑗=𝑆𝑆𝐾𝐾𝑘𝑘=1 = 0 . Accordingly, it is known that 𝐸𝐸(𝑿𝑿′𝑾𝑾′𝜼𝜼) = 0 , and thus 𝐸𝐸(𝑿𝑿𝑰𝑰𝑰𝑰� ) = 𝑿𝑿. Similarly, it is also known that 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿′𝑾𝑾𝜼𝜼 𝑛𝑛 → 0. Therefore, 𝑝𝑝 lim 𝑛𝑛→∞ 𝑿𝑿𝑰𝑰𝑰𝑰� → 𝑿𝑿, in other words, the spatially lagged IV estimate is unbiased and consistent. ∎