(ArcGIS 10 for Economics Research)
Masayuki Kudamatsu
19 October, 2018
Press SPACE to proceed.
To go back to the previous slide, press SHIFT+SPACE.
Create a polygon of an input feature's neighborhood
Point
|
Polyline | Polygon |
Buffer + Spatial Join $\Rightarrow$
$\Rightarrow$ Identify neighbors for each point
This is useful for economics research for (at least) 3 reasons
1. Generate a treatment variable
Estimate the impact of shale gas wells on house prices in their neighborhood
2. Estimate spillover effects of treatment
RCT on deworming school children
# of treated schools within 6km can be identified by Buffer tool
3. Mitigate omitted variable bias in peer effect estimation
1. Conley & Udry (2010)
2. UTM projections
3. Replicate Conley & Udry (2010)
4. Set relative file paths in Python
Do farmers learn from their peers regarding input use (fertilizer) for new technology (pineapple)?
Important?
Original?
Feasible?
Panel household surveys (every 6 week in 1996-98) in 3 villages of southern Ghana
Define the treatment variable for farmer $i$ in period $t$ as:
\begin{align*} M_{i,t} \equiv \frac{GoodNews(x_{j,s-5}) \times (x_{j,s-5} - x_{i,t_p})}{Experience_{it}} \end{align*}
$x_{i,t_p}$ | $i$'s use of fertilizer at period $t_p$ (i.e. previous planting) |
$x_{j,s-5}$ | farmer $j$ ($i$'s friend)'s use of fertilizer at period $s-5$, where $s \in [t_p,t]$ |
$GoodNews(x_{j,s-5})$ | Indicator of higher-than-expected profit (fn.22) from using $x_{j,s-5}$ |
$Experience_{it}$ | $i$'s experience of planting pineapple |
Then estimate: \begin{align*} \Delta x_{i,t} = \beta_1 M_{i,t} + \beta_2 \Gamma_{i,t} + \boldsymbol{z}'_{i,t}\boldsymbol{\beta_3} + \nu_{it} \end{align*}
$\Delta x_{i,t}$ | Changes in fertilizer use since $t_p$ |
$\boldsymbol{z}'_{i,t}$ | Other controls |
$\Gamma_{i,t}$ | Growing condition for $i$'s plot at period $t$ |
\begin{align*} \Gamma_{i,t} \equiv x_{i,t}^{close} - x_{i,t_p} \end{align*} where $x_{i,t}^{close}$ averages fertilizer use over plots within 1km from $i$ during previous 3 periods
$\Rightarrow$ $\Gamma_{i,t}$ controls for common shocks to $i$ and $i$'s friends
We will learn how to identify plots within 1km from $i$
Source of identification: Friends outside 1km radius
(Figure 4 of Conley & Udery 2010)
\begin{align*} \Delta x_{i,t} = \beta_1 M_{i,t} + \beta_2 \Gamma_{i,t} + \boldsymbol{z}'_{i,t}\boldsymbol{\beta_3} + \nu_{it} \end{align*}
Standard errors: calculated by Conley (1999)
Which is now the standard procedure for cross-sectional regression with spatial data
Let $\mathbf{b}$ denote a vector of coefficients estimated by OLS
Variance of OLS estimator: \begin{align*} V(\mathbf{b}) = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'E(\boldsymbol{\varepsilon}\boldsymbol{\varepsilon}')\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \end{align*}
$\mathbf{X}$ | $n$ (# of obs.) by $k$ (# of regressors) data matrix |
$\boldsymbol{\varepsilon}$ | $n$-dimensional vector of the error term |
Let $\rho_{ij}$ denote $ij$-th element of $E(\boldsymbol{\varepsilon}\boldsymbol{\varepsilon}')$
Conley (1999) imposes $\rho_{ij} = K_{ij}\hat{\varepsilon}_i\hat{\varepsilon}_j$ where
\begin{align*} K_{ij} \equiv \left\{ \begin{array}{cl} (1-\frac{x_{ij}}{\bar{x}})(1-\frac{y_{ij}}{\bar{y}}) & \mbox{if $x_{ij}<\bar{x}$ & $y_{ij} < \bar{y}$} \\ 0 & \mbox{otherwise} \end{array} \right. \end{align*}
$x_{ij}, y_{ij}$ | Distance between $i$ & $j$ in x,y dimension |
$\bar{x},\bar{y}$ | Cut-off value chosen by researcher (1.5km in Conley-Udry 2010) |
$\hat{\varepsilon}_i$ | OLS residual for $i$ |
Tim Conley's website offers Stata ado files for implementing Conley (1999) for OLS, GMM, and Logit
In this ado program, $x_{ij},y_{ij}$ is calculated from the coordinates of $i$ and $j$
Make sure to use a projected coordinate system (cf. Lecture 1)
Conley's (1999) Stata ado cannot be used for panel regressions...
Hsiang (2010) extends it to a panel data, with a new Stata ado
(Table 5 of Conley & Udry 2010)
(Table 7 of Conley & Udry 2010)
1. Launch ArcMap 10 (it takes time)
2. Download the zipped dataset for lecture 3
3. Save it to Desktop (C:\\Users\\yourname\\Desktop
)
4. Right-click it and choose 7-Zip > Extract to "Lecture3\"
C:\\Users\\yourname\\Desktop\\Lecture3
Now in ArcMap's Catalogue Window:
5. Establish connection to data folder
6. Prepare the Model Builder
code/models.tbx/lecture3
Buffer tool requires distance calculation
How to calculate distance depends on which coordinate system is used, geographic or projected
This lecture uses UTM, a projected coordinate system
So let's learn about UTM.
Stands for Universal Transverse Mercator
Project earth surface onto the cylinder that is tangent on standard meridian
Father away from standard meridian, more distortion
To minimize distortion:
1. Divide Earth into 60 zones (6° wide in longitude)
2. For each zone, set standard meridian in the middle
3. Scale down distance along standard meridian by 0.9996
120 UTM projections available
Pick the appropriate zone for your study area
Conley and Udry (2010) construct the growing condition variable as
\begin{align*} \Gamma_{it} \equiv x_{it}^{close} - x_{it_p} \end{align*}
$x_{it}^{close}$ | Average of $x_{ks}$ |
$k$ | Plots within 1km from plot $i$ |
$s$ | $\in \{t-3, t-2, t-1, t \}$ |
Conley and Udry (2010) construct the growing condition variable as
\begin{align*} \Gamma_{it} \equiv x_{it}^{close} - x_{it_p} \end{align*}
$x_{it}^{close}$ | Average of $x_{ks}$ |
$k$ | Plots within 1km from plot $i$ |
$s$ | $\in \{t-3, t-2, t-1, t \}$ |
Browse the plot location data (Lecture3/input/udry2010.csv
) with Notepad
village hh plot: plot identifier
Xcoord Ycoord: plot coordinate in meters
Which geo-processing tool(s) do we need to use? (answer)
XY Table: ...\Lecture3\input\udry2010.csv
X Field: Xcoord
Y Field: Ycoord
Spatial Reference: WGS_1984_UTM_Zone_30N
Leave the other options as they are.
Input Features: udry2010_Layer
Output Feature Class: ...\Lecture3\temporary\plots.shp
Now save and run the Model.
Browse the output. You should see something like this:
Browse the attritube table, too. Is everything as expected?
Geo-processing tools to be used for this exercise:
1. Buffer (Analysis)
2. Spatial Join
Input Features: plots.shp
Output Feature Class: ...\Lecture3\temporary\buffer1km.shp
Distance: Linear unit / 1 / Kilometers
Dissolve Type: NONE
Now save and run the Model.
Browse the output. You should see something like this:
Browse the attribute table of the output. Notice that field names are all the same as plots.shp
Target Feature Class: buffer1km.shp
Join Features: plots.shp
Output Feature Class: ...\Lecture3\temporary\plot_neighbors.shp
Join Operation: JOIN_ONE_TO_MANY
Check "Keep All Target Features"
Field Map of Join Features: see next slide
Match Option: INTERSECT
Supposed to allow you to control which fields to be kept
When exported as a Python script, it tends to misbehave...
Best solution:
$\Rightarrow$ For now, keep every field
When target and join features have the same set of fields
$\Rightarrow$ Fields from join features are suffixed with "_1"
That's why you see village_1, hhn_1, plot_1, etc.
Now save and run the Model.
Browse the output and its attribute table
Which geo-processing tool(s) do we use? (cf. Lec 2 Ex 3)
If you prefer Excel...
plot_neighbors.shp
...\Lecture3\output\plot_neighbors.xls
If you prefer ASCII...
Input Feature Class: plot_neighbors.shp
Value Field: see next slide
Delimiter: SPACE
Output ASCII File: ...\Lecture3\output\plot_neighbors.txt
Check "Add Field Name to Output"
From target features: village, hhn, plot, xcoord, ycoord
From join features: village_1, hhn_1, plot_1
Look at models.tbx/exercises1-3
in the folder solutions4exercises/
Delete every field in Spatial Join's Field Map (Why)
Export the model as a Python script (cf. Lec 2 Ex 4)
The last three commands should look like this:
# Process: Spatial Join
arcpy.SpatialJoin_analysis(buffer1km_shp, plots_shp, plot_neighbors_shp, "JOIN_ONE_TO_MANY", "KEEP_ALL", "", "INTERSECT", "", "")
# Process: Table To Excel
arcpy.TableToExcel_conversion(plot_neighbors_shp, plot_neighbors_xls, "NAME", "CODE")
# Process: Export Feature Attribute to ASCII
arcpy.ExportXYv_stats(plot_neighbors_shp, "village;hhn;plot;village_1;hhn_1;plot_1", "SPACE", plot_neighbors_txt, "ADD_FIELD_NAMES")
The highlighted in yellow is where you specify fields to keep
Then edit the script by using the template (Lecture3\code\template4L3.py
)
Browse the output
Select one plot neighborhood polygon
Browse the attribute table
Count # of matched plots for this selected plot
See if the same # of plot points are located within the polygon
Currently we set an absolute file path
arcpy.env.workspace = "C:\\Users\\your_username\\Desktop\\Lecture3"
This code will break when
Super-tedious to change this line of code if
import os
work_dir = os.path.dirname(os.path.realpath(__file__))
os.chdir(work_dir)
These 3 lines of Python code sets the folder where the current script is saved (e.g. .../Lecture3/code/
)
as the working directory
We can then set its parent directory (e.g. Lecture3/
) as the working directory for ArcGIS by
arcpy.env.workspace = "../"
cf. In Stata, cd ..
moves up one directory
Now the code will run successfully as long as you
move the entire project directory (e.g. Lecture3/
)
to another computer
1. Open code/template4exercise5.py
import os
work_dir = os.path.dirname(os.path.realpath(__file__))
os.chdir(work_dir)
2. Copy and paste the whole Exercise 4 code below these three lines
3. Replace the line for arcpy.env.workspace
with
arcpy.env.workspace = "../"
4. Delete all the files in temporary/
and output/
5. Run the code. Check if every file is created in the expected folders
6. Now delete all the temporary and output files again.
7. Move the entire Lecture3/
folder to somewhere else (say, Documents/
)
8. Open the Python script in this new location, and run it.
9. Also run the Exercise 4 script and see how it returns an error
Do you remember which geo-processing tools you used for each of these tasks?