(ArcGIS 10 for Economics Research)
Masayuki Kudamatsu
16 November, 2018
Press SPACE to proceed.
To go back to the previous slide, press SHIFT+SPACE.
1. Elevation in economics
2. Duflo & Pande (2007)
3. Replicate instruments in Duflo & Pande (2007)
4. Calculate polyline length
5. Loop over values in Python
It's a great source of exogenous variation!
Electrification in South Africa (1996-2001)
$\Rightarrow$ Female labor supply $\uparrow$
Electrification: instrumented by mean land gradient
Tea production $\uparrow$ in China due to liberalization in 1979
$\Rightarrow$ Male-to-female ratio $\downarrow$
Tea production: instrumented by mean land gradient
Anti-Hutu radio station in Rwanda
$\Rightarrow$ Genocide incidents $\uparrow$
Radio: instrumented by signal strength
Do irrigation dams improve agricultural production and reduce rural poverty in India?
Important?
Original?
Feasible?
Unit of analysis: districts
Agricultural production data: annual 1971-99
Poverty data: 1973, 83, 87, 93, 99
Dams: location (nearest city), data of completion
Dam constriction: instrumented by river gradient
$\Leftarrow$ Dams: easier to build if river gradient is
How to use time-invariant river gradient for panel regression?
\begin{align*} D_{ist} =& \ \sum_{k=2}^4 \alpha_k (RGr_{ki} \ast \bar{D}_{st}) + (\mathbf{M}'_i \ast \bar{D}_{st})\boldsymbol{\beta} \\ & + \sum_{k=2}^4 \gamma_k (RGr_{ki} \ast l_{t}) + \nu_i + \mu_{st} + \omega_{ist} \end{align*}
$D_{ist}$ | # dams in district $i$ state $s$ in year $t$ |
$RGr_{ki}$ | Fraction of river areas in gradient category $k$ |
(2 for 1.5-3%; 3 for 3-6%, 4 for over 6%) |
We will learn how to calculate $RGr_{ki}$ in ArcGIS
\begin{align*} D_{ist} =& \ \sum_{k=2}^4 \alpha_k (RGr_{ki} \ast \bar{D}_{st}) + (\mathbf{M}'_i \ast \bar{D}_{st})\boldsymbol{\beta} \\ & + \sum_{k=2}^4 \gamma_k (RGr_{ki} \ast l_{t}) + \nu_i + \mu_{st} + \omega_{ist} \end{align*}
$\bar{D}_{st}$ | Predicted # of dams in state $s$ in year $t$ $=\frac{D_{s,1970}}{\sum_s D_{s,1970}}*\sum_s D_{st}$ ($D_{st}$: # dams in state $s$ in year $t$) |
$\Leftarrow$ $D_{st}$ is endogenous to poverty |
(Figure III of Duflo and Pande 2007)
\begin{align*} D_{ist} =& \ \sum_{k=2}^4 \alpha_k (RGr_{ki} \ast \bar{D}_{st}) + (\mathbf{M}'_i \ast \bar{D}_{st})\boldsymbol{\beta} \\ & + \sum_{k=2}^4 \gamma_k (RGr_{ki} \ast l_{t}) + \nu_i + \mu_{st} + \omega_{ist} \end{align*}
$\mathbf{M}_i$ | Controls: river length, fraction of district areas with each gradient cateogry, etc. |
$l_{t}$ | Year dummies |
$\nu_i,\mu_{st}$ | District and state-year FEs |
Why should we control for each of these variables?
\begin{align*} D_{ist} =& \ \sum_{k=2}^4 \alpha_k (RGr_{ki} \ast \bar{D}_{st}) + (\mathbf{M}'_i \ast \bar{D}_{st})\boldsymbol{\beta} \\ & + \sum_{k=2}^4 \gamma_k (RGr_{ki} \ast l_{t}) + \nu_i + \mu_{st} + \omega_{ist} \end{align*}
$\mathbf{M}_i$ | Controls: river length, fraction of district areas with each gradient cateogry, etc. |
$l_{t}$ | Year dummies |
$\nu_i,\mu_{st}$ | District and state-year FEs |
We will learn how to obtain river length in $\mathbf{M}_i$
\begin{align*} y_{ist} =& \ \gamma_i + \eta_{st} + \delta D_{ist} + \delta^U D_{ist}^U \\ & + \mathbf{Z}_{ist}\mathbf{\delta_Z} + \mathbf{Z^U}_{ist}\mathbf{\delta_Z^U} + \varepsilon_{ist} \end{align*}
$y_{ist}$ | Agricultural outcomes / poverty measures |
$D_{ist}$ | # of dams in own district (harmful) |
$\Leftarrow$ Displacement, soil degradation, restricted water use | |
$D_{ist}^U$ | # of dams in upstream districts (beneficial) |
$\Leftarrow$ Irrigation | |
$\mathbf{Z}_{ist}$ | All the controls for own district |
$\mathbf{Z^U}_{ist}$ | All the controls for upstream districts |
\begin{align*} y_{ist} =& \ \gamma_i + \eta_{st} + \delta D_{ist} + \delta^U D_{ist}^U \\ & + \mathbf{Z}_{ist}\mathbf{\delta_Z} + \mathbf{Z^U}_{ist}\mathbf{\delta_Z^U} + \varepsilon_{ist} \end{align*}
$D_{ist}$: instrumented by fitted value from 1st stage
$D_{ist}^U$: instrumented by sum of fitted values from 1st stage for upstream districts
cf. Using the fitted value from 1st stage as IV yields the 2SLS estimator (see Section 9.5.1 of Greene (2000), p. 374)
(Table II of Duflo and Pande 2007)
(Table III of Duflo and Pande 2007)
(Table III of Duflo and Pande 2007)
(Table VIII of Duflo and Pande 2007)
1. Launch ArcMap 10 (it takes time)
2. Download the zipped dataset for Lecture 6
3. Save it to Desktop (C:\\Users\\yourname\\Desktop
)
4. Right-click it and choose 7-Zip > Extract to "Lecture6\"
C:\\Users\\yourname\\Desktop\\Lecture6
In the input
folder:
5. Right-click 10m-rivers-lake-centerlines.zip
and select 7-Zip > Extract to Here
Now in ArcMap's Catalogue Window:
6. Establish connection to data folder
7. Create two empty models in the code/models.tbx
exercises1-3
exercise4
We use SRTM30 (version 2.1)
Resolution: 30x30 arc seconds (roughly 1x1km)
Avaialable in 27 separate tiles
1. Log on to: webgis.com/srtm30.html
2. Click the tile covering your study area
3. Download:
***.dem.zip
(data file)
***.hdr.zip
(header file)
***.prj.zip
(projection file)
***.stx.zip
(statistics file)
For this lecture, these four files for India (060n40.***.zip
) are already downloaded
in the input/
folder.
4. Unzip them so the four files (.DEM, .HDR, .PRJ, .STX) are all saved in the same directory
5. Drag the .DEM file from ArcMap's Calatogue window
6. For geo-processing, use the .DEM file as the input file
7. If you use multiple tiles, Mosaic to New Raster appends them all
Make sure to set the appropriate pixel type
8_BIT_UNSIGNED
(i.e. 0 to 255)
16_BIT_SIGNED
(i.e. -32,768 to 32,767)
District-level data on fraction of river areas with gradient:
$\Rightarrow$ Obtain gradient of each segment of river by matching river polylines with gradient raster:
If each raster cell is not of equal size within a district
$\Rightarrow$ Need to project the raster data (cf. Lecture 7)
But this lecture assumes:
Each river segment is of equal size within the same district
$\frac{107551-96486}{120}*\frac{3}{30-15} \approx 18$
If each river segment is of equal size,
$\Rightarrow$ We can use Zonal Statitics (cf. Lecture 5) with two inputs:
Mean cell value among river cells | = | Fraction of river areas with gradient 3-6% |
$\uparrow$ | ||
Zonal Statistics |
1. Indicator raster for each gradient category
2. River polylines intersected & dissolved by district
3. % of river segments in each gradient category
Input: elevation data (E060N40.DEM
)
Geo-processing tools:
1. Slope
2. Reclassify (Spatial Analyst)
Return either $\theta$ or $\tan\theta$ in diagram below
$\tan\theta$ for cell $e$ is obtained by
\begin{align*} \Rightarrow \ \tan \theta = \sqrt{(dz/dx)^2+(dz/dy)^2} \end{align*}
where
\begin{align*} \frac{dz}{dx} = & \ \Big[\frac{c + 2f + i}{4} - \frac{a+2d+g}{4}\Big] / 2 \\ \frac{dz}{dy} = & \ \Big[\frac{a+2b+c}{4} - \frac{g + 2h + i}{4}\Big] / 2 \end{align*}
Input raster: ...\Lecture6\input\E060N40.DEM
Output raster: ...\Lecture6\temporary\slope.tif
Output Measurement: PERCENT_RISE
Z factor: 0.000009
1 if raster data is in projected coordinate system
0.000009 if geographic coordinate system and low-latitude areas
For middle- to high-latitude areas, use Project Raster first to use UTM projections (cf. Lecture 7)
Input raster: slope.tif
Reclass field: VALUE
Reclassification: see next slide
Output raster: ...\Lecture6\temporary\gradient3060.tif
Uncheck "Change missing values to NoData"
For gradient 3-6%:
Repeat for gradient 1.5-3% and over 6%
Inputs:
Natural Earth 1:10m river + lake centerlines (ne_10m_river_lake_centerlines.shp
)
Indian district polygons (gadm36_IND_2.shp
)
Geo-processing tools:
1. Intersect
2. Dissolve
cf. Lecture 5
Input Features (both in ...\Lecture6\input\
):
gadm36_IND_2.shp
ne_10m_rivers_lake_centerlines.shp
Output Feature Class: ...\Lecture6\temporary\intersect.shp
Input Features: intersect.shp
Output Feature Class: ...\output\river_by_district.shp
Dissolve_Field(s): GID_2
Check "Create multipart features"
Statistics Field(s)
NAME_1 FIRST
NAME_2 FIRST
Now save and run the Model.
Browse river_by_district.shp
To check if multipart features are properly created:
river_by_disrict.shp
and gadm36_IND_2.shp
gadm36_IND_2.shp
for GID_2 of the selected district
river_by_disrict.shp
and select the row for this GID_2
Geo-processing tools:
1. Zonal Statistics as Table
2. Table To Excel
Input raster or feature zone data: river_by_district.shp
Zone field: GID_2
Input value raster: gradient3060.tif
Output Table: ...\temporary\river_gradient3060.dbf
Check "Ignore NoData in calculation"
Statistics type: MEAN
Input Table: river_gradient3060.dbf
Output Excel File: ...\Lecture6\output\river_gradient3060.xls
Repeat for gradient 1.5-3% and over 6%
Look at
models.tbx\exercises1-3
exercises1-3.py
in the solutions4exercises/
folder
Use % of areas with slope > 15% as a measure of unsuitability for urban development
Finds housing supply inelastic in such areas
Slope of over 15%: now often used as IV for housing supply or urban development
Gradient 1.5-3%
Gradient 3-6%
Gradient over 6%
Elevation 250-500m
Elevation 500-1000m
Elevation over 1000m
Cannot use geographic coordinate systems
For small areas, use UTM (Lecture 3)
For large areas such as India:
No single map projection preserves distance in every direction
$\Rightarrow$ Project each district to appropriate UTM projection
$\Rightarrow$ Loop over UTM zone numbers
1. Assign UTM zone number to each river-by-district
2. Extract rivers whose centroid is within each UTM zone
3. Project to UTM
4. Calculate river length by district
5. Loop over steps 2-4 for all UTM zone numbers
$\Rightarrow$ Use Model Builder first, then edit the exported Python script to create a loop.
Inputs:
river_by_district.shp
(created in Exercise #2)
utm.shp
: UTM zone polygons, provided by ArcGIS
Geo-processing tools:
1. Feature To Point (cf. Lecture 4)
2. Spatial Join
3. Copy Features + Join Field
There is a tool called "Calculate UTM Zone"
But it's not useful at all.
Now open the empty model exercise4
It's best to separate a model from the one for Exercises 1-3
Input Features: ...\output\river_by_district.shp
Output Feature Class: ...\temporary\river_centroids.shp
Uncheck "Inside"
Target Features: river_centroids.shp
Join Features: C:\Program Files (x86)\ArcGIS\Desktop10.6\Reference Systems\utm.shp
Output Feature Class: ...\Lecture6\temporary\utm_zones.shp
Join Operation: JOIN_ONE_TO_ONE
Check "Keep All Target Features"
Field Map of Join Features: leave as it is
Match Option: Intersect
Input Features: river_by_district.shp
Output Feature Class: ...\Lecture6\temporary\river_utm.shp
Add fields from another shapefile's attribute table
merge
Overwrites the input. So use Copy Features first.
Input Table: river_utm.shp
Input Join Field: GID_2
Join Table: utmzones.shp
Output Join Field: GID_2
Join Field(s): check ZONE
NOTE: This tool overwrites the input data.
Create your data as a csv file
Then use Table to Table, to convert it into dBASE format.
Finally, use Join Field
We will learn this in Lecture 7
Replicate Figure IV of Duflo & Pande (2007)
Geo-processing tool: Select (Analysis)
This tool keeps features whose field values satisfy the conditions you specify
Useful in various situations
Input Features: river_utm.shp (2)
Output Feature Class: ...\Lecture6\temporary\river43.shp
Expression: "ZONE" = 43
Which geo-processing tool(s) should we use?
Input Dataset or Feature Class: river43.shp
Output Dataset or Feature Class: ...\temporary\river43_projected.shp
Output Coordinate System: WGS_1984_UTM_Zone_43N
Geo-processing tools: Add Geometry Attributes
cf. Surface area calculation (Lecture 4)
Input Features: river43_projected.shp
Geometry Property: LENGTH
Length Unit: METERS
NOTE: This tool overwrites the input data.
Input Table: river43_projected.shp (2)
Output Excel File: ...\Lecture6\output\river_length43.xls
You can directly use Add Geometry Attributes (cf. Lecture 4)
Input Features: river43.shp
Geometry Properties: LENGTH_GEODESIC
Area Unit: METERS
Coordinate System: WGS_1984_UTM_Zone_43N
Now export the model exercise4
as a Python script
Open both the exported script and the template script (code/template4L6.py
) with IDLE
We first review what we have learned (cf Lecture 2)
Then we create a loop over UTM zones
1. Copy geo-processing commands in exported script
2. Paste them below try:
in template script
3. Select them all
4. Indent them by pressing the tab key
1. Copy local variables in exported script
2. Paste them above try:
in template script
3. Sort them by inputs, intermediate files, and outputs
temporary\\river_by_district.shp
, ...\\utm.shp
output\\river_length43.xls
4. Specify the working directory (see Lecture 3)
arcpy.env.workspace = "../"
5. Remove directory path from file names
utm_zones_shp = "C:\\...\\utm_zones.shp"
$\Downarrow$
utm_zones_shp = "temporary\\utm_zones.shp"
6. Replace duplicated variables
rivers_utm_shp___2_
$\rightarrow$ rivers_utm_shp
river43_projected_shp___2_
$\rightarrow$ river43_projected_shp
Replace the lengthy 6th argument with ""
Whenever using Spatial Join in Python, make sure the 6th argument is empty.
Put the following geo-processing steps into a loop
At each turn of loop, change:
1. UTM zone number for Select
"\"ZONE\" = 43"
2. UTM coordinate system for Project
3. Output file names
Create a loop over values from 43 to 47
To do so, add the following command line
for zone in range(43,48):
for zone in range(43,48):
This command line does the following:
1. Creates a variable zone
that contains number 43
2. Executes all the indented commands below
3. Then assigns next number (44) to zone
4. Executes all the indented commands below
5. Repeats this until number 47 (not 48!)
Same can be done by the while loop
zone = 43
while zone < 48:
commands
zone = zone + 1
or the list loop (to be used in Lecture 7)
zoneList = [43,44,45,46,47]
for zone in zoneList:
commands
1. Add:
for zone in range(43,48):
arcpy.Select_analysis...
2. Indent the four geo-processing commands
Now the code should look like this:
for zone in range(43,48):
# Process: Select
arcpy.Select_analysis(...)
# Process: Project
arcpy.Project_management(...)
# Process: Add Geometry Attributes
arcpy.AddGeometryAttributes_management(...)
# Process: Table To Excel
arcpy.TableToExcel_conversion(...)
Let's first understand the syntax for Select tool
arcpy.Select_analysis(river_utm_shp, river43_shp, "\"ZONE\"=43")
1st argument: input file name
arcpy.Select_analysis(river_utm_shp, river43_shp, "\"ZONE\"=43")
2nd argument: output file name
arcpy.Select_analysis(river_utm_shp, river43_shp, "\"ZONE\"=43")
3rd argument: selection rule
"\"ZONE\"=43"
Must be string, thus enclosed by " "
"\"ZONE\"=43"
Field name must be enclosed by \" \"
"\"ZONE\"=43"
We want to replace this number at each turn of the loop
To do so, write as follows:
"\"ZONE\"="+str(zone)
"\"ZONE\"="+str(zone)
Variable zone
contains number, not string
But selection rule expression has to be string
"\"ZONE\"="+str(zone)
So transform number into string by str( )
"\"ZONE\"="+str(zone)
To "concatenate" (i.e. connect) two strings, use +
"\"ZONE\"="+str(zone)
Don't forget to enclose a string with " "
"\"ZONE\"="+str(zone)
We have two strings
"\"ZONE\"="+str(zone)
We have two strings concatenated by +
"\"ZONE\"="+str(zone)
Number must be converted into string by str()
"\"ZONE\"="+str(zone)
Field name must be enclosed by \" \"
Replace the 3rd argument for the Select tool
"\"ZONE\"= 43"
with
"\"ZONE\"="+str(zone)
Let's first understand the syntax for Project tool
arcpy.Project_management(river43_shp, river43_projected_shp, "PROJCS[...
1st argument: input file name
arcpy.Project_management(river43_shp, river43_projected_shp, "PROJCS[...
2nd argument: output file name
arcpy.Project_management(river43_shp, river43_projected_shp, "PROJCS[...
3rd argument: coordinate system name
(too long to be shown in this slide)
Use factory code (unique ID for each coordinate system)
UTM projections for zones 43-47:
32643, 32644, 32645, 32646, 32647
Then use SpatialReference method
factorycode = 32643
cs = arcpy.SpatialReference(factorycode)
arcpy.Project_management(river43_shp, river43_projected_shp, cs)
Create a variable for factory code
factorycode = 32643
cs = arcpy.SpatialReference(factorycode)
arcpy.Project_management(river43_shp, river43_projected_shp, cs)
SpatialReference method takes factory code as input
factorycode = 32643
cs = arcpy.SpatialReference(factorycode)
arcpy.Project_management(river43_shp, river43_projected_shp, cs)
Then returns that long string name for coordinate system
factorycode = 32643
cs = arcpy.SpatialReference(factorycode)
arcpy.Project_management(river43_shp, river43_projected_shp, cs)
We assign variable cs
to coordinate system name
factorycode = 32643
cs = arcpy.SpatialReference(factorycode)
arcpy.Project_management(river43_shp, river43_projected_shp, cs)
Use it as 3rd argument for Project
for zone in range(43,48):
factorycode = 32600 + zone
cs = arcpy.SpatialReference(factorycode)
arcpy.Project_management(river43_shp, river43_projected_shp, cs)
Then use variable zone
to change factory code for each turn of the loop
str(zone)
Modify the script as follows (marked in yellow)
for zone in range(43,48):
arcpy.Select_analysis(river_utm_shp, river43_shp, "\"ZONE\" = " + str(zone))
factorycode = 32600 + zone
cs = arcpy.SpatialReference(factorycode)
arcpy.Project_management(river43_shp, river43_projected_shp, cs)
Select the names of files to be created within loop
Cut-and-paste them to beginning of loop
for zone in range(43,48):
river43_shp = "river43.shp"
river43_projected_shp = "river43_projected.shp"
river_length43_xls = "river_length43.xls"
Don't forget indenting them
Replace 43
in the file names with "+str(zone)+"
for zone in range(43,48):
river43_shp = "rivers"+str(zone)+".shp"
river43_projected_shp = "rivers_pj"+str(zone)+".shp"
river_length43_xls = "river_length"+str(zone)+".xls"
This way, each turn of the loop will create different files
To explain what each command line does, add:
print "..."
print
must be a string
To indicate each turn of the loop:
for zone in range(43,48):
print "Processing UTM zone "+str(zone)
Another example of string concatenation
Now run the script by pressing F5 in IDLE
Look at solutions4exercises/exercise4.py
Do you remember which geo-processing tools you used for each of these tasks?