(ArcGIS 10 for Economics Research)
Masayuki Kudamatsu
7 December, 2018
Press SPACE to proceed.
To go back to the previous slide, press SHIFT+SPACE.
Cell-by-cell calculation across multiple raster datasets
1. Arithmetic operations
raster*2
)
ras1 + ras2
)
2. Functions
1. Mayshar, Moav, Neeman, & Pascali (2015)
2. Arithmetics in Map Algebra
3. Cell Statistics
4. Focal Statistics
5. Other geo-processing tools used in economics
This study was featured in Washington Post
What caused the formation of the state?
Important?
Original?
Feasible?
Potatoes, cassava, yam, taro, bananas...
Perishable upon harvest
Harvesting is non-seasonal, not visible (underground)
$\Rightarrow$ Costly to steal / confiscate
Wheat, rice, maize...
Storable
Harvest within a short period of time, clearly visible
$\Rightarrow$ Cheap to steal / confiscate
Building a state incurs a fixed cost
With roots & tubers, no incentive to build a state
Historical examples
cf. Coltan vs gold in DR Congo (Sanchez de la Sierra (2017))
State formation
(Figure 4 of Mayshar, Moav, Neeman, & Pascali (2015))
Relative productivity of cereals vs. tuber/roots
Resolution: 5 x 5 arc-minutes (about 10 X 10 km)
Data: potential yields by crop, based on climate & soil
$\Rightarrow$ Exogeneous to human activities
$\Rightarrow$ Widely used by economists
(Figure 7 of Mayshar, Moav, Neeman, & Pascali (2015))
Sample: 952 pre-colonial societies
\begin{align*} y_{ic} = \beta CAL\_DIF_i + \gamma MAX\_CAL_{i} + \mu_c + \varepsilon_{i} \end{align*}
$y_{ic}$ | # jurisdictional levels beyond villages |
$CAL\_DIF_i$ | Cereal's relative caloric yield to tuber |
$MAX\_CAL_i$ | Maximum caloric yield among all crops |
$\mu_c$ | Continent fixed effects |
We will learn how to calculate $CAL\_DIF_i$ and $MAX\_CAL_i$ from GAEZ data
(Table 3 of Mayshar, Moav, Neeman, & Pascali (2015))
Until 1500:
Cereals | Tubers | |
New World | Maize only | All but yams |
Old World | All but maize | Yams only |
$\Rightarrow$ Around 1500, cereals' relative productivity to tubers changed exogenously!
Sample: 159 countries from 1000 to 2000 (every 50 years)
\begin{align*} y_{ct} = \beta CAL\_DIF_{it} + \gamma MAX\_CAL_{it} + \mu_c + \eta_t*x_c + \varepsilon_{i} \end{align*}
$y_{ct}$ | Government above tribal level |
$\mu_c$ | Country fixed effect |
$\eta_t*x_c$ | Year fixed effect * country characteristics |
(Table 6 of Mayshar, Moav, Neeman, & Pascali (2015))
1. Launch ArcMap 10 (it takes time)
2. Download the zipped dataset for Lecture 8
3. Save it to Desktop (C:\\Users\\yourname\\Desktop
)
4. Right-click it and choose 7-Zip > Extract to "Lecture8\"
C:\\Users\\yourname\\Desktop\\Lecture8
Now in ArcMap's Catalogue Window:
5. Establish connection to data folder
We directly write a Python script to use Map Algebra
Caloric yields by crop at each 5-minute cell across the world
$\Rightarrow$ Multiply it w/ crop-specific conversion factor
(Table A.1 of Online Appendix for Galor & Ozak 2016)
So we need to edit each 5-minute cell of GAEZ
$\Rightarrow$ Use Map Algebra
To use Map Algebra, start the script by:
import arcpy
from arcpy.sa import *
arcpy.CheckOutExtension("spatial")
.../Lecture8/code/template4L8ex*.py
To use Map Algebra, start the script by:
import arcpy
from arcpy.sa import *
arcpy.CheckOutExtension("spatial")
Without this line, every method for Map Algebra has to have the prefix arcpy.sa.
To use Map Algebra, start the script by:
import arcpy
from arcpy.sa import *
arcpy.CheckOutExtension("spatial")
Map Algebra is part of Spatial Analyst extension.
Without this, you cannot use it
To use Map Algebra, we need to understand the concept of Raster Object
To use arithmetic operators, first convert each raster data into a Raster object by Raster method:
maize_yield = Raster("maize.tif")
cf. If the script didn't start with
from arcpy.sa import *
we have to write:
maize_yield = arcpy.sa.Raster("maize.tif")
Raster object can be used just like X in high-school algebra
maize_yield = Raster("maize.tif")
maize_calorie = maize_yield * 36.5
(taken from ArcGIS Help)
Output from calculation is also a Raster object
maize_yield = Raster("maize.tif")
maize_calorie = maize_yield * 36.5
It's not saved in the disk. Just in memory.
To save a raster object in the disk, use the save method
maize_yield = Raster("maize.tif")
maize_calorie = maize_yield * 36.5
maize_calorie.save("maize_calorie.tif")
To save a raster object in the disk, use the save method
maize_yield = Raster("maize.tif")
maize_calorie = maize_yield * 36.5
maize_calorie.save("maize_calorie.tif")
Add .save()
to raster object name
To save a raster object in the disk, use the save method
maize_yield = Raster("maize.tif")
maize_calorie = maize_yield * 36.5
maize_calorie.save("maize_calorie.tif")
Within ( ), specify the output raster file name
Various mathematical functions available
To use functions, the input raster doesn't have to be a Raster object
Output of a function is a Raster object, though
$\Rightarrow$ To save the output, use save method
For some reason, the following way of specifying relative file paths (cf. Lecture 3) does not work with Map Algebra
import os
work_dir = os.path.dirname(os.path.realpath(__file__))
os.chdir(work_dir)
arcpy.env.workspace = "../"
input_raster = "input/GAEZ/maize.tif"
output_raster = "output/maize_calorie.tif"
The problem seems to be arcpy.env.workspace
So we directly specify the file paths:
import os
work_dir = os.path.dirname(os.path.realpath(__file__))
os.chdir(work_dir)
input_raster = "../input/GAEZ/maize.tif"
output_raster = "../output/maize_calorie.tif"
We will later use arcpy.env.workspace
for another purpose
Start with code/template4L8ex1.py
Inputs
../input/GAEZ/maize.tif
Output
../output/maize_calorie.tif
print "Inputs being set..."
input_raster = "../input/GAEZ/maize.tif"
conversion_factor = 36.5
print "Outputs being set..."
output_raster = "../output/calorie_maize.tif"
try:
maize_yield = Raster(input_raster)
maize_calorie = maize_yield * conversion_factor
maize_calorie.save(output_raster)
except arcpy.ExecuteError:
...
How do we repeat this for other 14 crops?
Naive approach by 1st-year PhD student:
We should avoid copy-and-paste
Use a loop over 15 crops
If tempted to copy-and-paste during programming
$\Rightarrow$ Youl should better use a loop
One example of what's known as defensive programming
"Computer scientists assume programmers will make mistakes, and instead of chiding people to “just be careful,” they have developed a battery of practices to address the problem." (Nick Eubank)
(Click the link to learn more)
We need a loop over crop names
Each crop has a different conversion factor
$\Rightarrow$ Link crop name + conversion factor in each turn of loop
$\Rightarrow$ Use dictionary in Python
Another of Python data types to be assigned to a variable
It's a set of pairs
Just like a Japanese-English dictionary
dict = {'watashi':'I', 'anata':'you', 'kare':'he'}
Enclose the whole by { }
dict = {'watashi':'I', 'anata':'you', 'kare':'he'}
Each pair is separated by , (comma)
dict = {'watashi':'I', 'anata':'you', 'kare':'he'}
Each element within a pair is separated by : (colon)
dict = {'watashi':'I', 'anata':'you', 'kare':'he'}
can be replaced with any other variable name of your choice
dict = {'watashi':'I', 'anata':'you', 'kare':'he'}
1st element in pair: called key
dict = {'watashi':'I', 'anata':'you', 'kare':'he'}
2nd element in pair: called value
dict = {'watashi':'I', 'anata':'you', 'kare':'he'}
Once this dictionary called dict
is defined:
dict['watashi']
|
returns | 'I'
|
dict['anata']
|
returns |
'you'
|
dict['kare']
|
returns |
'he'
|
dict = {}
dict['watashi'] = 'I'
dict['anata'] = 'you'
dict['kare'] = 'he'
Create an empty dictonary
dict = {}
dict['watashi'] = 'I'
dict['anata'] = 'you'
dict['kare'] = 'he'
Assign key 'watashi' to value 'I'
dict = {}
dict['watashi'] = 'I'
dict['anata'] = 'you'
dict['kare'] = 'he'
More readable if there are many key-value pairs
dict = {}
dict['watashi'] = 'I'
dict['anata'] = 'you'
dict['kare'] = 'he'
dict['nihongo'] = 'Japanese'
dict['eigo'] = 'English'
Also you can add more key-value pairs later
key_list = dict.keys()
$\Rightarrow$ Can create a loop over keys by
for key in key_list:
Start with code/template4L8ex2.py
1. Create a dictionary of calorie conversion factors
2. Loop over 15 crops
calorie = {}
calorie['barley'] = 35.2
calorie['buckwheat'] = 34.3
calorie['maize'] = 36.5
calorie['millet_pearl'] = 37.8
calorie['millet_foxtail'] = 37.8
calorie['oat'] = 24.6
calorie['rice_indica'] = 37
calorie['rice_wetland'] = 37
calorie['rye'] = 33.8
calorie['sorghum'] = 33.9
calorie['wheat'] = 34.2
calorie['cassava'] = 16
calorie['sweetpotato'] = 8.6
calorie['whitepotato'] = 7.7
calorie['yams'] = 11.8
Create a Python list of crop names as crops
:
crops = calorie.keys()
for crop in crops:
Set a loop over crop names:
crops = calorie.keys()
for crop in crops:
Specify the directory in which the input raster is saved
crops = calorie.keys()
for crop in crops:
arcpy.env.workspace = "../input/GAEZ"
crop_yield = Raster(crop+".tif")
Create a Raster object
crops = calorie.keys()
for crop in crops:
arcpy.env.workspace = "../input/GAEZ"
crop_yield = Raster(crop+".tif")
Retrieve conversion factor as dictionary value for key crop
crops = calorie.keys()
for crop in crops:
arcpy.env.workspace = "../input/GAEZ"
crop_yield = Raster(crop+".tif")
calorie_yield = crop_yield * calorie[crop]
Convert yield into calorie by Map Algebra
crops = calorie.keys()
for crop in crops:
arcpy.env.workspace = "../input/GAEZ"
crop_yield = Raster(crop+".tif")
calorie_yield = crop_yield * calorie[crop]
crops = calorie.keys()
for crop in crops:
arcpy.env.workspace = "../input/GAEZ"
crop_yield = Raster(crop+".tif")
calorie_yield = crop_yield * calorie[crop]
arcpy.env.workspace = "../output"
calorie_yield.save("calorie_"+crop+".tif")
Use the same name as the script that creates the output
If a script creates multiple outputs
$\Rightarrow$ use the script's name as prefix
calorie.py
$\rightarrow$ calorie_maize.tif
, calorie_cassava.tif
, etc.
So save the script as calorie.py
If your script crashes, close Python Shell from the previous execution
Maximum caloric yield among cereal crops
Maximum caloric yield among tubers
Difference of these two (main regressor)
Maximum caloric yield among all crops (control)
$\Rightarrow$ Cell Statistics tool can handle these
Calculate summary statistics in each raster cell
Inputs: multiple raster datasets
$\Rightarrow$ Use a Python list to specify inputs
1st argument: input raster list
If input_rasters
is a Python list of calorie rasters, we can write:
max_calorie = CellStatistics(input_rasters,"MAXIMUM","DATA")
2nd argument: statistics to be calculated
Don't forget to enclose it with " "
max_calorie = CellStatistics(input_rasters,"MAXIMUM","DATA")
See ArcGIS Help for other statistics
max_calorie = CellStatistics(input_rasters,"MAXIMUM","DATA")
3rd argument: ignore missing values if "DATA"
If "NODATA", output cell value is missing wherever at least one input raster data has a missing value
Now how to specify the list of input rasters?
input_rasters = ???
max_calorie = CellStatistics(input_rasters,"MAXIMUM","DATA")
Directly writing file names: likely to make a mistake
When file names are changed, this script will crash
ListRasters method helps
arcpy.env.workspace = "../output"
input_rasters = arcpy.ListRasters("calorie_*", "TIF")
max_calorie = CellStatistics(input_rasters,"MAXIMUM","DATA")
Collect all TIFF files with "calorie_" as prefix
ListRasters method helps
arcpy.env.workspace = "../output"
input_rasters = arcpy.ListRasters("calorie_*", "TIF")
max_calorie = CellStatistics(input_rasters,"MAXIMUM","DATA")
Raster data folder is specified by arcpy.env.workspace
For other file types
Fields (i.e. columns in attribute table)
arcpy.env.workspace = "../output"
input_rasters = arcpy.ListRasters("calorie_*", "TIF")
max_calorie = CellStatistics(input_rasters,"MAXIMUM","DATA")
This code is fine if we need maximum among all crops
To obtain maximum for cereals and for tubers separately...
By renaming all cereal (tuber) raster files as "calorie_cereal_" ("calorie_tuber_"):
arcpy.env.workspace = "../output"
input_cereals = arcpy.ListRasters("calorie_cereal_*", "TIF")
max_cereal = CellStatistics(input_cereals, "MAXIMUM", "DATA")
input_tubers = arcpy.ListRasters("calorie_tuber_*", "TIF")
max_tuber = CellStatistics(input_tubers, "MAXIMUM", "DATA")
Maximum will be calculated for each type of crops
$\Rightarrow$ Revise calorie.py
as in next slide:
Use if/else statement to branch by crop type
for crop in crops:
arcpy.env.workspace = "../input/GAEZ"
crop_yield = Raster(crop+".tif")
calorie_yield = crop_yield * calorie[crop]
arcpy.env.workspace = "../output"
if crop == "cassava" or crop == "sweetpotato" or crop == "whitepotato" or crop == "yams":
calorie_yield.save("calorie_tuber_"+crop+".tif")
else:
calorie_yield.save("calorie_cereal_"+crop+".tif")
Start with code/template4L8ex3.py
1. Create the list of caloric yield rasters for cereal and tuber
2. Calculate caloric difference between cereal and tuber
3. Obtain maximum caloric yield for all crops
4. Save the outputs
Use simple arithmetic expression
max_cereal = CellStatistics(input_cereals, "MAXIMUM", "DATA")
max_tuber = CellStatistics(input_tubers, "MAXIMUM", "DATA")
caloric_diff = max_cereal - max_tuber
Use Cell Statistics again
max_cereal = CellStatistics(input_cereals, "MAXIMUM", "DATA")
max_tuber = CellStatistics(input_tubers, "MAXIMUM", "DATA")
caloric_diff = max_cereal - max_tuber
max_calorie = CellStatistics([max_cereal, max_tuber], "MAXIMUM", "DATA")
Use the save method
max_cereal = CellStatistics(input_cereals, "MAXIMUM", "DATA")
max_tuber = CellStatistics(input_tubers, "MAXIMUM", "DATA")
caloric_diff = max_cereal - max_tuber
max_calorie = CellStatistics([max_cereal, max_tuber], "MAXIMUM", "DATA")
arcpy.env.workspace = "../output"
max_cereal.save("max_calorie_cereal.tif")
max_tuber.save("max_calorie_tuber.tif")
caloric_diff.save("max_calorie_diff.tif")
max_calorie.save("max_calorie_all.tif")
And save the script as max_calorie.py
Look at solutions4exercises/
folder
Obtain average caloric difference within 20-mile radius of each society
INPUT: input/ethnographic_atlas.shp
Which geo-processing tools should we use?
Solution: solutions4exercises/tribes20m.py
1. Create indicator raster for New World
2. Use the Con tool to assign 0 calorie to relevant cells
Con(americas=1, calorie_maize, 0)
returns:
Calculates summary statistics in neighbouring raster cells
e.g. Sum of immediate neighbors
You can define the neighborhood very flexibly (more detail)
Rectangle
You can define the neighborhood very flexibly (more detail)
Circle
You can define the neighborhood very flexibly (more detail)
Annulus (ring)
You can define the neighborhood very flexibly (more detail)
Wedge
You can define the neighborhood very flexibly (more detail)
Irregular
You can define the neighborhood very flexibly (more detail)
Weight
Estimate terrain ruggedness's impact on income per capita
Negative outside Africa
Positive in Africa
Originally proposed by Riley et al. (1999)
Defined as: \begin{align*} TRI_{xy} =& \sqrt{\sum_{i=x-1}^{x+1} \sum_{j=y-1}^{y+1} (e_{ij}-e_{xy})^2} \end{align*}
$e_{xy}$ | Elevation at longitude $x$ latitude $y$ |
Can be obtained by Focal Statistics + Map Algebra
Raster cell (30x30 arc-sec) level data is downloadable from Diego Puga's website
Expand the expression inside the square root:
\begin{align*} TRI_{xy} =& \sqrt{\sum_i \sum_j (e_{ij})^2 - 2e_{xy}\sum_i\sum_j e_{ij} + 9 (e_{xy})^2} \end{align*}
Map Algebra calculates $(e_{ij})^2$:
elev = Raster("srtm30.tif")
elev_sq = elev**2
Expand the expression inside the square root:
\begin{align*} TRI_{xy} =& \sqrt{\sum_i \sum_j (e_{ij})^2 - 2e_{xy}\sum_i\sum_j e_{ij} + 9 (e_{xy})^2} \end{align*}
Focal Statistics calculates $\sum_i \sum_j (e_{ij})^2$ and $\sum_i\sum_j e_{ij}$:
sum_elev_sq = FocalStatistics(elev_sq, "", "SUM", "")
sum_elev = FocalStatistics(elev, "", "SUM", "")
Expand the expression inside the square root:
\begin{align*} TRI_{xy} =& \sqrt{\sum_i \sum_j (e_{ij})^2 - 2e_{xy}\sum_i\sum_j e_{ij} + 9 (e_{xy})^2} \end{align*}
Map Algebra sums them up and takes square root:
TRI_square = sum_elev_sq - 2*elev*sum_elev + 9*elev_sq
TRI = SquareRoot(TRI_square)
TRI.save("ruggedness.tif")
How city shapes affect population, wages, and housing rents
How can we measure city shapes?
Use nighttime light image
Luminosity over 40 is defined as urban area
$\Rightarrow$ Reclassify tool (cf. Lec 6)
Then use the Region Group tool, to assign unique ID to each urban area
For shape metrics, use Spatial Statistics Toolbox
Estimate the externality of military efforts on allied partners' effort in DR Congo civil wars
Use rainfall shock as instruments
But how do we measure the location of each military actor?
Minimum Bounding Geometry tool with Convex Hull option
Divide surface by nearest point (aka Voronoi partition)
Alesina et al. (2016) use it as a robustness check to Murdock's ethnic homeland boundaries
Alsan (2015) (cf. Lec 2) uses it as an alternative to the 20-miles radius of Ethnographic Atlas society location
Clip (Analysis): clip vector by polygon
Extrast By Mask: clip raster by polygon
Erase: open holes in polygons by polygons