(ArcGIS 10 for Economics Research)
Masayuki Kudamatsu
5 October, 2018
Press SPACE to proceed.
To go back to the previous slide, press SHIFT+SPACE.
1. Why GIS for economics?
2. Satellite images and scanned old maps
3. GIS software
4. Polygon, polyline, point, and raster
5. Coordinate systems
Satellite images & old maps (this lecture)
Merge datasets by proximity (Lecture 2)
e.g., weather data with survey data
Estimate the spillover effect on the control group (Lecture 3)
Control for more covariates / fixed effects (Lectures 3, 5)
Instruments (Lectures 4, 6)
RD-design (Lectures 7)
# of districts in a province $\uparrow$
$\Rightarrow$ Each district govt official engages in Cournot competition in selling (illegal) logging permits
$\Rightarrow$ Deforestation in the province $\uparrow$
Cannot rely on official stats of logging
$\Rightarrow$ Use satellite images
(Figure I of Burgess et al. 2012)
(Figure II of Burgess et al. 2012)
Henderson et al (2012): correlate with real GDP growth
Pinkovskiy & Sala-i-Martin (2016): accuracy of GDP versus household surveys
Michalopoulos & Papaioannou (2013, 2014), Alesina et al (2016): measure ethnicity-level development in Africa
Hodler & Raschky (2014): Presidents' home region brighter
Storeygard (2016): Impact of trade infrastructure in Africa
Henderson et al (2018): Geographic correlates of light
Campante and Yanagizawa-Drott (2018): Impact of air links
For more satellite image examples, see a survey by Donaldson and Storeygard (2016).
Digitize Michelin maps for Kenya since 1961
Track road network expansion over time
See if the president's ethnic group gets more roads built than other groups
(source: Remi Jedwab's presentation slide)
Drawn by Murdock (1959)
Digitized by Nunn (2008), to match ethnicity-level data on slave trade with country-level data (Lecture 4)
Also used by Alsan (2015) (Lecture 2)
Other examples: Nunn & Wantchekon (2011), Michalopoulos & Papaioannou (2013, 2014, 2016), Alesina et al. (2016)
Figure II of Nunn (2008)
Satellite images: some are freely available but very costly (time & money) to process
Old maps: digitizing is also time-consuming but feasible with patience in two steps:
$\Rightarrow$ This course helps you use these datasets
ArcGIS
QGIS
$\Rightarrow$ For the ease of use of Python (for replication), we will learn ArcGIS
R
Purchasing ArcGIS
License fee (in case of Japan): 18,000 yen per year
Make sure the package you buy include:
1. Show file extensions (.shp
etc) in File Explorer
2. Install 7-Zip for 64-bit Windows
.tar
files
1. Launch ArcMap 10 (it takes time)
2. Download the zipped dataset for lecture 1
3. Save it to Desktop (C:\\Users\\yourname\\Desktop
)
4. Right-click it and choose 7-Zip > Extract to "Lecture1\"
C:\\Users\\yourname\\Desktop\\Lecture1
Browse the inside of the Lecture1
folder
I've created 4 folders:
code/
: files to edit datasets (e.g. Python scripts)
input/
: original data
output/
: final data to be used for analysis
temporary/
: other files created during process
It's a standard directory structure to organize files for empirical analysis
Now browse the input/
folder
5. Right-click all the .zip files and select 7-Zip > Extract Here
10m-rivers-lake-centerlines.zip
gadm36.zip
glds90ag.zip
Leave F162008.v4.tar
(nighttime light raster)
for the time being.
Spatial data comes in two different formats
How to edit data differs a lot between them
We now learn how to browse spatial datasets in ArcGIS while learning these different formats of spatial data
Each spatial unit in vector data is called a feature
Three types of a feature:
A set of features of the same type: a feature class
File format: Shapefile (.shp)
Represent geographic zones
Data to be read: GADM (widely used by economists)
gadm36/gadm36_0.shp
(countries)
gadm36/gadm36_2.shp
(subnational districts)
Drag these files in Catalog Window to Data Frame
If you don't see Catalog Window...
$\Rightarrow$ Click Windows in the menu bar
If you don't see the data directory in Catalogue Window...
$\Rightarrow$ Right-click Folder Connections and click Connect To Folder...
Represent networks / routes
Data to be read: Natural Earth's Rivers and Lake Centerlines data (10m_rivers_lake_centerlines.shp
)
Browse this data (cf. Exercise 1)
Uncheck the subnational boundary data in Table of Contents
$\Rightarrow$ Now rivers are shown on national boundaries.
Change the color of rivers to blue.
If you read the river data first and then the national boundary data, the river data would be hidden below the national boundary data.
If you cannot drag the data in the Table of Contents window, check if the List By Drawing Order icon is selected on top left.
Represent point location
Can easily be created from XY data
Each row: point feature
Column 1: longitude (x value)
Column 2: latitude (y value)
Other columns: attributes of point feature
1. GPS receivers
If you conduct your own survey
2. Online gazetteer
If location names are available, search at:
See Sundberg et al. (2010) for an example protocol
3. Geocoding tools
If postal address is available, use:
Data format: Comma-delimited text (csv) / Excel worksheet
export delimited
(Stata help)
Code example for 10 dicimal digits for longitude/latitude
format lon %15.10f
format lat %14.10f
export delimited lon lat using filename.csv, replace
To convert XY data into a point feature class, use:
These are the examples of geo-processing tools
Data to be read: CEPII GeoDist Database (geo_cepii.xls
)
Browse the data in Excel
To implement geo-processing tools, we use Model Builder
XY Table: input/geo_cepii.xls/geo_cepii$
X Field: lon
Y Field: lat
Spatial Reference: WGS 1984
If you don't know what to fill in on the geo-processing tool window:
This tool creates a temporary layer
But the layer often doesn't properly work with other tools
We also want to save the point feature class in the disk
$\Rightarrow$ Always use the Copy Features tool to convert the layer into a shapefile data
Input Features: the output from the Make XY Event Layer
Output Feature Class: ...\Desktop\Lecture1\output\cities.shp
Ignore other options. Rarely used.
Also useful if you want to keep the original data intact
To save the model:
code/
)
lec1.tbx
)
exercise3
)
A model can only be saved inside the toolbox.
Save frequently; ArcGIS often crashes.
To edit an existing model:
Now run the Model by clicking the triangle icon at top right
Browse the output point feature shapefile
Capital cities (and other major cities) around the world should appear as point features.
One more thing about vector data...
Contains fields (i.e. variables) which can take a different value for each feature
To browse in ArcMap:
Spatial data comes in two different formats
How to edit data differs a lot between them
Divides the earth surface into many "square" cells (or pixels)
Each cell contains one value
Often created from satellite images
Examples:
Can create a new variable for vector data (Lecture 5, 6, 8)
Most file formats can be browsed in ArcMap
.tif
)
$\Rightarrow$ I recommend using TIFF format
For the ASCII format (.asc
), you need to convert it into TIFF (Excercise #5 below)
Data to be read: DMSP-OLS nighttime light for year 2008
F162008.v4.tar
.gz
and .tfw
files.
.gz
files to the same folder as .tfw
files
.tfw
file allows ArcMap to read a TIFF image of the same file name as geo-referenced (see Greenberg 2003 for detail)
F162008.v4b_web.stable_lights.avg_vis.tif
Data to be read: population density in 1990 (input/glds90ag.asc
)
Use the ASCII to Raster tool
output/popden90.tif
.tif
" to save in the TIFF formatWhen you read the converted population density raster, you'll get a pop-up alert message: "Unknown Spatial Reference"
We'll come back to this issue shortly.
Stata can read raster in the ASCII format, with ras2dta ado (Muller 2005)
Each cell becomes one row in Stata
To export raster as the ASCII format, use the Raster to ASCII tool in ArcGIS
NetCDF files: Widely used for time-series raster data
To browse NetCDF data at each point in time, use:
Earth is a sphere (approximately)
Various ways to two-dimensionally represent the earth surface
Each way corresponds to a coordinate system
aka. spatial reference / map projection
To calculate distance and surface area properly
To merge different spatial datasets accurately
cf. Apple Map did this wrong when it was launched in 2012
See Eubank (2018) for more exact explanation on the difference
Each location is coded by degrees
e.g. Osaka: 34.6937° North, 135.5022° East
Not suitable for calculating surface area
But useful for calculating distance between two locations
WGS 1984: most popular
Earth surface is projected by the "light" from the center of the earth on cylinder:
Earth surface is projected by the "light" from the center of the earth on cone:
Earth surface is projected by the "light" from the center of the earth on plane:
Each location: coded in meters from a certain origin
WGS 1984
UTM
Any equal area projections
Differ just in how the world is shown
Differ just in how the world is shown
Differ just in how the world is shown
Map Projections: A Working Manual, by John P. Snyder (U.S. Geological Survey, 1987) (Downloadable for free)
Project: for vector data
Project Raster: for raster data (Lecture 7)
Define Projection: if undefined (for both vector & raster)
Don't use Project (Coverage) or Define Projection (Coverage)
Spatial data usually comes with a meta data that specify the coordinate system used when the data is created
GPW's meta data says it's WGS 1984
Use Define Projection to assign WGS 1984 (cf. Exercise #3)
Note that this geo-processing tool overwrites the input file.
Suppose we are interested in calculating the surface area of districts around the world
Sinusoidal projection allows you to calculate surface area properly
Use the Project tool to change the coordinate system from WGS 1984
Sinusoidal projection is found at:
Projected Coordinate Systems > World > Sinusoidal(world)
Browse the output. Does it look like this?
You cannot overlay data with different coordinate systems
ArcMap displays data in the coordinate system of the data read first
To browse a data with a different coordinate system, open a new map document
Saves the way you overlay / color-code / symbolize different spatial datasets
File extension: .mxd
DOES NOT contain spatial data. It just has links to them
Set the relative path to refer to each data
code/
folder
Now you should see something like this:
See code/solutions4exercises.tbx
Browse the spatial data in Windows's File Explore
You'll see many, many files for one dataset
.shp
+ .shx
+ .dbf
+ .prj
+ ...
.tif
+ .tfw
+ .ovr
+ .aux
+ .prj
.dbf |
Attribute table |
.prj |
Projection file |
$\Rightarrow$ Use Catalogue Window in ArcMap, not File Explore, to move / copy / delete spatial data
Do you remember which geo-processing tools you used for each of these tasks?
Yale University Library: GIS Workshop Archive
Keep an eye on the publication of papers using spatial data