Watch the Borders (GIS pro tip)

December 08, 2016

Speaking of the countryside, Evgeny Finkel, Dmitrii Kofanov (UW grad student), and I are writing a short paper on peasant unrest in 1917 for a special issue of Slavic Review on the 100-year anniversary of the Russian Revolution. There are limits to what you can do in a 3000-word essay for an interdisciplinary audience—data description, a couple of maps (the figure here illustrates disturbances, March–October 1917, using one Soviet-era data source), and a few statistical comparisons. We’re especially interested in the impact of soil fertility and the legacy of serfdom on unrest in 1917. But there is a lot of unexplained variation, and what we don’t observe is likely spatially correlated—what drives unrest in Tambov, or reports thereof, may be similar to that in Penza. So, spatial regression.

There’s more than one way to run a spatial model. A key choice is the type of weighting matrix to use. One such matrix is a “contiguity” matrix, which assumes similarity among immediate neighbors only: Tambov and Penza but not Tambov and Simbirsk. An easy way to generate such a matrix is to run a GIS shapefile through the appropriate routine in Stata or R. And in our case, the resulting matrix was wrong: for many regions, some but not all neighbors showed up in the matrix. Look at the map: Archangel has two neighbors, only one of which is recognized by Stata.
My best guess is that there is an infinitesimally small “demilitarized zone” that runs between many of the regions in our data. You can’t see it on the map, but it’s messing with the identification of neighbors.

Anybody looking for a class replication project? Take a paper with spatial regressions and reproduce the spatial-weighting matrixes by hand. And let me know if you find anything. My hunch is that this is a common problem.