공간자기상관성 분석
개요
-
공간데이터(Spatial data)들은 순수한 자신만의 정보를 가지고 있을 뿐만 아니라, 지리적(geographical space)에 관한 정보를 함께 포함하고 있다. 이러한 공간데이터를 분석하기 위하여 기존의 많은 선형모델들을 적용하여 해석하려고 했으나, ‘공간’이라는 요인을 고려하지 못하여 의미있는 분석결과를 도출하지 못하였다.
-
Doreian(1981)에 따르면 “변수들이 무작위적이고 오류항이 독립적이며 동일한 분포를 갖는 다는 가정을 하는 전통적인 선형분석 방법들로 공간준거 데이터(spatially-referenced data)를 분석할 경우 많은 사회경제현상, 인구현상 및 자연현상(natural phenomena)이 공간상에서 나타나는 특성인 의존성(spatial dependence) 및 상호작용(spatial interaction)을 통제하지 못한다.”고 하였다.
-
Tober의 지리의 제1법칙(the first law of geography) - Everything is realted everything else, but near things are more related than distant things. 에서와 같이 공간상의 객체들은 공간상에 무작위(random)하게 있지 않고 서로간에 영향을 주고받으며 존재한다고 할 수 있다.
-
이와 같이 지리적 공간상에서 공간객체간의 상호의존성과 상호작용을 공간적자기상관(spatial autocorrelation)이라고 할 수 있다.
-
Anselin and Bera(1998)는 “공간상에 분포하는 공간객체들은 위치의 유사성이 높아짐에 따라 객체들이 갖는 값의 유사성도 높아가는 현상”이라고 정의하기도 하였다.
-
공간적자기상관에는 '정적 공간자기상관(positive spatial autocorrelation)'과 '부적 공간자기상관(negative spatial autocorrelation)'이 있다. 정적 공간자기상관은 공간실체들이 서로 유사한 값을 갖으며 군집적으로 분포하는 경우이며, 반대로 부적 공간자기상관은 공간실체들이 서로 상이한 값들을 갖으며 군집적으로 분포하는 경우이다.
분석 데이터
- 행정동 단위의 모든 데이터에 적용 가능
- Ex: 행정동별 교통, 유동인구, 범죄 데이터 등
- 현재 MBD 데이터는 불완전하기 때문에 행정동 별 임의의 데이터를 대상으로 분석 노트북을 작성하였음
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import geopandas as gpd
shp_link = "AL_00_D001_20200106(EMD).shp"
import sys
import os
sys.path.append(os.path.abspath('..'))
from pysal.explore import esda
import pandas as pd
import geopandas as gpd
import pysal.lib as lps
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = gpd.read_file('AL_00_D001_20200106(EMD).shp', encoding="euckr")
df.head()
A0 | A1 | A2 | A3 | geometry | |
---|---|---|---|---|---|
0 | 295 | 11110140 | 삼청동 | 2019-02-21 | POLYGON ((197597.735 454551.233, 197599.083 45... |
1 | 296 | 11110101 | 청운동 | 2019-02-21 | POLYGON ((196524.180 453809.271, 196541.748 45... |
2 | 297 | 11110168 | 동숭동 | 2019-02-21 | POLYGON ((200515.600 453698.981, 200521.372 45... |
3 | 315 | 11110166 | 연건동 | 2019-02-21 | POLYGON ((199711.309 453300.989, 199718.145 45... |
4 | 316 | 11110105 | 창성동 | 2019-02-21 | POLYGON ((197592.504 453280.724, 197592.907 45... |
df.columns
Index(['A0', 'A1', 'A2', 'A3', 'geometry'], dtype='object')
df['A0'].describe()
count 5079.000000 mean 2251.962985 std 5697.271617 min 3.000000 25% 120.000000 50% 381.000000 75% 1329.000000 max 61906.000000 Name: A0, dtype: float64
ax = df.plot(figsize=(11, 11), color="w", edgecolor="k")
#ax.set_title("South Korea")
ax.set_axis_off()
plt.show()
Similarity
wq = lps.weights.Queen.from_dataframe(df)
wq.transform = 'r'
('WARNING: ', 539, ' is an island (no neighbors)') ('WARNING: ', 663, ' is an island (no neighbors)') ('WARNING: ', 705, ' is an island (no neighbors)') ('WARNING: ', 718, ' is an island (no neighbors)') ('WARNING: ', 719, ' is an island (no neighbors)') ('WARNING: ', 725, ' is an island (no neighbors)') ('WARNING: ', 726, ' is an island (no neighbors)') ('WARNING: ', 735, ' is an island (no neighbors)') ('WARNING: ', 745, ' is an island (no neighbors)') ('WARNING: ', 746, ' is an island (no neighbors)') ('WARNING: ', 747, ' is an island (no neighbors)') ('WARNING: ', 748, ' is an island (no neighbors)') ('WARNING: ', 749, ' is an island (no neighbors)') ('WARNING: ', 919, ' is an island (no neighbors)') ('WARNING: ', 953, ' is an island (no neighbors)') ('WARNING: ', 1472, ' is an island (no neighbors)') ('WARNING: ', 1653, ' is an island (no neighbors)') ('WARNING: ', 1778, ' is an island (no neighbors)') ('WARNING: ', 2001, ' is an island (no neighbors)') ('WARNING: ', 2364, ' is an island (no neighbors)') ('WARNING: ', 2864, ' is an island (no neighbors)') ('WARNING: ', 3057, ' is an island (no neighbors)') ('WARNING: ', 3140, ' is an island (no neighbors)') ('WARNING: ', 3213, ' is an island (no neighbors)') ('WARNING: ', 3308, ' is an island (no neighbors)') ('WARNING: ', 3337, ' is an island (no neighbors)') ('WARNING: ', 3400, ' is an island (no neighbors)') ('WARNING: ', 3597, ' is an island (no neighbors)') ('WARNING: ', 3769, ' is an island (no neighbors)') ('WARNING: ', 4000, ' is an island (no neighbors)') ('WARNING: ', 4251, ' is an island (no neighbors)') ('WARNING: ', 4327, ' is an island (no neighbors)') ('WARNING: ', 4328, ' is an island (no neighbors)') ('WARNING: ', 4329, ' is an island (no neighbors)') ('WARNING: ', 4330, ' is an island (no neighbors)') ('WARNING: ', 4331, ' is an island (no neighbors)') ('WARNING: ', 4333, ' is an island (no neighbors)') ('WARNING: ', 4334, ' is an island (no neighbors)') ('WARNING: ', 4336, ' is an island (no neighbors)') ('WARNING: ', 4337, ' is an island (no neighbors)') ('WARNING: ', 4338, ' is an island (no neighbors)') ('WARNING: ', 4341, ' is an island (no neighbors)') ('WARNING: ', 4346, ' is an island (no neighbors)') ('WARNING: ', 4347, ' is an island (no neighbors)') ('WARNING: ', 4362, ' is an island (no neighbors)') ('WARNING: ', 4372, ' is an island (no neighbors)') ('WARNING: ', 4373, ' is an island (no neighbors)') ('WARNING: ', 4379, ' is an island (no neighbors)') ('WARNING: ', 4380, ' is an island (no neighbors)') ('WARNING: ', 4381, ' is an island (no neighbors)') ('WARNING: ', 4382, ' is an island (no neighbors)') ('WARNING: ', 4383, ' is an island (no neighbors)') ('WARNING: ', 4384, ' is an island (no neighbors)') ('WARNING: ', 4385, ' is an island (no neighbors)') ('WARNING: ', 4386, ' is an island (no neighbors)') ('WARNING: ', 4474, ' is an island (no neighbors)') ('WARNING: ', 4542, ' is an island (no neighbors)') ('WARNING: ', 4567, ' is an island (no neighbors)') ('WARNING: ', 4629, ' is an island (no neighbors)') ('WARNING: ', 4637, ' is an island (no neighbors)') ('WARNING: ', 4703, ' is an island (no neighbors)') ('WARNING: ', 4856, ' is an island (no neighbors)') ('WARNING: ', 5061, ' is an island (no neighbors)') ('WARNING: ', 5062, ' is an island (no neighbors)')
Moran's I
- 전체연구지역의 공간적자기상관 관계를 하나의 값으로 보여주는 글로벌 지수(global index)이다.
- 공간적자기상관을 파악하기 위한 유용한 측정척도로, 인접해 있는 공간단위 (neighboring spatial units)들이 갖는 값(values)을 비교하여 이 계수를 산출하게 된다.
- 만일 인접한 공간단위들이 '전체 연구지역(entire study area)'에 걸쳐 유사한 값을 갖는 경우, Moran I 계수는 높은 '정적 공간상관'을 갖는 반면, 인접한 공간 단위들이 서로 상이한 값들을 갖게 되면 Moran I 계수는 높은 '부적 공간상관'을 갖게 된다.
y = df['A0']
np.random.seed(12345)
mi = esda.moran.Moran(y, wq)
mi.I
0.6087514418350501
- A positive z-value: data is spatially clustered in some way.
- A negative z-value: data is clustered in a competitive way. For example, high values may be repelling high values or negative values may be repelling negative values.
mi.p_sim
0.001
fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
df.plot(column='A0', scheme='Quantiles', k=5, cmap='GnBu', legend=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x2a362e202c8>
Anselin Local Morans' I (LISA)
- 연구지역내에서 발생할 수 있는 공간적자기상관의 국지적 변이(local variatons)를 고려한 시각적 지표이다
- LISA를 이용하면 한 변수의 공간적자기상관이 특정 지역에서 높게 나타나는 'Hot spot'을 찾을 수 있다.
- 국지적인 규모에서 공간자기상관 정도를 측정하기 위해서는, 각각의 공간단위(each areal unit)에서 공간자기상관 값이 계산되어야 하는데 여러 LISA중 가장 손쉽게 활용될 수 있는 것은 '국지 Moran (local Moran)'이다.
Moran Scatterplot
np.random.seed(12345)
wq.transform = 'r'
lag_price = lps.weights.lag_spatial(wq, df['A0'])
price = df['A0']
b, a = np.polyfit(price, lag_price, 1)
f, ax = plt.subplots(1, figsize=(9, 9))
plt.plot(price, lag_price, '.', color='firebrick')
# dashed vert at mean of the price
plt.vlines(price.mean(), lag_price.min(), lag_price.max(), linestyle='--')
# dashed horizontal at mean of lagged price
plt.hlines(lag_price.mean(), price.min(), price.max(), linestyle='--')
# red line of best fit using global I as slope
plt.plot(price, a + b*price, 'r')
plt.title('Moran Scatterplot')
plt.ylabel('Spatial Lag of Value')
plt.xlabel('Value')
plt.show()
- The upper-right quadrant and the lower-left quadrant correspond with positive spatial autocorrelation (similar values at neighboring locations).
- We refer to them as respectively high-high and low-low spatial autocorrelation.
- In contrast, the lower-right and upper-left quadrant correspond to negative spatial autocorrelation (dissimilar values at neighboring locations).
- We refer to them as respectively high-low and low-high spatial autocorrelation.
LISA
li = esda.moran.Moran_Local(y, wq)
li.q
array([3, 3, 3, ..., 3, 3, 3])
(li.p_sim < 0.05).sum()
1671
sig = li.p_sim < 0.05
hotspot = sig * li.q==1
coldspot = sig * li.q==3
doughnut = sig * li.q==2
diamond = sig * li.q==4
spots = ['n.sig.', 'hot spot']
labels = [spots[i] for i in hotspot*1]
df = df
from matplotlib import colors
hmap = colors.ListedColormap(['red', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
k=2, cmap=hmap, linewidth=0.1, ax=ax, \
edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()
spots = ['n.sig.', 'cold spot']
labels = [spots[i] for i in coldspot*1]
df = df
from matplotlib import colors
hmap = colors.ListedColormap(['blue', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
k=2, cmap=hmap, linewidth=0.1, ax=ax, \
edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()
spots = ['n.sig.', 'doughnut']
labels = [spots[i] for i in doughnut*1]
df = df
from matplotlib import colors
hmap = colors.ListedColormap(['lightblue', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
k=2, cmap=hmap, linewidth=0.1, ax=ax, \
edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()
spots = ['n.sig.', 'diamond']
labels = [spots[i] for i in diamond*1]
df = df
from matplotlib import colors
hmap = colors.ListedColormap(['pink', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
k=2, cmap=hmap, linewidth=0.1, ax=ax, \
edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()
sig = 1 * (li.p_sim < 0.05)
hotspot = 1 * (sig * li.q==1)
coldspot = 3 * (sig * li.q==3)
doughnut = 2 * (sig * li.q==2)
diamond = 4 * (sig * li.q==4)
spots = hotspot + coldspot + doughnut + diamond
spots
array([3, 0, 0, ..., 3, 0, 0])
spot_labels = [ '0 ns', '1 hot spot', '2 doughnut', '3 cold spot', '4 diamond']
labels = [spot_labels[i] for i in spots]
from matplotlib import colors
hmap = colors.ListedColormap([ 'lightgrey', 'red', 'lightblue', 'blue', 'pink'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
k=2, cmap=hmap, linewidth=0.1, ax=ax, \
edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()
Reference
공간자기상관성 관련 설명자료:
- 이경주, 황명화, 한선희, & 양은정. (2015). 공간통계 분석의 이해와 활용을 위한 첫걸음.
참고자료:
-
Andresen, M. A. (2011). Estimating the probability of local crime clusters: The impact of immediate spatial neighbors. Journal of Criminal Justice, 39(5), 394-404.
-
Suthanaya, P. A. (2011). Spatial Autocorrelation Analyses of the Commuting Preferences by Bus in the Sydney Metropolitan Region. Journal of Civil Engineering, 18(1), 71-80.
-
Truong, L. T., & Somenahalli, S. V. (2011). Using GIS to identify pedestrian-vehicle crash hot spots and unsafe bus stops. Journal of Public Transportation, 14(1), 6.
-
Tselios, V. (2008). Income and educational inequalities in the regions of the European Union: geographical spillovers under welfare state restrictions. Papers in Regional Science, 87(3), 403-430.
-
Yun, J. M., & Choi, D. J. (2015). Geographically weighted regression on the characteristics of land use and spatial patterns of floating population in seoul city. Journal of Korean Society for Geospatial Information System, 23(3), 77-84.
-
김현중, & 이성우. (2013). 범죄발생의 공간의존성 변화와 핫스팟 분포, 2001-2010. 주거환경, 11(2), 27-41.
-
이연수, 진창종, & 추상호. (2012). 공간계량분석을 이용한 대중교통 이용에 영향을 미치는 공간적 특성요인 분석에 관한 연구: 서울시 행정동을 중심으로. 서울도시연구, 13(4), 97-111.