공간자기상관성 분석

개요

공간데이터(Spatial data)들은 순수한 자신만의 정보를 가지고 있을 뿐만 아니라, 지리적(geographical space)에 관한 정보를 함께 포함하고 있다. 이러한 공간데이터를 분석하기 위하여 기존의 많은 선형모델들을 적용하여 해석하려고 했으나, ‘공간’이라는 요인을 고려하지 못하여 의미있는 분석결과를 도출하지 못하였다.
Doreian(1981)에 따르면 “변수들이 무작위적이고 오류항이 독립적이며 동일한 분포를 갖는 다는 가정을 하는 전통적인 선형분석 방법들로 공간준거 데이터(spatially-referenced data)를 분석할 경우 많은 사회경제현상, 인구현상 및 자연현상(natural phenomena)이 공간상에서 나타나는 특성인 의존성(spatial dependence) 및 상호작용(spatial interaction)을 통제하지 못한다.”고 하였다.
Tober의 지리의 제1법칙(the first law of geography) - Everything is realted everything else, but near things are more related than distant things. 에서와 같이 공간상의 객체들은 공간상에 무작위(random)하게 있지 않고 서로간에 영향을 주고받으며 존재한다고 할 수 있다.
이와 같이 지리적 공간상에서 공간객체간의 상호의존성과 상호작용을 공간적자기상관(spatial autocorrelation)이라고 할 수 있다.
Anselin and Bera(1998)는 “공간상에 분포하는 공간객체들은 위치의 유사성이 높아짐에 따라 객체들이 갖는 값의 유사성도 높아가는 현상”이라고 정의하기도 하였다.
공간적자기상관에는 '정적 공간자기상관(positive spatial autocorrelation)'과 '부적 공간자기상관(negative spatial autocorrelation)'이 있다. 정적 공간자기상관은 공간실체들이 서로 유사한 값을 갖으며 군집적으로 분포하는 경우이며, 반대로 부적 공간자기상관은 공간실체들이 서로 상이한 값들을 갖으며 군집적으로 분포하는 경우이다.

	A0	A1	A2	A3	geometry
0	295	11110140	삼청동	2019-02-21	POLYGON ((197597.735 454551.233, 197599.083 45...
1	296	11110101	청운동	2019-02-21	POLYGON ((196524.180 453809.271, 196541.748 45...
2	297	11110168	동숭동	2019-02-21	POLYGON ((200515.600 453698.981, 200521.372 45...
3	315	11110166	연건동	2019-02-21	POLYGON ((199711.309 453300.989, 199718.145 45...
4	316	11110105	창성동	2019-02-21	POLYGON ((197592.504 453280.724, 197592.907 45...

Similarity

wq = lps.weights.Queen.from_dataframe(df)
wq.transform = 'r'

('WARNING: ', 539, ' is an island (no neighbors)') ('WARNING: ', 663, ' is an island (no neighbors)') ('WARNING: ', 705, ' is an island (no neighbors)') ('WARNING: ', 718, ' is an island (no neighbors)') ('WARNING: ', 719, ' is an island (no neighbors)') ('WARNING: ', 725, ' is an island (no neighbors)') ('WARNING: ', 726, ' is an island (no neighbors)') ('WARNING: ', 735, ' is an island (no neighbors)') ('WARNING: ', 745, ' is an island (no neighbors)') ('WARNING: ', 746, ' is an island (no neighbors)') ('WARNING: ', 747, ' is an island (no neighbors)') ('WARNING: ', 748, ' is an island (no neighbors)') ('WARNING: ', 749, ' is an island (no neighbors)') ('WARNING: ', 919, ' is an island (no neighbors)') ('WARNING: ', 953, ' is an island (no neighbors)') ('WARNING: ', 1472, ' is an island (no neighbors)') ('WARNING: ', 1653, ' is an island (no neighbors)') ('WARNING: ', 1778, ' is an island (no neighbors)') ('WARNING: ', 2001, ' is an island (no neighbors)') ('WARNING: ', 2364, ' is an island (no neighbors)') ('WARNING: ', 2864, ' is an island (no neighbors)') ('WARNING: ', 3057, ' is an island (no neighbors)') ('WARNING: ', 3140, ' is an island (no neighbors)') ('WARNING: ', 3213, ' is an island (no neighbors)') ('WARNING: ', 3308, ' is an island (no neighbors)') ('WARNING: ', 3337, ' is an island (no neighbors)') ('WARNING: ', 3400, ' is an island (no neighbors)') ('WARNING: ', 3597, ' is an island (no neighbors)') ('WARNING: ', 3769, ' is an island (no neighbors)') ('WARNING: ', 4000, ' is an island (no neighbors)') ('WARNING: ', 4251, ' is an island (no neighbors)') ('WARNING: ', 4327, ' is an island (no neighbors)') ('WARNING: ', 4328, ' is an island (no neighbors)') ('WARNING: ', 4329, ' is an island (no neighbors)') ('WARNING: ', 4330, ' is an island (no neighbors)') ('WARNING: ', 4331, ' is an island (no neighbors)') ('WARNING: ', 4333, ' is an island (no neighbors)') ('WARNING: ', 4334, ' is an island (no neighbors)') ('WARNING: ', 4336, ' is an island (no neighbors)') ('WARNING: ', 4337, ' is an island (no neighbors)') ('WARNING: ', 4338, ' is an island (no neighbors)') ('WARNING: ', 4341, ' is an island (no neighbors)') ('WARNING: ', 4346, ' is an island (no neighbors)') ('WARNING: ', 4347, ' is an island (no neighbors)') ('WARNING: ', 4362, ' is an island (no neighbors)') ('WARNING: ', 4372, ' is an island (no neighbors)') ('WARNING: ', 4373, ' is an island (no neighbors)') ('WARNING: ', 4379, ' is an island (no neighbors)') ('WARNING: ', 4380, ' is an island (no neighbors)') ('WARNING: ', 4381, ' is an island (no neighbors)') ('WARNING: ', 4382, ' is an island (no neighbors)') ('WARNING: ', 4383, ' is an island (no neighbors)') ('WARNING: ', 4384, ' is an island (no neighbors)') ('WARNING: ', 4385, ' is an island (no neighbors)') ('WARNING: ', 4386, ' is an island (no neighbors)') ('WARNING: ', 4474, ' is an island (no neighbors)') ('WARNING: ', 4542, ' is an island (no neighbors)') ('WARNING: ', 4567, ' is an island (no neighbors)') ('WARNING: ', 4629, ' is an island (no neighbors)') ('WARNING: ', 4637, ' is an island (no neighbors)') ('WARNING: ', 4703, ' is an island (no neighbors)') ('WARNING: ', 4856, ' is an island (no neighbors)') ('WARNING: ', 5061, ' is an island (no neighbors)') ('WARNING: ', 5062, ' is an island (no neighbors)')

Moran's I

전체연구지역의 공간적자기상관 관계를 하나의 값으로 보여주는 글로벌 지수(global index)이다.
공간적자기상관을 파악하기 위한 유용한 측정척도로, 인접해 있는 공간단위 (neighboring spatial units)들이 갖는 값(values)을 비교하여 이 계수를 산출하게 된다.
만일 인접한 공간단위들이 '전체 연구지역(entire study area)'에 걸쳐 유사한 값을 갖는 경우, Moran I 계수는 높은 '정적 공간상관'을 갖는 반면, 인접한 공간 단위들이 서로 상이한 값들을 갖게 되면 Moran I 계수는 높은 '부적 공간상관'을 갖게 된다.

y = df['A0']

np.random.seed(12345)
mi = esda.moran.Moran(y, wq)
mi.I

0.6087514418350501

A positive z-value: data is spatially clustered in some way.
A negative z-value: data is clustered in a competitive way. For example, high values may be repelling high values or negative values may be repelling negative values.

mi.p_sim

0.001

fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
df.plot(column='A0', scheme='Quantiles', k=5, cmap='GnBu', legend=True, ax=ax)

<matplotlib.axes._subplots.AxesSubplot at 0x2a362e202c8>

1-2

Anselin Local Morans' I (LISA)

연구지역내에서 발생할 수 있는 공간적자기상관의 국지적 변이(local variatons)를 고려한 시각적 지표이다
LISA를 이용하면 한 변수의 공간적자기상관이 특정 지역에서 높게 나타나는 'Hot spot'을 찾을 수 있다.
국지적인 규모에서 공간자기상관 정도를 측정하기 위해서는, 각각의 공간단위(each areal unit)에서 공간자기상관 값이 계산되어야 하는데 여러 LISA중 가장 손쉽게 활용될 수 있는 것은 '국지 Moran (local Moran)'이다.

Moran Scatterplot

np.random.seed(12345)

wq.transform = 'r'
lag_price = lps.weights.lag_spatial(wq, df['A0'])

price = df['A0']
b, a = np.polyfit(price, lag_price, 1)
f, ax = plt.subplots(1, figsize=(9, 9))

plt.plot(price, lag_price, '.', color='firebrick')

# dashed vert at mean of the price

plt.vlines(price.mean(), lag_price.min(), lag_price.max(), linestyle='--')

# dashed horizontal at mean of lagged price

plt.hlines(lag_price.mean(), price.min(), price.max(), linestyle='--')

# red line of best fit using global I as slope

plt.plot(price, a + b*price, 'r')
plt.title('Moran Scatterplot')
plt.ylabel('Spatial Lag of Value')
plt.xlabel('Value')
plt.show()

1-3

The upper-right quadrant and the lower-left quadrant correspond with positive spatial autocorrelation (similar values at neighboring locations).
We refer to them as respectively high-high and low-low spatial autocorrelation.
In contrast, the lower-right and upper-left quadrant correspond to negative spatial autocorrelation (dissimilar values at neighboring locations).
We refer to them as respectively high-low and low-high spatial autocorrelation.

LISA

li = esda.moran.Moran_Local(y, wq)

li.q

array([3, 3, 3, ..., 3, 3, 3])

(li.p_sim < 0.05).sum()

1671

sig = li.p_sim < 0.05
hotspot = sig * li.q==1
coldspot = sig * li.q==3
doughnut = sig * li.q==2
diamond = sig * li.q==4

spots = ['n.sig.', 'hot spot']
labels = [spots[i] for i in hotspot*1]

df = df
from matplotlib import colors
hmap = colors.ListedColormap(['red', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
 k=2, cmap=hmap, linewidth=0.1, ax=ax, \
 edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()

1-4

spots = ['n.sig.', 'cold spot']
labels = [spots[i] for i in coldspot*1]

df = df
from matplotlib import colors
hmap = colors.ListedColormap(['blue', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
 k=2, cmap=hmap, linewidth=0.1, ax=ax, \
 edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()

1-5

spots = ['n.sig.', 'doughnut']
labels = [spots[i] for i in doughnut*1]

df = df
from matplotlib import colors
hmap = colors.ListedColormap(['lightblue', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
 k=2, cmap=hmap, linewidth=0.1, ax=ax, \
 edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()

1-6

spots = ['n.sig.', 'diamond']
labels = [spots[i] for i in diamond*1]

df = df
from matplotlib import colors
hmap = colors.ListedColormap(['pink', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
 k=2, cmap=hmap, linewidth=0.1, ax=ax, \
 edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()

1-7

sig = 1 * (li.p_sim < 0.05)
hotspot = 1 * (sig * li.q==1)
coldspot = 3 * (sig * li.q==3)
doughnut = 2 * (sig * li.q==2)
diamond = 4 * (sig * li.q==4)
spots = hotspot + coldspot + doughnut + diamond
spots

array([3, 0, 0, ..., 3, 0, 0])

spot_labels = [ '0 ns', '1 hot spot', '2 doughnut', '3 cold spot', '4 diamond']
labels = [spot_labels[i] for i in spots]

from matplotlib import colors
hmap = colors.ListedColormap([ 'lightgrey', 'red', 'lightblue', 'blue', 'pink'])
f, ax = plt.subplots(1, figsize=(9, 9))
df.assign(cl=labels).plot(column='cl', categorical=True, \
 k=2, cmap=hmap, linewidth=0.1, ax=ax, \
 edgecolor='white', legend=True)
ax.set_axis_off()
plt.show()

1-8

Reference

공간자기상관성 관련 설명자료:

이경주, 황명화, 한선희, & 양은정. (2015). 공간통계 분석의 이해와 활용을 위한 첫걸음.

참고자료:

Andresen, M. A. (2011). Estimating the probability of local crime clusters: The impact of immediate spatial neighbors. Journal of Criminal Justice, 39(5), 394-404.
Suthanaya, P. A. (2011). Spatial Autocorrelation Analyses of the Commuting Preferences by Bus in the Sydney Metropolitan Region. Journal of Civil Engineering, 18(1), 71-80.
Truong, L. T., & Somenahalli, S. V. (2011). Using GIS to identify pedestrian-vehicle crash hot spots and unsafe bus stops. Journal of Public Transportation, 14(1), 6.
Tselios, V. (2008). Income and educational inequalities in the regions of the European Union: geographical spillovers under welfare state restrictions. Papers in Regional Science, 87(3), 403-430.
Yun, J. M., & Choi, D. J. (2015). Geographically weighted regression on the characteristics of land use and spatial patterns of floating population in seoul city. Journal of Korean Society for Geospatial Information System, 23(3), 77-84.
김현중, & 이성우. (2013). 범죄발생의 공간의존성 변화와 핫스팟 분포, 2001-2010. 주거환경, 11(2), 27-41.
이연수, 진창종, & 추상호. (2012). 공간계량분석을 이용한 대중교통 이용에 영향을 미치는 공간적 특성요인 분석에 관한 연구: 서울시 행정동을 중심으로. 서울도시연구, 13(4), 97-111.

국가교통 데이터 오픈마켓 메뉴얼