๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Projects/๐Ÿช Convenience Store Location Analysis

[Mini Project] 10. ๋งค์ถœ์˜ ๋ถ„ํฌ ํ™•์ธ (+ ์ด์ƒ์น˜ ์ œ๊ฑฐ ํ›„ ๋ชจ๋ธ ์„ฑ๋Šฅ ์ฒดํฌ)

by ISLA! 2023. 9. 18.

 

๐ŸŒฟ ๋งค์ถœ ๋ถ„ํฌ ํ™•์ธ

  • ๋ชจ๋ธ๋ง ์ง„ํ–‰ ์ค‘, ์ข…์†๋ณ€์ˆ˜์ธ ๋งค์ถœ์˜ ๋ถ„ํฌ๋ฅผ ์ฒดํฌํ•ด๋ณด์•˜๋‹ค.
  • ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ด ๋ณ€์ˆ˜์˜ ๋ถ„ํฌ์— ๋”ฐ๋ผ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ RMSE๋กœ ํ• ์ง€, MAE๋กœ ํ• ์ง€๊ฐ€ ์ •ํ•ด์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
  • ๋งŒ์•ฝ ์™œ๋„๊ฐ€ ๋†’์•„ ๋ถ„ํฌ๊ฐ€ ๊ณ ๋ฅด์ง€ ์•Š๋‹ค๋ฉด, RMSE๋ฅผ ์จ์•ผํ•˜๊ณ  ๊ทธ ๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ๋Š” MAE๋ฅผ ์“ฐ๋Š” ๊ฒƒ์ด ๋ณด๋‹ค ์ง๊ด€์ ์ธ ๊ฒฐ๊ณผํ•ด์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 

โœ” ๊ณจ๋ชฉ์ƒ๊ถŒ ๋งค์ถœ ๋ถ„ํฌ

  • ์™œ๋„ ๊ฐ’๊นŒ์ง€ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์—ฌ ๊ณจ๋ชฉ์ƒ๊ถŒ์˜ ๋งค์ถœ ๋ถ„ํฌ๋ฅผ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ ๊ทธ๋ ค๋ณด์•˜๋‹ค.
fig, ax = plt.subplots(1, 1, figsize=(18,10))
g = sns.histplot(df_gol['๋งค์ถœ'], color='b', label='Skewness : {:.2f}'.format(df_gol['๋งค์ถœ'].skew()), ax=ax)
g.legend(loc='best', prop={'size': 16})
g.set_xlabel("๊ณจ๋ชฉ์ƒ๊ถŒ ๋งค์ถœ๋ถ„ํฌ(์ด์ƒ์น˜์ œ๊ฑฐ ์ „)", fontsize = 16)
g.set_ylabel("Count", fontsize = 16)

plt.show()

 

# ๋ฐ•์Šค ํ”Œ๋ž
fig, ax = plt.subplots(figsize = (18,10))

g = sns.boxplot(df_gol['๋งค์ถœ'], color='b', ax=ax)
g.set_xlabel("๊ณจ๋ชฉ์ƒ๊ถŒ ๋งค์ถœ๋ถ„ํฌ(์ด์ƒ์น˜์ œ๊ฑฐ ์ „)", fontsize = 16)
g.set_ylabel("Count", fontsize = 16)
plt.show()

 

  • ์—ฌ๊ธฐ๊นŒ์ง€ ์ง„ํ–‰ํ•œ ๊ฒฐ๊ณผ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜๋ฌธ์ด ๋“ค์—ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ ์ด์ƒ์น˜ ์ œ๊ฑฐ๋ฅผ ๊ฐ„๋‹จํžˆ ์ง„ํ–‰ํ•ด๋ณด๊ณ  ๋ชจ๋ธ๋ง์„ ์‹œ๋„ํ•ด๋ณด๊ธฐ๋กœ ํ–ˆ๋‹ค.

 

์ด์ƒ์น˜๋ฅผ ์ œ๊ฑฐํ•˜๋ฉด ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ์–ผ๋งˆ๋‚˜ ์ข‹์•„์งˆ๊นŒ?

โœ” ๊ณจ๋ชฉ์ƒ๊ถŒ ์ด์ƒ์น˜ ์ œ๊ฑฐ(์‚ฌ๋ถ„์œ„์ˆ˜ ์ด์šฉ)

  • ์‚ฌ๋ถ„์œ„์ˆ˜๋ฅผ ํŒŒ์•…ํ•ด, ์ˆ˜์—ผ ๋ฐ”๊นฅ์œผ๋กœ ๋ฒ—์–ด๋‚œ ๊ฐ’์„ ์ด์ƒ์น˜๋กœ ํŒ๋‹จํ•˜๊ณ  ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์„ ์‹œ๋„ํ•ด๋ณด์•˜๋‹ค.
outlier_ind = []

Q1 = np.percentile(df_gol['๋งค์ถœ'], 25)
Q3 = np.percentile(df_gol['๋งค์ถœ'], 75)

IQR = Q3 - Q1
outlier_ind = df_gol[(df_gol['๋งค์ถœ'] < Q1 - 1.5*IQR) | (df_gol['๋งค์ถœ'] > Q3 + 1.5*IQR)].index

upper = Q3 + 1.5*IQR
lower = Q1 - 1.5*IQR

print('upper: ', upper)
print('lower: ', lower)

 

  • ์ด์ƒ์น˜ ์ œ๊ฑฐ ํ›„, ๊ณจ๋ชฉ์ƒ๊ถŒ ๋งค์ถœ ๋ถ„ํฌ ->> ์™œ๋„๋Š” ํ™•์‹คํžˆ ๋‚˜์•„์ง„ ๊ฒƒ์ด ๋ณด์ธ๋‹ค.

 

 

  • ๋ฐ•์Šค ํ”Œ๋ž ->> ๋งค์ถœ ์ตœ๋Œ“๊ฐ’์ด ๋งŽ์ด ๋‚ฎ์•„์ ธ ์ด์ƒ์น˜๋ฅผ ๋‹จ์ˆœ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์€ ์œ„ํ—˜์ด ์žˆ์„ ๊ฒƒ์ด๋ผ ํŒ๋‹จํ–ˆ๋‹ค.
  • ์ด ์ƒํ™ฉ์—์„œ์˜ ๋ชจ๋ธ๋ง ๊ฒฐ๊ณผ๊ฐ€ ๋‹จ์ˆœํžˆ ๊ถ๊ธˆํ•˜์—ฌ ์ผ๋‹จ ๋ชจ๋ธ๋ง์„ ์‹œ๋„ํ–ˆ๋‹ค.

 

 

๐ŸŒฟ ์ด์ƒ์น˜ ์ œ๊ฑฐ ํ›„, ๋ชจ๋ธ๋ง ๊ฒฐ๊ณผ

  • ์ปฌ๋Ÿผ์€ 13๊ฐœ(ํŒŒ์ƒ๋ณ€์ˆ˜ ํƒ๊ตฌ ์ดํ›„) ํฌํ•จ
  • k = 10์œผ๋กœ ์ƒํ–ฅ ์กฐ์ •
ํ‰๊ท  RMSE: 26238.05105698613
ํ‰๊ท  MAE: 18370.828031660763

 

๐Ÿ‘‰ RMSE๊ฐ€ ๋ฐ˜ ์ •๋„๋กœ ๋‚ฎ์•„์กŒ์ง€๋งŒ ์œ„์—์„œ ๊ณ ๋ฏผํ•œ๋Œ€๋กœ, ์ œ๊ฑฐ๋œ ์ด์ƒ์น˜์— ์†ํ•˜๋Š” ์ƒ๊ถŒ์ฝ”๋“œ์™€ ๋งค์ถœ ๊ฐ’์„ ์ข€ ๋” ๊ตฌ์ฒด์ ์œผ๋กœ ํ™•์ธํ•  ํ•„์š”๋ฅผ ๋Š๊ผˆ๋‹ค.


๐ŸŒฟ ์ œ๊ฑฐ๋œ ์ด์ƒ์น˜ ๋ถ„์„

  • ์ œ๊ฑฐ๋œ ์ด์ƒ์น˜ ๋ฐ์ดํ„ฐ๋งŒ ๋ฝ‘์•„์„œ csv ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๊ณ  ๊ฐ„๋‹จํ•œ EDA์™€ ์‹œ๊ฐํ™”๋ฅผ ์ง„ํ–‰ํ–ˆ๋‹ค.
  • ์•„๋ž˜์™€ ๊ฐ™์ด ์‚ญ์ œ๋œ ์ƒ๊ถŒ์ฝ”๋“œ 11๊ฐœ์— ๋Œ€ํ•œ ๋งค์ถœ ํ‰๊ท  / ์‚ญ์ œ๋œ ํ–‰์˜ ๊ฐœ์ˆ˜๋ฅผ ์ •๋ฆฌํ–ˆ๋‹ค.
# ์ฐธ๊ณ ์šฉ ์ฝ”๋“œ ์ผ๋ถ€(๊ทธ๋ฃน๋ฐ”์ด ์‚ฌ์šฉ ์ฒดํฌ)
merged = merged.groupby('์ƒ๊ถŒ_์ฝ”๋“œ').agg({'๋งค์ถœ': 'mean', 'count': 'max'}).reset_index()

 

 

  • ์ด๋ฅผ ์‹œ๊ฐํ™”ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค
fig = go.Figure()

# count ์ตœ๋Œ“๊ฐ’์„ ์„  ๊ทธ๋ž˜ํ”„๋กœ ์ถ”๊ฐ€
fig.add_trace(go.Scatter(x=merged['์ƒ๊ถŒ_์ฝ”๋“œ'], y=merged['count'], mode='lines+markers', name='์ œ๊ฑฐ๋œ ์ƒ๊ถŒ ๊ฐœ์ˆ˜', line=dict(color='red'), yaxis='y2'))

# ๊ทธ๋ž˜ํ”„ ๋ ˆ์ด์•„์›ƒ ์„ค์ •
fig.update_layout(title='์ƒ๊ถŒ์ฝ”๋“œ๋ณ„ ์‚ญ์ œ๋œ ๊ฐœ์ˆ˜',
                  xaxis_title='์ƒ๊ถŒ ์ฝ”๋“œ',
                  yaxis_title='๊ฐ’',
                  height=600)

# ๊ทธ๋ž˜ํ”„ ํ‘œ์‹œ
fig.show()

 

 

fig = go.Figure()

# ๋งค์ถœ ํ‰๊ท ์„ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„๋กœ ์ถ”๊ฐ€
fig.add_trace(go.Bar(x=merged['์ƒ๊ถŒ_์ฝ”๋“œ'], y=merged['๋งค์ถœ'], name='์ œ๊ฑฐ๋œ ์ƒ๊ถŒ์˜ ํ‰๊ท ๋งค์ถœ', marker_color='blue', yaxis='y'))

# ๊ทธ๋ž˜ํ”„ ๋ ˆ์ด์•„์›ƒ ์„ค์ •
fig.update_layout(title='์ƒ๊ถŒ์ฝ”๋“œ๋ณ„ ๋งค์ถœ ํ‰๊ท ',
                  xaxis_title='์ƒ๊ถŒ ์ฝ”๋“œ',
                  yaxis_title='๊ฐ’',
                  height=600)

# ๊ทธ๋ž˜ํ”„ ํ‘œ์‹œ
fig.show()

 

 

 

๐ŸŒฟ ์ด์ƒ์น˜ ์ œ๊ฑฐ ๊ธฐ์ค€ ์ •ํ•˜๊ธฐ

  • ๋†’์€ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ ์ด์ƒ์น˜ ์ œ๊ฑฐ์—์„œ ๋ชจ๋‘ ์‚ญ์ œํ•ด๋ฒ„๋ ธ๊ธฐ ๋•Œ๋ฌธ์—, ์‚ฌ๋ถ„์œ„์ˆ˜๋ฅผ ๋‹จ์ˆœ ํ™œ์šฉํ•œ ์ด์ƒ์น˜ ์ œ๊ฑฐ๋Š” ๋ณธ ํ”„๋กœ์ ํŠธ์— ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๋‹ค.
  • ๋”ฐ๋ผ์„œ ๋‹ค์‹œ ํ•œ๋ฒˆ ์ „์ฒด ๋งค์ถœ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•˜๊ณ , ์˜คํžˆ๋ ค 0๊ณผ ๊ฐ™์ด ๋„ˆ๋ฌด ๋‚ฎ์€ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋งค์ถœ ๋ฐ์ดํ„ฐ์˜ ํ–‰์„ ์‚ญ์ œํ•˜๋Š” ๋ฐฉํ–ฅ์„ ์‚ญ์ œํ•˜๊ธฐ๋กœ ํ–ˆ๋‹ค.
  • ์ด์— ๋Œ€ํ•œ ํŒ๋‹จ ๋ฐ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ ์ •๋ฆฌํ•˜๊ธฐ๋กœ ํ•œ๋‹ค.

728x90