๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Projects/๐Ÿช Convenience Store Location Analysis

[pandas ์‘์šฉ์‹ค์Šต] ํ–‰์ •๋™๋ณ„ ์ง€ํ•˜์ฒ  ์ด ์Šน์ฐจ ์Šน๊ฐ์ˆ˜ / ์‹œ๊ฐ„๋ณ„ ์‹œ๊ฐํ™”

by ISLA! 2023. 8. 28.

 

์ตœ์ข… ๊ฒฐ๊ณผ๋ฌผ

 

API ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

  • ๊ณต๊ณต๋ฐ์ดํ„ฐ ํฌํ„ธ์˜ API๋Š” 1000ํŽ˜์ด์ง€์”ฉ ๋ฐ์ดํ„ฐ๋ฅผ ๋”ฐ๋กœ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
  • ๋ฐ˜๋ณต๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ 1000ํŽ˜์ด์ง€ ๋‹จ์œ„๋กœ url์„ ์ˆ˜์ •ํ•˜๋ฉฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค
  • requests.get(url) ๋กœ Json ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ
  • json ๋ฐ์ดํ„ฐ์—์„œ ํ•„์š”ํ•œ ๋ถ€๋ถ„๋งŒ get 
    • data.get('๊ฐ€์ ธ์˜ฌ ํ‚ค๊ฐ’1', { }).get('๊ฐ€์ ธ์˜ฌ ํ‚ค๊ฐ’2', [ ])
    • { } ์™€ [ ] ๋Š” ๋ฐ์ดํ„ฐ ํ˜•ํƒœ์— ๋”ฐ๋ผ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ์ž…
    • get ํ•œ ๊ฐ’์„ Item์— ์ €์žฅ
  • ๋ฐ˜๋ณต๋ฌธ ์ข…๋ฃŒ๋ฅผ ์œ„ํ•ด, item ์ด ์—†์œผ๋ฉด break ๋˜๋„๋ก ํ•จ
  • item์„ dataframe์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ๋ณ€ํ™˜๋œ df๋ฅผ data_frames ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ
  • ์ตœ์ข…์ ์œผ๋กœ data_frames์— ๋“ค์–ด์žˆ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ concat 
  • concatํ•œ ์ตœ์ข… ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ return
base_url = "http://openapi.seoul.go.kr:8088/์ธ์ฆํ‚ค/json/tpssSubwayPassenger/"
items_per_page = 1000
total_pages = None
data_frames = []

for i in range(1, 1001):  # Adjust the range according to your needs
    start_page = (i - 1) * items_per_page + 1
    end_page = i * items_per_page
    url = f"{base_url}{start_page}/{end_page}/"
    
    response = requests.get(url)
    data = response.json()

    # Check if the retrieved data is empty
    items = data.get('tpssSubwayPassenger', {}).get('row', [])
    if not items:
        break
    
    df = pd.DataFrame(items)
    data_frames.append(df)

# Concatenate all DataFrames into a single DataFrame
combined_df = pd.concat(data_frames, ignore_index=True)

# Now you have a single DataFrame containing data from all pages
print(combined_df)

๊ฒฐ๊ณผ ํ™•์ธ

๊ฒฐ๊ณผ

 

ํ–‰์ •๋™(ADMDONG_ID) ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉํ•ด์„œ ์‚ดํŽด๋ณด๊ธฐ

  • ํ–‰์ •๋™ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์„œ, ์–ด๋Š ์ง€์—ญ์˜ ๋ฐ์ดํ„ฐ์ธ์ง€ ์‚ดํŽด๋ด์•ผํ•œ๋‹ค
  • ์„œ์šธ์‹œ ์๋ฉด๋™ ๋งˆ์Šคํ„ฐ ์ •๋ณด ๊ฐ€์ ธ์˜ค๊ธฐ > combined_dong์— ์ €์žฅ
base_url = "http://openapi.seoul.go.kr:8088/57756f69527273303830644c4b4c6f/json/districtEmd/"
items_per_page = 1000
total_pages = None
data_frames = []

for i in range(1, 1001):  # Adjust the range according to your needs
    start_page = (i - 1) * items_per_page + 1
    end_page = i * items_per_page
    url = f"{base_url}{start_page}/{end_page}/"
    
    response = requests.get(url)
    data = response.json()

    # Check if the retrieved data is empty
    items = data.get('districtEmd', {}).get('row', [])
    if not items:
        break
    
    df = pd.DataFrame(items)
    data_frames.append(df)

# Concatenate all DataFrames into a single DataFrame
combined_dong = pd.concat(data_frames, ignore_index=True)

# Now you have a single DataFrame containing data from all pages
print(combined_dong)

 

ํ–‰์ •๋™ ๋ฐ์ดํ„ฐ์™€ ์ง€ํ•˜์ฒ  ์ด์šฉ์ž ์ˆ˜ ๋ฐ์ดํ„ฐ ํ•ฉ์น˜๊ธฐ

  • ํ–‰์ •๋™ ์•„์ด๋””๋ฅผ ๊ธฐ์ค€์œผ๋กœ, ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ์ธ
  • ์ด๋•Œ, ์ง€ํ•˜์ฒ  ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ left join ๋˜๋„๋ก ํ•œ๋‹ค
subway_dong = pd.merge(combined_df, combined_dong, on = 'ADMDONG_ID', how = 'left')

๊ฒฐ๊ณผ ํ™•์ธ(๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋์ชฝ)

 

ํ–‰์ •๋™ ์•„์ด๋”” ์ปฌ๋Ÿผ ์‚ญ์ œ & ์‹œ๊ฐ„ ์ปฌ๋Ÿผ๋ช… ๋ณ€๊ฒฝ

subway_dong.drop('ADMDONG_ID', axis = 1, inplace = True)

# ์ปฌ๋Ÿผ๋ช… ๋ณ€๊ฒฝ
subway_dong = subway_dong.rename(columns = {'SBWY_PSGR_CNT_00HH' : '00์‹œ', 'SBWY_PSGR_CNT_01HH' : '01์‹œ', 'SBWY_PSGR_CNT_02HH' : '02์‹œ',
                                           'SBWY_PSGR_CNT_03HH': '03์‹œ', 'SBWY_PSGR_CNT_04HH' :'04์‹œ', 'SBWY_PSGR_CNT_05HH':'05์‹œ',
                                           'SBWY_PSGR_CNT_06HH' : '06์‹œ', 'SBWY_PSGR_CNT_07HH' : '07์‹œ', 'SBWY_PSGR_CNT_08HH' : '08์‹œ',
                                           'SBWY_PSGR_CNT_09HH': '09์‹œ', 'SBWY_PSGR_CNT_10HH' :'10์‹œ', 'SBWY_PSGR_CNT_11HH':'11์‹œ', 'SBWY_PSGR_CNT_12HH':'12์‹œ',
                                           'SBWY_PSGR_CNT_13HH' : '13์‹œ', 'SBWY_PSGR_CNT_14HH' : '14์‹œ', 'SBWY_PSGR_CNT_15HH' : '15์‹œ',
                                           'SBWY_PSGR_CNT_16HH': '16์‹œ', 'SBWY_PSGR_CNT_17HH' :'17์‹œ', 'SBWY_PSGR_CNT_18HH':'18์‹œ',
                                           'SBWY_PSGR_CNT_19HH' : '19์‹œ', 'SBWY_PSGR_CNT_20HH' : '20์‹œ', 'SBWY_PSGR_CNT_21HH' : '21์‹œ',
                                           'SBWY_PSGR_CNT_22HH': '22์‹œ', 'SBWY_PSGR_CNT_23HH' :'23์‹œ', 'SBWY_PSGR_CNT_24HH':'24์‹œ'})

 

์„œ์šธ ๋ฐ์ดํ„ฐ์ด๋ฏ€๋กœ, SIDO_NM ์ปฌ๋Ÿผ ์‚ญ์ œ

# ์„œ์šธ ๊ฐ’๋งŒ ์žˆ๋Š”์ง€ ๊ฐ„๋‹จํžˆ ์ฒดํฌ ํ›„, 
subway_dong['SIDO_NM'].unique()

# ํ•ด๋‹น ์ปฌ๋Ÿผ ์‚ญ์ œ
subway_dong.drop('SIDO_NM', axis =1, inplace = True)

 

๋ฐ์ดํ„ฐํƒ€์ž… ํ™•์ธ ํ›„, ๋‚ ์งœ ์ปฌ๋Ÿผ์„ datetime์œผ๋กœ

๐Ÿ‘‰ ๋‚ ์งœ์— ํ•ด๋‹นํ•˜๋Š” CRTR_DT : object(๋ฌธ์žํ˜•)

subway_dong.info()

๐Ÿ‘‰ CRTR_DT๋ฅผ datetime์œผ๋กœ

subway_dong['CRTR_DT'] = pd.to_datetime(subway_dong['CRTR_DT'])
subway_dong.info()

 

๐Ÿ‘‰ CRTR_DT์—์„œ ์—ฐ๋„, ์›”, ์ผ ์ถ”์ถœํ•ด๋ณด๊ธฐ

# ์—ฐ์›”์ผ ์ถ”์ถœ
def extract_date_info(df, date_column_name):
    df['์—ฐ๋„'] = df[date_column_name].dt.year
    df['์›”'] = df[date_column_name].dt.month
    df['์ผ'] = df[date_column_name].dt.day
    return df

subway_gangnam = extract_date_info(subway_gangnam, 'CRTR_DT')
subway_gangnam.head(1)
# subway_gangnam.info()

๐Ÿ‘‰ ๊ฐ•๋‚จ๊ตฌ ๋ฐ์ดํ„ฐ๋งŒ ์ถ”์ถœ

subway_gangnam = subway_dong[subway_dong['ATDRC_NM'] == '๊ฐ•๋‚จ๊ตฌ']
subway_gangnam.head()

๐Ÿ‘‰ ์ปฌ๋Ÿผ ์ˆœ์„œ ๋ณ€๊ฒฝ(๋ณด๊ธฐ ์ข‹๊ฒŒ!)

cols_list = subway_gangnam.columns[-7:].tolist() + subway_gangnam.columns[:-7].tolist()
subway_gangnam = subway_gangnam[cols_list]

subway_gangnam.head()

 

์‹œ๊ฐํ™”

  • 2023๋…„, ๊ฐ•๋‚จ๊ตฌ ๋‚ด ํ–‰์ •๋™๋ณ„ ์ง€ํ•˜์ฒ  ์‚ฌ์šฉ์ž ์ˆ˜ ๋ฐ์ดํ„ฐ
grouped_df = subway_gangnam[subway_gangnam['์—ฐ๋„'] == 2023].groupby('ADMDONG_NM')

import plotly.graph_objects as go
fig = go.Figure()

for dong, df in grouped_df:
    fig.add_trace(go.Scatter(x = df.columns[7:], y = df.iloc[0, 7:], name = dong))
    
fig.update_layout(title = '๊ฐ•๋‚จ๊ตฌ ๋™๋ณ„ ์‹œ๊ฐ„๋‹น ์ง€ํ•˜์ฒ  ์ด์šฉ์ž์ˆ˜',
                  xaxis_title = '์‹œ๊ฐ„',
                  yaxis_title = '์ง€ํ•˜์ฒ  ์ด์šฉ์ž์ˆ˜')
fig.update_xaxes(tickangle=45)
    
fig.show()

728x90