์ต์ข ๊ฒฐ๊ณผ๋ฌผ
API ๋ฐ์ดํฐ ๋ถ๋ฌ์ค๊ธฐ
- ๊ณต๊ณต๋ฐ์ดํฐ ํฌํธ์ API๋ 1000ํ์ด์ง์ฉ ๋ฐ์ดํฐ๋ฅผ ๋ฐ๋ก ๊ฐ์ ธ์ฌ ์ ์๋ค.
- ๋ฐ๋ณต๋ฌธ์ ์ฌ์ฉํ์ฌ 1000ํ์ด์ง ๋จ์๋ก url์ ์์ ํ๋ฉฐ ๋ฐ์ดํฐ๋ฅผ ๊ฐ์ ธ์จ๋ค
- requests.get(url) ๋ก Json ๋ฐ์ดํฐ ๊ฐ์ ธ์ค๊ธฐ
- json ๋ฐ์ดํฐ์์ ํ์ํ ๋ถ๋ถ๋ง get
- data.get('๊ฐ์ ธ์ฌ ํค๊ฐ1', { }).get('๊ฐ์ ธ์ฌ ํค๊ฐ2', [ ])
- { } ์ [ ] ๋ ๋ฐ์ดํฐ ํํ์ ๋ฐ๋ผ ์ผ์นํ๋ ๊ฒ์ ๊ธฐ์
- get ํ ๊ฐ์ Item์ ์ ์ฅ
- ๋ฐ๋ณต๋ฌธ ์ข ๋ฃ๋ฅผ ์ํด, item ์ด ์์ผ๋ฉด break ๋๋๋ก ํจ
- item์ dataframe์ผ๋ก ๋ณํํ๊ณ , ๋ณํ๋ df๋ฅผ data_frames ๋ฆฌ์คํธ์ ์ ์ฅ
- ์ต์ข ์ ์ผ๋ก data_frames์ ๋ค์ด์๋ ๋ฐ์ดํฐํ๋ ์์ concat
- concatํ ์ต์ข ๋ฐ์ดํฐ ํ๋ ์์ return
base_url = "http://openapi.seoul.go.kr:8088/์ธ์ฆํค/json/tpssSubwayPassenger/"
items_per_page = 1000
total_pages = None
data_frames = []
for i in range(1, 1001): # Adjust the range according to your needs
start_page = (i - 1) * items_per_page + 1
end_page = i * items_per_page
url = f"{base_url}{start_page}/{end_page}/"
response = requests.get(url)
data = response.json()
# Check if the retrieved data is empty
items = data.get('tpssSubwayPassenger', {}).get('row', [])
if not items:
break
df = pd.DataFrame(items)
data_frames.append(df)
# Concatenate all DataFrames into a single DataFrame
combined_df = pd.concat(data_frames, ignore_index=True)
# Now you have a single DataFrame containing data from all pages
print(combined_df)
๊ฒฐ๊ณผ ํ์ธ
ํ์ ๋(ADMDONG_ID) ๋ฐ์ดํฐ ๋ณํฉํด์ ์ดํด๋ณด๊ธฐ
- ํ์ ๋ ๋ฐ์ดํฐ๋ฅผ ๊ฐ์ ธ์์, ์ด๋ ์ง์ญ์ ๋ฐ์ดํฐ์ธ์ง ์ดํด๋ด์ผํ๋ค
- ์์ธ์ ์๋ฉด๋ ๋ง์คํฐ ์ ๋ณด ๊ฐ์ ธ์ค๊ธฐ > combined_dong์ ์ ์ฅ
base_url = "http://openapi.seoul.go.kr:8088/57756f69527273303830644c4b4c6f/json/districtEmd/"
items_per_page = 1000
total_pages = None
data_frames = []
for i in range(1, 1001): # Adjust the range according to your needs
start_page = (i - 1) * items_per_page + 1
end_page = i * items_per_page
url = f"{base_url}{start_page}/{end_page}/"
response = requests.get(url)
data = response.json()
# Check if the retrieved data is empty
items = data.get('districtEmd', {}).get('row', [])
if not items:
break
df = pd.DataFrame(items)
data_frames.append(df)
# Concatenate all DataFrames into a single DataFrame
combined_dong = pd.concat(data_frames, ignore_index=True)
# Now you have a single DataFrame containing data from all pages
print(combined_dong)
ํ์ ๋ ๋ฐ์ดํฐ์ ์งํ์ฒ ์ด์ฉ์ ์ ๋ฐ์ดํฐ ํฉ์น๊ธฐ
- ํ์ ๋ ์์ด๋๋ฅผ ๊ธฐ์ค์ผ๋ก, ๋ฐ์ดํฐ๋ฅผ ์กฐ์ธ
- ์ด๋, ์งํ์ฒ ๋ฐ์ดํฐ๋ฅผ ๊ธฐ์ค์ผ๋ก left join ๋๋๋ก ํ๋ค
subway_dong = pd.merge(combined_df, combined_dong, on = 'ADMDONG_ID', how = 'left')
ํ์ ๋ ์์ด๋ ์ปฌ๋ผ ์ญ์ & ์๊ฐ ์ปฌ๋ผ๋ช ๋ณ๊ฒฝ
subway_dong.drop('ADMDONG_ID', axis = 1, inplace = True)
# ์ปฌ๋ผ๋ช
๋ณ๊ฒฝ
subway_dong = subway_dong.rename(columns = {'SBWY_PSGR_CNT_00HH' : '00์', 'SBWY_PSGR_CNT_01HH' : '01์', 'SBWY_PSGR_CNT_02HH' : '02์',
'SBWY_PSGR_CNT_03HH': '03์', 'SBWY_PSGR_CNT_04HH' :'04์', 'SBWY_PSGR_CNT_05HH':'05์',
'SBWY_PSGR_CNT_06HH' : '06์', 'SBWY_PSGR_CNT_07HH' : '07์', 'SBWY_PSGR_CNT_08HH' : '08์',
'SBWY_PSGR_CNT_09HH': '09์', 'SBWY_PSGR_CNT_10HH' :'10์', 'SBWY_PSGR_CNT_11HH':'11์', 'SBWY_PSGR_CNT_12HH':'12์',
'SBWY_PSGR_CNT_13HH' : '13์', 'SBWY_PSGR_CNT_14HH' : '14์', 'SBWY_PSGR_CNT_15HH' : '15์',
'SBWY_PSGR_CNT_16HH': '16์', 'SBWY_PSGR_CNT_17HH' :'17์', 'SBWY_PSGR_CNT_18HH':'18์',
'SBWY_PSGR_CNT_19HH' : '19์', 'SBWY_PSGR_CNT_20HH' : '20์', 'SBWY_PSGR_CNT_21HH' : '21์',
'SBWY_PSGR_CNT_22HH': '22์', 'SBWY_PSGR_CNT_23HH' :'23์', 'SBWY_PSGR_CNT_24HH':'24์'})
์์ธ ๋ฐ์ดํฐ์ด๋ฏ๋ก, SIDO_NM ์ปฌ๋ผ ์ญ์
# ์์ธ ๊ฐ๋ง ์๋์ง ๊ฐ๋จํ ์ฒดํฌ ํ,
subway_dong['SIDO_NM'].unique()
# ํด๋น ์ปฌ๋ผ ์ญ์
subway_dong.drop('SIDO_NM', axis =1, inplace = True)
๋ฐ์ดํฐํ์ ํ์ธ ํ, ๋ ์ง ์ปฌ๋ผ์ datetime์ผ๋ก
๐ ๋ ์ง์ ํด๋นํ๋ CRTR_DT : object(๋ฌธ์ํ)
subway_dong.info()
๐ CRTR_DT๋ฅผ datetime์ผ๋ก
subway_dong['CRTR_DT'] = pd.to_datetime(subway_dong['CRTR_DT'])
subway_dong.info()
๐ CRTR_DT์์ ์ฐ๋, ์, ์ผ ์ถ์ถํด๋ณด๊ธฐ
# ์ฐ์์ผ ์ถ์ถ
def extract_date_info(df, date_column_name):
df['์ฐ๋'] = df[date_column_name].dt.year
df['์'] = df[date_column_name].dt.month
df['์ผ'] = df[date_column_name].dt.day
return df
subway_gangnam = extract_date_info(subway_gangnam, 'CRTR_DT')
subway_gangnam.head(1)
# subway_gangnam.info()
๐ ๊ฐ๋จ๊ตฌ ๋ฐ์ดํฐ๋ง ์ถ์ถ
subway_gangnam = subway_dong[subway_dong['ATDRC_NM'] == '๊ฐ๋จ๊ตฌ']
subway_gangnam.head()
๐ ์ปฌ๋ผ ์์ ๋ณ๊ฒฝ(๋ณด๊ธฐ ์ข๊ฒ!)
cols_list = subway_gangnam.columns[-7:].tolist() + subway_gangnam.columns[:-7].tolist()
subway_gangnam = subway_gangnam[cols_list]
subway_gangnam.head()
์๊ฐํ
- 2023๋ , ๊ฐ๋จ๊ตฌ ๋ด ํ์ ๋๋ณ ์งํ์ฒ ์ฌ์ฉ์ ์ ๋ฐ์ดํฐ
grouped_df = subway_gangnam[subway_gangnam['์ฐ๋'] == 2023].groupby('ADMDONG_NM')
import plotly.graph_objects as go
fig = go.Figure()
for dong, df in grouped_df:
fig.add_trace(go.Scatter(x = df.columns[7:], y = df.iloc[0, 7:], name = dong))
fig.update_layout(title = '๊ฐ๋จ๊ตฌ ๋๋ณ ์๊ฐ๋น ์งํ์ฒ ์ด์ฉ์์',
xaxis_title = '์๊ฐ',
yaxis_title = '์งํ์ฒ ์ด์ฉ์์')
fig.update_xaxes(tickangle=45)
fig.show()
728x90
'Projects > ๐ช Convenience Store Location Analysis' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Mini Project] 4. ๋์ค๊ตํต(์งํ์ฒ ) ์์น ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ (0) | 2023.09.08 |
---|---|
[Mini Project] 3. ์๊ถ(๋ถ๊ธฐ๋ณ/์๊ถ๋ณ/์๊ฐ๋๋ณ ๋งค์ถ์ก) ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ (0) | 2023.09.08 |
[Mini Project] 2. ๋ฐ์ดํฐ ํ์๊ณผ ์ ์ฒ๋ฆฌ (0) | 2023.09.08 |
[Mini Project] 1. ์ฃผ์ ์ ์ ๊ณผ ๋ฐ์ดํฐ ์์ง (0) | 2023.09.08 |
[pandas ์์ฉ์ค์ต] ๊ฐ๋จ๊ตฌ ์ฃผ์ฐจ ํํฉ ์๊ฐํ (0) | 2023.08.28 |