λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°
Projects/πŸͺ Convenience Store Location Analysis

[Mini Project] 5. λŒ€μ€‘κ΅ν†΅(μ§€ν•˜μ² , λ²„μŠ€) μœ„μΉ˜ 데이터 병합

by ISLA! 2023. 9. 11.

πŸ₯‘ 데이터 μ…‹ 뢈러였기

  • μ§€ν•˜μ² μ—­ μœ„λ„ 경도
  • λ²„μŠ€μ •λ₯˜μž₯ μœ„λ„ 경도
  • μƒκΆŒμ½”λ“œ μœ„λ„ 경도
import pandas as pd

market_area = pd.read_csv('./μƒκΆŒ_α„Žα…©α†Όα„’α…‘α†Έα„‡α…©α†«(Center_Points).csv')
subway_stations = pd.read_csv('./μ΅œμ’…_μ§€ν•˜μ² _3κ°œλ…„_μŠΉν•˜μ°¨.csv')
bus_stations = pd.read_csv('./α„Žα…¬α„Œα…©α†Ό_ᄇα…₯스데아ᄐα…₯+ᄇα…₯α„‰α…³α„Œα…₯α†Όα„…α…²α„Œα…‘α†Ό_위경도_α„€α…‘α†Ή(Bus_Points).csv')

 

 

➑ market_area(μƒκΆŒ) 데이터 ν”„λ ˆμž„ 각각, μœ„λ„ 경도 κ°’ μΆ”μΆœ

# λ¬Έμžμ—΄μ„ μ‰Όν‘œλ₯Ό κΈ°μ€€μœΌλ‘œ λΆ„ν• ν•˜μ—¬ '경도'와 'μœ„λ„' 컬럼 생성
market_area[['경도', 'μœ„λ„']] = market_area['μƒκΆŒ_μ€‘μ•™μœ„κ²½λ„_κ°’'].str.split(', ', expand=True)
market_area[['경도', 'μœ„λ„']] = market_area[['경도', 'μœ„λ„']].astype(float)
market_area.head(3)

 

➑ market_area(μƒκΆŒ) 데이터 ν”„λ ˆμž„ : μƒκΆŒμ½”λ“œ 별 경도와 μœ„λ„ κ°’ μΆ”μΆœ

μƒκΆŒ 쀑심 μ’Œν‘œ λ¦¬μŠ€νŠΈμ— μ €μž₯(μ€‘λ³΅μ‚­μ œ)

markets = market_area[['μƒκΆŒ_μ½”λ“œ','경도', 'μœ„λ„']]
markets = markets.drop_duplicates().sort_values(by = 'μƒκΆŒ_μ½”λ“œ')
market_list = list(zip(markets['경도'], markets['μœ„λ„']))
market_list[:3]

 

➑ subway & bus 데이터 ν”„λ ˆμž„ : μƒκΆŒμ½”λ“œ 별 경도와 μœ„λ„ κ°’ μΆ”μΆœ

μ§€ν•˜μ² μ—­ μ’Œν‘œ λ¦¬μŠ€νŠΈμ— μ €μž₯(μ€‘λ³΅μ‚­μ œ)

sub_temp = subway_stations[['μœ„λ„', '경도']]
sub_temp = sub_temp.drop_duplicates()

sub_list = list(zip(sub_temp['경도'], sub_temp['μœ„λ„']))
sub_list[:3]
len(sub_list)

λ²„μŠ€μ •λ₯˜μž₯ μ’Œν‘œ λ¦¬μŠ€νŠΈμ— μ €μž₯(μ€‘λ³΅μ‚­μ œ)

display(bus_stations.head(3), bus_stations.info())  
bus_stations = bus_stations.dropna()
bus_stations.info()

bus_temp = bus_stations[['Xμ’Œν‘œ', 'Yμ’Œν‘œ']]
bus_temp = bus_temp.drop_duplicates()

bus_list = list(zip(bus_temp['Yμ’Œν‘œ'], bus_temp['Xμ’Œν‘œ']))
bus_list[:3]
len(bus_list)

 

 

➑ μƒκΆŒ μ’Œν‘œ 500λ―Έν„° 이내 μ§€ν•˜μ² μ—­, λ²„μŠ€μ •λ₯˜μž₯ 개수λ₯Ό λ°˜ν™˜

  • market_summary에 기쀀이 λ˜λŠ” μƒκΆŒμ½”λ“œμ™€ μƒκΆŒλ³„ μœ„λ„μ™€ 경도 μ €μž₯
market_summary = markets[['μƒκΆŒ_μ½”λ“œ', '경도', 'μœ„λ„']]
market_summary.head(3)
  • geopy 라이브러리λ₯Ό μ‚¬μš©ν•˜μ—¬, μƒκΆŒμ’Œν‘œ - μ§€ν•˜μ² μ—­/λ²„μŠ€μ •λ₯˜μž₯ μ‚¬μ΄μ˜ 거리λ₯Ό μΈ‘μ •ν•˜μ—¬
  • 반경 500λ―Έν„° μ•ˆμ— λ“€ 경우, μ§€ν•˜μ² μ—­/λ²„μŠ€μ •λ₯˜μž₯ 개수λ₯Ό count ν•˜λŠ” ν•¨μˆ˜ μ •μ˜
  • market_summary λ°μ΄ν„°ν”„λ ˆμž„μ— μƒκΆŒμ½”λ“œλ³„ 반경 500λ―Έν„° μ§€ν•˜μ² μ—­/λ²„μŠ€μ •λ₯˜μž₯ 개수 컬럼 μΆ”κ°€
from geopy.distance import geodesic

def within_radius(center, target):
    return geodesic(center, target).meters <= 500


def count_bus_stations_within_radius(market_list, bus_list):
    bus_count = 0

    for station in bus_list:
        if any(within_radius(market, station) for market in market_list):
            bus_count += 1
    return bus_count


def count_sub_stations_within_radius(market_list, sub_list):
    sub_count = 0

    for station in sub_list:
        if any(within_radius(market, station) for market in market_list):
            sub_count += 1
    return sub_count

market_summary['μ§€ν•˜μ² μ—­_수'] = [count_sub_stations_within_radius([market], sub_list) for market in market_list]            
market_summary['λ²„μŠ€μ •λ₯˜μž₯_수'] = [count_bus_stations_within_radius([market], bus_list) for market in market_list]

 


μƒκΆŒ λΆ„λ₯˜νŒ€

  • k-means clustering 으둜 μœ μ˜λ―Έν•΄λ³΄μ΄λŠ” λ³€μˆ˜λ³„λ‘œ ꡰ집화 μ‹œλ„ ->> μ œλŒ€λ‘œ 된 ꡰ집이 λ‚˜μ˜€μ§€ μ•ŠλŠ” 쀑
  • 집객 μ‹œμ„€ 선택 : λ‹€μ–‘ν•œ μƒκΆŒλΆ„μ„ μ„œλΉ„μŠ€ λ‚΄μ˜ 집객 μ‹œμ„€ 쀑, μ˜λ―Έμžˆμ„ κ²ƒμœΌλ‘œ μ˜ˆμƒλ˜λŠ” 집객 μ‹œμ„€ λΆ„λ₯˜, 선택
  • 골λͺ©μƒκΆŒμ˜ 배후지 데이터(μ§‘κ°μ‹œμ„€ ν•œμ •) 병합 μ˜ˆμ •

 


09.11 μ—…λ°μ΄νŠΈ 사항 ➑ λ³Έ ν¬μŠ€νŒ…μ˜ κ²°κ³Ό 데이터 μ‚¬μš©ν•˜μ§€ μ•ŠμŒ

  • μƒκΆŒ 쀑앙 μ§€μ μ˜ 반경 λ‚΄ μ§€ν•˜μ² μ—­ 수 / λ²„μŠ€μ •λ₯˜μž₯ 수 λ°μ΄ν„°λŠ” μ‚¬μš©ν•˜μ§€ μ•ŠκΈ°λ‘œ κ²°μ •ν–ˆλ‹€.
  • μ΄μœ λŠ”, 기본적으둜 μƒκΆŒμ˜ μ •ν™•ν•œ 면적 내에 μ§‘κ³„λœ 데이터λ₯Ό μˆ˜μ§‘ν•˜κ³  μžˆμ—ˆκΈ° λ•Œλ¬Έμ— λ™μΌν•œ λ°©μ‹μœΌλ‘œ 반경 κ°œλ…μ„ νκΈ°ν•˜κ³  'μ˜μ—­λ‚΄' κ°œλ…μœΌλ‘œ μ§€ν•˜μ² μ—­ μˆ˜μ™€ λ²„μŠ€μ •λ₯˜μž₯ 수λ₯Ό μ§‘κ³„ν•˜κΈ°λ‘œ ν–ˆλ‹€.
728x90