๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Machine Learning/Case Study ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

[Kaggle] ์ด์ปค๋จธ์Šค ๋ฐ์ดํ„ฐ ๋ถ„์„ 3 (CRM Analytics ๐Ÿ›๏ธ๐Ÿ›’)

by ISLA! 2023. 10. 8.

๋ณธ ํฌ์ŠคํŒ…์€ ์ด์ปค๋จธ์Šค ๋ฐ์ดํ„ฐ ๋ถ„์„ 1, 2์™€ ์ด์–ด์ง‘๋‹ˆ๋‹ค.

์ „์ฒ˜๋ฆฌ๊นŒ์ง€ ๋๋‚ธ ๋ฐ์ดํ„ฐ๋กœ  RFM ๋ถ„์„์„ ํ•ด๋ณด์ž.

 

 

RFM ์ด๋ž€?

  • "Recency, Frequency, Monetary"์˜ ์•ฝ์–ด๋กœ, ๊ณ ๊ฐ ์„ธ๊ทธ๋จผํ…Œ์ด์…˜ ๋ฐ ๊ณ ๊ฐ ๋ถ„์„์— ์‚ฌ์šฉ๋˜๋Š” ์ค‘์š”ํ•œ ๊ฐœ๋…์ด๋‹ค.
  • ์ด ์„ธ ๊ฐ€์ง€ ์ง€ํ‘œ๋Š” ๊ณ ๊ฐ์˜ ๊ตฌ๋งค ํ–‰๋™ ๋ฐ ๊ฐ€์น˜๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

RFM ๊ฐ’ ๊ตฌํ•˜๊ธฐ

  • Recency๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋งˆ์ง€๋ง‰ ๊ตฌ๋งค์ผ๋กœ๋ถ€ํ„ฐ ์ง€๊ธˆ๊นŒ์ง€ ์ง€๋‚œ ๋‚ ์งœ๋ฅผ ์•Œ์•„์•ผ ํ•œ๋‹ค. 
  • ์†ก์žฅ๋‚ ์งœ์˜ ์ตœ๋Œ“๊ฐ’์œผ๋กœ ๋งˆ์ง€๋ง‰ ๊ตฌ๋งค์ผ์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
print(df['InvoiceDate'].max())
2011-12-09 12:50:00

 

  • datetime์„ ์ด์šฉํ•˜์—ฌ ๊ธฐ์ค€ ๋‚ ์งœ๋ฅผ ์„ ์ •ํ•œ๋‹ค.
  • ๊ณ ๊ฐ ID๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ฃนํ™”๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.
    • ์ด๋•Œ, ์†ก์žฅ๋‚ ์งœ๋Š” ๊ธฐ์ค€ ๋‚ ์งœ์™€์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ผ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฐ˜ํ™˜ํ•˜๋„๋ก ํ•œ๋‹ค.
    • ์†ก์žฅ๋ฒˆํ˜ธ๋Š” ๊ณ ์œณ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•˜์—ฌ, ์ด ๋ช‡ ํšŒ ๊ตฌ๋งคํ–ˆ๋Š”์ง€(frequency)๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.
    • ๋งˆ์ง€๋ง‰์œผ๋กœ Monetary๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด TotalPrice๋Š” ๋ชจ๋‘ ํ•ฉํ•ด์ค€๋‹ค.
import datetime as dt

# today_date ๋ณ€์ˆ˜์—๋Š” 2011๋…„ 12์›” 11์ผ์„ ๋‚˜ํƒ€๋‚ด๋Š” datetime ๊ฐ์ฒด๊ฐ€ ์ €์žฅ
today_date = dt.datetime(2011, 12, 11)

rfm = df.groupby('CustomerID').agg({'InvoiceDate': lambda x : (today_date - x.max()).days,    #์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์„ ์ผ(day)๋กœ ํ‘œํ˜„:๋งˆ์ง€๋ง‰ ๊ตฌ๋งค์ผ๋กœ๋ถ€ํ„ฐ ์ง€๋‚œ ์ผ์ˆ˜
                                   'InvoiceNo' : lambda x : x.nunique(),    # ์ด ๋ช‡ ๋ฒˆ ๊ตฌ๋งคํ–ˆ๋Š”์ง€
                                    'TotalPrice' : lambda x : x.sum()    # ์ด ๋ˆ„์  ๊ตฌ๋งค ๊ธˆ์•ก
                                   })
rfm

 

  • ์นผ๋Ÿผ๋ช…์„ ๋ณ€๊ฒฝํ•˜๊ณ 
  • ์ด ์ฃผ๋ฌธ ๊ธˆ์•ก์ด 0 ๋ณด๋‹ค ํฐ ๊ฐ’๋งŒ ํ•„ํ„ฐ๋งํ•˜๊ณ , ์ธ๋ฑ์Šค๋ฅผ ์ •๋ฆฌํ•ด ์ค€๋‹ค.
rfm.columns = ['recency', 'frequency', 'monetary']

rfm = rfm[rfm['monetary'] > 0]
rfm = rfm.reset_index()
rfm.head()

 

RFM Scores ๊ตฌํ•˜๊ธฐ

  • ์ด์ œ ์•ž์„œ ๊ตฌํ•œ ๊ฐ’๋“ค๋กœ RFM ์ ์ˆ˜๋ฅผ ๊ตฌํ•ด๋ณด์ž. ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•  ๊ฒƒ์ด๋‹ค.
  • pd.qcut()์„ ์ด์šฉํ•ด์„œ, ๊ตฌ๊ฐ„ ๋ณ„๋กœ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๊ท ์ผํ•˜๊ฒŒ ํฌํ•จ๋˜๋„๋ก ํ•œ๋‹ค.
  • recency_score : 5๊ฐœ ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆ„๋˜, ๊ฐ’์ด ํด์ˆ˜๋ก ์ข‹์ง€ ์•Š์œผ๋ฏ€๋กœ(์ตœ์ข… ๊ตฌ๋งค์ผ๋กœ๋ถ€ํ„ฐ ๋งŽ์€ ์‹œ๊ฐ„์ด ํ๋ฆ„) ์—ญ์ˆœ์œผ๋กœ Label์„ ์ง€์ •ํ•œ๋‹ค.
  • frequency_score : frequency์—ด์˜ ์ˆœ์œ„๋ฅผ. rank()๋กœ ๊ณ„์‚ฐํ•˜๋˜, method์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ first๋กœ ์„ค์ •ํ•˜์—ฌ ๋™์ผํ•œ ๊ฐ’์ด ์žˆ๋‹ค๋ฉด ๋จผ์ € ๋‚˜ํƒ€๋‚œ ๊ฐ’์— ๋” ๋†’์€ ์ˆœ์œ„๋ฅผ ํ• ๋‹นํ•œ๋‹ค. (๊ณ ๊ฐ์˜ ๊ตฌ๋งค ๋นˆ๋„๊ฐ€ ๊ฐ™๋‹ค๋ฉด, ๋จผ์ € ๋‚˜ํƒ€๋‚œ ๊ณ ๊ฐ์ด ๋” ๋†’์€ ์ˆœ์œ„)
    • ์ด๋•Œ ์ž์ฃผ ๋ฐฉ๋ฌธํ•œ ๊ณ ๊ฐ์ด ์šฐ์„ ๋˜๋ฏ€๋กœ label์€ 1๋ถ€ํ„ฐ ์ง€์ •
  • monetary_score : 5 ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆ„๋˜, ๊ฐ’์ด ํด์ˆ˜๋ก(๊ตฌ๋งค ๊ธˆ์•ก์ด ํผ) ์šฐ์„ ๋˜๋ฏ€๋กœ label์„ 1๋ถ€ํ„ฐ ์ง€์ •
  • RFM_SCORE : ์œ„์—์„œ ๊ตฌํ•œ ์ปฌ๋Ÿผ ๊ฐ’์„ ๋”ํ•ด์ค€๋‹ค.(R๊ณผ F๋งŒ)
def get_rfm_scores(df):
    
    df_ = df.copy()
    
    df_['recency_score'] = pd.qcut(df_['recency'], 5, labels = [5, 4, 3, 2, 1])    #๊ฐ€์žฅ ๋‚ฎ์€ ๊ฐ’์ด 1
    df_['frequency_score'] = pd.qcut(df_['frequency'].rank(method = 'first'), 5, labels = [1, 2, 3, 4, 5])
    df_['monetary_score'] = pd.qcut(df_['monetary'], 5, labels = [1, 2, 3, 4, 5])
    
    df_['RFM_SCORE'] = df_['recency_score'].astype(str) +  df_["frequency_score"].astype(str)
    
    return df_

 

๐Ÿš€ ํ•จ์ˆ˜๋ฅผ ์ ์šฉ

rfm = get_rfm_scores(rfm)
rfm

728x90