Machine Learning/Case Study ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

[Kaggle] ์ด์ปค๋จธ์Šค ๋ฐ์ดํ„ฐ ๋ถ„์„ 4 (CRM Analytics ๐Ÿ›๏ธ๐Ÿ›’)

ISLA! 2023. 10. 8. 14:04

์•ž์˜ ํฌ์ŠคํŒ… [Kaggle] ์ด์ปค๋จธ์Šค ๋ฐ์ดํ„ฐ ๋ถ„์„ 3 (CRM Analytics ๐Ÿ›๏ธ๐Ÿ›’)๊ณผ ์ด์–ด์ง‘๋‹ˆ๋‹ค.

RFM score๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ณ ๊ฐ ์„ธ๋ถ„ํ™”๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.

 


Segmentation

์ •๊ทœ ํ‘œํ˜„์‹์„ ์‚ฌ์šฉํ•˜์—ฌ RFM score๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ณ ๊ฐ์˜ ๊ทธ๋ฃน์„ ๋‚˜๋ˆˆ๋‹ค.

r : ์ ‘๋‘์‚ฌ๋Š” ๋ฌธ์ž์—ด ๋ฆฌํ„ฐ๋Ÿด์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ด ํŒจํ„ด์˜ ๋ฌธ์ž์—ด์€ Raw ๋ฌธ์ž์—ด๋กœ ์ฒ˜๋ฆฌ๋จ์„ ์˜๋ฏธํ•œ๋‹ค.(\๊ฐ€ ์ด์Šค์ผ€์ดํ”„ ๋ฌธ์ž๋กœ ์ฒ˜๋ฆฌ๋˜์ง€ ์•Š์Œ)
[1-2] : ๋ฌธ์ž ํด๋ž˜์Šค๋กœ, ์ด ๋ถ€๋ถ„์—์„œ ๋ฌธ์ž์—ด์—์„œ 1 ๋˜๋Š” 2์™€ ๋งค์นญ๋˜๋Š” ์œ„์น˜๋ฅผ ์ฐพ๋Š”๋‹ค.
[3-4] : ์œ„์™€ ๋™์ผํ•˜๊ฒŒ, ํ•ด๋‹น ๋ถ€๋ถ„์—์„œ ๋ฌธ์ž์—ด์ด 3 ๋˜๋Š” 4์™€ ๋งค์นญ๋˜๋Š” ์œ„์น˜๋ฅผ ์ฐพ๋Š”๋‹ค.
์ฆ‰, r'[1-2][3-4]' ๋Š” ๋ฌธ์ž์—ด์—์„œ 1 ๋˜๋Š” 2์— ๋งค์นญ๋˜๋Š” ๋ถ€๋ถ„์„ ์ฐพ๊ณ , ๊ทธ๋‹ค์Œ 3 ๋˜๋Š” 4์— ๋งค์นญ๋˜๋Š” ๋ถ€๋ถ„์„ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค.

 

seg_map = {r'[1-2][1-2]' : 'hibernating',
          r'[1-2][3-4]' : 'at_Rish',
          r'[1-2]5':'cant_loose',
          r'3[1-2]': 'about_to_sleep',
          r'33':'need_attention',
          r'[3-4][4-5]': 'loyal_customers',
          r'41' : 'promising',
          r'51':'new_customers',
          r'[4-5][2-3]' : 'potential_loyalists',
          r'5[4-5]':'champions'}

 

  • ์œ„ ์ •๊ทœํ‘œํ˜„์‹์— ๋”ฐ๋ผ segment๋ฅผ ๊ตฌ๋ถ„ํ•œ๋‹ค. 
  • ์ด๋•Œ, RFM_SCORE์— replace() ํ•จ์ˆ˜๋กœ ์œ„ seg_map์„ ๋งคํ•‘ํ•œ๋‹ค.
  • regex = True ์˜ต์…˜์€ ์ •๊ทœ ํ‘œํ˜„์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋Œ€์ฒด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ผ๋Š” ๊ฒƒ์ด๋‹ค. 
rfm['segment'] = rfm['RFM_SCORE'].replace(seg_map, regex = True)
rfm.head()

 

Segmentation Map

  • squarify ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ, ๋ฐ์ดํ„ฐ๋ฅผ tree map ํ˜•ํƒœ๋กœ ์‹œ๊ฐํ™”ํ•œ๋‹ค.
    • ๊ฐ ์‚ฌ๊ฐํ˜• ๋ธ”๋ก์€ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
  • squarify.plot(): ์ด ํ•จ์ˆ˜๋Š” Squarify ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠธ๋ฆฌ๋งต์„ ์ƒ์„ฑํ•˜๊ณ  ํ”Œ๋กฏ ํ•œ๋‹ค.
  • sizes: `segments`๋ผ๋Š” ๋ณ€์ˆ˜์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ์ •๋ณด๋ฅผ ํ†ตํ•ด ๊ฐ๊ฐ์˜ ์‚ฌ๊ฐํ˜• ๋ธ”๋ก์˜ ํฌ๊ธฐ๋ฅผ ์ง€์ •
  • label: ๊ฐ ์‚ฌ๊ฐํ˜• ๋ธ”๋ก์— ํ‘œ์‹œํ•  ๋ ˆ์ด๋ธ” ์ •๋ณด๋ฅผ ์„ค์ •. `seg_map` ๋”•์…”๋„ˆ๋ฆฌ์˜ value๋กœ ๋ ˆ์ด๋ธ” ์ •๋ณด๋ฅผ ์ง€์ •
  • color : ๊ฐ ์‚ฌ๊ฐํ˜• ๋ธ”๋ก์˜ ์ƒ‰์ƒ์„ ์ง€์ •. ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ƒ‰์ƒ์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋‚˜์—ดํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค.
  • pad: `True` ๋˜๋Š” `False` ๊ฐ’์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, `True`๋กœ ์„ค์ •ํ•˜๋ฉด ๊ฐ ์‚ฌ๊ฐํ˜• ๋ธ”๋ก ๊ฐ„์— ๊ฐ„๊ฒฉ์ด ์ถ”๊ฐ€๋˜์–ด ๋” ๋ณด๊ธฐ ์ข‹์€ ํŠธ๋ฆฌ๋งต์„ ์ƒ์„ฑํ•œ๋‹ค.
  • bar_kwargs: ์‚ฌ๊ฐํ˜• ๋ธ”๋ก์˜ ํŠน์„ฑ์„ ์„ค์ •ํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์ง€์ •. ์—ฌ๊ธฐ์—์„œ `alpha`๋ฅผ 1๋กœ ์„ค์ •ํ•˜์—ฌ ํˆฌ๋ช…๋„๋ฅผ ์กฐ์ ˆํ–ˆ๋‹ค.
  • text_kwargs: ํ…์ŠคํŠธ์˜ ํŠน์„ฑ์„ ์„ค์ •ํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์ง€์ •. ์—ฌ๊ธฐ์—์„œ `fontsize`๋ฅผ 15๋กœ ์„ค์ •ํ•˜์—ฌ ํ…์ŠคํŠธ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ–ˆ๋‹ค.

 

import squarify

segments = rfm['segment'].value_counts().sort_values(ascending = False)
segments

# figure ๊ฐ€์ ธ์˜ค๊ธฐ(์บ”๋ฒ„์Šค)
fig = plt.gcf()
fig.set_size_inches(16, 10)

# fig์— ์„œ๋ธŒํ”Œ๋กฏ ์ถ”๊ฐ€(fig, add_suplot(2, 2, 1)์ฒ˜๋Ÿผ ์“ฐ์ž„)
ax = fig.add_subplot()

# ๋ฐ์ดํ„ฐ๋ฅผ ํŠธ๋ฆฌ๋งต(tree map) ํ˜•ํƒœ๋กœ ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ
squarify.plot(sizes = segments,
             label = [label for label in seg_map.values()],
             color = ["#AFB6B5", "#F0819A","#926717","#F0F081","#81D5F0",
                        "#C78BE5","#748E80","#FAAF3A","#7B8FE4","#86E8C0"],
            pad = False,
            bar_kwargs = {'alpha': 1},
            text_kwargs = {'fontsize':15}       
            )

plt.title('Customer Segmentation Map', fontsize = 20)
plt.xlabel('Frequency', fontsize = 18)
plt.ylabel('Recency', fontsize = 18)
plt.show()

 

 

Segmentation Model Evaluation

# ๋‹ค์Œ์˜ ํ‰๊ฐ€์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
from sklearn.metrics import (silhouette_score,
                             calinski_harabasz_score,
                             davies_bouldin_score)

 

  • ์œ„์—์„œ recency_score์™€ frequency_score๋ฅผ ๋„์ถœํ•˜๊ณ  ์ด ์กฐํ•ฉ์„ ๋‹ค์–‘ํ•˜๊ฒŒ ์„ธ๋ถ„ํ™”ํ•˜์—ฌ segment๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.
  • seg_map์„ ํ†ตํ•ด ์ง„ํ–‰ํ•œ ์„ธ๋ถ„ํ™”๊ฐ€ ์ž˜ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด 'ํด๋Ÿฌ์Šคํ„ฐ๋ง ํ‰๊ฐ€ ์ง€ํ‘œ'๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.
  • silhouette_score(X, labels) ํ•จ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ X์™€ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ํ• ๋‹น๋œ ํด๋Ÿฌ์Šคํ„ฐ ๋ ˆ์ด๋ธ” labels์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ Silhouette Score๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , round(..., 3)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์†Œ์ˆ˜์  ์…‹์งธ ์ž๋ฆฌ๊นŒ์ง€ ๋ฐ˜์˜ฌ๋ฆผํ•˜์—ฌ ์ถœ๋ ฅ 

 

  • Silhouette Score : -1๋ถ€ํ„ฐ 1๊นŒ์ง€์˜ ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ, 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์ž˜ ํ˜•์„ฑ๋˜์—ˆ๋‹ค๊ณ  ํŒ๋‹จํ•œ๋‹ค.
  • Calinski Harabasz Score : ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„ ๋ถ„์‚ฐ๊ณผ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด ๋ถ„์‚ฐ์˜ ๋น„์œจ์„ ์‚ฌ์šฉํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๋ง์˜ ํ’ˆ์งˆ์„ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ
  • Davies Bouldin Score : ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„ ๊ฑฐ๋ฆฌ์™€ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด ๋ถ„์‚ฐ์˜ ๋น„์œจ์„ ์‚ฌ์šฉํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๋ง ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ
print(' RFM Model Evaluation '.center(70, '='))
X = rfm[['recency_score', 'frequency_score']]
labels = rfm['segment']

print(f'๊ด€์ธก์น˜ ์ˆ˜ : {X.shape[0]}')
print(f'๊ณ ๊ฐ ์„ธ๋ถ„ํ™” ์ˆ˜ : {labels.nunique()}')
print(f'์‹ค๋ฃจ์—ฃ Score : {round(silhouette_score(X, labels), 3)}')
print(f'Calinski Harabasz Score: {round(calinski_harabasz_score(X, labels), 3)}')
print(f'Davies Bouldin Score: {round(davies_bouldin_score(X, labels), 3)} \n{70*"="}')

 

 

๐Ÿ‘‰ ์ด๋ฅผ ํ†ตํ•ด ๊ณ ๊ฐ ์„ธ๋ถ„ํ™”(ํด๋Ÿฌ์Šคํ„ฐ๋ง)๊ฐ€ ์ž˜ ๋œ ํŽธ์ž„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
      ์ด์ œ ๊ณ ๊ฐ ์„ธ๋ถ„ํ™” ๊ฒฐ๊ณผ ์ž์ฒด๋ฅผ ์ข€ ๋” ์ž์„ธํžˆ ๋“ค์—ฌ๋‹ค๋ณด์ž.

      (๋‹ค์Œ ํฌ์ŠคํŒ…์— ์ด์–ด์ง‘๋‹ˆ๋‹ค.)


 

728x90