๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Machine Learning/Case Study ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

[Kaggle] ์ด์ปค๋จธ์Šค ๋ฐ์ดํ„ฐ ๋ถ„์„ 5 (CRM Analytics ๐Ÿ›๏ธ๐Ÿ›’)

by ISLA! 2023. 10. 8.

์•ž ํฌ์ŠคํŒ…์—์„œ ๊ณ ๊ฐ๊ตฐ์„ ๋‚˜๋ˆ„์–ด ์ „์ฒด์ ์ธ ํฌ๊ธฐ๋ฅผ ํ™•์ธํ–ˆ๋‹ค. 

์ด์ œ ์ข€ ๋” ๊ตฌ์ฒด์ ์œผ๋กœ ๊ณ ๊ฐ๊ตฐ ๋ณ„ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ํŒŒ์•…ํ•ด ๋ณธ๋‹ค.(R, F, M์„ ๊ธฐ์ค€์œผ๋กœ)

 


Segment Analysis

1. ๊ณ ๊ฐ๊ตฐ ๋ณ„ R, F, M ๊ธฐ์ˆ  ํ†ต๊ณ„ ํ™•์ธ

  • segment ๊ธฐ์ค€์œผ๋กœ r, f, m ๊ฐ’์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ, ์ตœ๋Œ“๊ฐ’๊ณผ ์ตœ์†Ÿ๊ฐ’์„ ํ™•์ธํ•ด ๋ณด์ž
rfm[['recency','monetary','frequency','segment']] \
        .groupby('segment').agg({'mean', 'std', 'max', 'min'})

 

 

2. ๊ณ ๊ฐ๊ตฐ ๋ณ„ ๋ถ„ํฌ ์‹œ๊ฐํ™”

  • ๊ฐ ๊ณ ๊ฐ๊ตฐ ๋ณ„๋กœ ์–ด๋– ํ•œ ์–‘์ƒ์„ ๋ณด์ด๋Š”์ง€ ํŒŒ์•…ํ–ˆ๋‹ค.
  • ์ด์ œ ๊ณ ๊ฐ๊ตฐ ๋ณ„๋กœ ๋ถ„ํฌ(์ˆ˜)๋ฅผ ์‹œ๊ฐํ™”ํ•˜์—ฌ ํ™•์ธํ•ด ๋ณด์ž. ์ด๋•Œ, ๊ฐ ๊ณ ๊ฐ๊ตฐ ์ˆ˜๋ฅผ ์ „์ฒด ์ˆ˜๋กœ ๋‚˜๋ˆ„์–ด ๋น„์œจ๋„ ํ•จ๊ป˜ ํ‘œ์‹œํ•ด ์ค€๋‹ค.
plt.figure(figsize = (18, 8))
palette = 'Set2'

ax = sns.countplot(data = rfm,
                  x = 'segment',
                  palette = palette)

total = len(rfm.segment)
for patch in ax.patches:
    percentage = '{:.1f}%'.format(100 * patch.get_height()/total)
    x = patch.get_x() + patch.get_width()/2 - 0.17
    y = patch.get_y() + patch.get_height() * 1.01
    ax.annotate(percentage, (x, y), size = 14)

plt.title('Number of Customers by Segments', size = 16)
plt.xlabel('Segment', size = 14)
plt.ylabel('Count', size = 14)
plt.xticks(size = 12)
plt.yticks(size = 12)
plt.show()

 

 

3. ๊ณ ๊ฐ๊ตฐ ๋ณ„ R,F ๊ฐ’ ์‹œ๊ฐํ™”

  • ๊ณ ๊ฐ๊ตฐ๋ณ„ recency์™€ frequency ๊ด€๊ณ„ ๋ฐ ๋ถ„ํฌ ํŒŒ์•…ํ•˜๊ธฐ(scatterplot)
    • hue์— segment๋ฅผ ์ง€์ •
plt.figure(figsize = (18, 8))

sns.scatterplot(
    data = rfm, x = 'recency', y = 'frequency', hue = 'segment', palette = palette, s = 60)
plt.title('Recency & Frequency by Segments', size = 16)
plt.xlabel('Recency', size = 12)
plt.ylabel('Frequency', size = 12)
plt.xticks(size = 10)
plt.yticks(size = 10)
plt.legend(loc = 'best', fontsize = 14, title = '- Segments -', title_fontsize = 14)
plt.show()

 

 

4. ๊ณ ๊ฐ๊ตฐ ๋ณ„ R,F ๊ฐ’ ์‹œ๊ฐํ™”(boxplot)

  • monetary๋Š” ์ด๋ฒˆ ์‹ค์Šต์—์„œ RFM ์Šค์ฝ”์–ด์— ํฌํ•จ๋˜์ง€ ์•Š์•˜๊ธฐ์— ์ œ์™ธํ•˜๊ณ  ์‹œ๊ฐํ™” ์ง„ํ–‰ํ–ˆ๋‹ค.
  • ์•„๋ž˜์™€ ๊ฐ™์ด ๊ณ ๊ฐ๊ตฐ๋ณ„๋กœ R, F ๊ฐ’์ด ์–ด๋–ป๊ฒŒ ๋‚˜ํƒ€๋‚˜๊ณ  ์žˆ๋Š”์ง€ ๋ฐ•์Šค ํ”Œ๋ž์œผ๋กœ๋„ ์‚ดํŽด๋ณด์ž.
fig, axes = plt.subplots(1, 2, figsize = (9, 6))
fig.suptitle('RFM Segment Analysis', size = 14)

feature_list = ["recency", "frequency"]

for idx, col in enumerate(feature_list):
    sns.boxplot(ax = axes[idx], data = rfm, x = 'segment', y = feature_list[idx], palette = palette)
    axes[idx].set_xticklabels(axes[idx].get_xticklabels(), rotation = 60)
    
    if idx == 1:
        axes[idx].set_ylim([0, 30])

plt.tight_layout()
plt.show()

 

 

5. ๊ณ ๊ฐ๊ตฐ ๋ณ„ R,F ๊ฐ’ ์‹œ๊ฐํ™”(histplot)

  • ๊ณ ๊ฐ๊ตฐ ๋ณ„๋กœ ์ƒ‰์ƒ์„ ์ง€์ •ํ•ด์ฃผ๊ณ , R, F ๊ฐ’์˜ ๋ถ„ํฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š”์ง€ ์‚ดํŽด๋ณด์ž.
fig, axes = plt.subplots(2, 1, figsize = (12, 8))
fig.suptitle('RFM Segment Analysis', size = 14)
feature_list = ['recency', 'frequency']

for idx, col in enumerate(feature_list):
    sns.histplot(ax = axes[idx], data = rfm, x = feature_list[idx], hue = 'segment', palette=palette)
    
    if idx == 1:
        axes[idx].set_xlim([0, 30])


plt.tight_layout()
plt.show()

728x90