Machine Learning/Case Study ๐ฉ๐ป๐ป
[Kaggle] ์ด์ปค๋จธ์ค ๋ฐ์ดํฐ ๋ถ์ 5 (CRM Analytics ๐๏ธ๐)
ISLA!
2023. 10. 8. 15:23
์ ํฌ์คํ ์์ ๊ณ ๊ฐ๊ตฐ์ ๋๋์ด ์ ์ฒด์ ์ธ ํฌ๊ธฐ๋ฅผ ํ์ธํ๋ค.
์ด์ ์ข ๋ ๊ตฌ์ฒด์ ์ผ๋ก ๊ณ ๊ฐ๊ตฐ ๋ณ ์ธ๋ถ ์ฌํญ์ ํ์ ํด ๋ณธ๋ค.(R, F, M์ ๊ธฐ์ค์ผ๋ก)
Segment Analysis
1. ๊ณ ๊ฐ๊ตฐ ๋ณ R, F, M ๊ธฐ์ ํต๊ณ ํ์ธ
- segment ๊ธฐ์ค์ผ๋ก r, f, m ๊ฐ์ ํ๊ท ๊ณผ ํ์คํธ์ฐจ, ์ต๋๊ฐ๊ณผ ์ต์๊ฐ์ ํ์ธํด ๋ณด์
rfm[['recency','monetary','frequency','segment']] \
.groupby('segment').agg({'mean', 'std', 'max', 'min'})
2. ๊ณ ๊ฐ๊ตฐ ๋ณ ๋ถํฌ ์๊ฐํ
- ๊ฐ ๊ณ ๊ฐ๊ตฐ ๋ณ๋ก ์ด๋ ํ ์์์ ๋ณด์ด๋์ง ํ์ ํ๋ค.
- ์ด์ ๊ณ ๊ฐ๊ตฐ ๋ณ๋ก ๋ถํฌ(์)๋ฅผ ์๊ฐํํ์ฌ ํ์ธํด ๋ณด์. ์ด๋, ๊ฐ ๊ณ ๊ฐ๊ตฐ ์๋ฅผ ์ ์ฒด ์๋ก ๋๋์ด ๋น์จ๋ ํจ๊ป ํ์ํด ์ค๋ค.
plt.figure(figsize = (18, 8))
palette = 'Set2'
ax = sns.countplot(data = rfm,
x = 'segment',
palette = palette)
total = len(rfm.segment)
for patch in ax.patches:
percentage = '{:.1f}%'.format(100 * patch.get_height()/total)
x = patch.get_x() + patch.get_width()/2 - 0.17
y = patch.get_y() + patch.get_height() * 1.01
ax.annotate(percentage, (x, y), size = 14)
plt.title('Number of Customers by Segments', size = 16)
plt.xlabel('Segment', size = 14)
plt.ylabel('Count', size = 14)
plt.xticks(size = 12)
plt.yticks(size = 12)
plt.show()
3. ๊ณ ๊ฐ๊ตฐ ๋ณ R,F ๊ฐ ์๊ฐํ
- ๊ณ ๊ฐ๊ตฐ๋ณ recency์ frequency ๊ด๊ณ ๋ฐ ๋ถํฌ ํ์
ํ๊ธฐ(scatterplot)
- hue์ segment๋ฅผ ์ง์
plt.figure(figsize = (18, 8))
sns.scatterplot(
data = rfm, x = 'recency', y = 'frequency', hue = 'segment', palette = palette, s = 60)
plt.title('Recency & Frequency by Segments', size = 16)
plt.xlabel('Recency', size = 12)
plt.ylabel('Frequency', size = 12)
plt.xticks(size = 10)
plt.yticks(size = 10)
plt.legend(loc = 'best', fontsize = 14, title = '- Segments -', title_fontsize = 14)
plt.show()
4. ๊ณ ๊ฐ๊ตฐ ๋ณ R,F ๊ฐ ์๊ฐํ(boxplot)
- monetary๋ ์ด๋ฒ ์ค์ต์์ RFM ์ค์ฝ์ด์ ํฌํจ๋์ง ์์๊ธฐ์ ์ ์ธํ๊ณ ์๊ฐํ ์งํํ๋ค.
- ์๋์ ๊ฐ์ด ๊ณ ๊ฐ๊ตฐ๋ณ๋ก R, F ๊ฐ์ด ์ด๋ป๊ฒ ๋ํ๋๊ณ ์๋์ง ๋ฐ์ค ํ๋์ผ๋ก๋ ์ดํด๋ณด์.
fig, axes = plt.subplots(1, 2, figsize = (9, 6))
fig.suptitle('RFM Segment Analysis', size = 14)
feature_list = ["recency", "frequency"]
for idx, col in enumerate(feature_list):
sns.boxplot(ax = axes[idx], data = rfm, x = 'segment', y = feature_list[idx], palette = palette)
axes[idx].set_xticklabels(axes[idx].get_xticklabels(), rotation = 60)
if idx == 1:
axes[idx].set_ylim([0, 30])
plt.tight_layout()
plt.show()
5. ๊ณ ๊ฐ๊ตฐ ๋ณ R,F ๊ฐ ์๊ฐํ(histplot)
- ๊ณ ๊ฐ๊ตฐ ๋ณ๋ก ์์์ ์ง์ ํด์ฃผ๊ณ , R, F ๊ฐ์ ๋ถํฌ๊ฐ ์ด๋ป๊ฒ ๋ํ๋๋์ง ์ดํด๋ณด์.
fig, axes = plt.subplots(2, 1, figsize = (12, 8))
fig.suptitle('RFM Segment Analysis', size = 14)
feature_list = ['recency', 'frequency']
for idx, col in enumerate(feature_list):
sns.histplot(ax = axes[idx], data = rfm, x = feature_list[idx], hue = 'segment', palette=palette)
if idx == 1:
axes[idx].set_xlim([0, 30])
plt.tight_layout()
plt.show()
728x90