Library Import
import os
import numpy as np
import random
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import Ridge
Data pre-processing : One-Hot Encoding
- ๋ช ๋ชฉํ ๋ณ์์ ๊ฒฝ์ฐ ๊ฐ๋ค ๊ฐ๊ฐ์ ์๋ก์ด ์ปฌ๋ผ์ผ๋ก ๋ง๋ค๊ณ ๐ ์๋ ํด๋นํ๋ ๊ฐ์๋ 1์, ์๋ ๊ฒฝ์ฐ 0์ ๋ถ์ฌ
- ์ด์ค For ๋ฌธ์ ์ด ์ด์ : One Hot Encoder๊ฐ Test ๋ฐ์ดํฐ๋ก๋ถํฐ Fitting๋๋ ๊ฒ์ Data Leakage์ด๋ฏ๋ก,
Test ๋ฐ์ดํฐ์๋ Train ๋ฐ์ดํฐ๋ก Fitting๋ One Hot Encoder๋ก๋ถํฐ transform๋ง ์ํ๋์ด์ผ ํ๋ค!
# ์ง์ ์ปฌ๋ผ ์ํซ์ธ์ฝ๋ฉ
qual_col = ['propertyType','suburbName']
ohe = OneHotEncoder(sparse=False)
for i in qual_col :
train_x = pd.concat([train_x, pd.DataFrame(ohe.fit_transform(train_x[[i]]), columns = ohe.categories_[0])], axis = 1)
for qual_value in np.unique(test_x[i]):
if qual_value not in np.unique(ohe.categories_):
ohe.categories_ = np.append(ohe.categories_, qual_value)
# One Hot Encoder๊ฐ Test ๋ฐ์ดํฐ๋ก๋ถํฐ Fitting๋๋ ๊ฒ์ Data Leakage์ด๋ฏ๋ก,
# Test ๋ฐ์ดํฐ์๋ Train ๋ฐ์ดํฐ๋ก Fitting๋ One Hot Encoder๋ก๋ถํฐ transform๋ง ์ํ๋์ด์ผ ํ๋ค!!
test_x = pd.concat([test_x, pd.DataFrame(ohe.transform(test_x[[i]]), columns = ohe.categories_[0])], axis = 1)
train_x = train_x.drop(qual_col, axis = 1)
test_x = test_x.drop(qual_col, axis=1)
print('Done')
Model Hyperparameter Setting
- Ridge Regression ๋ชจ๋ธ์์๋ alpha๋ฅผ Hyperparameter๋ก ์ ๊ณตํ๊ณ ์๋ค
- alpha๋ ๋ชจ๋ธ์ ๊ท์ ํญ์ผ๋ก, ๋ชจ๋ธ์ ์ค๋ฒํผํ ์ ๋ฐฉ์งํ๋ ์ญํ ์ ํ๋ค
Model = Ridge(alpha = 1.0)
Model.fit(train_x, train_y)
preds = Model.predict(test_x)
submit['monthlyRent(us_dollar)'] = preds
submit.head()
728x90
'Machine Learning > Case Study ๐ฉ๐ปโ๐ป' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[๐ฆ ๊ฒ ๋์ด ์์ธก(2)] Baseline Modeling(Gradient Boosting) (0) | 2023.09.24 |
---|---|
[๐ฆ ๊ฒ ๋์ด ์์ธก(1)] ๋ฐ์ดํฐ ํ์ & EDA (0) | 2023.09.24 |
[์ค๊ณ ์ฐจ ๊ฐ๊ฒฉ ์์ธก(2)] EDA (0) | 2023.09.17 |
[์ค๊ณ ์ฐจ ๊ฐ๊ฒฉ ์์ธก(1)] pandas_profiling ์ ์ด์ฉํ ํผ์ณ ์์ฝ ํ์ธ (0) | 2023.09.17 |
[ํด์ธ ๋ถ๋์ฐ ์์ธ ์์ธก(1)] ๊ธฐ๋ณธ์ ์ธ EDA ์ฐ์ต (0) | 2023.09.15 |