Machine Learning/Case Study ๐Ÿ‘ฉ๐Ÿป‍๐Ÿ’ป

[๐Ÿฆ€ ๊ฒŒ ๋‚˜์ด ์˜ˆ์ธก(3)] Baseline Modeling(Hist Gradient Boosting)

ISLA! 2023. 9. 24. 23:16

------>>  ๊ธฐ๋ณธ์ ์ธ ์ „์ฒ˜๋ฆฌ ์ฝ”๋“œ๋Š” ์•ž ํฌ์ŠคํŒ…๊ณผ ๋™์ผํ•˜๊ฒŒ ์ด์–ด์ง‘๋‹ˆ๋‹ค.

 

 

๐Ÿš€ Hist Gradient Boosting ์ด๋ž€?

  • ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ Gradient Boosting ์˜ ๋ณ€ํ˜• ์ค‘ ํ•˜๋‚˜๋กœ, ์ผ๋ฐ˜ Gradient Boosting(ํšŒ๊ท€/๋ถ„๋ฅ˜)๊ณผ ๋น„๊ตํ•˜์—ฌ ํšจ์œจ์ ์ธ ๊ตฌํ˜„์„ ์ œ๊ณต
  • ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์— ์ ํ•ฉํ•˜๋ฉฐ, ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ Gradient Boosting๋ณด๋‹ค ๋น ๋ฅธ ํ•™์Šต๊ณผ ์˜ˆ์ธก์„ ์ œ๊ณต
  • ์žฅ์ 
    • ํžˆ์Šคํ† ๊ทธ๋žจ ๊ธฐ๋ฐ˜ ๋ถ„ํ•  : ๋ฐ์ดํ„ฐ๋ฅผ ํžˆ์Šคํ† ๊ทธ๋žจ ๊ธฐ๋ฐ˜ ๋ถ„ํ• ์„ ์‚ฌ์šฉํ•˜์—ฌ, ์—ฐ์†ํ˜• ํŠน์„ฑ์„ ๋น ๋ฅด๊ฒŒ ์ด์‚ฐํ™”ํ•˜๊ณ  ์ด์‚ฐ์ ์ธ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„ํ• ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ํ•™์Šต๊ณผ ์˜ˆ์ธก์„ ๊ฐ€์†ํ™” ํ•จ
    • ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์  : ํžˆ์Šคํ† ๊ทธ๋žจ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์••์ถ•/์ €์žฅํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ๋Ÿ‰์ด ๋‚ฎ์•„์ง
    • ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ : ๋ฉ€ํ‹ฐ์ฝ”์–ด CPU์—์„œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅ
    • ๋น ๋ฅธ ํ•™์Šต ๋ฐ ์˜ˆ์ธก : ํšจ์œจ์ ์ธ ๋ถ„ํ•  ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ๋กœ ๊ธฐ์กด Gradient Boosting ๋ณด๋‹ค ๋น ๋ฅธ ํ•™์Šต๊ณผ ์˜ˆ์ธก ์ œ๊ณต
    • ๋ถ„๋ฅ˜ ๋ฐ ํšŒ๊ท€ ์ง€์›
    • ์Šค์ผ€์ผ ๋ถˆ๋ณ€์„ฑ : ํŠน์„ฑ ์Šค์ผ€์ผ์— ๋œ ๋ฏผ๊ฐํ•˜๋ฉฐ, ๋ฐ์ดํ„ฐ ์Šค์ผ€์ผ ์กฐ์ •์ด ํ•„์š”ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ
  • Gradient Boosting ๊ณผ ๋น„๊ตํ•˜์—ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ์€ ์œ ์‚ฌํ•˜๊ฑฐ๋‚˜ ๋” ์šฐ์ˆ˜ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ฉ”๋ชจ๋ฆฌ ๋ฐ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„ ์ธก๋ฉด์—์„œ ์ด์ ์ด ์žˆ์Œ!

 

Modeling code

skf = KFold(n_splits = 10, random_state = 42, shuffle = True)

for i, (train_ix, test_ix) in enumerate(skf.split(X, Y)):
    X_train, X_test = X.iloc[train_ix], X.iloc[test_ix]
    Y_train, Y_test = Y.iloc[train_ix], Y.iloc[test_ix]
    
    print(f'----------------------------------------------------------------')

    # histGradientBoosting

    hist_md = HistGradientBoostingRegressor(loss = 'absolute_error',
                                            l2_regularization = 0.01,
                                            early_stopping = False,
                                            learning_rate = 0.01,
                                            max_iter = 1000,
                                            max_depth = 15,
                                            max_bins = 255,
                                            min_samples_leaf = 70,
                                            max_leaf_nodes = 115)
    hist_md.fit(X_train, Y_train)

    hist_pred_1 = hist_md.predict(X_test[X_test['generated'] == 1])
    hist_pred_2 = hist_md.predict(test_baseline)
    hist_score_fold = mean_absolute_error(Y_test[X_test['generated'] == 1], hist_pred_1)
    hist_cv_scores.append(hist_score_fold)
    hist_preds.append(hist_pred_2)

    print('Fold', i, '==> HistGradient of MAE is ==>', hist_score_fold)

728x90