λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°
SQL/SQL Grammar

[BigQuery] 데이터 νƒ€μž…λ³„ ν•¨μˆ˜ 정리 (λ¬Έμžμ—΄κ³Ό μ‹œκ°„/λ‚ μ§œ)

by ISLA! 2024. 4. 1.

 

 

🎯 데이터λ₯Ό κ·ΈλŒ€λ‘œ κ°€μ Έμ˜€μ§€ μ•Šκ³  λ³€ν™˜μ΄ ν•„μš”ν•œ κ²½μš°μ— μ“Έ 수 μžˆλŠ” ν•¨μˆ˜λ₯Ό μ•Œμ•„λ³΄μž

  • SELECT λ¬Έμ—μ„œ 데이터λ₯Ό λ³€ν™˜μ‹œν‚¬ 수 있으며, WHERE의 μ‘°κ±΄λ¬Έμ—μ„œλ„ μ‚¬μš© κ°€λŠ₯
  • λ°μ΄ν„°μ˜ νƒ€μž…μ— 따라 λ‹€μ–‘ν•œ ν•¨μˆ˜κ°€ 쑴재

 

1. 자료 νƒ€μž…μ„ λ³€ν˜•ν•˜λŠ” ν•¨μˆ˜ : CAST

SELECT
 CAST(1 AS STRING) # 숫자 1을 문자 1둜 λ³€κ²½

 

 

πŸ‘‰ 더 μ•ˆμ „ν•˜κ²Œ 데이터 νƒ€μž… λ³€κ²½ν•˜κΈ° : SAFE_CAST() 

→ λ³€ν™˜μ΄ μ‹€νŒ¨ν•  경우 NULL을 λ°˜ν™˜ν•˜λ©°, λ‹€μŒκ³Ό 같은 κ²½μš°μ— μ—λŸ¬ λŒ€μ‹   NULL이 됨

SELECT
 SAFE_CAST("λ¬Έμžμ—΄" AS INT64)

 

 

(μœ μ‚¬ 사둀) μ•ˆμ „ν•˜κ²Œ λ‚˜λˆ„κΈ° μ—°μ‚°ν•˜κΈ° : SAFE_DIVIDE()

SAFE_DIVIDE(x, y) # zero error λŒ€μ‹  null λ°œμƒ

 


2. λ¬Έμžμ—΄μ„ λ‹€λ£¨λŠ” ν•¨μˆ˜

(1) CONCAT : λ¬Έμžμ—΄ 뢙이기

CONCAT(컬럼1, 컬럼2, ...)

SELECT
  CONCAT('μ•ˆλ…•', 'ν•˜μ„Έμš”', '!!') AS result;

 

 

(2) SPLIT : λ¬Έμžμ—΄ λΆ„λ¦¬ν•˜κΈ°

SPLIT(λ¬Έμžμ—΄ 원본, λ‚˜λˆŒ 기쀀이 λ˜λŠ” 문자)

SELECT
 SPLIT('κ°€, λ‚˜, λ‹€, 라', ", ") AS result;

 

(3) REPLACE : νŠΉμ • 단어 μˆ˜μ •ν•˜κΈ°

REPLACE(λ¬Έμžμ—΄ 원본, 찾을 단어, λ°”κΏ€ 단어)

SELECT
 REPLACE('μ•ˆλ…•ν•˜μ„Έμš”', 'μ•ˆλ…•', 'μ‹€μ²œ') AS result;

 

(3) TRIM : λ¬Έμžμ—΄ 자λ₯΄κΈ°

TRIM(λ¬Έμžμ—΄ 원본, 자λ₯Ό 단어)

SELECT
 TRIM("μ•ˆλ…•ν•˜μ„Έμš”", "ν•˜μ„Έμš”") AS trim_example

 

 

(4) UPPER : μ˜μ–΄ λŒ€λ¬Έμž λ³€ν™˜ν•˜κΈ°

UPPER(λ¬Έμžμ—΄ 원본)

SELECT
 UPPER("ab") AS upper_example


3. λ‚ μ§œ 및 μ‹œκ°„ 데이터 (κΈ°λ³Έ)

πŸ“œ μ‹œκ°„ κ΄€λ ¨ 데이터 νƒ€μž…μ˜ μ’…λ₯˜

  • DATE : DATE만 ν‘œμ‹œν•˜λŠ” 데이터 → 예: 2024-03-01
  • DATETIME : DATE와 TIMEκΉŒμ§€ ν‘œμ‹œν•˜λŠ” 데이터(Timezone 정보 μ—†μŒ) → 예: 2024-03-03 14:00:00
  • TIME : λ‚ μ§œμ™€ λ¬΄κ΄€ν•˜κ²Œ μ‹œκ°„λ§Œ ν‘œμ‹œν•˜λŠ” 데이터 → 예: 23:59:59.00

 

πŸ“œ νƒ€μž„μ‘΄ κ°œλ… 짚고 κ°€κΈ°

  • UTC(Universal Time Coordinated : ν•œκ΅­μ‹œκ°„ = UTC + 9)
    • ꡭ제적인 ν‘œμ€€ μ‹œκ°„μœΌλ‘œ, ν˜‘μ • μ„Έκ³„μ‹œ
    • νƒ€μž„μ‘΄μ΄ μ‘΄μž¬ν•¨ → νŠΉμ • μ§€μ—­μ˜ ν‘œμ€€ μ‹œκ°„λŒ€
  • TIMESTAMP
    • UTCλΆ€ν„° κ²½κ³Όν•œ μ‹œκ°„μ„ λ‚˜νƒ€λ‚΄λŠ” κ°’
    • νƒ€μž„μ‘΄ 정보 있음
    • ν‘œμ‹œ 예: 2024-03-03 14:00:00 UTC

 

πŸ“œ millisecond & microsecond μ•Œκ³  κ°€κΈ°

  • millisecond(ms) : 천 λΆ„μ˜ 1초(1,000 ms = 1 sec)
    • λΉ λ₯Έ λ°˜μ‘μ΄ ν•„μš”ν•œ λΆ„μ•Όμ—μ„œ μ‚¬μš©(μ΄ˆλ³΄λ‹€ 더 μ •ν™•ν•˜κ²Œ ⏰)
    • Millisecond TIMESTAMP   DATETIME으둜 λ³€κ²½ν•΄μ„œ μ‚¬μš©ν•˜λŠ” 경우 많음
  • microsecond(μs) : 1 / 1,000 ms
  • μ•„λž˜ μ˜ˆμ‹œμ™€ 같이 μ‹œκ°„ νƒ€μž…μ„ μ „ν™˜ν•˜λ©°, DATETIME으둜 μ „ν™˜ μ‹œ 'νƒ€μž„μ‘΄' λͺ…μ‹œν•΄μ•Ό 함!
  • λ§Žμ€ νšŒμ‚¬λ“€μ˜ Table에 μ‹œκ°„μ΄ Timestamp둜 μ €μž₯된 κ²½μš°κ°€ 많음(λ˜λŠ” Datetime) : λ”°λΌμ„œ λ³€ν™˜μ΄ ν•„μš”ν•œ 경우 많음
SELECT
 TIMESTAMP_MILLIS(1704176819711) AS milli_to_timestamp_value,
 TIMESTAMP_MICROS(1704176819711000) AS micro_to_timestamp_value,
 DATETIME(TIMESTAMP_MICROS(1704176819711000), 'Asia/Seoul') AS datetime_value

 

 

πŸ“œ 정리 : TIMESTAMP vs. DATETIME

  TIMESTAMP DATETIME
νƒ€μž„μ‘΄ UTC라고 λ‚˜μ˜΄ Tκ°€ λ‚˜μ˜΄(Time 의미)
μ‹œκ°„ 차이 ν•œκ΅­ μ‹œκ°„ -9μ‹œκ°„ ν•œκ΅­ Zone μ‚¬μš©μ‹œ ν•œκ΅­ μ‹œκ°„κ³Ό 동일

 


 

3. Datetime 가곡 ν•¨μˆ˜

(1) CURRENT_DATETIME : ν˜„μž¬ Datetime 좜λ ₯

 

CURRENT_DATETIME([time_zone])

CURRENT_DATE('Asia/Seoul')
CURRENT_DATETIME('Asia/Seoul')

 

(2) EXTRACT : Datetimeμ—μ„œ νŠΉμ • λΆ€λΆ„λ§Œ μΆ”μΆœ

 

EXTRACT(part FROM [col])

EXTRACT(DATE FROM [datetime_col])
EXTRACT(YEAR FROM [datetime_col])
EXTRACT(MONTH FROM [datetime_col])
EXTRACT(DAY FROM [datetime_col])
EXTRACT(HOUR FROM [datetime_col])
EXTRACT(MINUTE FROM [datetime_col])

 

πŸ‘‰ μš”μΌ μΆ”μΆœ : EXTRACT(DAYOFWEEK FROM [datetime_col])

→ ν•œ 주의 첫 날인 μΌμš”μΌλΆ€ν„° 1둜 λ°˜ν™˜λ¨(ν† μš”μΌμ΄ λ§ˆμ§€λ§‰ 7) 

 

 

(3) DATETIME_TRUNC : Datetime μ‹œκ°„ 자λ₯΄κΈ°

 

 DATETIME_TRUNC([datetime_col], [HOUR:자λ₯Ό λΆ€λΆ„ λͺ…μ‹œ])

DATETIME_TRUNC([datetime_col], DAY)
DATETIME_TRUNC([datetime_col], YEAR)
DATETIME_TRUNC([datetime_col], MONTH)
DATETIME_TRUNC([datetime_col], HOUR)

datetime_trunc μ‹€ν–‰ μ˜ˆμ‹œ

 

 

(4) PARSE_DATETIME : λ¬Έμžμ—΄λ‘œ μ €μž₯된 Datetime을 Datetime νƒ€μž…μœΌλ‘œ λ°”κΎΈκΈ°

 

 PARSE_DATETIME('λ¬Έμžμ—΄μ˜ ν˜•νƒœ', 'DATETIME λ¬Έμžμ—΄')

→ Format Elements λ¬Έμ„œλ₯Ό μ°Έκ³ ν•˜μ—¬ λ¬Έμžμ—΄ ν˜•νƒœμ˜ 의미λ₯Ό 체크(ν•„μš”ν•  λ•Œ 확인) : https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_elements_date_time

PARSE_DATETIME('%Y-%m-%d %H:%M:%S', '2024-01-11 12:35:35') AS parse_datetime;

 

 

 

(5) FORMAT_DATETIME : Datetime νƒ€μž… 데이터λ₯Ό νŠΉμ • ν˜•νƒœμ˜ λ¬Έμžμ—΄ λ°μ΄ν„°λ‘œ λ³€ν™˜ν•˜κΈ°

 

 FORMAT_DATETIME('λ¬Έμžμ—΄μ˜ ν˜•νƒœ', 'DATETIME λ¬Έμžμ—΄')

SELECT
 FORMAT_DATETIME("%c", DATETIME "2024-01-11 12:35:35") AS formatted;

 

(6) LAST_DAY : μžλ™μœΌλ‘œ μ›”μ˜ λ§ˆμ§€λ§‰ 값을 κ³„μ‚°ν•΄μ„œ νŠΉμ • 연산을 ν•  경우

 

 LAST_DAY([DATETIME], λ§ˆμ§€λ§‰λ‚  κΈ°μ€€)

→ μ•„λž˜ μ˜ˆμ‹œλ₯Ό 보고 차이λ₯Ό 확인

SELECT
    LAST_DAY(DATETIME '2024-01-03 15:30:00') AS last_day,
    LAST_DAY(DATETIME '2024-01-03 15:30:00', MONTH) AS last_day_month,
    LAST_DAY(DATETIME '2024-01-03 15:30:00', WEEK) AS last_day_week,
    LAST_DAY(DATETIME '2024-01-03 15:30:00', WEEK(SUNDAY)) AS last_day_week_sun,
    LAST_DAY(DATETIME '2024-01-03 15:30:00', WEEK(MONDAY)) AS last_day_week_mon

 

 

(7) DATETIME_DIFF : μžλ™μœΌλ‘œ μ›”μ˜ λ§ˆμ§€λ§‰ 값을 κ³„μ‚°ν•΄μ„œ νŠΉμ • 연산을 ν•  경우

 DATETIME_DIFF([첫 DATETIME], [λ‹€μŒ DATETIME], 차이 ν™˜μ‚° κΈ°μ€€)

→ μ•„λž˜ μ˜ˆμ‹œλ₯Ό 보고 차이λ₯Ό 확인

SELECT
    DATETIME_DIFF(first_datetime, second_datetime, DAY) AS day_diff1,
    DATETIME_DIFF(second_datetime, first_datetime, DAY) AS day_diff2,
    DATETIME_DIFF(first_datetime, second_datetime, MONTH) AS month_diff,
    DATETIME_DIFF(first_datetime, second_datetime, WEEK) AS week_diff,
FROM ( SELECT
    DATETIME "2024-04-02 10:20:00" AS first_datetime,
    DATETIME "2021-01-01 15:30:00" AS second_datetime, )

728x90