[머신러닝 with Python]STL 분해와 Z-Score를 활용한 시계열 이상탐지(구글 Colab 활용)

📌 STL + Z-Score란?

STL(Seasonal-Trend decomposition using Loess)은 시계열을 Trend + Seasonality + Residual로 나누는 기법입니다. 잔차(residual)에 대해 Z-score를 적용하면 통계적으로 이상치로 판단되는 구간을 쉽게 식별할 수 있습니다.

🗂 실습 개요

데이터: NYC 택시 탑승량 시계열 (2014)
주요 기법: STL 분해 + Z-score 기반 이상치 탐지
시각화: Matplotlib으로 이상구간 표시

1. 실습

1) 필요한 라이브러리 설치

먼저, 기본적으로 필요한 라이브러리 들은 아래와 같습니다.

pandas statsmodels matplotlib seaborn

코랩은 위 라이브러리들을 기본적으로 설치된 상태이기에 따로 설치하실 필요는 없지만, 설치를 하시게 된다면

!pip install pandas statsmodels matplotlib seaborn

를 입력해주시면 되겠습니다.

2) NYC 택시 시계열 데이터를 불러와줍니다.

import pandas as pd

url = 'https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv'

df = pd.read_csv(url, parse_dates=['timestamp'], index_col='timestamp')

df = df.resample('1H').sum()

df = df.fillna(method='ffill')

df.head()

3) 이후, STL 분해를 진행해주고 이를 시각화 해줍니다.

from statsmodels.tsa.seasonal import STL

import matplotlib.pyplot as plt

# Apply STL decomposition

stl = STL(df['value'], period=24*7)

res = stl.fit()

df['resid'] = res.resid

# Plot the original, trend, seasonal, and residual components

fig, axes = plt.subplots(4, 1, figsize=(15, 10), sharex=True)

axes[0].plot(df.index, df['value'], label='Original')

axes[0].set_title('Original Time Series')

axes[1].plot(df.index, res.trend, label='Trend', color='orange')

axes[1].set_title('Trend Component')

axes[2].plot(df.index, res.seasonal, label='Seasonal', color='green')

axes[2].set_title('Seasonal Component')

axes[3].plot(df.index, res.resid, label='Residual', color='red')

axes[3].set_title('Residual Component')

for ax in axes:

ax.legend()

ax.grid(True)

plt.tight_layout()

plt.show()

잘 분해가 된 것을 확인할 수 있습니다.

4) 이제 이것을 가지고 Z-score 기반 이상치를 탐지해보겠습니다.

z-score의 threhosld는 3으로, 3이상의 z-score가 나온 시계열에 대해서는 이상치로 판단한다는 의미입니다.

import numpy as np

threshold = 3.0

z_score = (df['resid'] - df['resid'].mean()) / df['resid'].std()

df['anomaly'] = np.abs(z_score) > threshold

import matplotlib.pyplot as plt

plt.figure(figsize=(14,5))

plt.plot(df.index, df['value'], label='Taxi Demand')

plt.scatter(df[df['anomaly']].index, df[df['anomaly']]['value'],

color='red', label='Anomaly', s=10)

plt.title('NYC Taxi Anomaly Detection (STL + Z-score)')

plt.legend()

plt.tight_layout()

plt.show()

결과에 Z-score가 3이상인 부분들이 빨간색 점으로 표시되고 이는 Anomaly, 즉 이상치를 의미합니다.

저작자표시 비영리 동일조건 (새창열림)

'머신러닝 with Python' 카테고리의 다른 글

[머신러닝 with Python] Ruptures 라이브러리를 활용한 시계열 변화점 탐 (0)	2025.06.07
[머신러닝 with Python] UMAP과 t-SNE 비교 (차원 축소 비교 및 구현) (0)	2025.03.11
[머신러닝 with Python] Darts 라이브러리로 SCHD 주가 예측모델 만들기 (0)	2025.03.08
[머신러닝 with Python] Prophet 모델로 SCHD 주가 분석하기 (1)	2025.03.07
[머신러닝 with Python] Prophet 모델 알아보기(시계열 예측) (0)	2025.03.01

Innov_AI_te

[머신러닝 with Python]STL 분해와 Z-Score를 활용한 시계열 이상탐지(구글 Colab 활용)

📌 STL + Z-Score란?

🗂 실습 개요

1. 실습

'머신러닝 with Python' 카테고리의 다른 글

댓글

티스토리툴바

[머신러닝 with Python]STL 분해와 Z-Score를 활용한 시계열 이상탐지(구글 Colab 활용)

📌 STL + Z-Score란?

🗂 실습 개요

1. 실습

'머신러닝 with Python' 카테고리의 다른 글

관련글

댓글

티스토리툴바