#title Time Series [[TableOfContents]] * https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_ts.html ==== 주요용어 ==== 시계열자료는 일반적으로 독립이 아니며, 시간순서에 따라 분석을 해야 한다. 시계열은 4개의 성분으로 분해되는데, 장기적 변화인 추세(Trend: T), 계절변동(Seasonal fluctuation: S), 비계절적 변동을 의미하는 순환(Cyclic: C), 그리고 불규칙적 오차(Random error: R)가 그 성분들이 된다. --[http://terms.naver.com/entry.nhn?docId=1924236&cid=531&categoryId=531 네이버 교육용어사전] * trend * seasonal fluctuation * cycles * white noise ==== 시차상관(선행, 후행) ==== {{{ > tmp curr_val prev_val 5 3949515 4316313 6 3835690 4149917 7 3975137 4165120 8 3993659 4050347 9 3942373 3949515 10 3825807 3835690 11 3920338 3975137 12 3586568 3993659 13 3633877 3942373 14 3444306 3825807 15 3903889 3920338 16 3529384 3586568 17 3531147 3633877 18 3518490 3444306 19 3481886 3903889 20 3298512 3529384 21 3400325 3531147 22 3641914 3518490 23 3466527 3481886 24 6028886 3298512 25 6543880 3400325 26 5516709 3641914 27 5100715 3466527 28 5259752 6028886 29 5370662 6543880 30 5009019 5516709 31 4778911 5100715 32 4836214 5259752 33 4662136 5370662 }}} {{{ tmp <- na.omit(tmp) ccf(tmp$curr_val, tmp$prev_val) }}} 첫 번째 매개변수를 기준으로 얼마나 선행/후행 하는지 알 수 있다. 가장 볼록하게 나온 부분이 x축에 -4 이니까 tmp$curr_val가 대략 4일정도 빠르다. attachment:TimeSeries/ts01.png 첫 번째 매개변수를 기준으로 얼마나 선행하는지 알 수 있다. lag.plot2(attachment:TimeSeries/itall.R) 함수를 이용해서 차트를 그려보자. {{{ source("http://databaser.net/moniwiki/pds/TimeSeries/itall.R") lag.plot2(tmp$curr_val , tmp$prev_val, 11) }}} attachment:TimeSeries/ts02.png 표준화해서 라인 차트를 그려보자. {{{ plot(scale(tmp$curr_val), type="l", col="red") lines(scale(tmp$prev_val), col="blue") }}} attachment:TimeSeries/ts03.png 최대/최소 교차상관 및 시차 ([http://blog.naver.com/lofie21?Redirect=Log&logNo=110171809189 출처]) {{{ max_ccf <- function(...) { d <- ccf(...) cor <- d$acf[,,1] lag <- d$lag[,,1] dd <- data.frame(cor, lag) max_val <- dd[which.max(dd$cor),] min_val <- dd[which.min(dd$cor),] result <- c() if(abs(max_val[1]) >= abs(min_val[1])){ result <- max_val } else { result <- min_val } return(result) } max_ccf <- function(...) { d <- ccf(...) cor <- d$acf[,,1] lag <- d$lag[,,1] df <- data.frame(cor, lag) max_val <- df[which.max(df$cor),] return(max_val) } min_ccf <- function(...) { d <- ccf(...) cor <- d$acf[,,1] lag <- d$lag[,,1] df <- data.frame(cor, lag) min_ccf <- df[which.min(df$cor),] return(min_ccf) } }}} 예를 들어, 다음과 같이 결과가 나왔다면.. {{{ > max_ccf(sales, val) cor lag 12 0.8932095 -1 }}} * 최대 상관 계수는 0.89 * 시간축 상에서 sales가 val보다 -1일 뒤져 있음을 뜻함. ==== plotForecastErrors ==== {{{ plotForecastErrors <- function(forecasterrors) { # make a histogram of the forecast errors: mybinsize <- IQR(forecasterrors)/4 mysd <- sd(forecasterrors) mymin <- min(forecasterrors) - mysd*5 mymax <- max(forecasterrors) + mysd*3 # generate normally distributed data with mean 0 and standard deviation mysd mynorm <- rnorm(10000, mean=0, sd=mysd) mymin2 <- min(mynorm) mymax2 <- max(mynorm) if (mymin2 < mymin) { mymin <- mymin2 } if (mymax2 > mymax) { mymax <- mymax2 } # make a red histogram of the forecast errors, with the normally distributed data overlaid: mybins <- seq(mymin, mymax, mybinsize) hist(forecasterrors, col="red", freq=FALSE, breaks=mybins) # freq=FALSE ensures the area under the histogram = 1 # generate normally distributed data with mean 0 and standard deviation mysd myhist <- hist(mynorm, plot=FALSE, breaks=mybins) # plot the normal curve as a blue line on top of the histogram of forecast errors: points(myhist$mids, myhist$density, type="l", col="blue", lwd=2) } p10 <- forecast(m, 10) plotForecastErrors(p10 $residuals) }}} attachment:TimeSeries/forecast_erros.png ==== 지수평활법: Holt-Winters filtering ==== 계절적요인과 불규칙요인을 제거 {{{ m1<- HoltWinters(ts, gamma=FALSE) plot(m1) }}} attachment:TimeSeries/HoltWinters.png {{{ plot(forecast.HoltWinters(m1, h=10)) }}} ==== 이상점 탐지 ==== {{{ library("TSA") detectAO(m) #가법적 이상점(AO;additive outlier)을 탐지 - T번째 관측값 수준 관점에의 이상치 detectIO(m) #혁신적 이상점(IO;innovative outlier)을 탐지 - 모든값 관점의 이상치 }}} ==== ARIMA ==== {{{ library("zoo") ts <- zoo(long_term$dau, as.Date(long_term$dt)) plot(ts) fit <- auto.arima(ts) plot(forecast(fit, h=20)) arima(ts, order=c(2,1,2)) m <- arima(ts, order=c(2,1,2), fixed=c(0,NA,0,NA)) confint(m) tsdiag(m) predict(m, n.ahead=10) }}} {{{ library("lmtest") coeftest(fit) # ARIMA p-value test }}} 월단위? {{{ t <- ts(training$amt, frequency=30) fit <- auto.arima(t,D=1) pred <- forecast(fit,h=30) plot(pred) grid() }}} * http://www.r-bloggers.com/simple-and-advanced-time-series-with-oracle-r-enterprise/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29&utm_content=Yahoo%21+Mail * http://stat.epfl.ch/files/content/sites/stat/files/shared/Applied%20Statistics/notes10.pdf * http://stats.stackexchange.com/questions/41070/how-to-setup-xreg-argument-in-auto-arima-in-r --> xreg * http://robjhyndman.com/ * http://stackoverflow.com/questions/20210504/r-times-series-arima-model-forecasting-daily-data * https://www.otexts.org/fpp/8/9 * http://stackoverflow.com/questions/10302261/forecasting-time-series-data ==== AR ==== {{{ library("MASS") ar.result <- ar(ts) predict(ar.result, n.ahead=9) }}} ==== 평활 ==== {{{ library("forecast") data(gold) g <- data.frame(g = na.omit(coredata(gold)))$g y_width <- c(min(g) - min(g) * 0.05, max(g) - max(g) * 0.05) plot(g, type="l", lty="dotted", ylim=y_width) par(new=T) plot(smooth(g, twiceit=T), type="l", col="blue", ylim=y_width, ylab="") }}} 이동중간값 함수 : runmde() attachment:TimeSeries/loess.gif --그림 출처: http://simplystatistics.org/2014/02/13/loess-explained-in-a-gif/ ==== 기타 ==== 7일 {{{ ts_df <- ts(mydata$y, freq=365.25/7, start=decimal_date(ymd(min(mydata$date_key)))) }}} ==== VAR ==== * http://christophj.github.io/replicating/r/vector-autoregression-var-in-r/ * attachment:TimeSeries/varModels.pdf * attachment:TimeSeries/벡터자기회귀_VAR_모형의_이해PDF.pdf ==== auto.arima ==== {{{ library(lubridate) library("forecast") ts_df <- ts(mydata$y, freq=7) #ts_df <- ts(mydata$y, freq=365.25/7, start=decimal_date(ymd(min(mydata$dt)))) d.arima <- auto.arima(ts_df) d.forecast <- data.frame(forecast(d.arima, level = c(99), h = 1)) }}} ==== 참고자료 ==== * http://www.itc.nl/~rossiter/teach/R/R_ts.pdf * http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html * http://maths-people.anu.edu.au/~johnm/courses/r/ASC2008/pdf/Rtimeseries-ohp.pdf * [http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm Time Series Analysis and Its Applications: With R Examples] * https://onlinecourses.science.psu.edu/stat510/?q=node/41 * attachment:TimeSeries/itall.R * [http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf Time Series Analysis with R - Part I] * [http://www.statmethods.net/advstats/timeseries.html Quick-R: Time Series]