I spoke about time series data on day 10, and I worked with the datasets package to look at a dataset on Johnson and Johnson quarterly stock earnings over a 20 year period. We then used detrend() to remove the major trend from the dataset. One way to analyze time series data is to look at how each data point relates to subsequent data points. This kind of analysis is called auto-correlation, and the function used in R to analyze this is acf(). ACF() plots can be difficult to interpret, so lets look at our example:

This is an ACF plot of the Johnson and Johnson dataset. Because of the increasing (almost log-linear) nature of the dataset, each datapoint is significantly related to subsequent datapoints. ACF plots show how each datapoint relates to subsequent datapoints at different “lags”, or time intervals “after present”, so to speak. The blue dotted lines are the significant relatedness of the lags to the current time point. Because of assumptions in how the ACF is calculated, R will only show a certain number of lags. However, you can adjust R so that it shows more lags, although this might be violating some assumptions.

Now lets look at the detrended version of the log of the Johnson and Johnson dataset

As I mentioned before, transforming data makes interpretation of the data not-straightforward. In addition, Analyses that use detrend() tend to use the detrend() as one of their degrees of freedom, which limits how much they can analyze their data in the future. However, one initial way to interpret this data is that there is a repeating trend every 4th lag. This data is quarterly, so this may reflect some internal process within Johnson and Johnson wherein they have higher earnings every 4th quarter.

Lets look at another dataset; this time we’ll look at monthly average temperatures in Nottingham datasets(nottem):

The seasonal variation in temperatures is fairly obvious from the data. Now lets look at an ACF of the nottem data. (NOTE: there was no major visible trend in the nottem dataset that needed to be accounted for, and detrend() takes up one degree of freedom from how you analyze your data, so I did not detrend the nottem dataset.)

This ACF is, in the words of Jim Carey, B-E-A-UTIFUL! There is a strong negatively significant trend lags 4-8, peaking at 6, and a strong positive trend in lags 10-14, peaking at lag 12. These correspond very nicely with 6 months and 12 months, so that weather data 6 months apart is significantly different and weather data 12 months apart is significantly similar.

## One comment