In this tutorial, two unsupervised clustering algorithms from the
`funtimes`

package are used to identify clusters of
Australia’s sea level time series.

First, load the essential libraries for the analysis:

```
library(funtimes)
library(ggplot2)
library(gridExtra)
library(readxl)
library(reshape2)
```

The daily sea level data are available from 1993 to 2012 for 17
locations. The data are obtained from Maharaj,
D’Urso, and Caido (2019) using the following link http://www.tsclustering.homepage.pt/index.php?p=3.
Download `Application7_3.zip`

folder, where the
`Aus Sea Levels 17.xlsx`

file contains the sea level records.
Annual average is taken to convert the temporal resolution.

```
<- readxl::read_xlsx("Aus_Sea_Levels_17.xlsx", skip = 1, n_max = 7300)
d_org # yearly average
<- data.frame(aggregate(d_org[, 4:20], list(d_org$Year),
d FUN = 'mean', na.rm = TRUE)[, -1],
row.names = unique(d_org$Year))
```

Below is the plot of annual time series of sea level for 17 locations:

```
<- reshape2::melt(t(d))
dlong names(dlong)[1:2] <- c("Location", "Year")
ggplot(dlong) + geom_line(aes(x = Year, y = value, color = Location), size = 1) +
ylab('Sea level (m)') +
theme_bw()
# Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
# ℹ Please use `linewidth` instead.
# This warning is displayed once every 8 hours.
# Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
# generated.
```

This plot demonstrates the variation in the sea levels across the locations. It can be seen that not all the time series are having a common trend since 1993. Grouping the locations with a common trend could benefit Australian government to assess and implement climate adaptation strategies for the impact of sea level rise on clustered locations.

The first function from the package to test is the
`sync_cluster`

that groups the time series with the common
linear trend. The window parameter `w`

is set here for number
of slides in each window. If the number of years are not enough in the
time series, this parameter is required to be set.

```
set.seed(123)
<- sync_cluster(d ~ t, Window = 3, B = 100)
Clus_sync # [1] "Cluster labels:"
# [1] 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0
# [1] "Number of single-element clusters (labeled with '0'): 4"
Clus_sync# $cluster
# [1] 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0
#
# $elements
# $elements$`Time series that each formed a separate cluster`
# [1] "Broome" "Cape.Fergusen" "Carnarvon" "Wyndham"
#
# $elements$`1`
# [1] "Booby.Island" "Brisbane" "Burnie" "Cocos.Island" "Darwin"
# [6] "Esperance" "Freemantle" "Hillarys" "Portland" "Spring.Bay"
# [11] "Sydney" "Thevenard" "Townsville"
#
#
# $estimate
# $estimate[[1]]
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -1.230297 0.2652816 -4.637704 2.045787e-04
# t 2.343424 0.4429056 5.291023 4.966844e-05
#
#
# $pval
# $pval[[1]]
# [1] 0.06
#
#
# $statistic
# $statistic[[1]]
# Test statistic
# 0.2646767
#
#
# $ar_order
# $ar_order[[1]]
# 1 2 4 7 8 9 10 11 12 13 14 15 16
# ar.order 0 12 11 0 12 0 0 0 0 0 0 11 1
#
#
# $window_used
# $window_used[[1]]
# 1 2 4 7 8 9 10 11 12 13 14 15 16
# Window 3 3 3 3 3 3 3 3 3 3 3 3 3
#
#
# $all_considered_windows
# $all_considered_windows[[1]]
# Window Statistic p-value Asympt. p-value
# 3 0.2646767 0.06 0.4608282
#
#
# $WAVK_obs
# $WAVK_obs[[1]]
# [1] 0.212437377 -0.009172104 -0.011933680 -0.158881835 -0.017872773
# [6] -0.017770762 0.199031778 0.060299520 -0.081576400 -0.100934598
# [11] 0.099766572 -0.007834631 0.099118234
```

Total 13 locations are clustered with a common linear trend, while the remaining 4 are not tied to any other location and form so-called noise cluster.

Below is the plot of the clustered time series of sea level, where
`Cluster 0`

indicates the noise cluster without any common
linear trend, while `Cluster 1`

shows the time series of
locations with a common linear trend:

```
for (i in 0:max(Clus_sync$cluster)) {
assign(paste('py', i, sep = ''),
ggplot(melt(t(d[, Clus_sync$cluster == i]))) +
geom_line(aes(x = Var2,y = value,color = Var1),size = 1) +
ylab('Sea level (m)') + xlab('Year') +
theme_bw() + ggtitle(paste('Cluster',i)) +
theme(axis.text = element_text(size = 13), axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15), legend.text = element_text(size = 10),
legend.title = element_blank(), legend.key.size = unit(0.3, "cm")))
}grid.arrange(py0, py1)
```

The `BICC`

function applies an unsupervised spatiotemporal
clustering algorithm, TRUST, from Ciampi, Appice,
and Malerba (2010). The algorithm has a few tuning parameters,
and the `BICC`

function automatically selects two of those
(`Delta`

and `Epsilon`

; for manual setting of all
the parameters, use the lower-level functions `CSlideCluster`

and `CWindowCluster`

). First, the time series are clustered
within small slides; the length of the slides is defined with the
parameter `p`

(i.e., number of time-series observations in
each slide). Then, slides are aggregated into windows (each window
contains `w`

consecutive slides), and slide-level cluster
assignments are used to cluster the time series at the window level.
When defining the windows, the user can also set the step
`s`

, which is the number of steps used to shift the window
(if `s = w`

, the windows do not overlap).

```
<- BICC(as.matrix(d), p = 5, w = 4, s = 4)
Clus_BICC
Clus_BICC# $delta.opt
# [1] 1.405918
#
# $epsilon.opt
# [1] 0.25
#
# $clusters
# Booby.Island Brisbane Broome Burnie Cape.Fergusen Carnarvon
# Window_1 1 1 1 1 1 1
# Cocos.Island Darwin Esperance Freemantle Hillarys Portland Spring.Bay
# Window_1 1 1 1 1 1 1 1
# Sydney Thevenard Townsville Wyndham
# Window_1 1 1 1 1
#
# $IC
# [,1] [,2] [,3] [,4]
# [1,] 244.1827 244.1827 244.1827 244.1827
# [2,] 244.1827 244.1827 244.1827 244.1827
# [3,] 244.1827 244.1827 244.1827 244.1827
# [4,] 244.1827 244.1827 244.1827 244.1827
# [5,] 244.1827 244.1827 244.1827 244.1827
#
# $delta.all
# [1] 1.405918 8.435507 15.465096 22.494685 29.524274
#
# $epsilon.all
# [1] 0.25 0.50 0.75 1.00
```

The algorithm detected only one cluster.

This vignette belongs to R package `funtimes`

. If you wish
to cite this page, please cite the package:

```
citation("funtimes")
#
# To cite package 'funtimes' in publications use:
#
# Lyubchich V, Gel Y, Vishwakarma S (2023). _funtimes: Functions for
# Time Series Analysis_. R package version 9.1.
#
# A BibTeX entry for LaTeX users is
#
# @Manual{,
# title = {funtimes: Functions for Time Series Analysis},
# author = {Vyacheslav Lyubchich and Yulia R. Gel and Srishti Vishwakarma},
# year = {2023},
# note = {R package version 9.1},
# }
```

Ciampi, A., A. Appice, and D. Malerba. 2010. “Discovering
Trend-Based Clusters in Spatially Distributed Data Streams.” In
*International Workshop of Mining Ubiquitous and Social
Environments*, 107–22. Barcelona, Spain.

Maharaj, E. A., P. D’Urso, and J. Caido. 2019. *Time Series
Clustering and Classification*. 1st ed. Computer Science and Data
Analysis. Australia: CRC Press. https://doi.org/10.1201/9780429058264.