Getting started with read.gt3x

Tuomo Nieminen

2022-06-30

This document describes how the read.gt3x package can be used to read binary activity data into R. To access the read.gt3x package, use:

library(read.gt3x)

For source code and installation instructions, see the GitHub page.

Reading .gt3x data into R

The read.gt3x package includes two sample .gt3x files which I’ll use to demonstrate reading the data. First we need the path to a single gt3x file. We will use data embedded in the package:

gt3xfile <-
  system.file(
    "extdata", "TAS1H30182785_2019-09-17.gt3x",
    package = "read.gt3x")

but longer and more extensive data can be downloaded via gt3x_datapath:

gt3xfile <- gt3x_datapath(1)

Method 1: Temporary unzip and read

The read.gt3x() function can take as input a path to a single .gt3x file and will then read activity samples as an R matrix.

X <- read.gt3x(gt3xfile)
head(X)
#> Sampling Rate: 100Hz
#> Firmware Version: 1.7.2
#> Serial Number Prefix: TAS
#>          X      Y     Z
#> [1,] 0.000  0.008 0.996
#> [2,] 0.016  0.000 1.008
#> [3,] 0.020 -0.008 1.004
#> [4,] 0.016 -0.012 1.012
#> [5,] 0.016 -0.008 1.008
#> [6,] 0.008 -0.008 1.008
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  602587 32.2    1368591 73.1   985802 52.7
#> Vcells 1139248  8.7    8388608 64.0  2735659 20.9
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  602586 32.2    1368591 73.1   985802 52.7
#> Vcells 1139304  8.7    8388608 64.0  2735659 20.9

Method 2: permanent unzip, then read

.gt3x files are actually zip archives which contain two files: log.bin and info.txt. log.bin is a binary file that contains the actual samples. It might make sense to store the data as unzipped folders containing these two files, because otherwise the read.gt3x() function will have to unzip each .gt3x archive to a temporary location, every time you need to access the data.

read.gt3x() also accepts paths to unzipped gt3x folders. To demonstrate the usage, we’ll unzip the sample .gt3x files in the package, and then read them. The unzip.gt3x() helper function unzips all .gt3x files in a given directory. By default, the contents of a .gt3x file named “subject001.gt3x” are extracted to a folder named “subject001”. unzip.gt3x() returns a vector of paths to the unzipped gt3x folders. The location argument can be used to choose where to locate those folders.

datadir <- dirname(gt3xfile) # location of .gt3x files
gt3xfolders <- unzip.gt3x(datadir, location = tempdir())
#> Unzipping gt3x data to /tmp/Rtmp4tnfEu
#> 1/1
#> Unzipping /tmp/Rtmp0hKxug/Rinst13d359d09861/read.gt3x/extdata/TAS1H30182785_2019-09-17.gt3x
#>  === info.txt, log.bin extracted to /tmp/Rtmp4tnfEu/TAS1H30182785_2019-09-17

The read.gt3x() function accepts a path to an unzipped gt3x folder. It is a bit faster if the unzip step has already been performed.

gt3xfolder <- gt3xfolders[1]
X <- read.gt3x(gt3xfolder)
head(X)
#> Sampling Rate: 100Hz
#> Firmware Version: 1.7.2
#> Serial Number Prefix: TAS
#>          X      Y     Z
#> [1,] 0.000  0.008 0.996
#> [2,] 0.016  0.000 1.008
#> [3,] 0.020 -0.008 1.004
#> [4,] 0.016 -0.012 1.012
#> [5,] 0.016 -0.008 1.008
#> [6,] 0.008 -0.008 1.008

Activity data matrix

Internally, the data matrix returned by read.gt3x() is a bit smarter than it looks, as it knows all the (relative) timestamps of the observations.

str(X)
#>  'activity' num [1:33000, 1:3] 0 0.016 0.02 0.016 0.016 0.008 0.016 0.02 0.016 0.012 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:3] "X" "Y" "Z"
#>  - attr(*, "time_index")= num [1:33000] 0 1 2 3 4 5 6 7 8 9 ...
#>  - attr(*, "missingness")='data.frame':  10 obs. of  2 variables:
#>   ..$ time     : POSIXct[1:10], format: "2019-09-17 18:40:10" "2019-09-17 18:44:21" ...
#>   ..$ n_missing: int [1:10] 400 10500 55400 112600 3300 100 100 500 100 24500
#>  - attr(*, "total_records")= int 33000
#>  - attr(*, "start_time_param")= num 1.57e+09
#>  - attr(*, "features")= chr "sleep mode"
#>  - attr(*, "start_time_info")= num 1.57e+09
#>  - attr(*, "sample_rate")= int 100
#>  - attr(*, "impute_zeroes")= logi FALSE
#>  - attr(*, "add_light")= logi FALSE
#>  - attr(*, "start_time")= POSIXct[1:1], format: "2019-09-17 18:40:00"
#>  - attr(*, "stop_time")= POSIXct[1:1], format: "2019-09-18 19:00:00"
#>  - attr(*, "last_sample_time")= POSIXct[1:1], format: "2019-09-17 19:20:05"
#>  - attr(*, "subject_name")= chr "suffix_85"
#>  - attr(*, "time_zone")= chr "-04:00:00"
#>  - attr(*, "firmware")= chr "1.7.2"
#>  - attr(*, "serial_prefix")= chr "TAS"
#>  - attr(*, "acceleration_min")= chr "-8.0"
#>  - attr(*, "acceleration_max")= chr "8.0"
#>  - attr(*, "bad_samples")= logi FALSE
#>  - attr(*, "old_version")= logi FALSE
#>  - attr(*, "header")=List of 17
#>   ..$ Serial Number     : chr "TAS1H30182785"
#>   ..$ Device Type       : chr "Link"
#>   ..$ Firmware          : chr "1.7.2"
#>   ..$ Battery Voltage   : chr "4.18"
#>   ..$ Sample Rate       : num 100
#>   ..$ Start Date        : POSIXct[1:1], format: "2019-09-17 18:40:00"
#>   ..$ Stop Date         : POSIXct[1:1], format: "2019-09-18 19:00:00"
#>   ..$ Last Sample Time  : POSIXct[1:1], format: "2019-09-17 19:20:05"
#>   ..$ TimeZone          : chr "-04:00:00"
#>   ..$ Download Date     : POSIXct[1:1], format: "2019-09-17 19:20:05"
#>   ..$ Board Revision    : chr "8"
#>   ..$ Unexpected Resets : chr "0"
#>   ..$ Acceleration Scale: int 256
#>   ..$ Acceleration Min  : chr "-8.0"
#>   ..$ Acceleration Max  : chr "8.0"
#>   ..$ Subject Name      : chr "suffix_85"
#>   ..$ Serial Prefix     : chr "TAS"
#>   ..- attr(*, "class")= chr [1:2] "gt3x_info" "list"

Converting to a data.frame

the read.gt3x package has an as.data.frame method for the activity matrix, which converts the matrix to a dataframe and adds a “time” column, which gives the timestamp of each sample. The timestamps are stored in R with the GMT timezone but note that this is misleading: in reality the timestamps correspond to the local time of the device!

X <- as.data.frame(X)
head(X)
#> Sampling Rate: 100Hz
#> Firmware Version: 1.7.2
#> Serial Number Prefix: TAS
#>                     time     X      Y     Z
#> 1 2019-09-17 18:40:00.00 0.000  0.008 0.996
#> 2 2019-09-17 18:40:00.00 0.016  0.000 1.008
#> 3 2019-09-17 18:40:00.01 0.020 -0.008 1.004
#> 4 2019-09-17 18:40:00.02 0.016 -0.012 1.012
#> 5 2019-09-17 18:40:00.03 0.016 -0.008 1.008
#> 6 2019-09-17 18:40:00.04 0.008 -0.008 1.008
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  619077 33.1    1368591 73.1   985802 52.7
#> Vcells 1176891  9.0    8388608 64.0  2735659 20.9
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  619049 33.1    1368591 73.1   985802 52.7
#> Vcells 1176904  9.0    8388608 64.0  2735659 20.9