Helper Functions

Michael Koohafkan

2021-02-17

This document gets illustrates some of the helper functions in cimir.

First, simply load the cimir library:

library(cimir)

In this vignette, we’ll use some example data from the Markleeville station (#246). The station metadata can be retrieved with cimis_station():

station.meta = cimis_station(246)
print(station.meta)
StationNbr Name City RegionalOffice County ConnectDate DisconnectDate IsActive IsEtoStation Elevation GroundCover HmsLatitude HmsLongitude ZipCodes SitingDesc
246 Markleeville Markleeville North Central Region Office Alpine 6/13/2014 12/31/2050 True True 5517 Grass 38º46’24N / 38.773409 -119º47’31W / -119.791930 96120
246 Markleeville Markleeville North Central Region Office Alpine 6/13/2014 12/31/2050 True True 5517 Grass 38º46’24N / 38.773409 -119º47’31W / -119.791930 96133

Notice that the station latitude and longitude is provided as a text string, in both Hour Minute Second (HMMS) and Decimal Degree (DD) format. We can extract one or the other of these formats using cimis_format_location():

station.meta = cimis_format_location(station.meta, "DD")
head(station.meta)
StationNbr Name City RegionalOffice County ConnectDate DisconnectDate IsActive IsEtoStation Elevation GroundCover Latitude Longitude ZipCodes SitingDesc
246 Markleeville Markleeville North Central Region Office Alpine 6/13/2014 12/31/2050 True True 5517 Grass 38.77341 -119.7919 96120
246 Markleeville Markleeville North Central Region Office Alpine 6/13/2014 12/31/2050 True True 5517 Grass 38.77341 -119.7919 96133

Now let’s retrieve some data with cimis_data():

station.data = cimis_data(246, "2017-04-01", "2017-04-30",
  c("day-air-tmp-avg", "hly-air-tmp"))
head(station.data)
Name Type Owner Date Julian Station Standard ZipCodes Scope Item Value Qc Unit Hour
cimis station water.ca.gov 2017-04-01 91 246 english 96120, 96133 daily DayAirTmpAvg 42.8 (F) NA
cimis station water.ca.gov 2017-04-02 92 246 english 96120, 96133 daily DayAirTmpAvg 45.7 (F) NA
cimis station water.ca.gov 2017-04-03 93 246 english 96120, 96133 daily DayAirTmpAvg 41.1 (F) NA
cimis station water.ca.gov 2017-04-04 94 246 english 96120, 96133 daily DayAirTmpAvg 47.0 (F) NA
cimis station water.ca.gov 2017-04-05 95 246 english 96120, 96133 daily DayAirTmpAvg 52.4 (F) NA
cimis station water.ca.gov 2017-04-06 96 246 english 96120, 96133 daily DayAirTmpAvg 48.9 (F) NA

Notice that hourly data returns timestamps in two columns “Date” and “Hour”. Furthermore, since we requested both a daily item and an hourly item, the daily item records have NA values for the “Hour” column. We can collapse these columns into a single datetime column using cimis_to_datetime():

station.data = cimis_to_datetime(station.data)
head(station.data)
Name Type Owner Datetime Julian Station Standard ZipCodes Scope Item Value Qc Unit
cimis station water.ca.gov 2017-04-01 00:00:00 91 246 english 96120, 96133 daily DayAirTmpAvg 42.8 (F)
cimis station water.ca.gov 2017-04-02 00:00:00 92 246 english 96120, 96133 daily DayAirTmpAvg 45.7 (F)
cimis station water.ca.gov 2017-04-03 00:00:00 93 246 english 96120, 96133 daily DayAirTmpAvg 41.1 (F)
cimis station water.ca.gov 2017-04-04 00:00:00 94 246 english 96120, 96133 daily DayAirTmpAvg 47.0 (F)
cimis station water.ca.gov 2017-04-05 00:00:00 95 246 english 96120, 96133 daily DayAirTmpAvg 52.4 (F)
cimis station water.ca.gov 2017-04-06 00:00:00 96 246 english 96120, 96133 daily DayAirTmpAvg 48.9 (F)

Note that a time of 00:00:00 is used for daily records.

The CIMIS Web API has fairly conservative limitations on the number of records you can query at once. Large queries can be split automatically into a series of smaller queries using cimis_split_queries:

queries = cimis_split_query(247, "2017-04-01", "2018-04-30",
  c("day-air-tmp-avg", "hly-air-tmp"))
queries
#> # A tibble: 7 x 4
#>   start.date end.date   items     targets  
#>   <date>     <date>     <list>    <list>   
#> 1 2017-04-01 2018-04-30 <chr [1]> <dbl [1]>
#> 2 2017-04-01 2017-06-04 <chr [1]> <dbl [1]>
#> 3 2017-06-05 2017-08-09 <chr [1]> <dbl [1]>
#> 4 2017-08-10 2017-10-14 <chr [1]> <dbl [1]>
#> 5 2017-10-15 2017-12-18 <chr [1]> <dbl [1]>
#> 6 2017-12-19 2018-02-22 <chr [1]> <dbl [1]>
#> 7 2018-02-23 2018-04-30 <chr [1]> <dbl [1]>

The queries can then be run in sequence using e.g. mapply() or purrr::pmap():

purrr::pmap_dfr(queries, cimis_data)

Note that the CIMIS API may reject your requests if you submit too many queries in a short period of time.