GNRS R package

Brian Maitner

2021-10-12

Geographic Name Resolution Service

The package GNRS is designed to interact with the Geographic Name Resolution Service API (GNRS; https://gnrs.biendata.org/) of the Botanical Information and Ecology Network (BIEN; https://bien.nceas.ucsb.edu). The GNRS is a tool for resolving, standardizing, and indexing political division names. The GNRS resolves political division names against standard world political units in the Geonames (https://www.geonames.org/) and Global Administrative Areas (GADM; https://gadm.org/) databases. Names are resolved to three levels: country, state/province and county/parish. The GNRS uses both exact and fuzzy matching to match standard and alternative political division names in a variety of languages, as well as abbreviations and codes such as ISO and FIPS codes. Results returned by the GNRS include the original names submitted, the standard names and codes of the political units matched, unique identifiers from the Geonames and GADM databases, and additional fields describe how each name was resolved. An overall match score from 0-1 describes how closely the submitted names matches standard names, where 1 is a perfect match.

Installing the GNRS package

The current, stable version of the GNRS package is available on CRAN, while the development version can be installed from Github using devtools.

# To install the stable version from CRAN
install.packages("GNRS")

# To install the development version from Github

library(devtools)
install_github("EnquistLab/RGNRS")

Standardizing a single name

In some cases, we may only want to standardize a single name. Say, we’d like to check what the standardized name for the United States of America is. Or perhaps we’d like to get the standardized name for the Canadian province of Quebec. We can use the function GNRS_super_simple for this.

library(GNRS)


# Standardizing a single country

USA_standardized <- GNRS_super_simple(country = "United States of America")

# Take a look at the columns returned
colnames(USA_standardized)
##  [1] "poldiv_full"                 "country_verbatim"           
##  [3] "state_province_verbatim"     "state_province_verbatim_alt"
##  [5] "county_parish_verbatim"      "county_parish_verbatim_alt" 
##  [7] "country"                     "state_province"             
##  [9] "county_parish"               "country_id"                 
## [11] "state_province_id"           "county_parish_id"           
## [13] "country_iso"                 "state_province_iso"         
## [15] "county_parish_iso"           "geonameid"                  
## [17] "gid_0"                       "gid_1"                      
## [19] "gid_2"                       "match_method_country"       
## [21] "match_method_state_province" "match_method_county_parish" 
## [23] "match_score_country"         "match_score_state_province" 
## [25] "match_score_county_parish"   "overall_score"              
## [27] "poldiv_submitted"            "poldiv_matched"             
## [29] "match_status"                "user_id"
# The most useful columns in this case are country and overall_score
USA_standardized[c("country","overall_score","match_method_country")]
##         country overall_score match_method_country
## 1 United States          1.00 exact alternate name

In this case, the standardized name is just “United States”. We have high confidence in this name because it matched perfectly (overall_score = 1.00) to an alternate name for “United States of America”. Note that even though we didn’t supply any state/province or country/parish names, there are still fields returned for these. This is because, when resolving names, the output is always identical, but may be empty.

# Standardizing a single state

Multiple political divisions

#First, we'll load the test data that are included with this package, gnrs_testfile

gnrs_testfile <- gnrs_testfile

head(gnrs_testfile, n = 10)
##    user_id   country          state_province
## 1        1    Russia                 Lipetsk
## 2        2    Mexico       Sonora, Estado de
## 3        3 Guatemala                  Izabal
## 4        4       USA                 Arizona
## 5        5     U.S.A                 Arizona
## 6        6       USA                 Ilinois
## 7        7    Mexico            Quintana Roo
## 8        8    Mexico            Quintana Roo
## 9        9   Ukraine                 Kharkiv
## 10      10    Canada Province of Nova Scotia
##                           county_parish
## 1                      Dobrovskiy rayon
## 2                          Hua^sA(C)pac
## 3                                      
## 4                           Pima County
## 5                                  Pima
## 6                                      
## 7               La^sA°zaro Ca^sA°rdenas
## 8  Municipio de La^sA°zaro Ca^sA°rdenas
## 9                       Novovodolaz'kyi
## 10

As you can see, the sample data include spelling variants (USA vs U.S.A.) and non-standard characters that may cause problems. The GNRS will standardize these spelling variants and non-standard characters.

gnrs_results <- GNRS(gnrs_testfile)

#The standardized names are found in these columns:
head(gnrs_results[c("country","state_province","county_parish")], n = 10)
##          country      state_province    county_parish
## 1         Russia  Lipetskaya Oblast' Dobrovskiy Rayon
## 2         Mexico              Sonora                 
## 3      Guatemala              Izabal                 
## 4  United States             Arizona             Pima
## 5  United States             Arizona             Pima
## 6  United States            Illinois                 
## 7         Mexico        Quintana Roo                 
## 8         Mexico        Quintana Roo                 
## 9        Ukraine Kharkivs'ka Oblast'  Novovodolaz'kyi
## 10        Canada         Nova Scotia

The GNRS function expects 4 columns as input, but all are optional. If you ever forget, you can use the function GNRS_template as a quick look-up, or as a template to populate

head(GNRS_template())
##   user_id country state_province county_parish
## 1      NA      NA             NA            NA