rfishbase

R-check cran checks Coverage status Onboarding CRAN status Downloads

Welcome to rfishbase 3. This package is the third rewrite of the original rfishbase package described in Boettiger et al. (2012).

rfishbase 3 queries pre-compressed tables from a static server and employs local caching (through memoization) to provide much greater performance and stability, particularly for dealing with large queries involving 10s of thousands of species. The user is never expected to deal with pagination or curl headers and timeouts.

We welcome any feedback, issues or questions that users may encounter through our issues tracker on GitHub: https://github.com/ropensci/rfishbase/issues

Installation

remotes::install_github("ropensci/rfishbase")
library("rfishbase")
library("dplyr") # convenient but not required

Getting started

FishBase (https://fishbase.org) makes it relatively easy to look up a lot of information on most known species of fish. However, looking up a single bit of data, such as the estimated trophic level, for many different species becomes tedious very soon. This is a common reason for using rfishbase. As such, our first step is to assemble a good list of species we are interested in.

Building a species list

Almost all functions in rfishbase take a list (character vector) of species scientific names, for example:

fish <- c("Oreochromis niloticus", "Salmo trutta")

You can also read in a list of names from any existing data you are working with. When providing your own species list, you should always begin by validating the names. Taxonomy is a moving target, and this well help align the scientific names you are using with the names used by FishBase, and alert you to any potential issues:

fish <- validate_names(c("Oreochromis niloticus", "Salmo trutta"))

Another typical use case is in wanting to collect information about all species in a particular taxonomic group, such as a Genus, Family or Order. The function species_list recognizes six taxonomic levels, and can help you generate a list of names of all species in a given group:

fish <- species_list(Genus = "Labroides")
fish
[1] "Labroides bicolor"       "Labroides dimidiatus"   
[3] "Labroides pectoralis"    "Labroides phthirophagus"
[5] "Labroides rubrolabiatus"

rfishbase also recognizes common names. When a common name refers to multiple species, all matching species are returned:

trout <- common_to_sci("trout")
Importing /home/cboettig/.local/share/R/rfishbase/comnames_fb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 324211 Columns: 35

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (34): autoctr, ComName, Transliteration, StockCode, SpecCode, C_Code, La...
dbl  (1): ComNamesRefNo


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 12.5654 secs)
trout
# A tibble: 296 × 4
   Species                   ComName              Language SpecCode
   <chr>                     <chr>                <chr>    <chr>   
 1 Salmo obtusirostris       Adriatic trout       English  6210    
 2 Schizothorax richardsonii Alawan snowtrout     English  8705    
 3 Schizopyge niger          Alghad snowtrout     English  24454   
 4 Salvelinus fontinalis     American brook trout English  246     
 5 Salmo trutta              Amu-Darya trout      English  238     
 6 Salmo kottelati           Antalya trout        English  67602   
 7 Oncorhynchus apache       Apache Trout         English  2687    
 8 Oncorhynchus apache       Apache trout         English  2687    
 9 Plectropomus areolatus    Apricot trout        English  6082    
10 Salmo trutta              Aral Sea Trout       English  238     
# … with 286 more rows

Note that there is no need to validate names coming from common_to_sci or species_list, as these will always return valid names.

Getting data

With a species list in place, we are ready to query fishbase for data. Note that if you have a very long list of species, it is always a good idea to try out your intended functions with a subset of that list first to make sure everything is working.

The species() function returns a table containing much (but not all) of the information found on the summary or homepage for a species on FishBase. rfishbase functions always return tidy data tables: rows are observations (e.g. a species, individual samples from a species) and columns are variables (fields).

species(trout$Species)
# A tibble: 296 × 101
   SpecCode Species                   Genus SpeciesRefNo Author FBname PicPreferredName
   <chr>    <chr>                     <chr>        <dbl> <chr>  <chr>  <chr>           
 1 6210     Salmo obtusirostris       Salmo        59043 (Heck… Adria… Saobt_u0.jpg    
 2 8705     Schizothorax richardsonii Schi…         4832 (Gray… Snowt… Scric_u1.jpg    
 3 24454    Schizopyge niger          Schi…         4832 (Heck… Algha… <NA>            
 4 246      Salvelinus fontinalis     Salv…        86798 (Mitc… Brook… Safon_u4.jpg    
 5 238      Salmo trutta              Salmo         4779 Linna… Sea t… Satru_u2.jpg    
 6 67602    Salmo kottelati           Salmo        99540 Turan… Antal… Sakot_m0.jpg    
 7 2687     Oncorhynchus apache       Onco…         5723 (Mill… Apach… Onapa_u0.jpg    
 8 2687     Oncorhynchus apache       Onco…         5723 (Mill… Apach… Onapa_u0.jpg    
 9 6082     Plectropomus areolatus    Plec…         5222 (Rüpp… Squar… Plare_u4.jpg    
10 238      Salmo trutta              Salmo         4779 Linna… Sea t… Satru_u2.jpg    
# … with 286 more rows, and 94 more variables: PicPreferredNameM <chr>,
#   PicPreferredNameF <chr>, PicPreferredNameJ <chr>, FamCode <dbl>,
#   Subfamily <chr>, GenCode <dbl>, SubGenCode <dbl>, BodyShapeI <chr>,
#   Source <chr>, AuthorRef <chr>, Remark <chr>, TaxIssue <chr>, Fresh <dbl>,
#   Brack <dbl>, Saltwater <dbl>, DemersPelag <chr>, Amphibious <chr>,
#   AmphibiousRef <int>, AnaCat <chr>, MigratRef <dbl>,
#   DepthRangeShallow <chr>, DepthRangeDeep <dbl>, DepthRangeRef <dbl>, …

Most tables contain many fields. To avoid overly cluttering the screen, rfishbase displays tables as “tibbles” from the dplyr package. These act just like the familiar data.frames of base R except that they print to the screen in a more tidy fashion. Note that columns that cannot fit easily in the display are summarized below the table. This gives us an easy way to see what fields are available in a given table.

Most rfishbase functions will let the user subset these fields by listing them in the fields argument, for instance:

dat <- species(trout$Species, fields=c("Species", "PriceCateg", "Vulnerability"))
dat
# A tibble: 296 × 3
   Species                   PriceCateg Vulnerability
   <chr>                     <chr>              <dbl>
 1 Salmo obtusirostris       very high           47.0
 2 Schizothorax richardsonii unknown             34.8
 3 Schizopyge niger          unknown             46.8
 4 Salvelinus fontinalis     very high           43.4
 5 Salmo trutta              very high           56.3
 6 Salmo kottelati           <NA>                33.1
 7 Oncorhynchus apache       very high           52.3
 8 Oncorhynchus apache       very high           52.3
 9 Plectropomus areolatus    very high           30.3
10 Salmo trutta              very high           56.3
# … with 286 more rows

Alternatively, just subset the table using the standard column selection in base R ([[) or dplyr::select.

FishBase Docs: Discovering data

Unfortunately identifying what fields come from which tables is often a challenge. Each summary page on FishBase includes a list of additional tables with more information about species ecology, diet, occurrences, and many other things. rfishbase provides functions that correspond to most of these tables.

Because rfishbase accesses the back end database, it does not always line up with the web display. Frequently rfishbase functions will return more information than is available on the web versions of the these tables. Some information found on the summary homepage for a species is not available from the species summary function, but must be extracted from a different table. For instance, the species Resilience information is not one of the fields in the species summary table, despite appearing on the species homepage of FishBase. To discover which table this information is in, we can use the special rfishbase function list_fields, which will list all tables with a field matching the query string:

list_fields("Resilience")
# A tibble: 1 × 1
  table 
  <chr> 
1 stocks

This shows us that this information appears on the stocks table. We can then request this data from the stocks table:

stocks(trout$Species, fields=c("Species", "Resilience", "StockDefs"))
Importing /home/cboettig/.local/share/R/rfishbase/stocks_fb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 35322 Columns: 119

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (50): StockCode, SpecCode, SynOC, StockDefs, StockDefsGeneral, Level, L...
dbl  (67): StocksRefNo, Easternmost, BoundingRef, BoundingMethod, TempMin, T...
dttm  (2): DateModified, DateChecked


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 2.653803 secs)

# A tibble: 397 × 3
   Species                   Resilience StockDefs                               
   <chr>                     <chr>      <chr>                                   
 1 Salmo obtusirostris       Medium     Europe:  Adriatic basin in Krka, Jardo,…
 2 Schizothorax richardsonii Medium     Asia:  Himalayan region of India, Sikki…
 3 Schizopyge niger          Medium     Asia:  Kashmir Valley in India and Azad…
 4 Salvelinus fontinalis     Medium     North America:  native to most of easte…
 5 Salmo trutta              <NA>       <i>Salmo trutta aralensis</i>:  Asia:  …
 6 Salmo trutta              <NA>       <i>Salmo trutta aralensis</i>:  Asia:  …
 7 Salmo trutta              Medium     <i>Salmo trutta fario</i>:  Northeast  …
 8 Salmo trutta              <NA>       <i>Salmo trutta lacustris</i>           
 9 Salmo trutta              <NA>       <i>Salmo trutta oxianus</i>             
10 Salmo trutta              <NA>       Baltic Sea (ICES subdivisions 22-32)    
# … with 387 more rows

Version stability

rfishbase relies on periodic cache releases – the database release 17.07 dates to a release in July 2017. Set the version of FishBase you wish to access by setting the environmental variable:

options(FISHBASE_VERSION="19.04")

Note that the same version number applies to both the fishbase and sealifebase data. Stay tuned for new releases.

SeaLifeBase

SeaLifeBase.org is maintained by the same organization and largely parallels the database structure of Fishbase. As such, almost all rfishbase functions can instead be instructed to address the

We can begin by getting the taxa table for sealifebase:

sealife <- load_taxa(server="sealifebase")
Importing /home/cboettig/.local/share/R/rfishbase/species_slb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 96814 Columns: 109

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (63): SpecCode, Genus, Species, Author, SpeciesRefNo, FBname, Subfamily...
dbl  (40): FamCode, TaxIssue, AuthorRef, SubGenCode, Brack, Land, MigratRef,...
lgl   (3): Pic, PictureFemale, Profile
dttm  (2): DateEntered, TS
date  (1): E_DateAppend


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 9.136215 secs)

Importing /home/cboettig/.local/share/R/rfishbase/genera_slb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 27798 Columns: 60

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (26): GenCode, GEN_NAME, AuthorYear, SUBGEN, CommonName, AUTH, QUALIFIC...
dbl  (12): GenusRefno, Famcode, Syncode, LineageID, CurrentGenusID, Marine, ...
lgl  (17): FB_NbSpp, DspinesMin, DspinesMax, DsoftRaysMin, DsoftRaysMax, Tot...
dttm  (5): STAT_CODE1, Dateentered, Datemodified, Datechecked, TS


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 1.100098 secs)

Importing /home/cboettig/.local/share/R/rfishbase/families_slb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 4564 Columns: 79

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (21): FamCode, Family, AuthorYear, Qualification, CommonName, SuperFami...
dbl  (14): FamiliesRefNo, Fossil, Ordnum, Genera, Species, Brackish, Freshwa...
lgl  (41): SwimMode, Activity, PeriodRange, Period, EpochRange, Epoch, LarvP...
dttm  (3): DateEntered, DateModified, DateChecked


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 0.316843 secs)

Importing /home/cboettig/.local/share/R/rfishbase/orders_slb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 743 Columns: 45

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr   (6): Order, CommonName, Class, Phylum, Remark, CommonName_German
dbl  (11): OrderRefNo, Ordnum, ClassNum, PhylumNum, Marine, Brackish, Freshw...
lgl  (26): EtymologyOrder, Class_Russian, EtymologyClass, SisterOrder, ComAn...
dttm  (2): DateEntered, DateModified


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 0.09213448 secs)

Importing /home/cboettig/.local/share/R/rfishbase/classes_slb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 176 Columns: 9

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (5): Class, CommonName, Phylum, Remarks, WaterSalinity
dbl (2): ClassNum, PhylumNum
lgl (2): SortNo, TS


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 0.071419 secs)

Importing /home/cboettig/.local/share/R/rfishbase/phylums_slb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 110 Columns: 10

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): Kingdom, Phylum, CommonName
dbl (5): PhylumId, Valid, ValidPhylumId, ParentPhylumId, PhylumRefNo
lgl (2): ParentKingdomId, ParentKingdomInfId


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 0.04270363 secs)

Importing /home/cboettig/.local/share/R/rfishbase/phylumclass_slb_2104.tsv.bz2 in 1000000 line chunks:

Rows: 238 Columns: 2

── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): Phylum, Class


ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

    ...Done! (in 0.03282166 secs)

(Note: running load_taxa() at the beginning of any session, for either fishbase or sealifebase is a good way to “warm up” rfishbase by loading in taxonomic data it will need. This information is cached throughout your session and will make all subsequent commands run faster. But no worries if you skip this step, rfishbase will peform it for you on the first time it is needed, and will cache these results thereafter.)

Let’s look at some Gastropods:

sealife %>% filter(Class == "Gastropoda")
# A tibble: 19,390 × 9
   SpecCode Species     Genus   Subfamily  Family   Order  Class  Phylum Kingdom
   <chr>    <chr>       <chr>   <chr>      <chr>    <chr>  <chr>  <chr>  <chr>  
 1 148430   Aartsenia … Aartse… <NA>       Pyramid… Not a… Gastr… Mollu… Animal…
 2 80334    Abranchaea… Abranc… <NA>       Pneumod… Gymno… Gastr… Mollu… Animal…
 3 5182     Acanthina … Acanth… Ocenebrin… Muricid… Neoga… Gastr… Mollu… Animal…
 4 148525   Acanthina … Acanth… Ocenebrin… Muricid… Neoga… Gastr… Mollu… Animal…
 5 5179     Acanthina … Acanth… Ocenebrin… Muricid… Neoga… Gastr… Mollu… Animal…
 6 5180     Acanthina … Acanth… Ocenebrin… Muricid… Neoga… Gastr… Mollu… Animal…
 7 5181     Acanthina … Acanth… Ocenebrin… Muricid… Neoga… Gastr… Mollu… Animal…
 8 147501   Acanthina … Acanth… Ocenebrin… Muricid… Neoga… Gastr… Mollu… Animal…
 9 151551   Acmaturris… Acmatu… Mangeliin… Turridae Neoga… Gastr… Mollu… Animal…
10 150126   Acmaturris… Acmatu… Mangeliin… Turridae Neoga… Gastr… Mollu… Animal…
# … with 19,380 more rows

All other tables can also take an argument to server:

species(server="sealifebase")
# A tibble: 96,816 × 109
   SpecCode Species  Genus  Author SpeciesRefNo FBname FamCode Subfamily GenCode
   <chr>    <chr>    <chr>  <chr>  <chr>        <chr>    <dbl> <chr>     <chr>  
 1 32307    Aaptola… Aapto… (Pils… 19           <NA>       815 <NA>      27838  
 2 32306    Aaptola… Aapto… Newma… 81749        <NA>       815 <NA>      27838  
 3 32308    Aaptola… Aapto… (Pils… 19           <NA>       815 <NA>      27838  
 4 32304    Aaptola… Aapto… Newma… 19           <NA>       815 <NA>      27838  
 5 32305    Aaptola… Aapto… Newma… 19           <NA>       815 <NA>      27838  
 6 51720    Aaptos … Aaptos (Schm… 19           <NA>      2630 <NA>      9253   
 7 165941   Aaptos … Aaptos de La… 108813       <NA>      2630 <NA>      9253   
 8 105687   Aaptos … Aaptos (Wils… 3477         <NA>      2630 <NA>      9253   
 9 139407   Aaptos … Aaptos (Tops… 85482        <NA>      2630 <NA>      9253   
10 130070   Aaptos … Aaptos (Soll… 81108        <NA>      2630 <NA>      9253   
# … with 96,806 more rows, and 100 more variables: TaxIssue <dbl>,
#   Remark <chr>, PicPreferredName <chr>, PicPreferredNameM <chr>,
#   PicPreferredNameF <chr>, PicPreferredNameJ <chr>, Source <chr>,
#   AuthorRef <dbl>, SubGenCode <dbl>, Fresh <chr>, Brack <dbl>,
#   Saltwater <chr>, Land <dbl>, BodyShapeI <chr>, DemersPelag <chr>,
#   AnaCat <chr>, MigratRef <dbl>, DepthRangeShallow <dbl>,
#   DepthRangeDeep <dbl>, DepthRangeRef <chr>, DepthRangeComShallow <dbl>, …

CAUTION: if switching between fishbase and sealifebase in a single R session, we strongly advise you always set server explicitly in your function calls. Otherwise you may confuse the caching system.

Backwards compatibility

rfishbase 3.0 tries to maintain as much backwards compatibility as possible with rfishbase 2.0. However, there are cases in which the rfishbase 2.0 behavior was not desirable – such as throwing errors when a introducing simple NAs for missing data would be more appropriate, or returning vectors where data.frames were needed to include all the context.


Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

ropensci_footer