Filtering occurrence records

William K. Morris

When getting records from FinBIF there are many options for filtering the data before it is downloaded, saving bandwidth and local post-processing time. For the full list of filtering options see ?filters.

Location

Records can be filtered by the name of a location.

finbif_occurrence(filter = c(country = "Finland"))
#> Records downloaded: 10
#> Records available: 44691386
#> A data.frame [10 x 12]
#>                                 record_id      scientific_name abundance lat_wgs84 lon_wgs84
#> 1                           …JX.1594385#3 Sciurus vulgaris Li…  1         60.23584  25.05693
#> 2  …KE.176/64895825d5de884fa20e297d#Unit1 Heracleum persicum …        NA  61.08302  22.38983
#> 3                           …JX.1594382#9 Hirundo rustica Lin…        NA  64.12716  23.99111
#> 4                          …JX.1594382#37 Pica pica (Linnaeus…        NA  64.12716  23.99111
#> 5                          …JX.1594382#49 Muscicapa striata (…        NA  64.12716  23.99111
#> 6                          …JX.1594382#39 Larus canus Linnaeu…        NA  64.12716  23.99111
#> 7                           …JX.1594382#5 Emberiza citrinella…        NA  64.12716  23.99111
#> 8                          …JX.1594382#31 Ficedula hypoleuca …        NA  64.12716  23.99111
#> 9                          …JX.1594382#41 Alauda arvensis Lin…        NA  64.12716  23.99111
#> 10                         …JX.1594382#21 Numenius arquata (L…        NA  64.12716  23.99111
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality

Or by a set of coordinates.

finbif_occurrence(
  filter = list(coordinates = list(c(60, 68), c(20, 30), "wgs84"))
)
#> Records downloaded: 10
#> Records available: 37318868
#> A data.frame [10 x 12]
#>                                 record_id      scientific_name abundance lat_wgs84 lon_wgs84
#> 1                           …JX.1594385#3 Sciurus vulgaris Li…  1         60.23584  25.05693
#> 2  …KE.176/64895825d5de884fa20e297d#Unit1 Heracleum persicum …        NA  61.08302  22.38983
#> 3                           …JX.1594382#9 Hirundo rustica Lin…        NA  64.12716  23.99111
#> 4                          …JX.1594382#37 Pica pica (Linnaeus…        NA  64.12716  23.99111
#> 5                          …JX.1594382#49 Muscicapa striata (…        NA  64.12716  23.99111
#> 6                          …JX.1594382#39 Larus canus Linnaeu…        NA  64.12716  23.99111
#> 7                           …JX.1594382#5 Emberiza citrinella…        NA  64.12716  23.99111
#> 8                          …JX.1594382#31 Ficedula hypoleuca …        NA  64.12716  23.99111
#> 9                          …JX.1594382#41 Alauda arvensis Lin…        NA  64.12716  23.99111
#> 10                         …JX.1594382#21 Numenius arquata (L…        NA  64.12716  23.99111
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality

See ?filters section “Location” for more details

Time

The event or import date of records can be used to filter occurrence data from FinBIF. The date filters can be a single year, month or date,

finbif_occurrence(filter = list(date_range_ym = c("2020-12")))
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 23847
#> A data.frame [10 x 12]
#>    record_id      scientific_name abundance lat_wgs84 lon_wgs84           date_time
#> 1       …107 Pica pica (Linnaeus…  31        65.0027   25.49381 2020-12-31 10:20:00
#> 2        …45 Larus argentatus Po…  1         65.0027   25.49381 2020-12-31 10:20:00
#> 3       …153 Emberiza citrinella…  2         65.0027   25.49381 2020-12-31 10:20:00
#> 4        …49 Columba livia domes…  33        65.0027   25.49381 2020-12-31 10:20:00
#> 5       …117 Corvus corax Linnae…  1         65.0027   25.49381 2020-12-31 10:20:00
#> 6       …111 Corvus monedula Lin…  7         65.0027   25.49381 2020-12-31 10:20:00
#> 7       …161 Sciurus vulgaris Li…  1         65.0027   25.49381 2020-12-31 10:20:00
#> 8       …123 Passer montanus (Li…  28        65.0027   25.49381 2020-12-31 10:20:00
#> 9       …149 Pyrrhula pyrrhula (…  1         65.0027   25.49381 2020-12-31 10:20:00
#> 10       …77 Turdus pilaris Linn…  1         65.0027   25.49381 2020-12-31 10:20:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


, or for record events, a range as a character vector.

finbif_occurrence(
  filter = list(date_range_ymd = c("2019-06-01", "2019-12-31"))
)
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 911735
#> A data.frame [10 x 12]
#>                     record_id      scientific_name abundance lat_wgs84 lon_wgs84
#> 1  …KE.921/LGE.627772/1470480 Pteromys volans (Li…        NA  61.81362  25.75756
#> 2             …JX.1054648#107 Pica pica (Linnaeus…  3         65.30543  25.70355
#> 3              …JX.1054648#85 Poecile montanus (C…  1         65.30543  25.70355
#> 4             …JX.1054648#103 Garrulus glandarius…  3         65.30543  25.70355
#> 5             …JX.1054648#123 Passer montanus (Li…  3         65.30543  25.70355
#> 6             …JX.1054648#149 Pyrrhula pyrrhula (…  1         65.30543  25.70355
#> 7              …JX.1054648#93 Cyanistes caeruleus…  9         65.30543  25.70355
#> 8              …JX.1054648#95 Parus major Linnaeu…  35        65.30543  25.70355
#> 9             …JX.1054648#137 Carduelis flammea (…  2         65.30543  25.70355
#> 10            …JX.1056695#107 Pica pica (Linnaeus…  6         62.7154   23.0893 
#>              date_time
#> 1  2019-12-31 12:00:00
#> 2  2019-12-31 10:20:00
#> 3  2019-12-31 10:20:00
#> 4  2019-12-31 10:20:00
#> 5  2019-12-31 10:20:00
#> 6  2019-12-31 10:20:00
#> 7  2019-12-31 10:20:00
#> 8  2019-12-31 10:20:00
#> 9  2019-12-31 10:20:00
#> 10 2019-12-31 10:15:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


Records for a specific season or time-span across all years can also be requested.

finbif_occurrence(
  filter = list(
    date_range_md = c(begin = "12-21", end = "12-31"),
    date_range_md = c(begin = "01-01", end = "02-20")
  )
)
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 1486845
#> A data.frame [10 x 12]
#>      record_id      scientific_name abundance lat_wgs84 lon_wgs84           date_time
#> 1  …433443#318 Accipiter nisus (Li…  1         64.8162   25.32106 2023-02-20 15:00:00
#> 2  …531663#107 Pica pica (Linnaeus…  10        62.9199   27.71032 2023-02-20 07:40:00
#> 3  …530610#107 Pica pica (Linnaeus…  21        65.78623  24.49119 2023-02-20 09:15:00
#> 4  …530449#107 Pica pica (Linnaeus…  4         65.74652  24.62216 2023-02-20 08:20:00
#> 5  …531663#153 Emberiza citrinella…  12        62.9199   27.71032 2023-02-20 07:40:00
#> 6   …531663#49 Columba livia domes…  10        62.9199   27.71032 2023-02-20 07:40:00
#> 7   …530610#49 Columba livia domes…  2         65.78623  24.49119 2023-02-20 09:15:00
#> 8  …530610#117 Corvus corax Linnae…  1         65.78623  24.49119 2023-02-20 09:15:00
#> 9   …531663#61 Dendrocopos major (…  6         62.9199   27.71032 2023-02-20 07:40:00
#> 10 …531663#111 Corvus monedula Lin…  7         62.9199   27.71032 2023-02-20 07:40:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


Data Quality

You can filter occurrence records by indicators of data quality. See ?filters section “Quality” for details.

strict <- c(
  collection_quality = "professional", coordinates_uncertainty_max = 1,
  record_quality = "expert_verified"
)
permissive <- list(
  wild_status = c("wild", "non_wild", "wild_unknown"),
  record_quality = c(
    "expert_verified", "community_verified", "unassessed", "uncertain",
    "erroneous"
  ),
  abundance_min = 0
)
c(
  strict     = finbif_occurrence(filter = strict,     count_only = TRUE),
  permissive = finbif_occurrence(filter = permissive, count_only = TRUE)
)
#>     strict permissive 
#>      52654   51733557

Collection

The FinBIF database consists of a number of constituent collections. You can filter by collection with either the collection or not_collection filters. Use finbif_collections() to see metadata on the FinBIF collections.

finbif_occurrence(
  filter = c(collection = "iNaturalist Suomi Finland"), count_only = TRUE
)
#> [1] 691076
finbif_occurrence(
  filter = c(collection = "Notebook, general observations"), count_only = TRUE
)
#> [1] 2110409

Informal taxonomic groups

You can filter occurrence records based on informal taxonomic groups such as Birds or Mammals.

finbif_occurrence(filter = list(informal_groups = c("Birds", "Mammals")))
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 22116048
#> A data.frame [10 x 12]
#>    record_id      scientific_name abundance lat_wgs84 lon_wgs84           date_time
#> 1       …5#3 Sciurus vulgaris Li…  1         60.23584  25.05693 2023-06-14 08:56:00
#> 2       …2#9 Hirundo rustica Lin…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 3      …2#37 Pica pica (Linnaeus…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 4      …2#49 Muscicapa striata (…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 5      …2#39 Larus canus Linnaeu…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 6       …2#5 Emberiza citrinella…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 7      …2#31 Ficedula hypoleuca …        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 8      …2#41 Alauda arvensis Lin…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 9      …2#21 Numenius arquata (L…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 10     …2#29 Dendrocopos major (…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


See finbif_informal_groups() for the full list of groups you can filter by. You can use the same function to see the subgroups that make up a higher level informal group:

finbif_informal_groups("macrofungi")
#> Error in finbif_informal_groups("macrofungi"): Group not found

Regulatory

Many records in the FinBIF database include taxa that have one or another regulatory statuses. See finbif_metadata("regulatory_status") for a list of regulatory statuses and short-codes.

# Search for birds on the EU invasive species list
finbif_occurrence(
  filter = list(informal_groups = "Birds", regulatory_status = "EU_INVSV")
)
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 471
#> A data.frame [10 x 12]
#>                                 record_id      scientific_name abundance lat_wgs84 lon_wgs84
#> 1                           …JX.1580858#3 Oxyura jamaicensis …  1         60.28687  25.0271 
#> 2                           …JX.1580860#3 Oxyura jamaicensis …  1         60.28671  25.02713
#> 3  …KE.176/62b1ad90d5deb0fafdc6212b#Unit1 Oxyura jamaicensis …  7         61.66207  23.57706
#> 4                          …JX.1045316#34 Alopochen aegyptiac…  3         52.16081  4.485534
#> 5                          …JX.138840#123 Alopochen aegyptiac…  4         53.36759  6.191796
#> 6                          …JX.139978#214 Alopochen aegyptiac…  6         53.37574  6.207861
#> 7                           …JX.139710#17 Alopochen aegyptiac…  30        52.3399   5.069133
#> 8                           …JX.139645#57 Alopochen aegyptiac…  36        51.74641  4.535283
#> 9                           …JX.139645#10 Alopochen aegyptiac…  3         51.74641  4.535283
#> 10                          …JX.139442#16 Alopochen aegyptiac…  2         51.90871  4.53258 
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


IUCN red list

Filtering can be done by IUCN red list category. See finbif_metadata("red_list") for the IUCN red list categories and their short-codes.

# Search for near threatened mammals
finbif_occurrence(
  filter = list(informal_groups = "Mammals", red_list_status = "NT")
)
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 42510
#> A data.frame [10 x 12]
#>                                 record_id      scientific_name abundance lat_wgs84 lon_wgs84
#> 1                          …JX.1594024#23 Rangifer tarandus f…  15        63.31266  24.43298
#> 2                        …JX.1588853#1075 Rangifer tarandus f…  1         63.84551  29.8366 
#> 3                           …JX.1593780#3 Pusa hispida botnic…  1         65.02313  25.40505
#> 4                    …HR.3211/166639315-U Rangifer tarandus f…        NA  63.7      24.7    
#> 5                    …HR.3211/166049302-U Rangifer tarandus f…        NA  64.1      26.5    
#> 6                    …HR.3211/165761924-U Rangifer tarandus f…        NA  63.9      24.9    
#> 7                         …JX.1589779#105 Rangifer tarandus f…  3         63.7261   23.40827
#> 8  …KE.176/647ad84dd5de884fa20e25e6#Unit1 Rangifer tarandus f…  1         64.12869  24.73877
#> 9                    …HR.3211/165005253-U Pusa hispida botnic…        NA  64.2865   23.87402
#> 10                         …JX.1588052#18 Rangifer tarandus f…  2         64.13286  26.26767
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


Habitat type

Many taxa are associated with one or more primary or secondary habitat types (e.g., forest) or subtypes (e.g., herb-rich alpine birch forests). Use finbif_metadata("habitat_type") to see the habitat types in FinBIF. You can filter occurrence records based on primary (or primary/secondary) habitat type or subtype codes. Note that filtering based on habitat is on taxa not on the location (i.e., filtering records with primary_habitat = "M" will only return records of taxa considered to primarily inhabit forests, yet the locations of those records may encompass habitats other than forests).

head(finbif_metadata("habitat_type"))
#>                code name                                              
#> MKV.habitatMt  Mt   alpine birch forests (excluding herb-rich alpine …
#> MKV.habitatTlk Tlk  alpine calcareous rock outcrops and boulder fields
#> MKV.habitatTlr Tlr  alpine gorges and canyons                         
#> MKV.habitatT   T    Alpine habitats                                   
#> MKV.habitatTp  Tp   alpine heath scrubs                               
#> MKV.habitatTk  Tk   alpine heaths
# Search records of taxa for which forests are their primary or secondary
# habitat type
finbif_occurrence(filter = c(primary_secondary_habitat = "M"))
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 26362337
#> A data.frame [10 x 12]
#>    record_id      scientific_name abundance lat_wgs84 lon_wgs84           date_time
#> 1       …5#3 Sciurus vulgaris Li…  1         60.23584  25.05693 2023-06-14 08:56:00
#> 2      …2#37 Pica pica (Linnaeus…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 3      …2#49 Muscicapa striata (…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 4       …2#5 Emberiza citrinella…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 5      …2#31 Ficedula hypoleuca …        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 6      …2#29 Dendrocopos major (…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 7      …2#15 Sylvia borin (Bodda…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 8      …2#11 Anthus trivialis (L…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 9      …2#45 Corvus monedula Lin…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 10      …2#3 Phylloscopus trochi…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


You may further refine habitat based searching using a specific habitat type qualifier such as “sun-exposed” or “shady”. Use finbif_metadata("habitat_qualifier") to see the qualifiers available. To specify qualifiers use a named list of character vectors where the names are habitat types or subtypes and the elements of the character vectors are the qualifier codes.

finbif_metadata("habitat_qualifier")[4:6, ]
#>                           code name                                              
#> MKV.habitatSpecificTypeCA CA   calcareous effect                                 
#> MKV.habitatSpecificTypeH  H    esker forests, also semi-open forests             
#> MKV.habitatSpecificTypeKE KE   intermediate-basic rock outcrops and boulder fiel…
# Search records of taxa for which forests with sun-exposure and broadleaved
# deciduous trees are their primary habitat type
finbif_occurrence(filter = list(primary_habitat = list(M = c("PAK", "J"))))
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 178
#> A data.frame [10 x 12]
#>      record_id      scientific_name abundance lat_wgs84 lon_wgs84           date_time
#> 1  …502812#393 Pammene fasciana (L…        NA  60.45845  22.17811 2022-08-14 12:00:00
#> 2    …435062#6 Pammene fasciana (L…  1         60.20642  24.66127          2022-08-04
#> 3    …435050#9 Pammene fasciana (L…  1         60.20642  24.66127          2022-07-25
#> 4   …501598#39 Pammene fasciana (L…  1         60.08841  22.48629 2022-07-21 12:00:00
#> 5  …501387#162 Pammene fasciana (L…  1         60.08841  22.48629 2022-07-20 12:00:00
#> 6  …448030#159 Pammene fasciana (L…  1         60.08841  22.48629 2022-07-18 12:00:00
#> 7   …447556#78 Pammene fasciana (L…  1         60.08841  22.48629 2022-07-14 12:00:00
#> 8  …446841#408 Pammene fasciana (L…  1         60.08841  22.48629 2022-07-12 12:00:00
#> 9   …443339#36 Pammene fasciana (L…  1         60.08841  22.48629 2022-07-10 12:00:00
#> 10 …440849#159 Pammene fasciana (L…  2         60.08841  22.48629 2022-07-08 12:00:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


Status of taxa in Finland

You can restrict the occurrence records by the status of the taxa in Finland. For example you can request records for only rare species.

finbif_occurrence(filter = c(finnish_occurrence_status = "rare"))
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 406005
#> A data.frame [10 x 12]
#>                                 record_id      scientific_name abundance lat_wgs84 lon_wgs84
#> 1                    …HR.3211/167313706-U Pygaera timon (Hübn…        NA  62.1281   27.45272
#> 2                          …JX.1594282#21 Carterocephalus pal…  1         64.65322  24.58941
#> 3                    …HR.3211/167197097-U Carterocephalus pal…        NA  65.07819  25.55236
#> 4                    …HR.3211/167183358-U Glaucopsyche alexis…        NA  60.46226  22.76647
#> 5                           …JX.1594291#3 Glaucopsyche alexis…  1         60.42692  22.20411
#> 6  …KE.176/6488c111d5de884fa20e295f#Unit1 Panemeria tenebrata…  1         61.16924  25.56036
#> 7                           …JX.1593930#3 Hemaris tityus (Lin…  1         60.63969  27.29052
#> 8  …KE.176/64889455d5de884fa20e294f#Unit1 Pseudopanthera macu…  2         62.054    30.352  
#> 9                         …JX.1594170#199 Glaucopsyche alexis…  1         61.10098  28.68453
#> 10                          …JX.1594112#3 Hemaris tityus (Lin…  1         61.25511  28.89127
#> ...with 0 more record and 7 more variables:
#> date_time, coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


Or, by using the negation of occurrence status, you can request records of birds excluding those considered vagrants.

finbif_occurrence(
  filter = list(
    informal_groups               = "birds",
    finnish_occurrence_status_neg = sprintf("vagrant_%sregular", c("", "ir"))
  )
)
Click to show/hide output.

#> Records downloaded: 10
#> Records available: 21725426
#> A data.frame [10 x 12]
#>    record_id      scientific_name abundance lat_wgs84 lon_wgs84           date_time
#> 1         …9 Hirundo rustica Lin…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 2        …37 Pica pica (Linnaeus…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 3        …49 Muscicapa striata (…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 4        …39 Larus canus Linnaeu…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 5         …5 Emberiza citrinella…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 6        …31 Ficedula hypoleuca …        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 7        …41 Alauda arvensis Lin…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 8        …21 Numenius arquata (L…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 9        …29 Dendrocopos major (…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> 10       …15 Sylvia borin (Bodda…        NA  64.12716  23.99111 2023-06-14 08:48:00
#> ...with 0 more record and 6 more variables:
#> coordinates_uncertainty, any_issues, requires_verification, requires_identification,
#> record_reliability, record_quality


See finbif_metadata("finnish_occurrence_status") for a full list of statuses and their descriptions.