Home

OpenCage, CC BY-SA 2.5, via Wikimedia Commons

1 Overview

Here we present a possible workflow for modeling and predicting human interactions with the world’s largest bony fish, the Mola mola. We’ll use R and a number of important packages to retrieve observations from OBIS, sea surface temperature from OISST and wind estimates from NBS v2. With these we’ll build a presence-only model using a pure-R implementation of MaxEnt. We’ll also try our hand at predicting (hindcast).

The cartoon below generalizes the two-step process that we’ll walk through. The first step is where most the coding efforts occurs as we gather our observations and predictor variables we have chosen. The product of this first step, the model, is often saved to disk for later use making predictions.

overview

2 Prerequisites

We are making an assumption that the reader is familiar with R programming language, the tools available in the tidyverse, and spatial data handling.

3 Getting started

“According to the ancient Chinese proverb, A journey of a thousand miles must begin with a single step.” ~ John F. Kennedy

3.1 Handling spatial data

In this tutorial we use the sf and stars packages to handle spatial data. These tutorials for sf and for stars will help you off to a great start.

3.2 Project-specific functions

We have developed a suite of functions that facilitate accessing and working with data. These can be loaded into your R session by source()-ing the setup.R file. Here’s an example where we show the study area using the ancillary function get_bb() to retrieve the project’s bounding box.

source("setup.R", echo = FALSE)
bb = get_bb(form = 'polygon')
coast = rnaturalearth::ne_coastline(scale = 'large', returnclass = 'sf')
plot(sf::st_geometry(coast), extent = bb, axes = TRUE, reset = FALSE)
plot(bb, lwd = 2, border = 'orange', add = TRUE)

The setup file also checks for the required packages, and will attempt to install them into the user’s R library directory if not already installed.

3.3 Fetching data

The robis package facilitates easy access to OBIS which is a huge public database for oceanographic species information. We have written a wrapper function to download the Mola mola species records in our study region. To simplify our task we drop may columns of data from that delivered by OBIS, but there is much in the original worth exploring. Note you can use this function to access other species in other parts of the world.

Note

Note that we have already fetched the data, so we don’t run this next step (but you can if you like to get updated data.)

if (!file.exists('data/obis/Mola_mola.gpkg')){
  x = fetch_obis(scientificname = 'Mola mola')
}

Since we already have the data, we need only read it which we do below. We also “glimpse” at the data so we can get a sense of the variables within the data frame and their respective data types.

x = read_obis(scientificname = 'Mola mola') |>
  dplyr::glimpse()
Rows: 10,646
Columns: 8
$ occurrenceID  <chr> "1817_15415", "283_15788", "513_6751", "513_49055", "513…
$ date          <date> 2016-08-07, 1980-05-23, 2002-06-17, 2009-07-10, 2009-08…
$ basisOfRecord <chr> "HumanObservation", "HumanObservation", "HumanObservatio…
$ bathymetry    <dbl> 121, 102, 65, 166, 133, 19, 218, 90, 1283, 160, 219, 97,…
$ shoredistance <dbl> 134559, 121292, 77684, 127643, 58028, 524, 94816, 141127…
$ sst           <dbl> 16.36, 15.44, 13.38, 11.70, 9.20, 12.35, 11.79, 13.90, 1…
$ sss           <dbl> 34.10, 33.41, 32.64, 32.22, 30.99, 31.38, 32.12, 33.38, …
$ geom          <POINT [°]> POINT (-72.8074 39.056), POINT (-72.283 39.733), P…

Let’s see what we found on a map. We first plot the coastline, but provide the bounding box to establish the limits of the plot. Then we add the box itself and then the points.

plot(sf::st_geometry(coast), extent = bb, axes = TRUE, reset = FALSE)
plot(bb, lwd = 2, border = 'orange', add = TRUE)
plot(sf::st_geometry(x), pch = "+", col = 'blue', add = TRUE)

3.4 Data storage

We have set up a data directory, data, for storing data collected for the project. To start out there isn’t much more than the downloaded data set, but we’ll added to it as we go. We try to keep the data storage area tidy by using subdirectories. Below we print the directory tree, your’s might look slightly different until you have run this code.

fs::dir_tree("data", recurse = 1)
data
├── bkg
│   ├── bkg-covariates.gpkg
│   ├── bkg-covariates.gpkg-journal
│   ├── buffered-polygon.gpkg
│   └── obs-covariates.gpkg
├── mask
│   ├── mask_factor.tif
│   └── mask_factor.tif.aux.xml
├── model
│   ├── tidysdm
│   ├── v1
│   ├── v2
│   └── v3
├── nbs
│   ├── 2000
│   ├── 2001
│   ├── 2002
│   ├── 2003
│   ├── 2004
│   ├── 2005
│   ├── 2006
│   ├── 2007
│   ├── 2008
│   ├── 2009
│   ├── 2010
│   ├── 2011
│   ├── 2012
│   ├── 2013
│   ├── 2014
│   ├── 2015
│   ├── 2016
│   ├── 2017
│   ├── 2018
│   ├── 2019
│   ├── 2020
│   ├── 2021
│   ├── 2022
│   ├── 2023
│   └── database.csv.gz
├── obis
│   └── Mola_mola.gpkg
├── obs
│   ├── model_input.gpkg
│   ├── obs-covariates.gpkg
│   ├── obs.gpkg
│   └── thinned_obs.gpkg
└── oisst
    ├── 2000
    ├── 2001
    ├── 2002
    ├── 2003
    ├── 2004
    ├── 2005
    ├── 2006
    ├── 2007
    ├── 2008
    ├── 2009
    ├── 2010
    ├── 2011
    ├── 2012
    ├── 2013
    ├── 2014
    ├── 2015
    ├── 2016
    ├── 2017
    ├── 2018
    ├── 2019
    ├── 2020
    ├── 2021
    ├── 2022
    ├── 2023
    └── database.csv.gz
Back to top