source("setup.R")
Prediction
- It’s tough to make predictions, especially about the future.
- Yogi Berra
Finally we come to the end product of forecasting: the prediction. This last step is actually fairly simple, given a recipe and model (now bundled in a workflow
container). We have already made predictions using tables of input covariates, but we now want to predict across the entire domain of our Brickman data set. This requires predicting with stars
objects not tables. You may recall that we are able to read these arrays, display them and extract point data from them. But we haven’t used them en mass as a variable yet.
1 Setup
As always, we start by running our setup function. Start RStudio/R, and reload your project with the menu File > Recent Projects
.
2 Load the Brickman data
Once again, we’ll use the August data where we started with a greedy sampling approach. We are going to make a prediction about the present, which means it something akin to a nowcast.
= read_configuration(version = "g_Aug")
cfg = brickman_database()
db = read_brickman(db |> filter(scenario == "PRESENT", interval == "mon")) |>
covars select(all_of(cfg$keep_vars)) |>
slice("month", "Aug")
3 Load the workflow
We read the recipe and model workflow bundle.
= read_workflow(version = cfg$version) wflow
Recall that the workflow is a container that has has two elements: pre-processing recipe and model. When we make a prediction with the workflow it will accept new data that then gets filtered and/or transformed as specified by the recipe steps. The data that survives the preprocessing will then be used to feed into the model that was trained on a specific domain (time and space).
4 Make a prediction
First we shall make a “nowcast” which is just a prediction of the current environmental conditions.
4.1 Nowcast
First make the prediction using predict_stars()
. The function yields a stars
array object that has three attributes: .pred_presence
, .pred_background
and .pred
. The leading dot simply gives us the heads up that these three values are all computed. The first two range from 0-1 which implies a probability. The last, .pred
, is the class label we would assign if we accept that any .pred_presence >= 0.5
should be considered suitable habitat where a reported observation might occur.
= predict_stars(wflow, covars)
nowcast nowcast
stars object with 2 dimensions and 3 attributes
attribute(s):
.pred_presence .pred_background .pred
Min. :0.000 Min. :0.002 presence : 613
1st Qu.:0.032 1st Qu.:0.748 background:5173
Median :0.097 Median :0.903 NA's :4983
Mean :0.184 Mean :0.816
3rd Qu.:0.252 3rd Qu.:0.968
Max. :0.998 Max. :1.000
NA's :4983 NA's :4983
dimension(s):
from to offset delta refsys point x/y
x 1 121 -74.93 0.08226 WGS 84 FALSE [x]
y 1 89 46.08 -0.08226 WGS 84 FALSE [y]
Now we can plot what is often called a “habitat suitability index” (hsi) map.
= read_coastline()
coast plot(nowcast['.pred_presence'], main = "Nowcast August",
axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)
plot(coast, col = "orange", lwd = 2, add = TRUE)
We can also plot a presence/background labeled map, but keep in mind it is just a thresholded version of the above where “presence” means .pred_presence >= 0.5
.
plot(nowcast['.pred'], main = "Nowcast August Labels",
axes = TRUE, reset = FALSE)
plot(coast, col = "black", lwd = 2, add = TRUE)
4.2 Forecast
Now let’s try our hand at forecasting - let’s try RCP85 in 2075. First we load those parameters, then run the prediction and plot.
= read_brickman(db |> filter(scenario == "RCP85", year == 2075, interval == "mon")) |>
covars_rcp85_2075 select(all_of(cfg$keep_vars)) |>
slice("month", "Aug")
= predict_stars(wflow, covars_rcp85_2075)
forecast_2075 forecast_2075
stars object with 2 dimensions and 3 attributes
attribute(s):
.pred_presence .pred_background .pred
Min. :0.000 Min. :0.347 presence : 53
1st Qu.:0.145 1st Qu.:0.681 background:5733
Median :0.270 Median :0.730 NA's :4983
Mean :0.237 Mean :0.763
3rd Qu.:0.319 3rd Qu.:0.855
Max. :0.653 Max. :1.000
NA's :4983 NA's :4983
dimension(s):
from to offset delta refsys point x/y
x 1 121 -74.93 0.08226 WGS 84 FALSE [x]
y 1 89 46.08 -0.08226 WGS 84 FALSE [y]
= read_coastline()
coast plot(forecast_2075['.pred_presence'], main = "RCP85 2075 August",
axes = TRUE, breaks = seq(0, 1, by = 0.1), reset = FALSE)
plot(coast, col = "orange", lwd = 2, add = TRUE)
Hmmm, that’s pretty different than what the nowcast predicts.
5 Time series
It would be nice to see a time series: current, 2055 and 2075 on the same graphic. Let’s load RCP85 2055 data, and make yet another prediction.
5.1 Forecast 2055
= read_brickman(db |> filter(scenario == "RCP85", year == 2055, interval == "mon")) |>
covars_rcp85_2055 select(all_of(cfg$keep_vars)) |>
slice("month", "Aug")
= predict_stars(wflow, covars_rcp85_2055)
forecast_2055 forecast_2055
stars object with 2 dimensions and 3 attributes
attribute(s):
.pred_presence .pred_background .pred
Min. :0.000 Min. :0.426 presence : 28
1st Qu.:0.138 1st Qu.:0.677 background:5758
Median :0.269 Median :0.731 NA's :4983
Mean :0.237 Mean :0.763
3rd Qu.:0.323 3rd Qu.:0.862
Max. :0.574 Max. :1.000
NA's :4983 NA's :4983
dimension(s):
from to offset delta refsys point x/y
x 1 121 -74.93 0.08226 WGS 84 FALSE [x]
y 1 89 46.08 -0.08226 WGS 84 FALSE [y]
5.2 Bind time series
We want to bind the .pred_presence
attribute for each of the predictions (nowcast, forecast_2055 and forecast_2075). Let’s assume the “present” mean 2020 so we can assign a year.
= c(nowcast, forecast_2055, forecast_2075, along = list(year = c("2020", "2055", "2075"))) rcp85
Curious about why we provide year as a vector of characters instead of a vector of integers? Try running the command above again but this time using along = list(year = c(2020, 2055, 2075))
(as numbers not strings). Then check out the 3rd dimension.
Since we are plotting multiple arrays, we need to plot the coastline using a “hook” function.
= function(){
plot_coast plot(coast, col = "orange", lwd = 2, add = TRUE)
}
plot(rcp85['.pred_presence'],
hook = plot_coast,
axes = TRUE, breaks = seq(0, 1, by = 0.1), join_zlim = TRUE, reset = FALSE)
Hmmmm. Why does there seem to be a strong shift between 2020 and 2055, while the 2055 to 2075 shift seems less pronounced?
Don’t forget that there are other ways to plot array based spatial data.
5.3 Save the predictions
We could save all three attributes, but .pred_background
is just 1 - .pred_presence
, and .pred
is just coding “presence” where .pred_presence >= 0.5
, so we can always compute those as needed if we have .pred_presence
. In that case, let’s just save the first attribute, .pred_presence
, in a multilayer GeoTIFF formatted image array file. The write_prediction()
function will do just that.
# make sure the output directory exists
= data_path("predictions")
path if (!dir.exists(path)) ok = dir.create(path, recursive = TRUE)
# write individual arrays?
write_prediction(nowcast, file = file.path(path,"g_Aug_RCP85_2020.tif"))
write_prediction(forecast_2055, file = file.path(path, "g_Aug_RCP85_2055.tif"))
write_prediction(forecast_2075, file = file.path(path, "g_Aug_RCP85_2075.tif"))
# or write them together in a "multi-layer" file?
write_prediction(rcp85, file = file.path(path, "g_Aug_RCP85_all.tif"))
To read it back simply provide the filename to read_prediction()
. If you are reading back a multi-layer array, be sure to check out the time
argument to assign values to the time dimension. Single layer arrays don’t have the concept of time so the time
argument is ignored.
6 Recap
We made both a nowcast and a number predictions using a previously saved workflow. Contrary to Yogi Berra’s claim, it’s actually pretty easy to predict the future. Perhaps more challenging is to interpret the prediction. We bundled these together to make time series plots, and we saved the .pred_presence
values.
7 Coding Assignment
For each each climate scenario create a monthly forecast (so that’s three: nowcast, forecast_2055 and forecast_2075) and save each to in your predictions
directory. Whether you choose to draw upon the greedy background sampling method, the conservative background sampling method or both is up to you. Keep in mind that some months may not have enough data to model without throwing an error. We suggest that you wrap your critical steps in a try()
function which will catch the error without crashing your iterator. There is a tutorial on error catching that specifically uses try()
.