Blog posts

2024

Spatial Clustering for Species Distribution Modeling

5 minute read

Published:

Citizen science biodiversity data can span large extents of space and time, but usually lack the structure required for building species distribution models which account for imperfect detection, i.e., occupancy models. Existing approaches of introducing this structure either throw away too much data due to strict definitions of sites and/or do not account for similarity in environmental feature space, leading to weaker downstream occupancy models. Spatial clustering algortihms from machine learning literature offer lucrative advantages over these existing approaches.

2023

How to Evaluate ML Models in Geospatial Settings?

6 minute read

Published:

Standard K-fold Cross-validation (KFCV) randomly divides a training set into K non-overlapping folds and iteratively holds out one fold at a time, training a model on the remainder (i.e., training folds) and measuring error on the held-out fold (i.e., validation fold). The average of these model errors across folds is the estimate of generalization performance for an unseen test set. KFCV provides unbiased performance estimates when applied to independent, identically distributed (iid) data. But does it also work well on geospatial data?

A comparison of remotely sensed environmental predictors for SDMs

6 minute read

Published:

There are many options when establishing environmntal variables from satellite imagery. First, one must select a satellite dataset (e.g., Landsat, MODIS, Sentinel). Second is deciding how to summarize the data (i.e., how to turn satellite imagery into input feature vectors). With the many satellite imagery datasets and methods of summarization, there is an open question of which environmental variables are best suited for SDMs. To help address this question, we compared the predictive power of several sets of environmental variables derived from Landsat satellite imagery in predicting 13 bird species across the state of Oregon, USA. This work was done in collaboration with the Oregon 2020 Project and was published in Landscape Ecology.