library(dplyr) # Data manipulation library(tidyr) # Tidy data library(sf) # Simple features for R (spatial objects) library(sit) # Analyse MRR data from SIT experiments
Importing data from files into R is performed using functions from other packages that either installed by default with R or that you have to install previously.
Typical candidates are
utils::read.csv for reading plain
text files (which is the
recommended format for storing data from MRR SIT experiments). MS
Excel files can be read with
readxl::read_excel. If you
have data in relational data bases (e.g. SQLite, MySQL, PostgreSQL,
etc), see the package DBI and
Almost certainly, you will need to transform your
data a little bit in order to put it in the format and structure
sit. You might be tempted to do this manually,
making a copy of your data files and removing, renaming and computing
variables as needed, filtering observations, copying and pasting values
from one table to another, etc.
I strongly encourage you to perform this operations from within R instead, using code rather than manual point-and-click operations, which are risky, error prone, difficult to verify and reproduce. In R, you can easily save your manipulations in a R script that you or someone else can check, reproduce and repeat if necessary (e.g. if some of the original data is corrected or added).
Fortunately, this is not too complicated and the required skills can
be learned in a couple of hours. You will mostly need to know a few
data-manipulation verbs (function names):
tidyr::separate(). A couple of tutorials covering
these are the Introduction
dplyr and Pivoting.
Finally, you will need a little understanding on how to deal with
spatial data in R. The section
on vector data in Robin Lovelace’s excellent on-line book
Geocomputation with R gives an overview of how the relevant
works. This is a vast topic, but we will only use point data,
which narrows the problem a lot. Essentially, you will only need to use
sf::st_as_sf() to convert a data table into a
sf object with proper spatial interpretation.
The Introductory example in Getting Started should cover most of the typical needs. Look for inspiration there, and use the materials provided above for reference and for a deeper understanding.
Below is a table of functions used for importing data from MRR SIT experiments.
||Import release events and sites.|
||Import adult surveys|
||Import egg surveys|
||Gather data about traps, release events and field
survey data into a
Release sites are automatically derived from the release events.
sit provides a default table of trap
types that you can query as follows:
|3||Human Landing Catch||HLC||adult||NA|
If you require other types of traps, you can import your own table of
trap types using the same function
its help page, or more specific tutorials, for more details.
Special data types
sit we need to specify the location of adult traps
and release points using spatial coordinates.
If GPS coordinates were collected as data variables, they can be
safely imported into
sit as shown in Getting Started, i.e. using
4326 as the EPSG code
for geographical coordinates, or alternatively, the string
points <- data.frame(id = "I'm at the Origin!", lon = 0, lat = 0) %>% st_as_sf( # Convert to spatial coords = c("lon", "lat"), # Variables with coordinates crs = 4326 # Code for GPS coordinates # crs = 'WGS84' # Same )
A simple way of gathering (approximate) coordinates if no GPS devices were used in the field, is to use a web-mapping tool such as OpenStreetMap or Google Maps to identify points and note coordinates down.
Coordinates can also be available in projected form, in some specific Coordinate Reference System (CRS). A very common case is Universal Transverse Mercator (UTM) coordinates. The issue is that you need to know the corresponding EPSG code, or a more general definition of the CRS in order to interpret the coordinates.
sf::st_as_sf() in the same way, but
specifying the right CRS code in the argument
sf, used behind the scenes by
sit for distance calculations, will correctly compute
distances in metres, even if you used geographical coordinates in the
input. No need to project your data.
Robin Lovelace (2021). Geocomputation with R. https://geocompr.robinlovelace.net/
Edzer Pebesma et al. (2021). Simple Features for R. https://r-spatial.github.io/sf/
Coordinate Systems Worldwide: https://epsg.io/
Dates and times
Dates will be requested as a character string in the
ISO 8601 format
Date and times will be jointly specified as a character
string of the form
2019-11-23 15:00. This is a
standard variation (profile) from ISO 8601 specified by the RFC 3339, neglecting the
time-zone specification for simplicity (not needed in practice) and
assuming the time is the local time at the specified date.
sit will internally use
convert them to date or date-time format. If you have data stored in a
different format (e.g. ‘DD/MM/YYYY’, or, ‘MM/DD/YYYY’, or whatever) you
can convert it yourself, prior to importing into
Otherwise, it will fail:
## Importing release event dates in a non-supported format fails: point_releases <- fake_rpoints point_releases$date <- c('11/25/2019', '12/01/2019', '12/13/2019') sit_revents(point_releases) #> Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format
Instead, convert to date yourself prior to import:
point_releases$date <- as.POSIXct(point_releases$date, format = "%m/%d/%Y") sit_revents(point_releases) #> id type site_id date colour n geometry #> 1 1 point 1 2019-11-25 yellow 10000 POINT (1 2) #> 3 2 point 2 2019-12-01 red 10000 POINT (2 1) #> 2 3 point 1 2019-12-13 blue 10000 POINT (1 2)