Skip to contents
library(dplyr)  # Data manipulation
library(tidyr)  # Tidy data
library(sf)     # Simple features for R (spatial objects)
library(sit)    # Analyse MRR data from SIT experiments


Importing data from files into R is performed using functions from other packages that either installed by default with R or that you have to install previously.

Typical candidates are utils::read.csv for reading plain text files (which is the recommended format for storing data from MRR SIT experiments). MS Excel files can be read with readxl::read_excel. If you have data in relational data bases (e.g. SQLite, MySQL, PostgreSQL, etc), see the package DBI and references therein.

Almost certainly, you will need to transform your data a little bit in order to put it in the format and structure expected by sit. You might be tempted to do this manually, making a copy of your data files and removing, renaming and computing variables as needed, filtering observations, copying and pasting values from one table to another, etc.

I strongly encourage you to perform this operations from within R instead, using code rather than manual point-and-click operations, which are risky, error prone, difficult to verify and reproduce. In R, you can easily save your manipulations in a R script that you or someone else can check, reproduce and repeat if necessary (e.g. if some of the original data is corrected or added).

Fortunately, this is not too complicated and the required skills can be learned in a couple of hours. You will mostly need to know a few data-manipulation verbs (function names): dplyr::filter(), dplyr::select(), dplyr::mutate(), tidyr::pivot_longer(), and perhaps tidyr::separate(). A couple of tutorials covering these are the Introduction to dplyr and Pivoting.

Finally, you will need a little understanding on how to deal with spatial data in R. The section on vector data in Robin Lovelace’s excellent on-line book Geocomputation with R gives an overview of how the relevant package, sf works. This is a vast topic, but we will only use point data, which narrows the problem a lot. Essentially, you will only need to use the function sf::st_as_sf() to convert a data table into a sf object with proper spatial interpretation.

The Introductory example in Getting Started should cover most of the typical needs. Look for inspiration there, and use the materials provided above for reference and for a deeper understanding.

Below is a table of functions used for importing data from MRR SIT experiments.

Import functions.
Function Description
sit_revents() Import release events and sites.
sit_traps() Import traps
sit_adult_surveys() Import adult surveys
sit_egg_surveys() Import egg surveys
sit() Gather data about traps, release events and field survey data into a sit object.

Release sites are automatically derived from the release events.

The package sit provides a default table of trap types that you can query as follows:

Default trap types in sit.
id name label stage description
1 Ovitrap OVT egg NA
2 BG-Sentinel BGS adult NA
3 Human Landing Catch HLC adult NA

If you require other types of traps, you can import your own table of trap types using the same function sit_trap_types(). Check its help page, or more specific tutorials, for more details.

Special data types

Spatial coordinates

In sit we need to specify the location of adult traps and release points using spatial coordinates.

If GPS coordinates were collected as data variables, they can be safely imported into sit as shown in Getting Started, i.e. using 4326 as the EPSG code for geographical coordinates, or alternatively, the string 'WGS84'. E.g.

points <- data.frame(id = "I'm at the Origin!", lon = 0, lat = 0) %>% 
  st_as_sf(                           # Convert to spatial
    coords = c("lon", "lat"),           # Variables with coordinates
    crs = 4326                          # Code for GPS coordinates
    # crs = 'WGS84'                       # Same

A simple way of gathering (approximate) coordinates if no GPS devices were used in the field, is to use a web-mapping tool such as OpenStreetMap or Google Maps to identify points and note coordinates down.

Coordinates can also be available in projected form, in some specific Coordinate Reference System (CRS). A very common case is Universal Transverse Mercator (UTM) coordinates. The issue is that you need to know the corresponding EPSG code, or a more general definition of the CRS in order to interpret the coordinates.

Otherwise, use sf::st_as_sf() in the same way, but specifying the right CRS code in the argument crs. See sf::st_crs().

The package sf, used behind the scenes by sit for distance calculations, will correctly compute distances in metres, even if you used geographical coordinates in the input. No need to project your data.


Dates and times

Dates will be requested as a character string in the ISO 8601 format (e.g. 2021-12-31).

Date and times will be jointly specified as a character string of the form 2019-11-23 15:00. This is a standard variation (profile) from ISO 8601 specified by the RFC 3339, neglecting the time-zone specification for simplicity (not needed in practice) and assuming the time is the local time at the specified date.

sit will internally use as.POSIXct() to convert them to date or date-time format. If you have data stored in a different format (e.g. ‘DD/MM/YYYY’, or, ‘MM/DD/YYYY’, or whatever) you can convert it yourself, prior to importing into sit. Otherwise, it will fail:

## Importing release event dates in a non-supported format fails:
point_releases <- fake_rpoints
point_releases$date <- c('11/25/2019', '12/01/2019', '12/13/2019')
#> Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format

Instead, convert to date yourself prior to import:

point_releases$date <- as.POSIXct(point_releases$date, format = "%m/%d/%Y")
#>   id  type site_id       date colour     n    geometry
#> 1  1 point       1 2019-11-25 yellow 10000 POINT (1 2)
#> 3  2 point       2 2019-12-01    red 10000 POINT (2 1)
#> 2  3 point       1 2019-12-13   blue 10000 POINT (1 2)