
Importing Data
importing.Rmd
library(dplyr) # Data manipulation
library(tidyr) # Tidy data
library(sf) # Simple features for R (spatial objects)
library(sit) # Analyse MRR data from SIT experiments
Introduction
Importing data from files into R is performed using functions from other packages that either installed by default with R or that you have to install previously.
Typical candidates are utils::read.csv
for reading plain
text files (which is the
recommended format for storing data from MRR SIT experiments). MS
Excel files can be read with readxl::read_excel
. If you
have data in relational data bases (e.g. SQLite, MySQL, PostgreSQL,
etc), see the package DBI and
references therein.
Almost certainly, you will need to transform your
data a little bit in order to put it in the format and structure
expected by sit
. You might be tempted to do this manually,
making a copy of your data files and removing, renaming and computing
variables as needed, filtering observations, copying and pasting values
from one table to another, etc.
I strongly encourage you to perform this operations from within R instead, using code rather than manual point-and-click operations, which are risky, error prone, difficult to verify and reproduce. In R, you can easily save your manipulations in a R script that you or someone else can check, reproduce and repeat if necessary (e.g. if some of the original data is corrected or added).
Fortunately, this is not too complicated and the required skills can
be learned in a couple of hours. You will mostly need to know a few
data-manipulation verbs (function names):
dplyr::filter()
, dplyr::select()
,
dplyr::mutate()
, tidyr::pivot_longer()
, and
perhaps tidyr::separate()
. A couple of tutorials covering
these are the Introduction
to dplyr
and Pivoting.
Finally, you will need a little understanding on how to deal with
spatial data in R. The section
on vector data in Robin Lovelace’s excellent on-line book
Geocomputation with R gives an overview of how the relevant
package, sf
works. This is a vast topic, but we will only use point data,
which narrows the problem a lot. Essentially, you will only need to use
the function sf::st_as_sf()
to convert a data table into a
sf
object with proper spatial interpretation.
The Introductory example in Getting Started should cover most of the typical needs. Look for inspiration there, and use the materials provided above for reference and for a deeper understanding.
Below is a table of functions used for importing data from MRR SIT experiments.
Function | Description |
---|---|
sit_revents() |
Import release events and sites. |
sit_traps() |
Import traps |
sit_adult_surveys() |
Import adult surveys |
sit_egg_surveys() |
Import egg surveys |
sit() |
Gather data about traps, release events and field
survey data into a sit object. |
Release sites are automatically derived from the release events.
The package sit
provides a default table of trap
types that you can query as follows:
id | name | label | stage | description |
---|---|---|---|---|
1 | Ovitrap | OVT | egg | NA |
2 | BG-Sentinel | BGS | adult | NA |
3 | Human Landing Catch | HLC | adult | NA |
If you require other types of traps, you can import your own table of
trap types using the same function sit_trap_types()
. Check
its help page, or more specific tutorials, for more details.
Special data types
Spatial coordinates
In sit
we need to specify the location of adult traps
and release points using spatial coordinates.
If GPS coordinates were collected as data variables, they can be
safely imported into sit
as shown in Getting Started, i.e. using
4326
as the EPSG code
for geographical coordinates, or alternatively, the string
'WGS84'
. E.g.
points <- data.frame(id = "I'm at the Origin!", lon = 0, lat = 0) %>%
st_as_sf( # Convert to spatial
coords = c("lon", "lat"), # Variables with coordinates
crs = 4326 # Code for GPS coordinates
# crs = 'WGS84' # Same
)
A simple way of gathering (approximate) coordinates if no GPS devices were used in the field, is to use a web-mapping tool such as OpenStreetMap or Google Maps to identify points and note coordinates down.
Coordinates can also be available in projected form, in some specific Coordinate Reference System (CRS). A very common case is Universal Transverse Mercator (UTM) coordinates. The issue is that you need to know the corresponding EPSG code, or a more general definition of the CRS in order to interpret the coordinates.
Otherwise, use sf::st_as_sf()
in the same way, but
specifying the right CRS code in the argument crs
. See
sf::st_crs()
.
The package sf
, used behind the scenes by
sit
for distance calculations, will correctly compute
distances in metres, even if you used geographical coordinates in the
input. No need to project your data.
References:
Robin Lovelace (2021). Geocomputation with R. https://geocompr.robinlovelace.net/
Edzer Pebesma et al. (2021). Simple Features for R. https://r-spatial.github.io/sf/
Coordinate Systems Worldwide: https://epsg.io/
Dates and times
Dates will be requested as a character string in the
ISO 8601 format
(e.g. 2021-12-31
).
Date and times will be jointly specified as a character
string of the form 2019-11-23 15:00
. This is a
standard variation (profile) from ISO 8601 specified by the RFC 3339, neglecting the
time-zone specification for simplicity (not needed in practice) and
assuming the time is the local time at the specified date.
sit
will internally use as.POSIXct()
to
convert them to date or date-time format. If you have data stored in a
different format (e.g. ‘DD/MM/YYYY’, or, ‘MM/DD/YYYY’, or whatever) you
can convert it yourself, prior to importing into sit
.
Otherwise, it will fail:
## Importing release event dates in a non-supported format fails:
point_releases <- fake_rpoints
point_releases$date <- c('11/25/2019', '12/01/2019', '12/13/2019')
sit_revents(point_releases)
#> Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format
Instead, convert to date yourself prior to import:
point_releases$date <- as.POSIXct(point_releases$date, format = "%m/%d/%Y")
sit_revents(point_releases)
#> id type site_id date colour n geometry
#> 1 1 point 1 2019-11-25 yellow 10000 POINT (1 2)
#> 3 2 point 2 2019-12-01 red 10000 POINT (2 1)
#> 2 3 point 1 2019-12-13 blue 10000 POINT (1 2)