Skip to contents
library(dplyr)   # Data manipulation
library(ggplot2) # Plotting
library(tidyr)   # Tidy data
library(sf)      # Simple features for R (spatial objects)
library(sit)     # Analyse MRR data from SIT experiments

Overview of the experimental setup

Printing the sit object gives a very short description of its contents:

sit_prototype
#> ── sit ─────────────────────────────────────────────────────────────────
#> Release events: 1 (areal) + 3 (point)
#> Release sites : 2
#> Adult traps   : 21 (sit)
#> Egg traps     : 7 (control) + 14 (sit)
#> Trap types    : 21 (BGS) + 21 (OVT)
#> Surveys       : 3486 (adult) + 462 (egg)

Use summary() to get a more detailed summary of each of the sit components:

summary(sit_prototype)
#> ── Releases ────────────────────────────────────────────────────────────
#> Number of point releases: 1 
#> Number of areal releases: 1 
#> 
#> Release events:
#>   id  type site_id                date colour     n    geometry
#> 1  1 point       1 2019-11-24 23:00:00 yellow 10000 POINT (0 0)
#> 2  2 point       1 2019-11-30 23:00:00    red 10000 POINT (0 0)
#> 3  3 point       1 2019-12-12 23:00:00   blue 10000 POINT (0 0)
#> 4  4 areal       2 2019-12-20 23:00:00   pink 10000 POINT EMPTY
#> 
#> Release sites:
#> Simple feature collection with 2 features and 1 field (with 1 geometry empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 0 ymin: 0 xmax: 0 ymax: 0
#> CRS:           +proj=laea +x_0=0 +y_0=0 +lon_0=16.41472 +lat_0=48.23528
#>   id    geometry
#> 1  1 POINT (0 0)
#> 2  2 POINT EMPTY
#> 
#> ── Traps ───────────────────────────────────────────────────────────────
#> Number of traps in each group:
#>      Area Type Coords  n
#> 1 control  OVT  FALSE  7
#> 2     sit  OVT  FALSE 14
#> 3     sit  BGS   TRUE 21
#> 
#> ── Adult surveys (total captures) ──────────────────────────────────────
#> , , sit
#> 
#>        female male
#> yellow     NA   58
#> red        NA   79
#> blue       NA   86
#> pink       NA   90
#> wild      226  299
#> 
#> 
#> ── Egg surveys (total captures) ────────────────────────────────────────
#>         Sterile Fertile
#> control    3407   18432
#> sit        4488    6808

Use plot() to get a very basic image of the experimental spatial set up.

plot(sit_prototype)

You can make more professional and even interactive maps by leveraging other R packages of your choice. This is beyond the scope of sit, but I’ll provide a few examples here using the package tmap. To learn more on mapping, see the chapter Making maps with R in Robin Lovelace’s excellent book Geocomputation with R.

library(tmap)
tmap_cmode <- tmap_mode("view")
#> tmap mode set to interactive viewing
tm_basemap("CartoDB") +
tm_shape(
  sit_traps(sit_prototype, area = "sit", stage = "adult")
) +
  tm_dots() +
  tm_shape(
    sit_revents(sit_prototype, type = "point") %>%
      st_as_sf()
  ) +
  tm_bubbles(
    col = "colour",
    jitter = 1       # Jitter a bit to see the overlapping release points
  )
tmap_mode(tmap_cmode)
#> tmap mode set to plotting

Note how we retrieved only adult traps from the sit area, and only the point releases, out of all available traps and releases. We will see how that worked in the following section.

Extraction functions

Extraction functions allow retrieving the data contained in a sit object, while filtering specific groups as needed.

Here we describe the list of arguments of each extraction function, with a few examples. This is very much borrowed from the help pages, but focused on the extraction method only.

sit_trap_types()

Arguments:
x A sit object.

This one is simple: it only takes a sit object and returns the list of trap types. E.g.

sit_trap_types(sit_prototype)
#>   id                name label stage description
#> 1  1             Ovitrap   OVT   egg          NA
#> 2  2         BG-Sentinel   BGS adult          NA
#> 3  3 Human Landing Catch   HLC adult          NA

The result is a data.frame itself. Thus, if you really need to filter some observations, you can put your R-skills to use:

sit_trap_types(sit_prototype) %>% 
  dplyr::filter(stage == "egg")
#>   id    name label stage description
#> 1  1 Ovitrap   OVT   egg          NA

sit_traps()

Arguments:
x A sit object.
type Character vector. Labels of requested trap types. By default, sit_trap_types()$label.
stage Character vector. Either adult, egg or both (default).
area Character vector. Either control, sit or both (default).

Retrieve the table of ovitraps:

sit_traps(sit_prototype, type = "OVT")
#> Simple feature collection with 21 features and 8 fields (with 21 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           +proj=laea +x_0=0 +y_0=0 +lon_0=16.41472 +lat_0=48.23528
#> First 10 features:
#>    id code area label type_name type_label type_stage type_description
#> 1  22 CS01  sit    NA   Ovitrap        OVT        egg               NA
#> 2  23 CS02  sit    NA   Ovitrap        OVT        egg               NA
#> 3  24 CS03  sit    NA   Ovitrap        OVT        egg               NA
#> 4  25 CS04  sit    NA   Ovitrap        OVT        egg               NA
#> 5  26 CS05  sit    NA   Ovitrap        OVT        egg               NA
#> 6  27 CS06  sit    NA   Ovitrap        OVT        egg               NA
#> 7  28 CS07  sit    NA   Ovitrap        OVT        egg               NA
#> 8  29 CS08  sit    NA   Ovitrap        OVT        egg               NA
#> 9  30 CS09  sit    NA   Ovitrap        OVT        egg               NA
#> 10 31 CS10  sit    NA   Ovitrap        OVT        egg               NA
#>       geometry
#> 1  POINT EMPTY
#> 2  POINT EMPTY
#> 3  POINT EMPTY
#> 4  POINT EMPTY
#> 5  POINT EMPTY
#> 6  POINT EMPTY
#> 7  POINT EMPTY
#> 8  POINT EMPTY
#> 9  POINT EMPTY
#> 10 POINT EMPTY

Note that this coincides with the table of traps for the stage = 'egg'.

However, traps in the control area are only a subset:

sit_traps(sit_prototype, area = "control")
#> Simple feature collection with 7 features and 8 fields (with 7 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           +proj=laea +x_0=0 +y_0=0 +lon_0=16.41472 +lat_0=48.23528
#>   id code    area label type_name type_label type_stage
#> 1 36 BL01 control    NA   Ovitrap        OVT        egg
#> 2 37 BL02 control    NA   Ovitrap        OVT        egg
#> 3 38 BL03 control    NA   Ovitrap        OVT        egg
#> 4 39 BL04 control    NA   Ovitrap        OVT        egg
#> 5 40 BL05 control    NA   Ovitrap        OVT        egg
#> 6 41 BL06 control    NA   Ovitrap        OVT        egg
#> 7 42 BL07 control    NA   Ovitrap        OVT        egg
#>   type_description    geometry
#> 1               NA POINT EMPTY
#> 2               NA POINT EMPTY
#> 3               NA POINT EMPTY
#> 4               NA POINT EMPTY
#> 5               NA POINT EMPTY
#> 6               NA POINT EMPTY
#> 7               NA POINT EMPTY

sit_revents()

Arguments:
x A sit object.
type Character vector. Which types of release events to retrieve. Either point or areal or both (default).
sit_revents(sit_prototype, type = 'point')
#>   id  type site_id                date colour     n    geometry
#> 1  1 point       1 2019-11-24 23:00:00 yellow 10000 POINT (0 0)
#> 2  2 point       1 2019-11-30 23:00:00    red 10000 POINT (0 0)
#> 3  3 point       1 2019-12-12 23:00:00   blue 10000 POINT (0 0)
sit_revents(sit_prototype, type = 'areal')
#>   id  type site_id                date colour     n    geometry
#> 1  4 areal       2 2019-12-20 23:00:00   pink 10000 POINT EMPTY
sit_revents(sit_prototype)
#>   id  type site_id                date colour     n    geometry
#> 1  1 point       1 2019-11-24 23:00:00 yellow 10000 POINT (0 0)
#> 2  2 point       1 2019-11-30 23:00:00    red 10000 POINT (0 0)
#> 3  3 point       1 2019-12-12 23:00:00   blue 10000 POINT (0 0)
#> 4  4 areal       2 2019-12-20 23:00:00   pink 10000 POINT EMPTY

sit_adult_surveys() and sit_egg_surveys()

Extract survey data, possibly filtering results.

Arguments:
x A sit object
area Character. Either control, sit or both (default).
trap_type Character. Any of sit_trap_types(x)$label. Typically, OVT for sit_egg_surveys() and BGS or HLC for sit_adult_surveys().
following_releases A sit_release object with a subset of release events or missing (default) for all release events. Use in combination with following_days to filter surveys of the target populations within a given number of days after release. Note that counts of wild populations will always be included in the results with a value of NA in pop_col.
following_days Integer or missing (default). Number of days after releases to return, if following_releases is not missing.
species Character or NULL (default). Filter surveys for a particular species, or ignore the species if NULL.

Retrieve all the egg surveys in the control area:

sit_egg_surveys(sit_prototype, area = "control")
#> # A tibble: 154 × 6
#>       id trap_code datetime_start      datetime_end        fertile     n
#>    <int> <chr>     <dttm>              <dttm>              <lgl>   <dbl>
#>  1   309 BL01      2019-11-21 23:00:00 2019-11-28 23:00:00 TRUE       41
#>  2   310 BL01      2019-11-21 23:00:00 2019-11-28 23:00:00 FALSE      37
#>  3   311 BL02      2019-11-21 23:00:00 2019-11-28 23:00:00 TRUE      102
#>  4   312 BL02      2019-11-21 23:00:00 2019-11-28 23:00:00 FALSE      20
#>  5   313 BL03      2019-11-21 23:00:00 2019-11-28 23:00:00 TRUE      204
#>  6   314 BL03      2019-11-21 23:00:00 2019-11-28 23:00:00 FALSE      18
#>  7   315 BL04      2019-11-21 23:00:00 2019-11-28 23:00:00 TRUE      354
#>  8   316 BL04      2019-11-21 23:00:00 2019-11-28 23:00:00 FALSE      54
#>  9   317 BL05      2019-11-21 23:00:00 2019-11-28 23:00:00 TRUE       31
#> 10   318 BL05      2019-11-21 23:00:00 2019-11-28 23:00:00 FALSE      10
#> # … with 144 more rows

Retrieve the adult surveys from the day that followed the areal release. Only counts from the target release, or the wild population, please.

sit_adult_surveys(
  sit_prototype,
  following_releases = sit_revents(sit_prototype, type = 'areal'),
  following_days = 1
)
#> # A tibble: 63 × 7
#>       id trap_code pop_col datetime_start      datetime_end        sex  
#>    <int> <chr>     <chr>   <dttm>              <dttm>              <chr>
#>  1  2227 1         wild    2019-12-20 23:00:00 2019-12-21 23:00:00 fema…
#>  2  2228 1         wild    2019-12-20 23:00:00 2019-12-21 23:00:00 male 
#>  3  2232 1         pink    2019-12-20 23:00:00 2019-12-21 23:00:00 male 
#>  4  2233 2         wild    2019-12-20 23:00:00 2019-12-21 23:00:00 fema…
#>  5  2234 2         wild    2019-12-20 23:00:00 2019-12-21 23:00:00 male 
#>  6  2238 2         pink    2019-12-20 23:00:00 2019-12-21 23:00:00 male 
#>  7  2239 3         wild    2019-12-20 23:00:00 2019-12-21 23:00:00 fema…
#>  8  2240 3         wild    2019-12-20 23:00:00 2019-12-21 23:00:00 male 
#>  9  2244 3         pink    2019-12-20 23:00:00 2019-12-21 23:00:00 male 
#> 10  2245 4         wild    2019-12-20 23:00:00 2019-12-21 23:00:00 fema…
#> # … with 53 more rows, and 1 more variable: n <dbl>

SIT results

1. Descriptive summaries

Dispersal of sterile males in space and time.

## Retrieve age since release date of each survey
sm_ages <- survey_ages(
  surveys  = sit_adult_surveys(sit_prototype),
  releases = sit_revents(sit_prototype, type = 'point')
) 

## Summarise counts by trap and age-group (by 5 days)
sm_summary <- 
  sm_ages %>% 
  mutate(
    age_group = cut(age, 5*(0:6))
  ) %>% 
  filter(!is.na(age_group)) %>%   # remove ages 30+ (max is 36)
  group_by(trap_code, age_group) %>% 
  summarise(n = sum(n), .groups = 'drop')

## Join this data to the geospatial information of the traps
sm_sf <- sit_traps(sit_prototype, area = "sit", stage = "adult") %>% 
  left_join(sm_summary, by = c(code = 'trap_code'))

tm_shape(sm_sf) +
  tm_bubbles(size = 'n') +
  tm_facets(by = 'age_group')

2. Competitiveness

The competitiveness \(\gamma\) of the sterile male individuals is defined as their relative capacity to mate with a wild female, compared to a wild male.

Thus, in a homogeneously mixed population with \(M_s\) sterile males and \(M_w\) wild males, the probability that a mating occurs with a sterile individual is \[ p_s = \frac{\gamma M_s}{M_w + \gamma M_s} = \frac{\gamma R_{sw}}{1 + \gamma R_{sw}}, \] where \(R_{sw} = M_s/M_w\) is the sterile-wild male ratio. At a given sterile-wild male ratio \(R_{sw} = M_s/M_w\) we observe a fertility rate \(H_s\) in the field.

Assuming a residual fertility rate \(H_{rs}\) for sterile males and a natural fertility rate \(H_w\) for wild males, the observed fertility rate \(H_s\) in the field is: \[\begin{equation} H_s = p_s H_{rs} + (1-p_s)H_w = \frac{\gamma R_{sw}}{1 + \gamma R_{sw}} H_{rs} + \frac{1}{1 + \gamma R_{sw}} H_w. \end{equation}\]

Solving the above equation for \(\gamma\) we obtain the Fried index, which estimates the competitiveness based on observed fertility rates and the sterile-wild male ratio: \[\begin{equation} F = \frac{H_w – H_s}{H_s-H_{rs}} \bigg/ R_{sw} \end{equation}\] where \(H_w\) is the percentage egg hatch in the control area (i.e. the natural fertility); \(H_s\) is the percentage egg hatch in the release area (observed fertility under a sterile-wild male ratio \(R_{sw}\)) and \(H_{rs}\) is the residual fertility of sterile males.

This requires the estimation of:

  1. The sterile-wild males ratio \(R_{sw}\). Estimated from the observed ratios in adult surveys in the following 7 days after areal releases by default.

  2. The natural fertility \(H_w\). Estimated from egg-hatching rates in the control area.

  3. The observed fertility in the release area \(H_s\). Estimated from egg-hatching rates in the SIT area.

  4. The residual fertility of sterile males \(H_{rs}\). This depends on the level of radiation used for sterilisation, and the influx of females from beyond the SIT area. This cannot be estimated from SIT data and needs to be fixed by the user. Using \(H_{rs} = 0\) reduces to the so-called CIS index.

sit_competitiveness(sit_prototype, residual_fertility = 0.05)
#> 
#> ── Sterile male competitiveness ────────────────────────────────────────
#>  Estimated value: 0.6
#> 
#> 
#> Component                         Value
#> -------------------------------  ------
#> Sterile-wild male ratio            1.13
#> Natural fertility                  0.84
#> Observed fertility in SIT area     0.52
#> Sterile fertility (assumed)        0.05

If there were more than one areal release, we could compute the estimated competitiveness for each one separately by specifying following_release. We can also control the number of days following the release to be used for the calculation of the sterile-wild male ratio.

sit_competitiveness(
  sit_prototype,
  following_releases = sit_revents(sit_prototype, type = "areal")[1,],
  following_days = 10,
  residual_fertility = 0.05
)
#> 
#> ── Sterile male competitiveness ────────────────────────────────────────
#>  Estimated value: 1.18
#> 
#> 
#> Component                         Value
#> -------------------------------  ------
#> Sterile-wild male ratio            0.58
#> Natural fertility                  0.84
#> Observed fertility in SIT area     0.52
#> Sterile fertility (assumed)        0.05

We can be shocked by the difference in the estimate after simply widening the observation window to 10 days rather than 7, and legitimately ask which time-span is more appropriate.

Notice that the only component of the calculation that changed sensibly is the sterile-wild male ratio, which essentially halved.

Let’s see how the relationship between captured sterile and wild males evolve in time. We can look at the ratio, but the difference is a more natural scale.

areal_release <- sit_revents(sit_prototype, type = "areal")

areal_surveys <- sit_adult_surveys(
    sit_prototype,
    area = "sit",
    following_releases = areal_release,
    following_days = 10
  )

sterile_wild_male_ratio(areal_surveys) %>% 
  mutate(
    age = as.numeric(datetime_end - areal_release$date)
  ) %>% 
  group_by(age) %>% 
  summarise(
    sterile = sum(sterile),
    wild = sum(wild),
    swdiff = sterile - wild
  ) %>% 
  ggplot(aes(age, swdiff)) +
  geom_point() +
  geom_smooth(method = 'loess', formula = 'y ~ x') +
  labs(x = "Age", y = "Sterile - wild difference")

Starting from day 8, the number of captured sterile males drops consistently with respect to the captured wild males, which explains the difference in the estimate of competitiveness.

Since the calculation pools observations assuming a constant ratio, we can conclude that 7 days was an appropriate time-span for these data.

3. Dispersal

Mean Distance Travelled (MDT)

Since the dispersion evolves in time, typically increasing with time since release, we compute the MDT at each day \(j\) and, potentially, for the population corresponding to the release event \(k\).

\[\begin{equation} \text{MDT}_{jk} = \sum_{i = 1}^{n_t} w_{ik}\,n_{ijk}\,d_i \bigg/ \sum_{i = 1}^{n_t} w_{ik}\,n_{ijk} \end{equation}\] where \(n_t\) is the total number of traps, \(d_i\) is the distance from trap \(i\) to the release site of event \(k\) and \(n_{ijk}\) is the number of sterile males surveyed at trap \(i\), in day \(j\) after release event \(k\).

Furthermore, a set of weights \(w_{ik}\) for each trap and release event can be used, in order to account for a inhomogeneous arrangement of the traps, as explained below. This is activated by the argument spatial_adjustment in sit_mdt() and sit_flight_range(). Without spatial adjustment, the weights are set as \(w_{ik} = 1\) and the MDT reduces to the so-called Mean Dispersal Distance (MDD).

Depending on the needs and hypotheses, the calculation could also aggregate observations by days, populations or both.

If most traps were concentrated far away from the release points, then MDT would be artificially inflated as a consequence. A way of correcting for this is to weight the counts by the inverse relative local trap density at each trap. However, a spatial density estimate of traps would be too noisy with only a couple dozen traps and suffer from considerable edge effects. Thus, we assume isotropy (neglect the direction) and consider only the radial density from the release site.

Trap density with respect to the distance from the release point increases linearly with the distance for a uniform distribution of traps.

The relative density is the ratio between the observed density and the expected density under uniformity (straight line). While the \(\omega_i\) is the inverse of the ratio at the distance of the corresponding trap (Figure below).

This approach also suffers from edge effect. However, the trick to solve it is much simpler and the results more robust. The figure above shows the uncorrected density in grey and the adjusted density as a black curve with points at the observed distances.

The only issue would be for trap patterns that are strongly concentrated in specific directions at different distances. However, this seems unlikely.

With this spatial adjustment (performed by default), we can assess the MDT by each released population (point releases only), and we can also plot the histogram of travelled distances weighted by the number of observations and the spatial adjustment:

(mdt_prototype <- sit_mdt(sit_prototype, by = "population"))
#> 
#> ── Mean Distance Travelled ─────────────────────────────────────────────
#> 
#> 
#> population       mdt
#> -----------  -------
#> blue          110.08
#> red            63.48
#> yellow         69.35
plot(mdt_prototype)

The histogram excludes the most extreme 2.5 % of observations on each side, to improve visualisation. The vertical lines display the mean distances travelled by population. They appear shifted towards 0 due to a large number of adjusted distances near 0 that are not shown.

Contrast with the results without the spatial adjustment. The estimated distances are shorter, due to a over-represented number of traps at shorter distances.

sit_mdt(sit_prototype, by = "population", spatial_adjustment = FALSE)
#> 
#> ── Mean Distance Travelled ─────────────────────────────────────────────
#> 
#> 
#> population      mdt
#> -----------  ------
#> blue          66.97
#> red           52.44
#> yellow        58.67

We can also consider the MDT by age (pooling across populations).

sit_mdt(sit_prototype, by = "age")
#> 
#> ── Mean Distance Travelled ─────────────────────────────────────────────
#> 
#> 
#>  age      mdt
#> ----  -------
#>    1    88.39
#>    2    51.75
#>    3    57.35
#>    4    68.11
#>    5   109.28
#>    6    79.75
#>    7   154.33
#>    8    79.38
#>    9   104.39
#>   10   128.51
#>   11   128.51
#>   12    37.38
#>   13   330.73
#>   14   128.51
#>   15    76.27

There is no obvious increase of MDT with time. It could be one particular population which is distorting the results, but we don’t have enough observations to split the MDT by age and population.

On the other hand, estimates get noisier for larger ages since observations are more scarce. If we restricted the analysis to the first 10 days of age, the pattern is more apparent.

sit_mdt(sit_prototype, by = "age") %>% 
  filter(age <= 10) %>% 
  ggplot(aes(age, mdt)) +
  geom_point() +
  geom_smooth(method = "lm", formula = 'y ~ x')

Flight Range

This is an alternative measure of the dispersal capacity of sterile males based on quantiles rather than averages.

The method regresses the log-distances (\(+1\) to avoid divergence) from the release point as a function of the cumulative (trap-density adjusted) catches. This relationship is modelled linearly, and the expected distances at the 50 and 90% adjusted catches with respect to the total are defined as the corresponding flight ranges FR50 and FR90.

The function sit_flight_range() uses the levels 50 and 90 by default and allows making a basic plot of the underlying model for verification.

(fr_prototype <- sit_flight_range(sit_prototype))
#> 
#> ── Flight range ────────────────────────────────────────────────────────
#> 
#> 
#> population    level   cum_adj_catch       FR
#> -----------  ------  --------------  -------
#> blue             50           15.92    73.54
#> blue             90           28.66   240.57
#> red              50           11.63    51.89
#> red              90           20.93   165.90
#> yellow           50            9.13    64.86
#> yellow           90           16.44   172.20
fr_plot <- plot(fr_prototype)

Moreover, the plot method returns the underlying data invisibly, which allows making a more customised figure. E.g.

fr_plot %>% 
  ggplot(aes(cx, logd, colour = population)) +
  geom_point(show.legend = FALSE) +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE, show.legend = FALSE) +
  labs(x = "Cumulative captures", y = "Log-distance (log-m)") +
  scale_colour_manual(values = unique(fr_plot$population))

The function also allows defining any number of quantiles and gives results either disaggregated by population (default) or all populations pooled together.

sit_flight_range(sit_prototype, levels = c(25, 50, 75), pool = TRUE)
#> 
#> ── Flight range ────────────────────────────────────────────────────────
#> 
#> 
#> population    level   cum_adj_catch       FR
#> -----------  ------  --------------  -------
#> pooled           25           18.34    29.31
#> pooled           50           36.68    61.46
#> pooled           75           55.02   127.70

Diffusion

Assuming that mosquitoes fly randomly (Brownian motion), their dispersion follow Fick’s first law of diffusion, which postulates that flux goes from regions of high concentration to regions of low concentration, with a magnitude that is proportional to the concentration gradient (spatial derivative). The proportionality constant is the diffusivity \(D\) and depend of the species’ behaviour for that location, season and weather conditions.

Under this model, the Mean Squared Displacement (MSD) of mosquitoes from their release point at time \(t\) is \[ \text{MSD}(t) = 4Dt. \]

From which we can solve for the Diffusion coefficient by using the distances from the release point at which sterile males were captured to estimate the MSD at a given age.

The function sit_diffusion() computes the MSD of the individuals (by population, by default, or possibly pooling populations) at the different observed ages, and derives \(D\) as the regression slope. The resulting object can be plotted to see the underlying regression model.

(diff_prototype <- sit_diffusion(sit_prototype))
#> 
#> ── Diffusion coefficient ───────────────────────────────────────────────
#> 
#> 
#> population      Dest
#> -----------  -------
#> blue          860.72
#> red           139.33
#> yellow        173.71
plot(diff_prototype)

4. Survival

We assume that for a released population of sterile males there is a certain probability of daily survival (\(\pi_\text{PDS}\)), which represents the probability that an individual (or the proportion of individuals that) makes it to the next day.

The proportion \(p_j\) of sterile males still alive at day \(j\) decreases geometrically, starting from some initial survival \(p_0\) at day 0:

\[ p_j = p_0 \, \pi_\text{PDS}^j. \]

Assuming that each day \(j\) we capture a fraction \(c\) of the released individuals that remain alive, we have that the expected number of surveyed individuals at day \(j\) is

\[ E[y_j] = c\, p_0 \, \pi_\text{PDS}^j. \]

In log-scale, \[ \log(E[y_j]) = \alpha + \beta \cdot j, \] where \(\alpha = \log(c\, p_0)\) and \(\beta = \log(\pi_\text{PDS})\).

This leads to estimating PDS by regressing the log-number of catches (+ 1, to avoid divergences) on the number of days since release (age).

The methodology was taken from Kay and Muir (1998), but originally developed in Gillies (1961). There is no appropriate statistical model other than the observation the the rate of decrease in the caught individuals is approximately constant, which leads to the regression approach.

The recapture \(\theta\) and survival \(S\) rates are computed as

\[\begin{equation} \begin{aligned} \theta & = \exp(\alpha) / (N + \exp(\alpha)) \\ S & = \exp(\beta) / (1 + \theta)^{1/d}, \end{aligned} \end{equation}\] where \(\alpha\) and \(\beta\) are the regression coefficients of the log-catches against age, \(N\) is the number of individuals released and \(d\) is the number of days after release (age in days). We use \(d = 1\).

The function sit_survival() returns a table with the Probability of Daily Survival (PDS), the Average Life Expectancy ALE, Recapture Rate RR, Survival Rate SR), by population (by default, unless pool = TRUE)

(surv_prototype <- sit_survival(sit_prototype))
#> 
#> ── Survival ────────────────────────────────────────────────────────────
#> 
#> 
#> population    N_released    PDS    ALE   RRx1e3     SR
#> -----------  -----------  -----  -----  -------  -----
#> blue               10000   0.81   4.83     1.96   0.81
#> red                10000   0.81   4.83     1.76   0.82
#> yellow             10000   0.81   4.83     1.38   0.82
plot(surv_prototype)

5. Density or size of the wild population

An estimation of the density, or equivalently, the size of the wild mosquito population is necessary for the determination of the appropriate quantity of sterile males to be released for controlling the population effectively.

A simple estimate is obtained as follows (Thompson 2012, Ch. 18). Let the total captures at a day \(t\) be the sum of \(m_t\) marked and \(n_t\) wild mosquitoes. Assuming that the proportion of marked individuals in the sample equals that in the population of size \(P\): \[ \frac{m_t}{m_t + n_t} = \frac{M_t}{M_t + P}, \] where \(M_t = R\,S^{a_t}\) is the number of marked individuals captured at time \(t\), with \(R\) the number of released adults, \(S\) the daily survival rate and \(a_t\) the number of days since release (age). I.e., the number of marked individuals at time \(t\) is the remaining number from those released that survived for \(a_t\) days.

The Lincoln Index (a.k.a. the Petersen estimator) has been used as a simple estimate of the wild male population size, assuming that the survival rate of an individual remains constant. Here we use a modified version that corrects for small samples and compensates for daily survival. \[ P_t = R\,S^{a_t}\,(n_t + 1) / (m_t + 1). \]

The values of \(R\), \(n_t\), \(m_t\) and \(t\) can be gathered from the adult surveys data. The calculation requires the estimation of the survival rate \(S\).

This provides multiple estimates of the population size, one for each age \(t\) and for each released population, which can be averaged to compute a final estimate based on all the available data.

The calculation needs to be split by released population. First, because survival rates could be different among populations, but also because the number \(n_t\) of captured wild males at a given date can be compared against the number of captured sterile males \(m_t\) at different ages. Still, the function allows pooling data across population as if they were the same.

The function returns a point estimate and the standard deviation.

sit_wild_size(sit_prototype)
#> 
#> ── Wild male population size ───────────────────────────────────────────
#> 
#> 
#> population      p_est      p_sd
#> -----------  --------  --------
#> blue          4981.57   5420.85
#> red           3471.37   1919.16
#> yellow        6527.88   4248.72

Bibliography

Gillies, M. T. 1961. “Studies on the Dispersion and Survival of Anopheles Gambiae Giles in East Africa, by Means of Marking and Release Experiments.” Bulletin of Entomological Research 52 (1): 99–127. https://doi.org/10.1017/S0007485300055309.
Kay, B H, and L E Muir. 1998. “Aedes Aegypti Survival and Dispersal Estimated by Mark-Release-Recapture in Northern Australia.” The American Journal of Tropical Medicine and Hygiene 58 (3): 277–82. https://doi.org/10.4269/ajtmh.1998.58.277.
Thompson, Steven K. 2012. Sampling. 3rd ed. Wiley Series in Probability and Statistics. Hoboken, N.J: Wiley.