Individual is designed for running big individual-based models. But if you find your model taking too long or consuming all of your memory, here are some things you can try to speed up your simulation.
The bitset data
structure is used to record the presence or absence of an element in a
finite set. individual::Bitset
implements this data
structure and is able to preform set operations extremely quickly using
bitwise
operations. Taking advantage of these operations can lead to very
fast processes, when you need to find some particular subset of
individuals.
Let’s take a look at the recovery process in
vignette("Tutorial")
. A crucial operation here is to get
all infectious individuals who are not already scheduled for recovery.
The object I
is a bitset containing all those individuals
currently infectious, and already_scheduled
is another
bitset containing those individuals scheduled for a recovery event.
Using already_scheduled$not()
returns a new bitset of those
individuals not in the set of already scheduled persons. This
is passed to the I$and()
, which modifies I
in-place so that the result is the intersection of currently infectious
persons and persons who have not yet been scheduled for a recovery
event, which is precisely the set of people we want.
recovery_process <- function(t){
I <- health$get_index_of("I")
already_scheduled <- recovery_event$get_scheduled()
I$and(already_scheduled$not())
rec_times <- rgeom(n = I$size(),prob = pexp(q = gamma * dt)) + 1
recovery_event$schedule(target = I,delay = rec_times)
}
Bitsets can also be efficiently sampled using
Bitset$sample()
. This is used in the infection process of
vignette("Tutorial")
. Once the per-capita force of
infection (probability of moving from S to I during this time step) is
calculated, the bitset S
is sampled with that probability
which modifies it in-place. The number of elements remaining after being
sampled is binomially distributed. The argument rate
can
also be specified as a vector of probabilities, one for each element in
the bitset.
infection_process <- function(t){
I <- health$get_size_of("I")
foi <- beta * I/N
S <- health$get_index_of("S")
S$sample(rate = pexp(q = foi * dt))
health$queue_update(value = "I",index = S)
}
When creating a new Bitset, a user must specify the maximum size of
the bitset. This is the maximum number of positive integers which the
bitset can store. For example, if calling
Bitset$new(size = 100)
, the resulting object is able to
store the presence or absence of integers between 1 and 100 (inclusive).
Attempting to insert or remove elements outside of this range will
result in an error.
Bitsets offer methods to preform unions (Bitset$or()
),
intersections (Bitset$and()
), symmetric set difference
(also known as exclusive or, Bitset$xor()
), and set
difference (Bitset$set_difference()
) with other bitsets.
These methods modify the bitset in-place. The method
Bitset$not()
gives the complement of a bitset, and returns
a new individual::Bitset
object, leaving the original
bitset intact. Because these set operations use bitwise operations
directly rather than more expensive relational operators, computations
with bitsets are extremely fast. Taking advantage of bitset operations
can help make processes in “individual” much faster.
This can be seen when implementing a common pattern in
epidemiological models: sampling success or failure for a bitset of
individuals, and then generating two bitsets to hold individuals sampled
one way or the other. A first method might use
individual::filter_bitset
.
n <- 1e4
bset <- Bitset$new(n)$insert(1:n)
probs <- runif(n)
keep <- probs >= 0.5
stay <- filter_bitset(bitset = bset,other = which(keep))
leave <- filter_bitset(bitset = bset,other = which(!keep))
This pattern is almost always slower than using the sample method with a set difference:
stay <- bset$copy()
stay$sample(rate = probs)
leave <- bset$copy()$set_difference(stay)
In both instances the original bitset object bset
is not
modified. The latter pattern can be made even faster if the original may
be modified by directly taking the set difference with it. For models
with large population sizes, the speed differences can be
substantial.
Because a bitset stores integers in some finite set, it can be
returned as an integer vector by using Bitset$to_vector()
.
However, this is a slow and expensive operation, as data must be copied
into a new vector which is returned to R. If your model’s dynamics
require the frequent returning of integer vectors, an
individual::IntegerVariable
object will be more
appropriate. However, for most discrete variables, and especially those
which mirror compartments in mathematical models, bitset operations and
individual::CategoricalVariable
(which uses bitsets
internally) should be preferred.
Every time your processes ask for a variable, there is an overhead associated with moving simulation data into R, potentially incurring expensive copying of data.
Because many epidemiological models have similar state transitions,
we’ve included several “prefab” processes and event listeners
implemented in C++ which provide significant speed improvements and can
be used out of the box in models. The functions return pointers which
can be passed to the process list of
individual::simulate_loop
or event listeners just like
closures in R. The processes available are:
individual::bernoulli_process
: moves individuals from
one categorical variable state to another at a constant probabilityindividual::multi_probability_bernoulli_process
: moves
individuals from one categorical variable state to another at a
individual level probability specified by a
individual::DoubleVariable
objectindividual::fixed_probability_multinomial_process
:
moves individuals from one categorical variable state to a set of
possible destination values with constant probability to leave and
multinomially distributed choice of destination state.individual::multi_probability_multinomial_process
:
moves individuals from one categorical variable state to a set of
possible destination values with individual level probability to leave
specified by a individual::DoubleVariable
object and
multinomially distributed choice of destination state.individual::infection_age_process
: Simulates infection
for age-structured models, where individuals come into contact at a rate
given by a mixing (contact) matrix.Prefabs for event listeners and renderers:
individual::update_category_listener
: event listener
for individual::TargetedEvent
objects which updates the
categorical variable state when it fires.individual::reschedule_listener
: event listener for
individual::TargetedEvent
objects which schedules some new
followup event when it fires.individual::categorical_count_renderer_process
: used
for individual::Render
objects that counts the size of each
state in a categorical variable.Unfortunately, we don’t have a prefab for every situation. Please feel free to write one of your own!
These are the basic steps to add C++ processes to your R package:
Run usethis::use_rcpp
to set your package up for C++
development.
Add individual
to the LinkingTo
section
of your package DESCRIPTION.
If you package is named mypackage
, create a header
file containing #include<individual.h>
in any of
these locations:
/mypackage_types.h
src/mypackage_types.hpp
src/include/mypackage_types.h
inst/include/mypackage_types.hpp inst
Then this header file will be automatically included in
RcppExports.cpp
. For more information, see section “2.5
Types in Generated Code” in the Rcpp
Attributes vignette.
Create a file src/Makecars
containing the line
CXX_STD = CXX14
. Because individual
uses C++14
features, when compiling your package against it you must let the
compiler know it should use the C++14 standard, otherwise it will not be
able to compile.
Write your process!
Processes in C++ are of type process_t
, defined in
inst/include/common_types.h
. Types for listeners for
individual::Event
and
individual::TargetedEvent
are listener_t
and
targeted_listener_t
, defined in
inst/include/Event.h
. Below is how the C++ implementation
of multi_probability_bernoulli_process
is coded.
Note that the return type is a Rcpp::XPtr
(see
here) to a process_t
, which is implemented as a
std::function
(see
here) object, a C++ class that can hold any callable type. The
Rcpp::XPtr
is initialized with a pointer to a
process_t
object, which itself holds a C++ lambda
function, basically a function closure.
The lambda function captures the input arguments by value, and takes
a single argument when called t
, giving the current time
step (just like process functions in R). Sampling those individuals to
change state is implemented with the C++ API for these objects.
#include <individual.h>
#include <Rcpp.h>
// [[Rcpp::export]]
::XPtr<process_t> multi_probability_bernoulli_process_cpp(
Rcpp::XPtr<CategoricalVariable> variable,
Rcppconst std::string from,
const std::string to,
const Rcpp::XPtr<DoubleVariable> rate_variable
){
// make pointer to lambda function and return XPtr to R
return Rcpp::XPtr<process_t>(
new process_t([variable,rate_variable,from,to](size_t t){
// sample leavers with their unique prob
individual_index_t leaving_individuals(variable->get_index_of(std::vector<std::string>{from}));
std::vector<double> rate_vector = rate_variable->get_values(leaving_individuals);
(leaving_individuals, rate_vector.begin(), rate_vector.end());
bitset_sample_multi_internal
->queue_update(to, leaving_individuals);
variable
}),
true
);
};
The exported function can be used normally in R when creating the list of processes:
processes <- list(
multi_probability_bernoulli_process_cpp(state, "I", "R", prob),
other_process_1,
other_process_2
)
That’s everything you need to scale your models up to millions of individuals!