Package 'dd4d' reference manual

Title:	Dummy Data for Dummies
Description:	Allows you to specify and sample from a Bayesian Network (a.k.a. a parametric Directed Acyclic Graph, or pDAG).
Authors:	William Hulme [aut, cre]
Maintainer:	William Hulme <william.hulme@thedatalab.org>
License:	MIT + file LICENSE
Version:	0.0.0.9000
Built:	2025-03-18 05:34:19 UTC
Source:	https://github.com/wjchulme/dd4d

Inverse Value Matching

Description

Complement of %in%. Returns the elements of x that are not in y.

Usage

x %ni% y
x %ni% y

Arguments

`x`	a vector
`y`	a vector

Get all functions that are used in a formula `expr`.

Description

Get all functions that are used in a formula expr.

Usage

all_funs(expr)
all_funs(expr)

Arguments

expr

a formula object

Creates a bayesian network object from a list of nodes

Description

Converts list to data frame which is a bit easier to work with, and embellishes with some useful columns. The function performs a few checks on the list, for instance to make sure the graph is acyclic and that variables used in the expressions are defined elsewhere or already known. The known_variables argument is for passing a character vector of variables names for variables that are already defined externally in a given dataset, which can be passed to bn_simulate whilst variable_formula is the variable name itself, this is to help with the bn_simulate function it doesn't actually lead to self-dependence (eg var depends on var)

Usage

bn_create(list, known_variables = NULL)
bn_create(list, known_variables = NULL)

Arguments

`list`	of node objects, created by `bn_node`.
`known_variables`	character vector of variables that will be provided by an external dataset

Value

data.frame

Specify a variable node in the network

Description

Specify a variable node in the network

Usage

bn_node(variable_formula, missing_rate = ~0, keep = TRUE, needs = character())
bn_node(variable_formula, missing_rate = ~0, keep = TRUE, needs = character())

Arguments

`variable_formula`	A RHS-only formula specified how to simulate that variable. Use `..n` for the number of observations, which is later replaced by `pop_size` in the `bn_simulate` function.
`missing_rate`	A RHS-only formula. This specifies how missing values should be distributed. Can use a simple proportion such as `~0.5` or missingness can depend on other values for example using `~plogis(-2 + age*0.05)`, which says missingness increases with age.
`keep`	logical. Should this variable be kept in the final simulated output or not
`needs`	A character vector of variables. If any variables given in `needs` are missing / `NA`, then this variable is missing too.

Value

Object of class node and list.

Examples

bn_node(variable_formula = ~floor(rnorm(n=..n, mean=60, sd=15)))
bn_node(variable_formula = ~floor(rnorm(n=..n, mean=60, sd=15)))

Plot bn_df object

Description

Plot bn_df object

Usage

bn_plot(bn_df, connected_only = FALSE)
bn_plot(bn_df, connected_only = FALSE)

Arguments

`bn_df`	initialised bn_df object, with simulation instructions. Created with `bn_create`
`connected_only`	logical. Only plot nodes that are connected to other nodes

Value

plot

Simulate data from bn_df object

Description

Simulate data from bn_df object

Usage

bn_simulate(bn_df, known_df = NULL, pop_size, keep_all = FALSE, .id = NULL)
bn_simulate(bn_df, known_df = NULL, pop_size, keep_all = FALSE, .id = NULL)

Arguments

`bn_df`	initialised bn_df object, with simulation instructions. Created with `bn_create`
`known_df`	data.frame. Optional data.frame containing upstream variables used for simulation.
`pop_size`	integer. The size of the dataset to be created.
`keep_all`	logical. Keep all simulated variables or only keep those specified by `keep`
`.id`	character. Name of id column placed at the start of the dataset. If NULL (default) then no id column is created.

Value

tbl

Converts a bn_df object to a dagitty object

Description

Converts a bn_df object to a dagitty object

Usage

bn2dagitty(bn_df)
bn2dagitty(bn_df)

Arguments

bn_df

initialised bn_df object, with simulation instructions. Created with bn_create

Value

dagitty object

Random categorical variables

Description

Random categorical variables

Usage

rcat(n, levels, p)
rcat(n, levels, p)

Arguments

`n`	number of samples
`levels`	vector of categories to sample from
`p`	vector of probabilities

Value

a character vector

Examples

#' rcat(n=10, levels=c("a","b"), p=c(0.2,0.8))
#' rcat(n=10, levels=c("a","b"), p=c(0.2,0.8))

Random factor variables

Description

Random factor variables

Usage

rfactor(n, levels, p)
rfactor(n, levels, p)

Arguments

`n`	number of samples
`levels`	vector of categories to sample from
`p`	vector of probabilities

Value

a factor vector

Examples

#' rfactor(n=10, levels=c("a","b"), p=c(0.2,0.8))
#' rfactor(n=10, levels=c("a","b"), p=c(0.2,0.8))

Package 'dd4d'

Help Index

Inverse Value Matching

Description

Usage

Arguments

Get all functions that are used in a formula expr.

Description

Usage

Arguments

Creates a bayesian network object from a list of nodes

Description

Usage

Arguments

Value

Specify a variable node in the network

Description

Usage

Arguments

Value

Examples

Plot bn_df object

Description

Usage

Arguments

Value

Simulate data from bn_df object

Description

Usage

Arguments

Value

Converts a bn_df object to a dagitty object

Description

Usage

Arguments

Value

Random categorical variables

Description

Usage

Arguments

Value

Examples

Random factor variables

Description

Usage

Arguments

Value

Examples

Get all functions that are used in a formula `expr`.