Estimate the goodness-of-fit between tree models and data — treefit • Treefit for R

Estimate the goodness-of-fit between tree models and data.

treefit(
  target,
  name = NULL,
  perturbations = NULL,
  normalize = NULL,
  reduce_dimension = NULL,
  build_tree = NULL,
  max_p = 20,
  n_perturbations = 20
)

Arguments

target

The target data to be estimated. It must be one of them:

list(counts=COUNTS, expression=EXPRESSION): You must specify at least one of COUNTS and EXPRESSION. They are matrix. The rows and columns correspond to samples such as cells and features such as genes. COUNTS's value is count data such as the number of genes expressed. EXPRESSION's value is normalized count data.
Seurat object

name

The name of target as string.

perturbations

How to perturbate the target data.

If this is NULL, all available perturbation methods are used.

You can specify used perturbation methods as list. Here are available methods:

normalize

How to normalize counts data.

If this is NULL, the default normalization is applied.

You can specify a function that normalizes counts data.

reduce_dimension

How to reduce dimension of expression data.

If this is NULL, the default dimensionality reduction is applied.

You can specify a function that reduces dimension of expression data.

build_tree

How to build a tree of expression data.

If this is NULL, MST is built.

You can specify a function that builds tree of expression data.

max_p

How many low dimension Laplacian eigenvectors are used.

The default is 20.

n_perturbations

How many times to perturb.

The default is 20.

Value

An estimated result as a treefit object. It has the following attributes:

max_cca_distance: The result of max canonical correlation analysis distance as data.frame.
rms_cca_distance: The result of root mean square canonical correlation analysis distance as data.frame.
n_principal_paths_candidates: The candidates of the number of principal paths.

data.frame of max_cca_distance and rms_cca_distance has the same structure. They have the following columns:

p: Dimensionality of the feature space of tree structures.
mean: The mean of the target distance values.
standard_deviation: The standard deviation of the target distance values.

Examples

# \dontrun{
# Generate a star tree data that have normalized expression values
# not count data.
star <- treefit::generate_2d_n_arms_star_data(300, 3, 0.1)
# Estimate tree-likeness of the tree data.
fit <- treefit::treefit(list(expression=star))
# }