Skip to content

Tools

Advanced analysis functions available via sclab.tools.

Pseudotime & Trajectory (cellflow)

pseudotime

pseudotime(adata: AnnData, use_rep: str, t_key: str, t_range: tuple[float, float], n_dims: int = 10, min_snr: float = 0.25, periodic: bool = False, method: Literal['fourier', 'splines'] = 'splines', largest_harmonic: int = 5, roughness: float | None = None, key_added='pseudotime') -> PseudotimeResult

Compute pseudotime ordering for cells by fitting a curve through a low-dimensional embedding.

Fits either a Fourier series or smoothing spline to a reduced-dimensional representation of the data, then projects each cell onto the nearest point along the fitted curve. The arc-length along that curve is used as the pseudotime coordinate, normalised to the range [0, 1].

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix. Must contain adata.obsm[use_rep] and adata.obs[t_key].

required
use_rep str

Key in adata.obsm containing the low-dimensional embedding (e.g. "X_pca") used to fit the pseudotime curve.

required
t_key str

Key in adata.obs that holds an initial continuous ordering of cells (e.g. a coarse time label or an existing pseudotime estimate) used to initialise the curve fit.

required
t_range tuple[float, float]

(t_min, t_max) interval of t_key values to consider. Cells outside this range are excluded from fitting and their pseudotime is set to NaN.

required
n_dims int

Maximum number of embedding dimensions to use for the curve fit. Default is 10.

10
min_snr float

Minimum signal-to-noise ratio (relative to the dimension with the highest SNR) required to include a dimension in the fit. Dimensions below this threshold are discarded. Default is 0.25.

0.25
periodic bool

If True, treat the trajectory as periodic (cyclic). Requires t_range[0] == 0.0 and method="fourier" or method="splines" with periodic boundary conditions. Default is False.

False
method (splines, fourier)

Curve-fitting method. "splines" fits an N-D smoothing spline; "fourier" fits an N-D Fourier series (only valid when periodic=True). Default is "splines".

"splines"
largest_harmonic int

Highest harmonic to include when method="fourier". Ignored for method="splines". Default is 5.

5
roughness float or None

Roughness penalty for the smoothing spline when method="splines". If None, an automatic penalty is chosen. Default is None.

None
key_added str

Base key under which results are stored. Default is "pseudotime". The following entries are written to adata:

  • adata.obs[key_added] -- arc-length pseudotime in [0, 1].
  • adata.obs[key_added + "_path_residue"] -- Euclidean distance from each cell to its nearest point on the fitted curve.
  • adata.obsm[key_added + "_path"] -- fitted curve coordinates evaluated at each cell's projected pseudotime.
  • adata.obsm[key_added + "_path_derivative"] -- first derivative of the fitted curve at each cell's projected pseudotime.
  • adata.uns[key_added] -- dictionary of run parameters and SNR values.
'pseudotime'

Returns:

Type Description
PseudotimeResult

A named tuple with the following fields:

  • pseudotime -- arc-length pseudotime values for cells within t_range, normalised to [0, 1].
  • residues -- Euclidean residuals between each cell and its nearest curve point.
  • phi -- raw parameter values (in the original t_key units) of the nearest curve point for each cell.
  • F -- fitted curve object (NDBSpline or NDFourier) defined over the full embedding dimensionality.
  • SNR -- per-dimension signal-to-noise ratios, normalised so the maximum is 1.
  • snr_mask -- boolean mask indicating which dimensions passed the min_snr threshold.
  • t_mask -- boolean mask indicating which cells fall within t_range.
  • fp_resolution -- floating-point resolution used during the final pseudotime refinement stage.
Notes

Results for cells outside t_range are stored as NaN in adata.obs. The curve is fitted only on cells whose t_key value lies within [t_min, t_max].


density_dynamics

density_dynamics(adata: AnnData, time_key: str = 'pseudotime', t_range: tuple[float, float] | None = None, periodic: bool | None = None, bandwidth: float = 1 / 64, algorithm: str = 'auto', kernel: str = 'gaussian', metric: str = 'euclidean', max_grid_size: int = 2 ** 8 + 1, derivative: int = 0, mode: Literal['peaks', 'valleys'] = 'peaks', find_peaks_kwargs: dict = {}, plot_density: bool = False, plot_density_fit: bool = False, plot_density_fit_derivative: bool = False, plot_histogram: bool = False, histogram_nbins: int = 50)

Detect density peaks or valleys along pseudotime via B-spline fitting.

Fits a KDE to the pseudotime distribution, smooths it with a B-spline, optionally takes a derivative of the spline, and identifies peaks (or valleys) using :func:scipy.signal.find_peaks. Detected peak times, heights, and inter-peak durations are stored in adata.uns.

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix. Must contain pseudotime values in adata.obs[time_key] and adata.uns[time_key]['t_range'].

required
time_key str

Column in adata.obs that holds pseudotime values and key under which results are stored in adata.uns. Default is "pseudotime".

'pseudotime'
t_range tuple of float

(t_min, t_max) domain for the density estimate. When None, the value stored in adata.uns[time_key]['t_range'] is used (an AssertionError is raised if that key is absent). Default is None.

None
periodic bool

Whether pseudotime is periodic. When None, inferred from adata.uns[time_key]['periodic'] if available, otherwise False. Default is None.

None
bandwidth float

Bandwidth for the KDE. Default is 1/64.

1 / 64
algorithm str

Algorithm passed to the KDE back-end. Default is "auto".

'auto'
kernel str

Kernel function for the KDE. Default is "gaussian".

'gaussian'
metric str

Distance metric for the KDE. Default is "euclidean".

'euclidean'
max_grid_size int

Number of grid points for KDE evaluation. Default is 2**8 + 1.

2 ** 8 + 1
derivative int

Order of the B-spline derivative to analyse. 0 analyses the density itself; 1 analyses its rate of change, etc. Default is 0.

0
mode (peaks, valleys)

Whether to detect peaks or valleys in the (derivative of the) density. Default is "peaks".

"peaks"
find_peaks_kwargs dict

Extra keyword arguments forwarded to :func:scipy.signal.find_peaks. The 'height' key, if present, is treated as a fraction of the global maximum and rescaled accordingly. Default is {}.

{}
plot_density bool

If True, plot the raw KDE. Default is False.

False
plot_density_fit bool

If True, plot the smoothed B-spline fit. Default is False.

False
plot_density_fit_derivative bool

If True, plot the derivative of the B-spline. Default is False.

False
plot_histogram bool

If True, overlay a histogram on the plot. Default is False.

False
histogram_nbins int

Number of histogram bins. Default is 50.

50

Returns:

Type Description
None

Modifies adata in-place. Results are stored under adata.uns[time_key][f'density_dynamics_d{derivative}_{mode}'] as a dict with keys:

  • 'times' — pseudotime positions of detected peaks.
  • 'deltas' — inter-peak durations (or phase durations for periodic data).
  • 'heights' — density (or derivative) values at each peak.
  • 'params' — KDE and peak-finding hyper-parameters.
  • 'density_bspline_tck' — B-spline representation of the fitted density.

expression_dynamics

expression_dynamics(adata: AnnData, time_key: str, t_range: tuple[float, float] | None = None, periodic: bool | None = None, layer: str | None = None, gene_mask: str | None = None, n_grid: int = 1001, progress: bool = False)

Compute per-cell gene turnover from expression dynamics over pseudotime.

Fits a smooth B-spline to the expression matrix over pseudotime, takes the analytical derivative (dX/dt), then counts the number of genes with high activation (rate > median of positives) and high repression (rate < median of negatives) for each cell.

Additionally computes per-gene timing summaries (pseudotime of peak activation, peak repression, acceleration onset, and deceleration onset) and a per-cell transcriptional flux (total absolute velocity across genes).

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix. Must contain pseudotime values in adata.obs[time_key].

required
time_key str

Column in adata.obs with pseudotime values. If adata.uns[time_key] exists, t_range and periodic are read from it when not explicitly provided.

required
t_range tuple[float, float] | None

Min and max pseudotime for the spline domain. Inferred from adata.uns[time_key]['t_range'] or the data range if None.

None
periodic bool | None

Whether pseudotime is periodic (e.g. cell cycle). Inferred from adata.uns[time_key]['periodic'] or defaults to False.

None
layer str | None

Layer in adata.layers to use as expression matrix. Uses adata.X when None.

None
gene_mask str | None

Boolean column in adata.var to subset genes before fitting. When provided, output columns are prefixed with {gene_mask}_ instead of the defaults.

None
n_grid int

Number of evenly spaced points over t_range used to locate per-gene derivative extrema. Higher values give more precise timing estimates at modest computational cost.

1001
progress bool

Show a progress bar during spline fitting.

False

Returns:

Type Description
None

Modifies adata in-place.

obs columns (per-cell):

  • n_activation / {gene_mask}_up — number of genes with velocity above the median of all positive velocities.
  • n_repression / {gene_mask}_dw — number of genes with velocity below the median of all negative velocities.
  • transcriptional_flux / {gene_mask}_flux — sum of absolute velocities across genes.

var columns (per-gene, restricted to gene_mask rows when provided):

  • peak_activation_t / {gene_mask}_peak_activation_t — pseudotime of maximum first derivative.
  • peak_repression_t / {gene_mask}_peak_repression_t — pseudotime of minimum first derivative.
  • acceleration_onset_t / {gene_mask}_acceleration_onset_t — pseudotime of maximum second derivative.
  • deceleration_onset_t / {gene_mask}_deceleration_onset_t — pseudotime of minimum second derivative.

real_time

real_time(adata: AnnData, pseudotime_key: str = 'pseudotime', pseudotime_t_range: tuple[float, float] | None = None, periodic: bool | None = None, key_added: str = 'real_time', tmax: float = 100, units: Literal['minutes', 'hours', 'days', 'percent'] = 'percent', bandwidth: float = 1 / 64, algorithm: str = 'auto', kernel: str = 'gaussian', metric: str = 'euclidean', max_grid_size: int = 2 ** 8 + 1, plot_density: bool = False, plot_density_fit: bool = False, plot_density_fit_derivative: bool = False, plot_histogram: bool = False, histogram_nbins: int = 50)

Convert pseudotime to real time by normalising for cell-cycle density.

Fits a density profile along pseudotime (via :func:density) and then maps each cell's pseudotime to a real-time value by integrating the inverse of the density curve (area-under-curve normalisation). This corrects for non-uniform sampling across the trajectory so that equal real-time intervals contain proportionally equal numbers of cells.

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix. Must contain pseudotime values in adata.obs[pseudotime_key].

required
pseudotime_key str

Column in adata.obs with pseudotime values. Default is "pseudotime".

'pseudotime'
pseudotime_t_range tuple of float

(t_min, t_max) domain of the pseudotime axis. When None, inferred from the data via :func:density. Default is None.

None
periodic bool

Whether pseudotime is periodic. When None, inferred from adata.uns[pseudotime_key]['periodic'] if available, otherwise False. Default is None.

None
key_added str

Column in adata.obs and key in adata.uns under which the real-time values and metadata are stored. Default is "real_time".

'real_time'
tmax float

Maximum real-time value (upper bound of the output axis). Cells at the very end of the trajectory are mapped to this value. Default is 100.

100
units (minutes, hours, days, percent)

Interpretive label for the real-time axis; stored in adata.uns[key_added]['t_units'] but does not affect the computation. Default is "percent".

"minutes"
bandwidth float

Bandwidth for the KDE. Default is 1/64.

1 / 64
algorithm str

Algorithm passed to the KDE back-end. Default is "auto".

'auto'
kernel str

Kernel function for the KDE. Default is "gaussian".

'gaussian'
metric str

Distance metric for the KDE. Default is "euclidean".

'euclidean'
max_grid_size int

Number of grid points for KDE evaluation. Default is 2**8 + 1.

2 ** 8 + 1
plot_density bool

If True, plot the raw KDE. Default is False.

False
plot_density_fit bool

If True, plot the smoothed B-spline fit. Default is False.

False
plot_density_fit_derivative bool

If True, plot the derivative of the B-spline. Default is False.

False
plot_histogram bool

If True, overlay a histogram on the plot. Default is False.

False
histogram_nbins int

Number of histogram bins. Default is 50.

50

Returns:

Type Description
None

Modifies adata in-place:

  • adata.obs[key_added] — real-time values for each cell. Cells outside pseudotime_t_range are assigned NaN.
  • adata.uns[key_added] — dict containing fitting parameters, the B-spline TCK representation, 'tmax', 't_range', 't_units', and 'periodic'.

piecewise_rescale

piecewise_rescale(adata: AnnData, time_key: str, groupby: str, groups: Sequence[str], durations: list[float] | dict[str, float], new_key: str = 'real_time', periodic: bool = False, t_range: tuple[float, float] | None = None) -> None

Rescale pseudotime to real-time using piecewise linear mapping.

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix.

required
time_key str

Key in adata.obs for pseudotime.

required
groupby str

Key in adata.obs for categorical labels used to define intervals.

required
groups Sequence[str]

Ordered list of category labels to include in the scaling. Cells belonging to other categories will be assigned NaN.

required
durations list[float] | dict[str, float]

Durations for each interval defined by groups. If a list, must match number of intervals (len(groups)). If a dictionary, must map category labels to durations.

required
new_key str

Key in adata.obs to store the rescaled real-time values.

'real_time'
periodic bool

Whether the trajectory is periodic.

False
t_range tuple[float, float] | None

Range of pseudotime. If None, inferred from adata.obs[time_key].

None

Doublet Detection

scrublet

scrublet(adata: AnnData, layer: str = 'X', key_added: str = 'scrublet', total_counts: ndarray | None = None, sim_doublet_ratio: float = 2.0, n_neighbors: int = None, expected_doublet_rate: float = 0.1, stdev_doublet_rate: float = 0.02, random_state: int = 0, scrub_doublets_kwargs: dict[str, Any] = dict(synthetic_doublet_umi_subsampling=1.0, use_approx_neighbors=True, distance_metric='euclidean', get_doublet_neighbor_parents=False, min_counts=3, min_cells=3, min_gene_variability_pctl=85, log_transform=False, mean_center=True, normalize_variance=True, n_prin_comps=30, svd_solver='arpack', verbose=True))

Detect doublet cells using Scrublet.

Simulates synthetic doublets from the observed count matrix and uses a k-NN classifier to assign each cell a doublet score. Cells are then labelled as "singlet" or "doublet".

Requires scrublet to be installed (pip install scrublet).

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix. Modified in-place.

required
layer str

Layer to use as the count matrix. Use "X" for adata.X. Default is "X".

'X'
key_added str

Prefix for the columns added to adata.obs. Results are stored as {key_added}_score and {key_added}_label. Default is "scrublet".

'scrublet'
total_counts ndarray or None

Pre-computed per-cell total counts. If None, Scrublet computes them internally. Default is None.

None
sim_doublet_ratio float

Number of synthetic doublets to simulate relative to the number of observed cells. Default is 2.0.

2.0
n_neighbors int or None

Number of neighbors used to classify doublets. If None, Scrublet uses a heuristic based on the number of cells. Default is None.

None
expected_doublet_rate float

Expected fraction of doublets in the dataset. Default is 0.1.

0.1
stdev_doublet_rate float

Uncertainty in the expected doublet rate. Default is 0.02.

0.02
random_state int

Random seed for reproducibility. Default is 0.

0
scrub_doublets_kwargs dict

Additional keyword arguments forwarded to :meth:scrublet.Scrublet.scrub_doublets.

dict(synthetic_doublet_umi_subsampling=1.0, use_approx_neighbors=True, distance_metric='euclidean', get_doublet_neighbor_parents=False, min_counts=3, min_cells=3, min_gene_variability_pctl=85, log_transform=False, mean_center=True, normalize_variance=True, n_prin_comps=30, svd_solver='arpack', verbose=True)

Returns:

Type Description
None

Adds the following columns to adata.obs:

  • {key_added}_score (float): Doublet score for each cell.
  • {key_added}_label (Categorical): "singlet" or "doublet".

doubletdetection

doubletdetection(adata: AnnData, layer: str = 'X', key_added: str = 'doubletdetection', boost_rate=0.25, n_components=30, n_top_var_genes=10000, replace=False, clustering_algorithm='phenograph', clustering_kwargs=None, n_iters=10, normalizer=None, pseudocount=0.1, random_state=0, verbose=False, standard_scaling=False, n_jobs=1) -> None

scdblfinder

scdblfinder(adata: AnnData, layer: str = 'X', key_added: str = 'scDblFinder', clusters_col: str | bool | None = None, samples_col: str | None = None, clust_cor: ndarray | int | None = None, artificial_doublets: int | None = None, known_doublets_col: int | None = None, known_use: Literal['discard', 'positive'] = 'discard', dbr: float | None = None, dbr_sd: float | None = None, nfeatures: int = 1352, dims: int = 20, k: int | None = None, remove_unidentifiable: bool = True, include_pcs: int = 19, prop_random=0, prop_markers=0, aggregate_features: bool = False, score: Literal['xgb', 'weighted', 'ratio'] = 'xgb', processing: str = 'default', metric: str = 'logloss', nrounds: float = 0.25, max_depth: int = 4, iter: int = 3, training_features: list[str] | None = None, unident_th: float | None = None, multi_sample_mode: Literal['split', 'singleModel', 'singleModelSplitThres', 'asOne'] = 'split', threshold: bool = True, verbose: bool = True, random_state: int = 31415)

Differential Expression

pseudobulk_edger

pseudobulk_edger(adata_: AnnData, group_key: str, condition_group: str | list[str] | None = None, reference_group: str | None = None, cell_identity_key: str | None = None, batch_key: str | None = None, layer: str | None = None, replicas_per_group: int = 5, min_cells_per_group: int = 30, bootstrap_sampling: bool = False, use_cells: dict[str, list[str]] | None = None, aggregate: bool = True, verbosity: int = 0) -> dict[str, DataFrame]

Fits a model using edgeR and computes top tags for a given condition vs reference group.

Parameters:

Name Type Description Default
adata_ AnnData

Annotated data matrix.

required
group_key str

Key in AnnData object to use to group cells.

required
condition_group str | list[str] | None

Condition group to compare to reference group. If None, each group will be contrasted to the corresponding reference group.

None
reference_group str | None

Reference group to compare condition group(s) to. If None, the condition group is compared to the rest of the cells.

None
cell_identity_key str | None

If provided, separate contrasts will be computed for each identity. Defaults to None.

None
layer str | None

Layer in AnnData object to use. EdgeR requires raw counts. Defaults to None.

None
replicas_per_group int

Number of replicas to create for each group. Defaults to 10.

5
min_cells_per_group int

Minimum number of cells required for a group to be included. Defaults to 30.

30
bootstrap_sampling bool

Whether to use bootstrap sampling to create replicas. Defaults to True.

False
use_cells dict[str, list[str]] | None

If not None, only use the specified cells. Defaults to None. Dictionary key is a categorical variable in the obs dataframe and the dictionary value is a list of categories to include.

None
aggregate bool

Whether to aggregate cells before fitting the model. EdgeR requires a small number of samples, so if adata_ is a single-cell experiment, the cells should be aggregated. Defaults to True.

True
verbosity int

Verbosity level. Defaults to 0.

0

Returns:

Type Description
dict[str, DataFrame]

Dictionary of dataframes, one for each contrast, with the following columns:

  • gene_ids : str Gene IDs.
  • logFC : float Log2 fold change.
  • logCPM : float Log2 CPM.
  • F: float F-statistic.
  • PValue : float p-value.
  • FDR : float False discovery rate.
  • pct_expr_cnd : float Percentage of cells in condition group expressing the gene.
  • pct_expr_ref : float Percentage of cells in reference group expressing the gene.

pseudobulk_limma

pseudobulk_limma(adata: AnnData, group_key: str, condition_group: str | list[str] | None = None, reference_group: str | None = None, cell_identity_key: str | None = None, batch_key: str | None = None, layer: str | None = None, replicas_per_group: int = 5, min_cells_per_group: int = 30, bootstrap_sampling: bool = False, use_cells: dict[str, list[str]] | None = None, aggregate: bool = True, verbosity: int = 0) -> dict[str, DataFrame]

Pseudobulk differential expression analysis using limma-voom.

Aggregates single cells into pseudobulk samples, then fits a linear model with limma-voom (via R) and computes top-table statistics for each requested contrast.

Requires R with the packages limma, edgeR, MAST, and SingleCellExperiment, as well as the Python packages rpy2 and anndata2ri.

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix.

required
group_key str

Column in adata.obs defining the experimental groups.

required
condition_group str or list of str or None

Group(s) to test against reference_group. If None, each group is contrasted with the corresponding reference. Default is None.

None
reference_group str or None

Reference group for contrasts. If None, each condition group is contrasted with all remaining cells. Default is None.

None
cell_identity_key str or None

Column in adata.obs for stratifying contrasts by cell type or identity. Separate DE results are returned per identity. Default is None.

None
batch_key str or None

Column in adata.obs to include as a covariate in the design matrix for batch correction. Default is None.

None
layer str or None

Layer containing raw counts required by limma/edgeR. Uses adata.X if None. Default is None.

None
replicas_per_group int

Number of pseudobulk replicas to create per group. Default is 5.

5
min_cells_per_group int

Minimum number of cells required for a group to be included. Default is 30.

30
bootstrap_sampling bool

If True, use bootstrap sampling when creating pseudobulk replicas. Default is False.

False
use_cells dict or None

Restrict analysis to specific cell subsets. Keys are adata.obs columns and values are lists of categories to include. Default is None.

None
aggregate bool

If True, aggregate cells into pseudobulk samples before fitting. Default is True.

True
verbosity int

Verbosity level (0 = silent). Default is 0.

0

Returns:

Type Description
dict of str to pd.DataFrame

One DataFrame per contrast (keyed by contrast label), with columns:

  • logFC — log2 fold change.
  • AveExpr — average log2 expression.
  • t — moderated t-statistic.
  • P.Value — raw p-value.
  • adj.P.Val — Benjamini-Hochberg adjusted p-value.
  • B — log-odds of differential expression.
  • pct_expr_cnd / pct_expr_ref — fraction of expressing cells in condition/reference group.