Tools¶
Advanced analysis functions available via sclab.tools.
Pseudotime & Trajectory (cellflow)¶
pseudotime
¶
pseudotime(adata: AnnData, use_rep: str, t_key: str, t_range: tuple[float, float], n_dims: int = 10, min_snr: float = 0.25, periodic: bool = False, method: Literal['fourier', 'splines'] = 'splines', largest_harmonic: int = 5, roughness: float | None = None, key_added='pseudotime') -> PseudotimeResult
Compute pseudotime ordering for cells by fitting a curve through a low-dimensional embedding.
Fits either a Fourier series or smoothing spline to a reduced-dimensional representation of the data, then projects each cell onto the nearest point along the fitted curve. The arc-length along that curve is used as the pseudotime coordinate, normalised to the range [0, 1].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. Must contain |
required |
use_rep
|
str
|
Key in |
required |
t_key
|
str
|
Key in |
required |
t_range
|
tuple[float, float]
|
|
required |
n_dims
|
int
|
Maximum number of embedding dimensions to use for the curve fit. Default is 10. |
10
|
min_snr
|
float
|
Minimum signal-to-noise ratio (relative to the dimension with the highest SNR) required to include a dimension in the fit. Dimensions below this threshold are discarded. Default is 0.25. |
0.25
|
periodic
|
bool
|
If |
False
|
method
|
(splines, fourier)
|
Curve-fitting method. |
"splines"
|
largest_harmonic
|
int
|
Highest harmonic to include when |
5
|
roughness
|
float or None
|
Roughness penalty for the smoothing spline when |
None
|
key_added
|
str
|
Base key under which results are stored. Default is
|
'pseudotime'
|
Returns:
| Type | Description |
|---|---|
PseudotimeResult
|
A named tuple with the following fields:
|
Notes
Results for cells outside t_range are stored as NaN in
adata.obs. The curve is fitted only on cells whose t_key value
lies within [t_min, t_max].
density_dynamics
¶
density_dynamics(adata: AnnData, time_key: str = 'pseudotime', t_range: tuple[float, float] | None = None, periodic: bool | None = None, bandwidth: float = 1 / 64, algorithm: str = 'auto', kernel: str = 'gaussian', metric: str = 'euclidean', max_grid_size: int = 2 ** 8 + 1, derivative: int = 0, mode: Literal['peaks', 'valleys'] = 'peaks', find_peaks_kwargs: dict = {}, plot_density: bool = False, plot_density_fit: bool = False, plot_density_fit_derivative: bool = False, plot_histogram: bool = False, histogram_nbins: int = 50)
Detect density peaks or valleys along pseudotime via B-spline fitting.
Fits a KDE to the pseudotime distribution, smooths it with a B-spline,
optionally takes a derivative of the spline, and identifies peaks (or
valleys) using :func:scipy.signal.find_peaks. Detected peak times,
heights, and inter-peak durations are stored in adata.uns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. Must contain pseudotime values in
|
required |
time_key
|
str
|
Column in |
'pseudotime'
|
t_range
|
tuple of float
|
|
None
|
periodic
|
bool
|
Whether pseudotime is periodic. When None, inferred from
|
None
|
bandwidth
|
float
|
Bandwidth for the KDE. Default is |
1 / 64
|
algorithm
|
str
|
Algorithm passed to the KDE back-end. Default is |
'auto'
|
kernel
|
str
|
Kernel function for the KDE. Default is |
'gaussian'
|
metric
|
str
|
Distance metric for the KDE. Default is |
'euclidean'
|
max_grid_size
|
int
|
Number of grid points for KDE evaluation. Default is |
2 ** 8 + 1
|
derivative
|
int
|
Order of the B-spline derivative to analyse. |
0
|
mode
|
(peaks, valleys)
|
Whether to detect peaks or valleys in the (derivative of the)
density. Default is |
"peaks"
|
find_peaks_kwargs
|
dict
|
Extra keyword arguments forwarded to
:func: |
{}
|
plot_density
|
bool
|
If True, plot the raw KDE. Default is False. |
False
|
plot_density_fit
|
bool
|
If True, plot the smoothed B-spline fit. Default is False. |
False
|
plot_density_fit_derivative
|
bool
|
If True, plot the derivative of the B-spline. Default is False. |
False
|
plot_histogram
|
bool
|
If True, overlay a histogram on the plot. Default is False. |
False
|
histogram_nbins
|
int
|
Number of histogram bins. Default is |
50
|
Returns:
| Type | Description |
|---|---|
None
|
Modifies adata in-place. Results are stored under
|
expression_dynamics
¶
expression_dynamics(adata: AnnData, time_key: str, t_range: tuple[float, float] | None = None, periodic: bool | None = None, layer: str | None = None, gene_mask: str | None = None, n_grid: int = 1001, progress: bool = False)
Compute per-cell gene turnover from expression dynamics over pseudotime.
Fits a smooth B-spline to the expression matrix over pseudotime, takes the analytical derivative (dX/dt), then counts the number of genes with high activation (rate > median of positives) and high repression (rate < median of negatives) for each cell.
Additionally computes per-gene timing summaries (pseudotime of peak activation, peak repression, acceleration onset, and deceleration onset) and a per-cell transcriptional flux (total absolute velocity across genes).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. Must contain pseudotime values in
|
required |
time_key
|
str
|
Column in |
required |
t_range
|
tuple[float, float] | None
|
Min and max pseudotime for the spline domain. Inferred from
|
None
|
periodic
|
bool | None
|
Whether pseudotime is periodic (e.g. cell cycle). Inferred from
|
None
|
layer
|
str | None
|
Layer in |
None
|
gene_mask
|
str | None
|
Boolean column in |
None
|
n_grid
|
int
|
Number of evenly spaced points over |
1001
|
progress
|
bool
|
Show a progress bar during spline fitting. |
False
|
Returns:
| Type | Description |
|---|---|
None
|
Modifies obs columns (per-cell):
var columns (per-gene, restricted to gene_mask rows when provided):
|
real_time
¶
real_time(adata: AnnData, pseudotime_key: str = 'pseudotime', pseudotime_t_range: tuple[float, float] | None = None, periodic: bool | None = None, key_added: str = 'real_time', tmax: float = 100, units: Literal['minutes', 'hours', 'days', 'percent'] = 'percent', bandwidth: float = 1 / 64, algorithm: str = 'auto', kernel: str = 'gaussian', metric: str = 'euclidean', max_grid_size: int = 2 ** 8 + 1, plot_density: bool = False, plot_density_fit: bool = False, plot_density_fit_derivative: bool = False, plot_histogram: bool = False, histogram_nbins: int = 50)
Convert pseudotime to real time by normalising for cell-cycle density.
Fits a density profile along pseudotime (via :func:density) and then
maps each cell's pseudotime to a real-time value by integrating the
inverse of the density curve (area-under-curve normalisation). This
corrects for non-uniform sampling across the trajectory so that equal
real-time intervals contain proportionally equal numbers of cells.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. Must contain pseudotime values in
|
required |
pseudotime_key
|
str
|
Column in |
'pseudotime'
|
pseudotime_t_range
|
tuple of float
|
|
None
|
periodic
|
bool
|
Whether pseudotime is periodic. When None, inferred from
|
None
|
key_added
|
str
|
Column in |
'real_time'
|
tmax
|
float
|
Maximum real-time value (upper bound of the output axis). Cells at
the very end of the trajectory are mapped to this value.
Default is |
100
|
units
|
(minutes, hours, days, percent)
|
Interpretive label for the real-time axis; stored in
|
"minutes"
|
bandwidth
|
float
|
Bandwidth for the KDE. Default is |
1 / 64
|
algorithm
|
str
|
Algorithm passed to the KDE back-end. Default is |
'auto'
|
kernel
|
str
|
Kernel function for the KDE. Default is |
'gaussian'
|
metric
|
str
|
Distance metric for the KDE. Default is |
'euclidean'
|
max_grid_size
|
int
|
Number of grid points for KDE evaluation. Default is |
2 ** 8 + 1
|
plot_density
|
bool
|
If True, plot the raw KDE. Default is False. |
False
|
plot_density_fit
|
bool
|
If True, plot the smoothed B-spline fit. Default is False. |
False
|
plot_density_fit_derivative
|
bool
|
If True, plot the derivative of the B-spline. Default is False. |
False
|
plot_histogram
|
bool
|
If True, overlay a histogram on the plot. Default is False. |
False
|
histogram_nbins
|
int
|
Number of histogram bins. Default is |
50
|
Returns:
| Type | Description |
|---|---|
None
|
Modifies adata in-place:
|
piecewise_rescale
¶
piecewise_rescale(adata: AnnData, time_key: str, groupby: str, groups: Sequence[str], durations: list[float] | dict[str, float], new_key: str = 'real_time', periodic: bool = False, t_range: tuple[float, float] | None = None) -> None
Rescale pseudotime to real-time using piecewise linear mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. |
required |
time_key
|
str
|
Key in |
required |
groupby
|
str
|
Key in |
required |
groups
|
Sequence[str]
|
Ordered list of category labels to include in the scaling. Cells belonging to other categories will be assigned NaN. |
required |
durations
|
list[float] | dict[str, float]
|
Durations for each interval defined by |
required |
new_key
|
str
|
Key in |
'real_time'
|
periodic
|
bool
|
Whether the trajectory is periodic. |
False
|
t_range
|
tuple[float, float] | None
|
Range of pseudotime. If None, inferred from |
None
|
Doublet Detection¶
scrublet
¶
scrublet(adata: AnnData, layer: str = 'X', key_added: str = 'scrublet', total_counts: ndarray | None = None, sim_doublet_ratio: float = 2.0, n_neighbors: int = None, expected_doublet_rate: float = 0.1, stdev_doublet_rate: float = 0.02, random_state: int = 0, scrub_doublets_kwargs: dict[str, Any] = dict(synthetic_doublet_umi_subsampling=1.0, use_approx_neighbors=True, distance_metric='euclidean', get_doublet_neighbor_parents=False, min_counts=3, min_cells=3, min_gene_variability_pctl=85, log_transform=False, mean_center=True, normalize_variance=True, n_prin_comps=30, svd_solver='arpack', verbose=True))
Detect doublet cells using Scrublet.
Simulates synthetic doublets from the observed count matrix and uses
a k-NN classifier to assign each cell a doublet score. Cells are then
labelled as "singlet" or "doublet".
Requires scrublet to be installed (pip install scrublet).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. Modified in-place. |
required |
layer
|
str
|
Layer to use as the count matrix. Use |
'X'
|
key_added
|
str
|
Prefix for the columns added to |
'scrublet'
|
total_counts
|
ndarray or None
|
Pre-computed per-cell total counts. If None, Scrublet computes them internally. Default is None. |
None
|
sim_doublet_ratio
|
float
|
Number of synthetic doublets to simulate relative to the number of observed cells. Default is 2.0. |
2.0
|
n_neighbors
|
int or None
|
Number of neighbors used to classify doublets. If None, Scrublet uses a heuristic based on the number of cells. Default is None. |
None
|
expected_doublet_rate
|
float
|
Expected fraction of doublets in the dataset. Default is 0.1. |
0.1
|
stdev_doublet_rate
|
float
|
Uncertainty in the expected doublet rate. Default is 0.02. |
0.02
|
random_state
|
int
|
Random seed for reproducibility. Default is 0. |
0
|
scrub_doublets_kwargs
|
dict
|
Additional keyword arguments forwarded to
:meth: |
dict(synthetic_doublet_umi_subsampling=1.0, use_approx_neighbors=True, distance_metric='euclidean', get_doublet_neighbor_parents=False, min_counts=3, min_cells=3, min_gene_variability_pctl=85, log_transform=False, mean_center=True, normalize_variance=True, n_prin_comps=30, svd_solver='arpack', verbose=True)
|
Returns:
| Type | Description |
|---|---|
None
|
Adds the following columns to
|
doubletdetection
¶
doubletdetection(adata: AnnData, layer: str = 'X', key_added: str = 'doubletdetection', boost_rate=0.25, n_components=30, n_top_var_genes=10000, replace=False, clustering_algorithm='phenograph', clustering_kwargs=None, n_iters=10, normalizer=None, pseudocount=0.1, random_state=0, verbose=False, standard_scaling=False, n_jobs=1) -> None
scdblfinder
¶
scdblfinder(adata: AnnData, layer: str = 'X', key_added: str = 'scDblFinder', clusters_col: str | bool | None = None, samples_col: str | None = None, clust_cor: ndarray | int | None = None, artificial_doublets: int | None = None, known_doublets_col: int | None = None, known_use: Literal['discard', 'positive'] = 'discard', dbr: float | None = None, dbr_sd: float | None = None, nfeatures: int = 1352, dims: int = 20, k: int | None = None, remove_unidentifiable: bool = True, include_pcs: int = 19, prop_random=0, prop_markers=0, aggregate_features: bool = False, score: Literal['xgb', 'weighted', 'ratio'] = 'xgb', processing: str = 'default', metric: str = 'logloss', nrounds: float = 0.25, max_depth: int = 4, iter: int = 3, training_features: list[str] | None = None, unident_th: float | None = None, multi_sample_mode: Literal['split', 'singleModel', 'singleModelSplitThres', 'asOne'] = 'split', threshold: bool = True, verbose: bool = True, random_state: int = 31415)
Differential Expression¶
pseudobulk_edger
¶
pseudobulk_edger(adata_: AnnData, group_key: str, condition_group: str | list[str] | None = None, reference_group: str | None = None, cell_identity_key: str | None = None, batch_key: str | None = None, layer: str | None = None, replicas_per_group: int = 5, min_cells_per_group: int = 30, bootstrap_sampling: bool = False, use_cells: dict[str, list[str]] | None = None, aggregate: bool = True, verbosity: int = 0) -> dict[str, DataFrame]
Fits a model using edgeR and computes top tags for a given condition vs reference group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata_
|
AnnData
|
Annotated data matrix. |
required |
group_key
|
str
|
Key in AnnData object to use to group cells. |
required |
condition_group
|
str | list[str] | None
|
Condition group to compare to reference group. If None, each group will be contrasted to the corresponding reference group. |
None
|
reference_group
|
str | None
|
Reference group to compare condition group(s) to. If None, the condition group is compared to the rest of the cells. |
None
|
cell_identity_key
|
str | None
|
If provided, separate contrasts will be computed for each identity. Defaults to None. |
None
|
layer
|
str | None
|
Layer in AnnData object to use. EdgeR requires raw counts. Defaults to None. |
None
|
replicas_per_group
|
int
|
Number of replicas to create for each group. Defaults to 10. |
5
|
min_cells_per_group
|
int
|
Minimum number of cells required for a group to be included. Defaults to 30. |
30
|
bootstrap_sampling
|
bool
|
Whether to use bootstrap sampling to create replicas. Defaults to True. |
False
|
use_cells
|
dict[str, list[str]] | None
|
If not None, only use the specified cells. Defaults to None. Dictionary key is a categorical variable in the obs dataframe and the dictionary value is a list of categories to include. |
None
|
aggregate
|
bool
|
Whether to aggregate cells before fitting the model. EdgeR requires a small number of samples, so if adata_ is a single-cell experiment, the cells should be aggregated. Defaults to True. |
True
|
verbosity
|
int
|
Verbosity level. Defaults to 0. |
0
|
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
Dictionary of dataframes, one for each contrast, with the following columns:
|
pseudobulk_limma
¶
pseudobulk_limma(adata: AnnData, group_key: str, condition_group: str | list[str] | None = None, reference_group: str | None = None, cell_identity_key: str | None = None, batch_key: str | None = None, layer: str | None = None, replicas_per_group: int = 5, min_cells_per_group: int = 30, bootstrap_sampling: bool = False, use_cells: dict[str, list[str]] | None = None, aggregate: bool = True, verbosity: int = 0) -> dict[str, DataFrame]
Pseudobulk differential expression analysis using limma-voom.
Aggregates single cells into pseudobulk samples, then fits a linear model with limma-voom (via R) and computes top-table statistics for each requested contrast.
Requires R with the packages limma, edgeR, MAST, and
SingleCellExperiment, as well as the Python packages rpy2 and
anndata2ri.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
Annotated data matrix. |
required |
group_key
|
str
|
Column in |
required |
condition_group
|
str or list of str or None
|
Group(s) to test against |
None
|
reference_group
|
str or None
|
Reference group for contrasts. If None, each condition group is contrasted with all remaining cells. Default is None. |
None
|
cell_identity_key
|
str or None
|
Column in |
None
|
batch_key
|
str or None
|
Column in |
None
|
layer
|
str or None
|
Layer containing raw counts required by limma/edgeR. Uses
|
None
|
replicas_per_group
|
int
|
Number of pseudobulk replicas to create per group. Default is 5. |
5
|
min_cells_per_group
|
int
|
Minimum number of cells required for a group to be included. Default is 30. |
30
|
bootstrap_sampling
|
bool
|
If True, use bootstrap sampling when creating pseudobulk replicas. Default is False. |
False
|
use_cells
|
dict or None
|
Restrict analysis to specific cell subsets. Keys are |
None
|
aggregate
|
bool
|
If True, aggregate cells into pseudobulk samples before fitting. Default is True. |
True
|
verbosity
|
int
|
Verbosity level (0 = silent). Default is 0. |
0
|
Returns:
| Type | Description |
|---|---|
dict of str to pd.DataFrame
|
One DataFrame per contrast (keyed by contrast label), with columns:
|