Functions
build_reference_workflowinfer_query_workflowlist_reference_panelsgenerate_infer_reportgenerate_reference_report
build_reference_workflow
Description
The build_reference_workflow function constructs a reference panel for cell-cell communication analysis. It processes reference count data, quantifies and ranks it, and prepares the necessary inputs for downstream CCC analysis. Additionally, the function saves the processed reference configuration and relevant files for future use.
- Reads and preprocesses reference raw count data from an
.h5adfile.- Uses HGNC symbols as gene names (i.e.,
anndata.var_namesare official gene symbols).- Count data in CSR sparse format (i.e.
type(anndata.X) == scipy.sparse.csr_matrix).- Extracts interaction information from a given ligand-receptor interaction (LRI) database.
- Configures and stores reference settings for later analyses.
Function Signature
def build_reference_workflow(
database_file_path,
reference_counts_file_path,
celltype_file_path,
reference_name,
save_path,
meta_key=None,
min_percentile = 0.1
)
Parameters
| Parameter | Type | Default Value | Description |
|---|---|---|---|
database_file_path | str | None | Path to the database directory containing the candidate LRIs. |
reference_counts_file_path | str | None | Path to the reference raw count matrix file in h5ad format. |
celltype_file_path | str | None | Path to the cell type annotation file for reference count file. If the h5ad count file already contains cell type labels, this can be set to None, and the meta_key parameter should be specified instead. |
reference_name | str | None | Name of the reference dataset. A folder with the same name as reference_name will be created under the save_path to store the reference panel. Please ensure the name is valid for file naming conventions. |
save_path | str | None | Path where the processed reference panel data will be saved. Used together with reference_name, i.e., save_path/reference_name/. |
meta_key | str or None | None | Metadata key specifying the column in adata.obs that contains the cell type labels. |
min_percentile | float | 0.1 | Minimum percentile threshold for filtering interactions. The same parameter will be used during inference, and it is recommended to keep the default value. |
Returns
This function does not return values directly but generates and saves multiple output files in the specified save_path. Users can ignore these details—once the reference panel is built, it can be easily utilized through infer_query_workflow.
infer_query_workflow
Description
The infer_query_workflow function performs query inference using a pre-built cell-cell communication reference.
It processes query count data, applies quality control, aligns metadata, and compares the query dataset with the reference to infer cell interactions.
This function enables researchers to analyze new datasets in the context of a predefined reference.
- Reads and preprocesses query raw count data from an
.h5adfile.- Uses HGNC symbols as gene names (i.e.,
anndata.var_namesare official gene symbols).- Count data in CSR sparse format (i.e.
type(anndata.X) == scipy.sparse.csr_matrix).- The LRI database must be the same as the one used in the reference panel. (We will introduce a new feature to remove this restriction.)
Function Signature
def infer_query_workflow(
database_file_path,
reference_path,
query_counts_file_path,
celltype_file_path,
save_path,
celltype_mapping_dict=None,
meta_key=None
)
Parameters
| Parameter | Type | Default Value | Description |
|---|---|---|---|
database_file_path | str | None | Path to the database directory containing the candidate LRIs. It should be the same as the one used in build_reference_workflow |
reference_path | str | None | Path to the pre-built CCC reference dataset. i.e. your/save/path/reference_name/ |
query_counts_file_path | str | None | Path to the user-provided query raw count matrix file in h5ad format. |
celltype_file_path | str | None | Path to the cell type annotation file for query count file. If the h5ad count file already contains cell type labels, this can be set to None, and the meta_key parameter should be specified instead. |
save_path | str | None | Path where the inference results will be saved. |
celltype_mapping_dict | dict or None | None | Dictionary for mapping reference cell types to query cell types. For example, this can be used to merge more granular cell subtype categories in the reference into broader categories, ensuring consistency with the cell type annotations in the query dataset. If None, the cell type annotations stored in the reference will be used directly. See examples for details. |
meta_key | str or None | None | Metadata key specifying the column in adata.obs that contains the cell type labels. |
Returns
This function does not return values directly but generates and saves multiple output files in the specified save_path. These include:
-
query_infer_results.tsv, results comparing query interactions against the reference dataset. This is a dataframe file with tab-separated values. -
query_percents_analysis.tsv, results of the percent analysis in the user-provided query dataset. This is a dataframe file with tab-separated values. -
query_interactions_strength.tsv, the \(CS\) of each candidate LRI for each sender-receiver cell type. This is a dataframe file with tab-separated values.
list_reference_panels
Description
list_reference_panels lists healthy tissue reference panels available under a downloaded FastCCC reference root. Use it before selecting reference_tissue in generate_reference_report.
Function Signature
from fastccc.report import list_reference_panels
list_reference_panels(reference_root=None)
Parameters
| Parameter | Type | Default Value | Description |
|---|---|---|---|
reference_root | str or None | None | Folder containing FastCCC reference panel subfolders. If omitted, FastCCC checks ./reference and the source checkout reference folder. |
Returns
A list of tissue panel names whose folders contain config.toml.
generate_infer_report
Description
generate_infer_report renders an HTML report from existing reference-based inference outputs. Use this when query_infer_results.tsv already exists and you do not want to rerun inference.
Function Signature
from fastccc.report import generate_infer_report
generate_infer_report(
infer_result_dir,
database_path,
output_dir=None,
query_name="Query",
reference_name=None,
reference_path=None,
reference_tissue=None,
reference_root=None,
dpi=150,
save_individual_figures=True,
top_n_lr=25,
top_n_celltypes=20,
)
Parameters
| Parameter | Type | Default Value | Description |
|---|---|---|---|
infer_result_dir | str | required | Directory containing query_infer_results.tsv and optionally query_interactions_strength.tsv. |
database_path | str | required | LRI database directory used by the inference results. |
output_dir | str or None | None | Where to write infer_report.html; defaults to <infer_result_dir>/infer_report/. |
query_name | str | "Query" | Query label shown in the report. |
reference_name | str or None | None | Reference label shown in the report. If omitted with reference_tissue, the report derives a healthy tissue label. |
reference_path | str or None | None | Optional custom reference panel path for report metadata. Mutually exclusive with reference_tissue. |
reference_tissue | str or None | None | Optional healthy tissue panel name under reference_root. |
reference_root | str or None | None | Folder containing healthy tissue reference panels. |
dpi | int | 150 | Resolution for saved figure PNGs. |
save_individual_figures | bool | True | Whether to save each figure as a PNG in addition to embedding it in HTML. |
top_n_lr | int | 25 | Number of top L-R pairs shown in dotplots. |
top_n_celltypes | int | 20 | Number of top cell types shown in heatmaps and summaries. |
generate_reference_report
Description
generate_reference_report runs reference inference and renders the HTML reference report in one call. Users can choose a FastCCC healthy tissue panel by name, for example reference_tissue='liver', or pass reference_path for a custom reference panel.
Function Signature
from fastccc.report import generate_reference_report
generate_reference_report(
database_path,
query_counts_file_path,
infer_result_dir,
celltype_file_path=None,
output_dir=None,
reference_path=None,
reference_tissue=None,
reference_root=None,
celltype_mapping_dict=None,
meta_key=None,
query_name="Query",
reference_name=None,
dpi=150,
save_individual_figures=True,
top_n_lr=25,
top_n_celltypes=20,
debug_mode=False,
)
Parameters
| Parameter | Type | Default Value | Description |
|---|---|---|---|
database_path | str | required | LRI database directory. It must match the database used to build the selected reference panel. |
query_counts_file_path | str or AnnData | required | Query raw-count dataset. Reference workflows perform rank preprocessing internally. |
infer_result_dir | str | required | Directory where reference inference TSV outputs are saved. |
celltype_file_path | str or None | None | Optional query cell-type metadata file. Use meta_key when labels are already in .obs. |
output_dir | str or None | None | Where to write infer_report.html. |
reference_path | str or None | None | Custom FastCCC reference panel directory. Mutually exclusive with reference_tissue. |
reference_tissue | str or None | None | Healthy tissue panel name under reference_root, for example liver. |
reference_root | str or None | None | Folder containing healthy tissue panel folders. |
celltype_mapping_dict | dict or None | None | Mapping from reference cell type name to query cell type name when granularities differ. |
meta_key | str or None | None | Query .obs column containing cell type labels. |
query_name | str | "Query" | Query label shown in the report. |
reference_name | str or None | None | Reference label shown in the report. |
dpi | int | 150 | Resolution for saved figure PNGs. |
save_individual_figures | bool | True | Whether to save each figure as a PNG in addition to embedding it in HTML. |
top_n_lr | int | 25 | Number of top L-R pairs shown in dotplots. |
top_n_celltypes | int | 20 | Number of top cell types shown in heatmaps and summaries. |
debug_mode | bool | False | Passed to reference inference for debugging output. |
Healthy tissue panel example
from fastccc.report import generate_reference_report
report_path = generate_reference_report(
database_path = './db/CPDBv5.0.0',
query_counts_file_path = '../data/clean/liver_query_disease_exp1.h5ad',
infer_result_dir = './results/PBC_vs_healthy_liver',
output_dir = './report/PBC_vs_healthy_liver',
reference_tissue = 'liver',
reference_root = './reference',
meta_key = 'cell_type',
query_name = 'PBC',
)
Use fastccc.report.list_reference_panels('./reference') to list tissue panel names available under a downloaded reference root. The selected panel must use the same LRI database as database_path.
Custom reference panel example
Use reference_path when the baseline is a user-built control panel.
from fastccc.report import generate_reference_report
report_path = generate_reference_report(
database_path = './db/CPDBv5.0.0',
query_counts_file_path = './data/disease_raw_counts.h5ad',
infer_result_dir = './results/disease_vs_control_reference',
output_dir = './report/disease_vs_control_reference',
reference_path = './reference/my_control_panel',
meta_key = 'cell_type',
query_name = 'Disease',
reference_name = 'Control',
)
Report contents
The reference report includes global trend summaries, sender-receiver heatmaps, top L-R dotplots, communication-score distributions, pathway breakdowns, a cell-type reference explorer, and a figure-generation audit. The cell-type explorer lets users switch among cell types, sender/receiver roles, L-R pairs, and annotated pathway profiles.
Version Information
- Author: Siyu Hou
- Version: early access
- Last Updated: 2026-05-22