FastCCC’s Outputs for Reference-based Analyses
Introduction
Unlike direct CCC analysis on the scRNA-seq dataset, the outputs of reference-based comparison analyses include the following three files:
- The
query_infer_results.tsv
file contains the results of the CCC comparison analysis between the user-provided query data and the reference panel. - The
query_interactions_strength.tsv
file includes the interaction strengths (i.e., communication scores) for each ligand-receptor interaction (columns) across cell-cell interaction pairs (rows) in the query dataset. - The
query_percents_analysis.tsv
file indicates whether the proportion of cells expressing both the ligand and receptor (nonzero expression) exceeds the given threshold in the query dataset.
- The
query_infer_results.tsv
file only includes candidate LRIs where both the ligand and receptor in the corresponding cell types have a nonzero expression proportion exceeding the given threshold (default = 10%).- A significant LRI in the results does not necessarily indicate insignificance in the reference data, and vice versa. The results leverage the accumulated large-scale data to build a reference for a more reliable null distribution. The significance of each LRI in the reference panel (if available) is also reported for comparison.
- The results obtained from the reference comparison and direct anaylsis on the query data may not be identical. However, as demonstrated in our article, the results are stable, and the conclusions are largely consistent. Moreover, for scenarios where the query data has relatively simple cell type composition, using the reference improves accuracy.
- Thus, reference-based analysis is highly effective for scenarios such as examining changes in cell-cell communication between disease and normal states, or when the user-provided query dataset is too small to generate an accurate null distribution.
Outputs of infer_query_workflow
The results of infer_query_workflow
are saved to a specified directory, with filenames structured as shown in the figure.
Example of the query_infer_results.tsv
file format
For better visualization, we have transposed the dataframe. The row names in the image correspond to the actual column names in the output.
Column Name | Description |
---|---|
sender|receiver | Sender and receiver cell types, separated by | . |
in_reference | Whether the sender and receiver cell types exist in the reference panel (case-sensitive match required or use celltype_mapping_dict , see here). |
LRI_ID | Ligand-receptor interaction ID. |
ligand | Ligand involved in the interaction. |
receptor | Receptor involved in the interaction. |
comm_score | Communication score of the ligand-receptor interaction (LRI). |
null_comm_score | Expected communication score under the null distribution in the query dataset. |
sig_threshold_CI | 95% confidence interval of the communication score threshold for significance (p = 0.05), inferred from the reference panel. |
ligand_null_ref | Expected ligand summary score under the null distribution in the reference panel. |
receptor_null_ref | Expected receptor summary score under the null distribution in the reference panel. |
above_expr_threshold | Whether both ligand and receptor expression levels exceed the predefined threshold in the query dataset. |
ligand_expr_percent | Percentage of sender cells expressing the ligand (nonzero expression) in the query dataset. |
receptor_expr_percent | Percentage of receiver cells expressing the receptor (nonzero expression) in the query dataset. |
above_expr_threshold_ref | Whether both ligand and receptor expression levels exceed the predefined threshold in the reference dataset. |
ligand_expr_percent_ref | Percentage of sender cells expressing the ligand in the reference dataset. |
receptor_expr_percent_ref | Percentage of receiver cells expressing the receptor in the reference dataset. |
ligand_CS_component | Ligand contribution to the communication score. |
ligand_CS_CI | 95% CI of ligand expression in the reference dataset, mapped to the query dataset. |
receptor_CS_component | Receptor contribution to the communication score. |
receptor_CS_CI | 95% CI of receptor expression in the reference dataset, mapped to the query dataset. |
is_significant | Whether the ligand-receptor interaction is statistically significant in the query dataset. |
is_significant_ref | Whether the ligand-receptor interaction is statistically significant in the reference dataset. |
trend_vs_ref | Change in communication score compared to the reference dataset (e.g., Up for upregulated, Down for downregulated, Both Sig or NS for unchanged). |
Example of the query_interactions_strength.tsv
file format
The format of the results is the same as that of the direct CCC analysis. For more details, refer to this link.
Example of the query_percents_analysis.tsv
file format
The format of the results is the same as that of the direct CCC analysis. For more details, refer to this link.