Input Data¶
tribal
requires the input data to be first clustered into clonotypes, i.e., groups of cells that descend from the same naive B cell receptor. We recommend Dandelion to assist with this preprocessing step.
tribal
requires two input files for preprocessing. The single-cell RNA sequencing data and the germline root sequences.
Sequencing data¶
Sequencing data should be provided in a csv file with the following columns:
Column Name | Type | Description | Required |
---|---|---|---|
cell | str or int | unqiue id or barcode of the sequnced B cell | True |
clonotype | str | unique clonotype id to which that cell belongs | True |
heavy_chain_isotype | str | the isotype of the constant region of the heavy chain | True |
heavy_chain_seq | str | the variable region sequence of the heavy chain | True |
heavy_chain_allele | str | the v allele of the heavy chain | True |
heavy_chain_isotype | str | the isotype of the constant region of the heavy chain | True |
light_chain_seq | str | the variable region sequence of the light chain | False |
light_chain_allele | str | the v allele of the light chain | False, if light_chain_seq not provided |
See data.csv for an example.
Germline clonotype roots¶
Additionally, the germline root sequences by clonotype should be provided in a csv file containing the heavy chain sequence and optionally, the light chain sequence.
Column Name | Type | Description | Required |
---|---|---|---|
clonotype | str | unique clonotype id of the germline root (naive BCR) | True |
heavy_chain_root | str | the heavy chain variable region germline root sequence | True |
light_chain_root | str | the light chain variable region germline root sequence | False |
See roots.csv for an example.
Note
All light chain columns may be omitted if the use_light_chain
argument in preprocess
is False
. In other words, tribal
may be used with only the heavy chain BCR sequences.