Input Data¶
tribal requires the input data to be first clustered into clonotypes, i.e., groups of cells that descend from the same naive B cell receptor. We recommend Dandelion to assist with this preprocessing step.
tribal requires two input files for preprocessing. The single-cell RNA sequencing data and the germline root sequences.
Sequencing data¶
Sequencing data should be provided in a csv file with the following columns:
| Column Name | Type | Description | Required |
|---|---|---|---|
| cell | str or int | unqiue id or barcode of the sequnced B cell | True |
| clonotype | str | unique clonotype id to which that cell belongs | True |
| heavy_chain_isotype | str | the isotype of the constant region of the heavy chain | True |
| heavy_chain_seq | str | the variable region sequence of the heavy chain | True |
| heavy_chain_allele | str | the v allele of the heavy chain | True |
| heavy_chain_isotype | str | the isotype of the constant region of the heavy chain | True |
| light_chain_seq | str | the variable region sequence of the light chain | False |
| light_chain_allele | str | the v allele of the light chain | False, if light_chain_seq not provided |
See data.csv for an example.
Germline clonotype roots¶
Additionally, the germline root sequences by clonotype should be provided in a csv file containing the heavy chain sequence and optionally, the light chain sequence.
| Column Name | Type | Description | Required |
|---|---|---|---|
| clonotype | str | unique clonotype id of the germline root (naive BCR) | True |
| heavy_chain_root | str | the heavy chain variable region germline root sequence | True |
| light_chain_root | str | the light chain variable region germline root sequence | False |
See roots.csv for an example.
Note
All light chain columns may be omitted if the use_light_chain argument in preprocess is False. In other words, tribal may be used with only the heavy chain BCR sequences.