Input Data¶
tribal requires the input data to be first clustered into clonotypes, i.e., groups of cells that descend from the same naive B cell receptor.  We recommend Dandelion to assist with this preprocessing step. 
tribal requires two input files for preprocessing. The  single-cell RNA sequencing data and the germline root sequences. 
Sequencing data¶
Sequencing data should be provided in a csv file with the following columns:
| Column Name | Type | Description | Required | 
|---|---|---|---|
| cell | str or int | unqiue id or barcode of the sequnced B cell | True | 
| clonotype | str | unique clonotype id to which that cell belongs | True | 
| heavy_chain_isotype | str | the isotype of the constant region of the heavy chain | True | 
| heavy_chain_seq | str | the variable region sequence of the heavy chain | True | 
| heavy_chain_allele | str | the v allele of the heavy chain | True | 
| heavy_chain_isotype | str | the isotype of the constant region of the heavy chain | True | 
| light_chain_seq | str | the variable region sequence of the light chain | False | 
| light_chain_allele | str | the v allele of the light chain | False, if light_chain_seq not provided | 
See data.csv for an example.
Germline clonotype roots¶
Additionally, the germline root sequences by clonotype should be provided in a csv file containing the heavy chain sequence and optionally, the light chain sequence.
| Column Name | Type | Description | Required | 
|---|---|---|---|
| clonotype | str | unique clonotype id of the germline root (naive BCR) | True | 
| heavy_chain_root | str | the heavy chain variable region germline root sequence | True | 
| light_chain_root | str | the light chain variable region germline root sequence | False | 
See roots.csv for an example.
Note
All light chain columns may be omitted if the use_light_chain argument in preprocess is False.  In other words, tribal may be used with only the heavy chain BCR sequences.