Skip to content

Input Data

tribal requires the input data to be first clustered into clonotypes, i.e., groups of cells that descend from the same naive B cell receptor. We recommend Dandelion to assist with this preprocessing step.

tribal requires two input files for preprocessing. The single-cell RNA sequencing data and the germline root sequences.

Sequencing data

Sequencing data should be provided in a csv file with the following columns:

Column Name Type Description Required
cell str or int unqiue id or barcode of the sequnced B cell True
clonotype str unique clonotype id to which that cell belongs True
heavy_chain_isotype str the isotype of the constant region of the heavy chain True
heavy_chain_seq str the variable region sequence of the heavy chain True
heavy_chain_allele str the v allele of the heavy chain True
heavy_chain_isotype str the isotype of the constant region of the heavy chain True
light_chain_seq str the variable region sequence of the light chain False
light_chain_allele str the v allele of the light chain False, if light_chain_seq not provided

See data.csv for an example.

Germline clonotype roots

Additionally, the germline root sequences by clonotype should be provided in a csv file containing the heavy chain sequence and optionally, the light chain sequence.

Column Name Type Description Required
clonotype str unique clonotype id of the germline root (naive BCR) True
heavy_chain_root str the heavy chain variable region germline root sequence True
light_chain_root str the light chain variable region germline root sequence False

See roots.csv for an example.

Note

All light chain columns may be omitted if the use_light_chain argument in preprocess is False. In other words, tribal may be used with only the heavy chain BCR sequences.