Command line tool (cli)¶
tribal
can also be run as a command line tool.
❯ tribal -h
usage: tribal [-h] {preprocess,fit} ...
Tribal CLI Tool
positional arguments:
{preprocess,fit} Sub-commands
preprocess Preprocess data
fit B cell lineage tree inference
optional arguments:
-h, --help show this help message and exit
Overview¶
The cli has two sub-commands:
1. preprocess - filter the data and find a multiple sequence alignment and parsimony forsest for each clonotype.
2. fit - infer a set of optimal B cell lineage trees per clonotype an a shared isotype transition probability matrix.
Tip
It is recommended to use the preprocessing tool to prepare the input data to the proper format for tribal
.
Preprocess¶
The preprocessing command will: 1. filter out clonotypes that are below the minimum number of cells 2. filter out cells which have v alleles that differ from the majority of the clonotype 3. perform a multiple sequence alignment (MSA) for each valid clonotype using mafft 4. infer a parsimony forest for each clonotype given the MSA using dnapars
Usage¶
❯ tribal preprocess -h
usage: tribal preprocess [-h] -d DATA -r ROOTS -e ENCODING [--min-size MIN_SIZE] [--dataframe DATAFRAME] [-o OUT]
[-j CORES] [--heavy] [-v]
optional arguments:
-h, --help show this help message and exit
-d DATA, --data DATA filename of csv file with the sequencing data
-r ROOTS, --roots ROOTS
filename of csv file with the root sequences
-e ENCODING, --encoding ENCODING
filename isotype encodings
--min-size MIN_SIZE minimum clonotype size (default 4)
--dataframe DATAFRAME
path to where the filtered dataframe with additional sequences and isotype encodings should be
saved.
-o OUT, --out OUT path to where pickled clonotype dictionary input should be saved
-j CORES, --cores CORES
number of cores to use (default 1)
--heavy only use the heavy chain and ignore the light chain
-v, --verbose print additional messages
Input¶
The --data
, --roots
and --encoding
are required arguments. See data description for more details. The --encoding
argument should be that path to a text file that lists the correct isotype ordering as well as the isotype labels that are present in the input data.
IGHM
IGHG3
IGHG1
IGHA1
IGHG2
IGHG4
IGHE
IGHA2
isotype labeling
Be sure that the labels used in the encoding file exactly match the labeling syntax in the input data. There is no standard convention for isotype labels, e.g., IgM, M, IghM and IGHM, and therefore the convention must be provided by the user.
Output¶
The main output from the preprocess
sub-command is the a pickled dictionary storing Clonotype
objects for each retained clonotype in the data. This output file will be used as the input to the fit
sub-command.
Example¶
Here is an example of how to run the preprocess sub-command.
$ tribal preprocess -d data.csv -r roots.csv -e isotypes.txt -j 4 --min-size 4 --dataframe filtered.csv -o clonotypes.pkl -v
fit¶
The fit
sub-command will infer a set of B cell lineage trees for each clonotype and a shared isotype transition probability matrix.
Usage¶
tribal fit -h
usage: tribal fit [-h] -c CLONOTYPES --n_isotypes N_ISOTYPES [--stay-prob STAY_PROB] [-t TRANSMAT] [--niter NITER]
[--thresh THRESH] [-j CORES] [--max-cand MAX_CAND] [-s SEED] [--restarts RESTARTS]
[--mode {score,refinement}] [--score SCORE] [--transmat-infer TRANSMAT_INFER] [--verbose]
[--pickle PICKLE] [--write-results WRITE_RESULTS]
optional arguments:
-h, --help show this help message and exit
-c CLONOTYPES, --clonotypes CLONOTYPES
path to pickled clonotypes dictionary of parsimony forests, alignments, and isotypes
--n_isotypes N_ISOTYPES
the number of isotypes states to use
--stay-prob STAY_PROB
the lower and upper bound of not class switching, example: 0.55,0.95
-t TRANSMAT, --transmat TRANSMAT
optional filename of isotype transition probabilities
--niter NITER max number of iterations during fitting
--thresh THRESH theshold for convergence in during fitting
-j CORES, --cores CORES
number of cores to use
--max-cand MAX_CAND max candidate tree size per clonotype
-s SEED, --seed SEED random number seed
--restarts RESTARTS number of restarts
--mode {score,refinement}
mode for fitting B cell lineage trees, one of 'refinment' or 'score'
--score SCORE filename where the objective values file should be saved
--transmat-infer TRANSMAT_INFER
filename where the inferred transition matrix should be saved
--verbose print additional messages.
--pickle PICKLE path where the output dictionary of LineageTree lists should be pickled
--write-results WRITE_RESULTS
path where all optimal solution results are saved
Input¶
The pickled dictionary of clonotypes (clonotypes.pkl
) that is output from tribal preprocess
will be the input to tribal fit
. See Clonotype for details.
Example¶
Assuming clonotypes.pkl
is in the working directory, here is an example of how to run
tribal fit
.
tribal fit -c clonotypes.pkl -j 3 --transmat-infer transmat.txt --pickle lineage_trees.pkl
--write-results results --score objective.csv
Output¶
There are four optional outputs from tribal fit
:
1. --transmat-infer
: the inferred isotype transition probability matrix.
2. --pickle
: a dictionary with clonotype id as key and value is a LineageTreeList containing the optimal lineage trees for each clonotype.
3. --score
: a csv file containing the SHM parsimony scores and CSR likelihood.
4. --write-results
: a directory where the lineage trees files will be saved including:
+ a fasta file containing the inferred BCR sequences,
+ a csv file containing the inferred isotypes,
+ a text file containing the edge list of the lineage tree,
+ a png file containing a visualization of the lineage tree.