

AbForest
1 Introduction
AbForest is a clonal lineage evolution analysis tool specifically designed for B cell immune repertoires. It can comprehensively simulate the entire process from clonal expansion, somatic hypermutation (SHM), and isotype switching to antigen-driven clonal selection, enabling a complete reconstruction of antibody developmental trajectories. Starting from raw sequencing data, the tool sequentially performs germline alignment, sequence filtering, clonal grouping, AI likelihood prediction, and constructs evolutionary trees using the maximum parsimony method. This integrates the full-chain antibody evolution analysis workflow, ultimately generating a set of B cell clonal lineage trees.
The workflow includes: first, accurately identifying the ancestral Fv sequences and VDJ segments through germline alignment; then classifying clonotypes based on the ancestral sequences and applying the IMGT numbering scheme for precise localization and functional region annotation, ensuring accurate alignment of key variable regions. Next, the maximum parsimony method is used in combination with somatic hypermutation and isotype switching features. Finally, sequence and clone abundance information, along with generation likelihood scores calculated by a pre-trained antibody language model, are used to visually annotate the leaves of the phylogenetic tree, intuitively displaying sequence rarity and potential functionality to facilitate in-depth analysis of antibody clonal evolution dynamics.

Figure 1. Workflow of AbForest
2 Parameter Description
- Heavy Chain Sequences: Input for heavy chain DNA sequences from single-cell sequencing file.
- Light Chain Sequences: Optional input for the light chain DNA sequences from the single-cell sequencing file, with each light chain sequence corresponding to a heavy chain sequence.
- VDJ Germline Species:
- Human: Matches human VDJ genes (Homo sapiens).
- Mouse: Matches mouse VDJ genes (Mus musculus).
- Alpaca (H): Matches alpaca VDJ genes (Vicugna pacos).
- Rat: Matches rat VDJ genes (Rattus norvegicus).
- Rhesus Monkey: Matches rhesus monkey VDJ genes (Macaca mulatta).
- Rabbit: Matches rabbit VDJ genes (Oryctolagus cuniculus).
- HUGO H3K3: Matches HUGO fully humanized antibody mouse VDJ genes.
- C Germline Species:
- Human: Matches human C genes (Homo sapiens).
- Mouse: Matches mouse C genes (Mus musculus).
- Clonotype Definition Method:
- Strict: Groups sequences with identical germline Fv region (including FWR and CDR) nucleotide sequences into the same clonotype, suitable for analyses requiring very high clonal diversity resolution.
- General: Groups sequences with the same VJ germline genes and identical CDR3 amino acid length into the same clonotype, allowing certain sequence variations, suitable for broader clonal population definitions.
3 Result Description
Chart: B cell clonal lineage tree
Each clonotype corresponds to one tree, with each tree having a primary ancestor. Nodes filled in gray represent inferred ancestors, while nodes with other colors represent real sequences; the darker the color, the lower the AI likelihood score. Node border colors indicate isotypes; node sizes represent sequence abundance; branch values indicate the genetic distances between node sequences.

Figure 2. Example of a B cell clonal lineage tree
4 References
[1] Weber LL, Reiman D, Roddur MS, Qi Y, El-Kebir M, Khan AA. TRIBAL: Tree Inference of B cell Clonal Lineages. bioRxiv [Preprint]. 2023 Nov. https://doi.org/10.1016/j.xgen.2024.100637

