

AbAtlas
1 Introduction
AbAtlas is a tool for dimensionality reduction and visualization of antibody sequences, capable of mapping antibody sequences into two-dimensional and three-dimensional graphics. This tool utilizes data from the Observed Antibody Space1(OAS) database, which includes heavy and light chains from six major species (human, mouse, rat, rhesus monkey, camel, and rabbit), as well as their germline genes. By combining AntiBERTy2 and UMAP3, AbAtlas generates high-quality sequence embeddings and effectively performs dimensionality reduction. Simply input an antibody sequence, and AbAtlas will automatically analyze the sequence and visually display its similarity to antibody chains from different species or various V gene families through graphical representations, allowing for the rapid identification of features in the input sequence.
2 Parameters
- Sequences: Input the nuleic acid or amino acid sequences of the Fv region of the antibody heavy or light chain.
- Species: Select a specific specie for subsequent V/D/J gene matching, or you may select “Match” , the tool will automatically match the species.
- Cgene Species: Select a specific specie for C gene matching, currently only human, mouse and rat are supported, if you select "None", then the toll will not perform C gene matching in the subsequent analysis.
- Components: 3D or 2D.
- Annotation: Human V gene family or Species can be selected as the basis for annotation.
- Control Samples: Number of control samples, with a value range from 10,000 to 100,000. This number of samples will be sampled from the chains of various species in the OAS data as controls.
3 Results Explanation
Chart: The 2D or 3D plot, with red diamond-shaped points representing the input sequences and circular points representing the control samples.

Figure 1. 2D Chart of AbAtlas on 20,000 control samples.

Figure 2. 3D Chart of AbAtlas on 20,000 control samples.
4 Reference
[1] Olsen T H, Boyles F, Deane C M. Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences[J]. Protein Science, 2022, 31(1): 141-146. https://doi.org/10.1002/pro.4205
[2] Ruffolo J A, Gray J J, Sulam J. Deciphering antibody affinity maturation with language models and weakly supervised learning[J]. arXiv preprint arXiv:2112.07782, 2021. https://doi.org/10.48550/arXiv.2112.07782
[3] McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction[J]. arXiv preprint arXiv:1802.03426, 2018. https://doi.org/10.48550/arXiv.1802.03426

