
ABiView
1 Introduction
ABiView is a visualization and analysis tool based on Sanger sequencing data, specifically designed for batch parsing, quality assessment, and visualization of antibody ABi sequencing files. It focuses on efficiently interpreting antibody sequencing data and identifying potential issues. Users can import multiple antibody ABi sequencing files with one click, automatically extract core sequences from each file, and generate interactive sequencing chromatograms with basic interactive operations. With one-click analysis, the tool performs quality evaluation and outputs standardized results. It also supports exporting sequences or summarized abnormality analysis tables.
2 Parameter Description
- File: Import multiple antibody ABi sequencing files with one click, supporting batch input of up to 100 files.
- Run Analysis: Perform quality analysis on antibody ABi sequencing files.
- Primer: Primer sequence, default is
GGTGCAGCTGGTGCAGTCTGG. - Region: Quality analysis region, default is 100 bp – 800 bp.
- Primer: Primer sequence, default is
- Export: Export sequences or abnormality analysis summary tables from antibody ABi sequencing files with one click.
3 Result Description
Interactive Sequencing Chromatogram
Each ABi file corresponds to one sequencing chromatogram, displaying fluorescence curves for A/T/C/G and sequencing quality. The chromatogram supports interactions such as dragging and dynamic fluorescence intensity display.
- Search: Input sequence fragments to search.
- Peak height: Adjust peak height.
- Peak width: Adjust peak width.

Figure 1. Interactive Sequencing Chromatogram
Quality Analysis Summary Table
| Field | Description |
|---|---|
| file_name | Name of the input ABI sequencing file |
| seq | DNA nucleotide sequence extracted from the ABI file |
| seq_len | Total number of base pairs in the extracted DNA sequence |
| evaluation | Evaluation result. Marked as 'abnormal' if there is an exception. |
| low_quality | Whether the quality is low. If True, it indicates low sequencing quality (region below threshold or below mean quality × factor). |
| interruption | Whether sequencing is prematurely terminated (sequence length < region end). |
| abnormal_region_count | Total count of abnormal regions (stacking peaks, decay, no signal, high GC, consecutive bases, bubbles, dispersion). |
| stacking | Number of regions with stacking peaks |
| decay | Number of regions with decay |
| no_signal | Number of regions with no signal |
| GC_high | Number of regions with high GC content |
| consecutive_base | Number of regions with consecutive identical bases |
| bubble | Number of regions with bubbles |
| disperse | Number of regions with dispersion |
Detailed Table Description
The detailed table contains the total count of abnormal regions. Abnormalities include stacking peaks, decay, no signal, high GC content, consecutive identical bases, bubbles, and dispersion.
| Field | Description |
|---|---|
| region | Detection region |
| stacking | Whether stacking peaks exist in the region |
| decay | Whether decay exists in the region |
| no_signal | Whether no signal exists in the region |
| GC_high | Whether GC content is high in the region |
| consecutive_base | Whether consecutive identical bases exist |
| bubble | Whether bubbles exist |
| disperse | Whether dispersion exists |
4 Abnormality Calculation and Definition
Abnormality Detection Workflow

Figure 2. Abnormality Detection Workflow
Low-Quality Region
Calculate low-quality flags and identify all consecutive low-quality regions.
For each base quality score, determine if it is below the low-quality threshold or less than the product of the mean quality score and the low-quality factor.
\text{low\_quality\_flags} = [ \text{score} < \text{low\_quality\_threshold} \text{ or } \text{score} < \text{mean\_quality} \times \text{low\_quality\_factor}
stacking (Stacking Peak Detection)
Determine whether a stacking peak occurs within the interval. For stacking peak detection, first sort the signals, calculate the signal intensity of the primary and secondary peaks and their respective proportions, then perform further judgment based on peak width.
- Signal sorting:
(\text{main\_peak\_base}, \text{main\_peak\_sig}), (\text{sub\_peak\_base}, \text{sub\_peak\_sig}) = \text{sorted\_sigs}[0], \text{sorted\_sigs}[1]
- Ratio calculation:
\text{main\_peak\_ratio} = \frac{\text{main\_peak\_sig}}{\text{total\_signal}}, \quad \text{sub\_peak\_ratio} = \frac{\text{sub\_peak\_sig}}{\text{total\_signal}}
- Peak shape detection:
\text{peak\_width} = \text{calculate\_peak\_width}(\text{intensity\_data}[\text{main\_peak\_base}], \text{pos})
- Overlay determination:
- Filter low sub-peak signal:
\text{if } \text{sub\_peak\_sig} < \text{sub\_peak\_min\_abs} \ \text{or} \ \text{sub\_peak\_ratio} < \text{sub\_peak\_min\_ratio}
- Use global or local average width:
\text{if } \text{global\_avg\_width} > 0
\text{if } \text{local\_width\_count} \geq 5
- Width validation:
\text{if } \text{peak\_width} < \text{avg\_width} \times \text{peak\_width\_threshold}
- Otherwise:
\text{is\_overlay} = \text{True}
no_signal (No Signal Detection)
Determine whether the signal is insignificant within the interval. For no-signal detection, check if the total signal is below the set minimum threshold, or if there is an obvious main peak.
- If the total signal is less than the minimum threshold, it is directly determined as no signal; otherwise, sort the signals to find the main peak and calculate the main peak's signal proportion:
\text{if } \text{total\_signal} < \text{total\_signal\_min} \Rightarrow \text{True}
\text{main\_peak\_ratio} = \frac{\text{main\_peak\_sig}}{\text{total\_signal}}
- If the main peak ratio is less than the minimum ratio threshold or the main peak signal is less than the minimum absolute threshold, it is determined as no signal:
\text{if } \text{main\_peak\_ratio} < \text{main\_peak\_min\_ratio} \ \text{or} \ \text{main\_peak\_sig} < \text{main\_peak\_min\_abs}
decay (Signal Decay Detection)
Determine whether the main fluorescence signal intensity gradually decreases at each position. For decay detection, analyze the main peak signal values within the specified region and perform linear fitting to determine if significant decay exists.
- Iterate through the specified region to collect main peak signal values main_peak_signals
- If the number of main peak signal values is less than min_decay_data (default 10), ignore this region; otherwise, perform linear fitting:
x = \text{np.arange(len(main\_peak\_signals))}
y = \frac{\text{main\_peak\_signals}}{\text{np.mean(main\_peak\_signals)}}
\text{decay} = \begin{cases} \text{True} & \text{if } \text{slope} < \text{slope\_threshold} \\ \text{False} \end{cases}
GC_high (High GC Content)
Calculate the GC content and determine whether it exceeds the set threshold (gc_threshold, default 0.9). First calculate the GC ratio, then determine if the GC content is high:
\text{gc\_ratio} = \begin{cases} \frac{\text{gc\_count}}{\text{region\_len}} & \text{if } \text{region\_len} > 0 \\ 0 & \text{otherwise} \end{cases}
\text{gc\_high} = \begin{cases} \text{True} & \text{if } \text{gc\_ratio} \geq \text{gc\_threshold} \\ \text{False} \end{cases}
consecutive_base (Consecutive Identical Bases)
Check for the presence of consecutive identical bases, or if the proportion of a single base exceeds 0.9. Let consecutive_base_count be the length threshold for consecutive identical bases, and region_len be the sequence length. Then calculate the existence of consecutive identical bases:
\text{consecutive\_base} = (\exists i \text{ such that repeated bases}) \ \vee \ (\exists \text{base with proportion} > \text{threshold})
bubble (Bubble Detection)
Determine whether the position with the highest signal value appears in this region, and the main peak is not obvious.
-
Calculate the total signal at the current position and the average total signal:
\text{max\_signal\_pos} = \underset{i}{\text{argmax}} \left( \text{a\_intensity}[i] + \text{c\_intensity}[i] + \text{g\_intensity}[i] + \text{t\_intensity}[i] \right)
\text{total\_peak\_signals}_i = \text{a\_intensity}[i] + \text{c\_intensity}[i] + \text{g\_intensity}[i] + \text{t\_intensity}[i]
\text{mean\_total\_peak\_signals} = \frac{1}{n} \sum_{i=1}^{n} \text{total\_peak\_signals}_i
-
Bubble determination: The total signal at the current position suddenly increases (≥ 2 times the average total signal) and the signal is dispersed (the sum of the top 2 signal proportions at the current position is < 60%, i.e., no obvious main peak):
\text{bubble} = \begin{cases} \text{True} & \text{if } \text{total\_peak\_signals}_i > 2 \times \text{mean\_total\_peak\_signals} \text{ and } \frac{\text{signals}[0] + \text{signals}[1]}{\text{total\_peak\_signals}_i} < 0.6 \\ \text{False} & \text{otherwise} \end{cases}
disperse (Signal Dispersion Detection)
The signal of a single base is detected across multiple consecutive base positions, resulting in peak broadening. Under the premise of sufficient total signal and dispersed signal, if a single base shows significant proportion across continuous or adjacent positions, it is determined as disperse.
- Main peak base proportion:
\text{Main peak base proportion}_i = \frac{\text{main\_peak\_base\_signal\_intensity}_i}{\text{total\_signal}_i}
- Cross-position proportion (a base shows significant proportion at both the current position and adjacent positions). For current base position i and adjacent position j (j = i \pm 1, i \pm 2), the cross-position proportion condition is:
\text{Cross-position proportion condition} = \left( \frac{\text{base\_signal\_intensity}_i}{\text{total\_signal}_i} > \text{threshold} \right) \land \left( \frac{\text{base\_signal\_intensity}_j}{\text{total\_signal}_j} > \text{threshold} \right)
- Let \text{disp\_candidates} be the list of base positions satisfying disperse conditions. The formula for extracting continuous disperse regions is:
\text{Continuous disperse region} = \left\{ (s, e) \mid s, e \in \text{disp\_candidates}, e - s + 1 \geq \text{min\_continuous} \right\}

