DH cluster 2 has a significant deletion
We analyzed the DH regions of the recent assembly of the Bos taurus immunoglobulin heavy chain locus [29]for features associated with the ultralong IGHD8-2 region. Of particular note, the DH regions at the heavy chain locus are divided into “clusters” that arose from duplication events through evolution. The IMGT naming nomenclature for DH regions includes numerical designations for the family and cluster of each gene; for example, IGHD3-2 is in family 3 and located in cluster 2 [16, 31-33]. There are four clusters, with clusters 2-4 being homologous with nucleotide identities of 92% (cluster 2 vs cluster 3), 99.7% (cluster 3 vs cluster 4), and 92% (cluster 2 vs cluster 4). The sequences of the DH regions located within the clusters are also homologous, with DH regions occupying analogous locations being 96% to 100% identical at the nucleotide level (Supplemental Figure 1). A major discrepancy in the cluster sequences, however, is that cluster 2 (3480 nucleotides) is 358 and 364 nucleotides shorter than clusters 3 (3838 nt) and 4 (3844 nt), respectfully. Additionally, cluster 2 is comprised of only five DH regions, with one of them being the ultralong IGHD8-2, whereas clusters 3 and 4 are comprised of six DH regions (Figure 1). Thus, cluster 2 appears to have a significant genomic deletion in relation to the highly homologous clusters 3 and 4. We hypothesized that this deletion might be related to formation of the ultralong IGHD8-2 region located in cluster 2. In simplistic terms, one explanation for formation of an ultralong DH region would be by fusion of two DH regions through deletion of intragenic sequence, with the fusion maintaining recombination signal sequences of each DH at both the 5’ and 3’ ends.
Cluster 2 has a short chromosomal rearrangement
To evaluate the location of the deletion in cluster 2 relative to clusters 3 and 4, we performed a series of sequence alignments of the clusters, the DH regions, and the intergenic regions (between DH regions). Indeed, the deletion in cluster 2 in relation to clusters 3 and 4 occurred at IGHD8-2, however the deletion was also associated with a larger chromosomal rearrangement. In this regard, IGHD5-2 in cluster 2 appears to have replaced the paralog for IGHD3-3 (cluster 3) and IGHD3-4 (cluster 4)(Figure 1, Supplemental Figures 2-3). The IGHD5 homologs are immediately 5’ of the IGHD6 family members in clusters 3 and 4, however IGHD5-2 is situated immediately 3’ of IGHD2-2 and immediately 5’ of the ultralong IGHD8-2 region in cluster 2 (Figure 1). There is no IGHD3 family member in cluster 2 (Supplemental Figure 3), with the paralog of IGHD3-3 and IGHD3-4 either deleted or fused to the adjacent DH region, which would be a paralog of IGHD7-3 (cluster 3) or IGHD7-4 (cluster 4). Global alignments of the clusters show deleted nucleotides at IGHD8-2 as well as the position occupied by family 5 genes in clusters 3 and 4 (e.g. between IGHD7 and IGHD6). Alignments of the intergenic regions show that the intergenic region corresponding to the sequence between IGHD3-3 and IGHD7-3 in cluster 3 (or IGHD3-4 and IGHD7-4 in cluster 4) is deleted in cluster 2 (Supplemental Figure 4). While IGHD5-2 has been transposed to a location 3’ to IGHD8-2, the actual genetic material deleted clearly includes IGHD3 and its 3’ intergenic region. Thus, one possibility is that the ultralong IGHD8-2 region resulted from a deletion and associated fusion of the cluster 2 paralogs of IGHD3-3 and IGHD7-3. However, local sequence alignment reveals that the 5’ end of IGHD6-3 is 91.2% identical to IGHD8-2 (89.4% for IGHD6-2) over the first 85 nucleotides, whereas IGHD3-3 (and IGHD3-4) is only 80% identical over the first 62 residues (Supplemental Figure 7). Of note, IGHD6 family sequences share a cysteine in the same position as the conserved cysteine in IGHD8-2, which is highly conserved in deep sequenced ultralong CDR H3 antibodies, and participates in a conserved disulfide bond at the base of the ultralong CDR H3 stalk [23, 26]. Thus, donation of an IGHD6 to the 5’ end of an IGHD7 through a recombinational or gene conversion process is a likely mechanism to produce IGHD8-2. Given the high sequence similarity of many of the DH regions and intergenic regions, we cannot definitively identify exact chromosomal breakpoints and cannot rule out that other events could have occurred in conjunction with the deletion event of the intragenic region between IGHD3 and IGHD7. For example, gene conversion could alternatively have occurred between IGHD6 and IGHD7 paralogs, or a deletion event followed by insertions of repeats into an IGHD7 paralog could have occurred. However, RSS analysis indicates that the 5’ RSS of IGHD8-2 shares identity with either IGHD3 or IGHD6 families (Table 1), thus a fusion between IGHD6 and IGHD7 or gene conversion of IGHD6 into IGHD3 followed by fusion to IGHD7 are likely mechanisms to production of the IGHD8-2 gene through a fusion event. The 3’ RSS of IGHD8-2 is identical to IGHD7 genes, and local alignments show homology between IGHD7 and IGHD8-2, suggesting that a primordial IGHD7 paralog from cluster 2 now forms the 3’ region of IGHD8-2.
DH genes have expanded repeats
Bovine IGHD regions are comprised of multiple repeating short sequence motifs, with the major differences between several DH regions being length differences due to variable numbers of nucleotide repeats (Figure 2). IGHD7-4 is the second longest DH region, and only differs from IGHD7-3 (its paralog in cluster 3) by one repeat of TGGTTA, which results in a two amino acid insertion. IGHD7-3, IGHD7-4 and IGHD8-2 (the ultralong DH region) are very similar in having several repeating units, but with IGHD8-2 being dramatically longer. The 3’ ends of IGHD7-3 and IGHD7-4 are 85.6% and 77.4% identical to IGHD8-2 over the last 96 nucleotides, respectively (Supplementary Figure 5). The longer DH regions appear to be evolutionarily active in length evolution based on expanding or contracting repeats, as polymorphisms in Bos taurus IGHD8-2 differ in repeat lengths (Figure 2, Supplemental Figure 6). In this regard, two IGHD8-2 polymorphisms have been reported that differ in length and cysteine position, but share similar repeating nucleotide and amino acid sequences[29, 30, 34]. Related species like Bos grunniens (domestic Yak) and Bison bison (American buffalo) also have ultralong CDR H3 regions encoded by IGHD8-2 orthologs, but differ in their lengths due to apparent differences in hexanucleotide repeat expansion within the coding regions (Figure 2, Supplemental Figure 6). Thus, while two DH genes may have fused to form the long IGHD8-2 gene, nucleotide repeat expansion or contraction appears to also play a role in long DH region evolution in these species.