Breeding column | 6-New genotyping technology: FBI seq

Frontier sharing

2022-10-28


I originally planned to introduce the targeted sequencing technology of genotyping this week, but I didn't want to know the latest genotyping technology - FBI seq: A prolific and robust whole genome genotyping method using PCR amplification via prime template mismatched annexing - developed by Chang Yuxiao of the Shenzhen Institute of Agricultural Genomics, Chinese Academy of Agricultural Sciences, recently. So let's get to know this new technology first.

 

FBI seq discovery

FBI seq (Foreground and Background Integrated genotyping by sequencing) is not sequencing by the US Federal Bureau of Investigation, but genotyping sequencing integrating foreground and background. As the name implies, this technology realizes the detection and selection of foreground genes and genetic background at the same time. The selection of foreground and background is two very important steps in molecular breeding. At present, breeders often need to carry out these two steps independently: first, screen foreground target sites, and then develop a large number of probes/bait/PCR primers to detect background genotypes. These time-consuming and costly preparations greatly delay the start of breeding projects.

In general, when we design PCR primers for genotype detection, we often hope that the amplification products are specific, that is, they only occur in a certain position of the genome, so that we can accurately detect the target site. However, nonspecific amplification is common in PCR reaction, and people usually use methods such as optimizing primer design, PCR reaction system to reduce or inhibit nonspecific amplification. This study, contrary to its practice, studied how to enhance non-specific amplification. The previous view was that non-specific amplification was caused by the annealing of primers at their homologous or partial homologous sites, and the number was generally small. In this study, it was unexpectedly found that there are tens of thousands of Primer Template Mismatched Annexing (PTMA) sites on the genome of any PCR primer, in addition to the Primer Template Perfect Annexing (PTPA) sites between the primer and the DNA template, and the amplification of PTMA is stable and repeatable. Based on this unexpected discovery, researchers developed FBI seq.

 

FBI seq process

In order to efficiently isolate unknown sequences flanking known DNA fragments, Teacher Chang's team has previously developed the flanking sequence tag sequencing method (FST seq). FST seq uses specific nested primer annealing to obtain target sequences and add splices by Tn5 transposase. At first, researchers wanted to use different transposons to carry out efficient splice connection through Tn5 transposase, and to amplify a large number of FSTs as molecular genetic markers, instead of designing corresponding primers for each site as in conventional research.

When analyzing the sequencing data, they accidentally found three types of reads because they inadvertently used the primer with the Rim2 transposon instead of the original nested primer. Each type of reads is different, because the sequence starting from read1 is unique, and when multiple reads are compared to the same area, tags are formed. The first type of PTPA reads/tags accounts for 1.1%, starting from the Rim2 primer sequence (FBI-Rim2) and ending at about 350 bp of the transposon. The second type of PTMA accounts for 43.8%. Its mode is similar to PTPA, but it contains soft clipped bases of different lengths at the beginning of read1 (that is, some bases cannot be compared to the genome, but remain in reads). The amplification of PTMA came from the dislocation annealing of FBI-Rim2 primer template. The rest of the third type of reads account for 55.1%. They are randomly compared in the whole genome, and most of them do not contain any sequence related to FBI-Rim2. This is a by-product based on Tn5 application and comes from a typical genome wide low depth sequencing library.

 

 

 

FBI seq process and its characteristics:

(A) Complete steps of FBI seq:

  • Genomic DNA fragmentation and Tn5 transposase junction.
  • In primary PCR, the flanking sequence of target gene (blue box) is amplified by PTPA of FBI-SP primer. Tens of thousands of loci (red box) scattered in the whole genome are also amplified by PTMA of FBI-SP and primer 2.
  • In the secondary PCR, all sequence elements based on Illumina sequencing are added.
  • The final library structure is prepared for Illumina based sequencing.

  

  

Minor question: The MGI platform has not been tested?

(B) IGV shows three types of reads/tags generated by a single FBI seq library (namely, PTPA reads/tags, low coverage reads, and PTMA reads/tags).

(C) Stacked histogram shows the sequence of FBI-SP PTMA site in R498 genome. PTMA tags generated by soft clipped read1 of 7 -, 8 - and 9 nt have Rim2 primer sequences at the bottom.

(D) The histogram shows the distribution of the number of PTMA tags detected by soft clipped read1 sequences of different lengths.

(E) The heat map shows the distribution of 306K PTMA labels detected.

 

FBI seq verification

Using a single specific primer to detect these three types of reads is of great significance for genome wide genotyping. First, PTPA reads can be used to track the foreground markers of target genes, while PTMA reads represent more than hundreds of thousands of background markers, so both foreground and background genotyping can be performed at the same time. In order to evaluate the use, applicability and performance of FBI seq, the authors evaluated whether PTPA and PTMA can also occur in different foreground genes or genome regions with FBI seq primers other than FBI Rim2.

By designing specific primers targeting rice Pi2 gene, it was determined that FBI-Pi2-1 could amplify Pi2 gene through PTPA. By introducing modifications similar to FBI seq into the Mu seq method, replacing nested primers with non nested primers, and amplifying the flanking sequences of isolated (Ds) elements inserted into maize mutant lines, a large number of PTMA can be detected. Therefore, it shows that FBI seq has the ability to design primers for any interested genomic site, and can simultaneously amplify PTPA and PTMA reads. Further, three rice varieties (R498, ZF802 and 9311) were subjected to FBI seq, and each rice variety was repeated three times with FBI Rim2 primer, which proved that the amplification initiated by PTMA was stable and repeatable.

FBI seq application

The selection of foreground and background is very important in backcross breeding × VE6219).

CNG1 is susceptible to rice blast, while VE6219 carries the rice blast resistance gene Pi2 on chromosome 6. The whole genome of the parents was re sequenced and 1.9 million SNPs were found between CNG1 and VE6219. In addition to this FBI-Pi2-1 primer, another Pi2 gene specific primer (FBI-Pi2-2) was designed. Its annealing site was 111 bp away from FBI-Pi2-1, and it was mixed together for the preparation of the FBI-seq library to test whether the two FBI-seq primers would interfere with the expansion and detection of multiple foreground sites. Therefore, five BC2F4 systems were FBI-seq by mixing FBI-Pi2-1 and FBI-Pi2-2.

Through the analysis of PTPA reads starting from FBI-Pi2-1 and FBI-Pi2-2, it is proved that FBI seq can be used for direct prospect selection of multiple genes at the same time. It is determined that these five rice lines all carry the Pi2 allele from the donor parent VE6219. Then divide the SNP detected into low depth (with depth<3 ×)、 High depth (with depth ≥ 3 × But missing rate>20%) and five systems share SNPs (with reads coverage depth ≥ 3 × And missing rate ≤ 20%). The binmap map was constructed. the results of their background genotyping were highly consistent, and the genetic background recovery rate was 83.8%~93.0%. Especially near the Pi2 region of chromosome 6, because all five lines contain the VE6219 resistance allele.

 

 

Prospect and background genotyping of rice BC2F4 population by using FBI seq

(A) Circos diagram of SNP number distribution in 1Mb sliding window: from outside to inside: (i) total number of parent SNPs; (ii) Number of SNPs shared by five BC2F4 series; (iii) Number of high depth (>3x) SNPs in the 8GWB5-54 system; (iv) 8GWB5-54 low depth (<3 ×) The number of SNPs for.

(B) IGV shows the binding position of FBI-Pi2-1 and FBI-Pi2-2 primers on the genome and the deep coverage of PTPA reads on adjacent regions.

(C) Multi sequence alignment of Pi2 gene sequence among Nipponbare, CNG1 and VE6219. The two blue horizontal arrows represent the primer binding regions of FBI-Pi2-1 (1-19 bp) and FBI-Pi2-2 (112-131 bp). The four vertical arrows represent SNPs among three rice varieties in the region.

( D ) CNG1 × The schematic diagram of foreground and background selection in VE6219 group. The black line on chromosome 6 indicates the target gene Pi2. Sample names of five BC2F4 strains: ① 8GWB5-54, ② 8GWB5-67, ③ 8GWB5-83, ④ 8GWB6-61, and ⑤ 8GWB6-69. In addition to rice, the author also used four specific primers for the FBI seq gene on tomato and cowpea to illustrate its applicability in different species.

 

Comparison of genotyping methods

Finally, the author also compared the FBI seq with the known common genotyping methods, highlighting the advantages of FBI seq.

 

 

In general, FBI seq is a new method that integrates both foreground and background genotyping. In sequencing, only one specific primer is needed, in which PTPA is used for foreground genotyping and PTMA is used for background genotyping. Unlike gene chips, multiple PCR or genome targeting sequencing, FBI seq requires only a few preliminary work such as primer design, and is applicable to different prospective genes and species. Therefore, it is easy to meet the needs of researchers and breeders, and has important application potential in crop variety identification, germplasm resource identification, genetic map construction, genetic relationship prediction and other fields. However, the actual effect still needs time to verify.

 

Data analysis pipeline

Thankfully, the author disclosed the data analysis process and script of PTMA tags detection in FBI-seq: https://github.com/dashengzhao/FBI-seq 。

 

The process starts from the bam files after comparison, sorting and duplicate removal:

  1. Extract the comparison result of read1 in the bam file;
  2. Split the sam file into two files;
  3. Calculate the base depth in one of the sam files;
  4. Calculate the base depth in another sam file;
  5. Detect PTMA tags in the positive and negative chains of the reference genome;
  6. PTMA tags combining positive and negative chains;
  7. Obtain all the labeled regions on the genome that bind to the FBI seq primer.

 

Interested friends can test this process, but the key lies in the primer design and library construction in the early stage.


Related recommendations

Breeding column | 7 - Targeted sequencing of genotyping

Targeted sequencing is a method to isolate, enrich and sequence a group of target genes or genome regions. This method enables researchers to focus time, cost and data analysis on specific regions of interest (target regions, genes), and use less data to obtain higher sensitivity and accuracy, so as to achieve rapid screening of mutation sites. These target regions usually include exome (the protein coding part of the genome), specific genes of interest (customized content), and target regions in genes or mitochondrial DNA.

11-04

2022

Breeding column | 6-New genotyping technology: FBI seq

FBI seq (Foreground and Background Integrated genotyping by sequencing) is not sequencing by the US Federal Bureau of Investigation, but genotyping sequencing integrating foreground and background. As the name implies, this technology realizes the detection and selection of foreground genes and genetic background at the same time. The selection of foreground and background is two very important steps in molecular breeding. At present, breeders often need to carry out these two steps independently: first, screen foreground target sites, and then develop a large number of probes/bait/PCR primers to detect background genotypes. These time-consuming and costly preparations greatly delay the start of breeding projects.

10-28

2022

Invitation | S371 Biobin Data Sciences invites you to participate in the 4th China North Seed Industry Expo!

Deeply cultivate the agricultural core, and work together for the future. In response to the call of the country to build a "China core" of seed industry, and to promote the joint construction of the "Northern Seed City", Phoenix Expo was officially upgraded to the "China Northern Seed Expo" on the basis of successfully holding three sessions of China Shandong International Vegetable Seed Expo, focusing on new varieties and new technologies such as field, vegetables, fruits and vegetables, horticulture, balcony agriculture, and combining cloud seed industry, cloud plant protection, cloud agricultural science platform, Build online and offline promotion and trading dual platforms for seed breeding. The 2022 Northern Species Expo will be linked with the 29th Shandong Double Trade Fair for Plant Protection, China Shandong International New Fertilizer Exhibition, and China (Shandong) New Agricultural Equipment Exhibition!

10-25

2022

Special Column on Breeding | 5 - Low Depth Resequencing of Genotyping

The genotype detected by the method of high depth re sequencing is undoubtedly the most comprehensive, but at present, the cost of application in animal and plant breeding is too high, especially for those species with complex and huge genomes. As mentioned in the previous issue, researchers usually use a unique library construction method to carry out simplified genome sequencing (RAD seq), thereby reducing the cost of genotyping. However, the amount of simplified genome data is generally only 1~10% of the total genome, and a lot of information is still lost. Pool sequencing is also an effective way to reduce the cost of population research, but it cannot analyze individuals, which has little effect on animal and plant breeding.

10-21

2022

Accelerate the application of artificial intelligence and lead the precision breeding BIOBIN

On October 13, the agricultural industry observed and planned a series of future agricultural activities. This activity focused on the theme of "the core future, the road to commercialization of biological breeding". It was hoped that through the interpretation and sharing of the breeding innovation layout of breeding companies and agricultural science and technology companies, new trends, industrialization and commercialization of biological breeding, the new picture, new business and new models of biological breeding would be revealed, and the possibility of future seed industry would be jointly looked forward to. Wu Xin, chief technology officer of Biocloud, participated in this activity and shared the theme of "accelerating the application of artificial intelligence and leading precision breeding". The following is a summary of the essence of the speech.

10-18

2022

Special Column on Breeding | Solid Phase Chip for 4-Genotyping

The layman who hears "gene chip" for the first time can easily connect it with the electronic chip of industrial integrated circuit. In fact, except that they all use micro technology to make the appearance more similar, they have nothing to do with each other and are purely porcelain. Gene chip, also known as DNA chip, biochip or DNA microarray, is based on the principle of specific interaction between molecules, integrating discontinuous analysis processes on the surface of solid phase chips such as silicon or glass, to achieve accurate, rapid and large detection of cells, proteins, genes and other biological components. According to specific scientific research and application contents, gene chips can be subdivided into microarray comparative genomic hybridization (a-CGH) chips, microRNA chips, SNP chips, expression profile chips, DNA methylation chips and chromatin immunoprecipitation chips.

10-07

2022