Breeding column | 7 - Targeted sequencing of genotyping

Frontier sharing


  Since the development of sequencing technology, its application has developed in two extreme directions, one is large and complete genome sequencing, and the other is small and precise target sequencing.


Introduction to Targeted Sequencing

Targeted sequencing is a method to isolate, enrich and sequence a group of target genes or genome regions. This method enables researchers to focus time, cost and data analysis on specific regions of interest (target regions, genes), and use less data to obtain higher sensitivity and accuracy, so as to achieve rapid screening of mutation sites. These target regions usually include exome (the protein coding part of the genome), specific genes of interest (customized content), and target regions in genes or mitochondrial DNA.

Compared with whole genome sequencing (WGS) and whole exon sequencing (WES), targeted sequencing can remove the interference of redundant data, use a lower cost and higher depth way to maximize the use of sequencing reads, especially when the sample size is small in clinical applications. For example, in the fields of genetic mutation and tumor screening, the targeted gene sequencing panel is a very useful tool when analyzing specific mutations in a given sample. The key panel contains a group of selected genes or gene regions that have known or suspected associations with the disease or phenotype being studied. When customizing the gene panel, you can select the pre selected content or customize the design content to include the genome region of interest, which can save resources and minimize the workload of data analysis.

Targeted sequencing has been widely used in tumor gene detection, genetic disease gene detection, pathogenic microorganism detection and other fields, but its application in agricultural breeding is still following suit. The reasons are mainly as follows: first, the basic research on animals and plants is insufficient, and the amount of data and known sites are too few; Second, the applicability and innovation of methods and technologies are insufficient, and we can only play with the rest of human research; Third, the individual value of animals and plants is too low, and the breeding cost is high.

However, after years of efforts by scientific researchers, molecular breeding technology and platform based on targeted sequencing has begun to take shape, and various testing companies have also launched various products (but the principle is basically the same), and gradually played its huge application value, such as more accurate and rapid positioning of candidate genes related to specific traits, action/plant line identification, superior trait screening, and development of molecular markers for auxiliary breeding, Population genetics analysis and germplasm genetic relationship analysis.


Principle of targeted sequencing

In terms of technical principles, targeted sequencing can be divided into two types: hybridization capture sequencing and multiple amplification sequencing.


Liquid phase hybridization capture sequencing

Hybridized capture sequencing refers to the design of biotinylated probes complementary to the target fragment to hybridize it with the fragment containing the target gene and enrich the target gene fragment for high-throughput sequencing. According to the different supports, the probe hybridization capture technology can be divided into liquid phase hybridization and solid phase hybridization. Solid phase hybridization has been basically eliminated due to its disadvantages in cost and operation; Liquid phase hybridization is to directly hybridize the target fragment with the biotin labeled probe in solution, and then use the magnetic beads coated with streptavidin to adsorb the fragment hybridized to the biotin probe. After washing the free DNA, the enriched DNA was amplified to construct a high-throughput sequencing library.

Liquid phase hybridization capture sequencing is based on the principle of base complementary pairing, designed and synthesized nucleic acid probes, and sequenced the DNA library after hybridization enrichment of target regions based on the liquid environment. It is applicable to the detection of genome target regions of several kb to hundreds of Mb, and can detect SNV, InDel, CNV, SV, gene fusion and other variations.



Sequencing of Multiplex PCR Amplifiers

Multiplex PCR primer is designed to amplify and sequence the target region of interest. It is usually applicable to detect tens to thousands of sites, or areas below tens of kb.



The comparison between the two targeted sequencing methods and the solid chip is shown in the following table:



With regard to the application of targeted sequencing in animal and plant breeding, multiplex PCR has been used less, and liquid phase based hybridization capture is more commonly used, but it depends on the specific application scenario. Generally speaking, the medium and high density chips are captured by hybridization, and the low density chips are used by multiple amplifiers (generally<5K), which are suitable for the detection of important sites with large sample size, similar to KASP markers, but relatively more than KASP markers.


Target sequencing data indicators

The data quality of target area capture and sequencing is mainly evaluated by target area coverage, capture specificity (capture efficiency), target area coverage uniformity and other data indicators. Target area coverage refers to the proportion of target area that can be measured. The ideal situation is that the coverage of the target area is 100%. However, due to various factors such as GC content, sequence characteristics and copy number, the probe in some areas will affect the effect of the whole panel. To ensure the overall gene capture efficiency, the capture of about 0.1-3% of the area will be abandoned. Because there will be some mismatches in the base complementary pairing, the probe will also combine some non target regions with similar sequences while capturing the target region. The proportion of data falling in the target area to the total data is capture specificity (or capture efficiency). High capture efficiency means high utilization of sequencing data. There are many ways to improve the capture efficiency, such as optimizing the design method of probes, improving repetitive sequence closure components, connector closure components, and optimizing hybridization conditions, including buffer solution, hybridization process, rinsing preciseness, etc. The distribution of site depth in the target area conforms to Poisson distribution (approximate normal distribution). We generally use 20% average depth coverage or 50% average depth coverage to evaluate the data homogeneity. For data with good homogeneity, the depth distribution map will show very narrow peaks, that is, the coverage of 50% average depth will be relatively high; For data with poor homogeneity, the site depth distribution will be more discrete, showing a wide peak, and a large proportion of the site coverage depth will be lower than 50% of the average depth.



In addition, the consistency with the re sequencing data can also be used as one of the evaluation indicators.

MNP (mSNP) marking

In general, each SNP marker is designed with a pair of specific amplification primers to generate a SNP marker in the obtained amplicon, that is, one amplicon corresponds to one SNP marker. Therefore, in general, the detected SNPs form a single uniform distribution on the genome. In order to maximize the use of the information of DNA fragments obtained by amplification of each pair of primers, the Science and Technology Development Center of the Ministry of Agriculture and Rural Affairs and Jianghan University jointly developed a technology that can detect multiple SNPs in a single amplification, called poly single nucleotide polymorphism (mSNP or MNP).

Compared with the liquid chip with only one SNP for each amplification site, the liquid chip with mSNP has four improvements. One is that multiple SNP markers can be generated at each amplification site, that is, the polymer of multiple SNPs (mSNP), so that the number of detectable SNPs can be expanded by more than four times of the number of available SNPs. Second, multiple SNP markers in the same amplicon can form haplotype, which improves the efficiency of mutation detection. The third is to select the SNP with the largest sub allele frequency (MAF) from each mSNP (extender) to form a core marker. Fourthly, mSNP provides more sophisticated genetic variation detection, including variation within and between mSNP loci, and can be detected by haplotype and SNP respectively. MSNP technology not only greatly improves the utilization rate of markers, but also improves the accuracy and sensitivity of marker identification through "one point multiple markers".



MNP technology has filled the blank of DNA identification standard of substantive derived varieties in China; For the first time, we systematically analyzed the substantial derivative relationship of more than 10000 (30 million pairs) authorized rice and maize varieties in China, which is conducive to mastering the level of plant germplasm innovation in China; To realize the localization of plant variety DNA identification technology. Using the above technical achievements, we have developed national standards and technical specifications for DNA identification of plant varieties, constructed DNA fingerprint databases of more than 10000 authorized plant varieties, and implemented more than 1 million times of variety breeding, authorization, anti fraud and rights protection identification.


reference material

Gene Valley: How to achieve high-quality gene capture and sequencing?

MNP Marking Method for Plant Variety Identification, GB/T 38551-2020

Xu Yunbi et al., Targeted Sequencing Genotype Detection (GBTS) Technology and Its Application, China Agricultural Sciences 2020,53 (15): 2983-3004

Guo Z, Yang Q, Huang F, et al. Development of high-resolution multiple-SNP arrays for genetic analyses and molecular breeding through genotyping by target sequencing and liquid chip. Plant Commun. 2021; 2(6):100230.

Related recommendations

Breeding column | 7 - Targeted sequencing of genotyping

Targeted sequencing is a method to isolate, enrich and sequence a group of target genes or genome regions. This method enables researchers to focus time, cost and data analysis on specific regions of interest (target regions, genes), and use less data to obtain higher sensitivity and accuracy, so as to achieve rapid screening of mutation sites. These target regions usually include exome (the protein coding part of the genome), specific genes of interest (customized content), and target regions in genes or mitochondrial DNA.



Breeding column | 6-New genotyping technology: FBI seq

FBI seq (Foreground and Background Integrated genotyping by sequencing) is not sequencing by the US Federal Bureau of Investigation, but genotyping sequencing integrating foreground and background. As the name implies, this technology realizes the detection and selection of foreground genes and genetic background at the same time. The selection of foreground and background is two very important steps in molecular breeding. At present, breeders often need to carry out these two steps independently: first, screen foreground target sites, and then develop a large number of probes/bait/PCR primers to detect background genotypes. These time-consuming and costly preparations greatly delay the start of breeding projects.



Invitation | S371 Biobin Data Sciences invites you to participate in the 4th China North Seed Industry Expo!

Deeply cultivate the agricultural core, and work together for the future. In response to the call of the country to build a "China core" of seed industry, and to promote the joint construction of the "Northern Seed City", Phoenix Expo was officially upgraded to the "China Northern Seed Expo" on the basis of successfully holding three sessions of China Shandong International Vegetable Seed Expo, focusing on new varieties and new technologies such as field, vegetables, fruits and vegetables, horticulture, balcony agriculture, and combining cloud seed industry, cloud plant protection, cloud agricultural science platform, Build online and offline promotion and trading dual platforms for seed breeding. The 2022 Northern Species Expo will be linked with the 29th Shandong Double Trade Fair for Plant Protection, China Shandong International New Fertilizer Exhibition, and China (Shandong) New Agricultural Equipment Exhibition!



Special Column on Breeding | 5 - Low Depth Resequencing of Genotyping

The genotype detected by the method of high depth re sequencing is undoubtedly the most comprehensive, but at present, the cost of application in animal and plant breeding is too high, especially for those species with complex and huge genomes. As mentioned in the previous issue, researchers usually use a unique library construction method to carry out simplified genome sequencing (RAD seq), thereby reducing the cost of genotyping. However, the amount of simplified genome data is generally only 1~10% of the total genome, and a lot of information is still lost. Pool sequencing is also an effective way to reduce the cost of population research, but it cannot analyze individuals, which has little effect on animal and plant breeding.



Accelerate the application of artificial intelligence and lead the precision breeding BIOBIN

On October 13, the agricultural industry observed and planned a series of future agricultural activities. This activity focused on the theme of "the core future, the road to commercialization of biological breeding". It was hoped that through the interpretation and sharing of the breeding innovation layout of breeding companies and agricultural science and technology companies, new trends, industrialization and commercialization of biological breeding, the new picture, new business and new models of biological breeding would be revealed, and the possibility of future seed industry would be jointly looked forward to. Wu Xin, chief technology officer of Biocloud, participated in this activity and shared the theme of "accelerating the application of artificial intelligence and leading precision breeding". The following is a summary of the essence of the speech.



Special Column on Breeding | Solid Phase Chip for 4-Genotyping

The layman who hears "gene chip" for the first time can easily connect it with the electronic chip of industrial integrated circuit. In fact, except that they all use micro technology to make the appearance more similar, they have nothing to do with each other and are purely porcelain. Gene chip, also known as DNA chip, biochip or DNA microarray, is based on the principle of specific interaction between molecules, integrating discontinuous analysis processes on the surface of solid phase chips such as silicon or glass, to achieve accurate, rapid and large detection of cells, proteins, genes and other biological components. According to specific scientific research and application contents, gene chips can be subdivided into microarray comparative genomic hybridization (a-CGH) chips, microRNA chips, SNP chips, expression profile chips, DNA methylation chips and chromatin immunoprecipitation chips.