Special Column on Breeding | 3 - Development and Characteristics of Molecular Markers
Molecular markers are the basis of molecular breeding
Today, let's learn about the types and characteristics of various molecular markers together with Xiao Bian
First, what are molecular markers?
Under the popular understanding, in our daily life, we usually need to find a landmark building to get a location when going to a certain place. Similarly, the search and location of genes or sites distributed on genome chromosomes also need to be guided by markers, so molecular markers are identified by nucleotides and other molecules. Polymorphism is the most important feature of markers. As the name implies, polymorphism is a variety of states, that is, diversity. A gene/locus can be distinguished and compared only when it is in different states and has differences in different materials. That is the meaning of molecular markers.
Before the use of molecular markers, genetic markers experienced the development from morphological markers (such as height, short, fat and thin), cytological markers (such as chromosome number and shape), and biochemical markers (such as serum proteins, isoenzymes, etc.). However, these markers can not directly reflect the characteristics of genetic material, they are only indirect reflection of genetic material, and are vulnerable to environmental impact, so they have great limitations.
Molecular markers directly reflect the genetic variation at the DNA level, which can be inherited stably, with large amount of information and high reliability, and eliminate the impact of the environment. Now there are dozens of molecular marker technologies, which have been widely used in crop genetics and breeding, such as genome genetic map construction, gene location and cloning, plant genetic relationship identification, germplasm gene bank construction, etc.
Ideal molecular markers usually have the following characteristics:
High level polymorphism;
Codominant inheritance (homozygous or heterozygous state of a certain locus can be distinguished);
High frequency in genome;
It is evenly distributed in the genome;
It has neutral selection feature;
Easy to obtain and quick to analyze;
The cost of separating marks is low.
At present, it is generally believed that molecular markers have experienced the development of three generations of technology. Representative molecular markers include RFLP, AFLP, SSR, SNP, InDel, etc.
The first generation of molecular markers
The first generation of molecular markers is based on molecular hybridization design. The most typical representative types of molecular markers, such as restriction fragment length polymorphism (RFLP), are designed based on Southern hybridization.
The basic principle is that specific restriction endonucleases are used to identify and cut (digest) genomic DNA of different biological individuals. Because of changes in base exchange, rearrangement, deletion and other changes between alleles of different individuals, the restriction endonuclease recognition sites change, resulting in differences in the length of restriction fragments between genotypes. These fragments were electrophoretic, transmembrane, denatured, hybridized with labeled probes, and washed to analyze their polymorphism results.
RFLP is widely used in gene mapping, chromosome structure and function exploration in the early stage. Most of the markers were codominant, and had good repeatability and stability. However, the technology is not mature enough, with low polymorphism, high requirements for DNA quality, complex technology and limited use. RFLP, as the first low-cost DNA typing technology, is basically outdated now.
Second generation molecular markers
The second generation of molecular markers was developed and designed based on PCR. According to the characteristics of primers (random or specific) and whether they combine with restriction enzymes, they can be divided into many different types of markers.
The classification is as follows:
|Based on PCR and restriction enzyme digestion
|With random primer PCR as the core: random DNA amplification polymorphism (RAPD), simple sequence repeat interval region (ISSR), random amplified microsatellite polymorphism (RAMPs).
|Based on the selective amplification of restriction enzyme digestion fragments, the polymorphism of fragment length is displayed: amplified fragment length polymorphism (AFLP), cDNA AFLP.
|With specific primer PCR as the core: simple sequence repeat (SSR), related sequence amplification polymorphism (SRAP), target region amplification polymorphism (TRAP).
|The polymorphism of PCR amplification region is displayed based on restriction digestion of PCR amplification fragment: restriction digestion amplification polymorphism sequence (CAPS), dCAPS.
Compared with the first generation, the second generation molecular marker technology is more advanced, has higher polymorphism, requires less DNA, but costs more. In the early stage, it was widely used in assisted breeding, genetic map construction, systematic evolution and genetic diversity analysis.
Some scholars only regard SSR as the representative of the second generation of molecular markers and other markers such as RAPD and AFLP as the first generation of molecular markers, which is irrelevant. Based on the above tag names, we can roughly determine the technologies used for each tag, so we will not introduce them here. On the time node of application, SSR is unique, and most of the time when the minor editor goes to graduate school is marked by SSR. I believe many teachers' laboratories still use this mark, so let's talk more about it here.
SSR, also known as microsatellite marker, is a kind of DNA sequence composed of several (1-6) base repeats in series, and the length is generally within 100 bp. The common forms are (TG) n, (GA) n, (AAT) n or (GACA) n, etc. In plant genomes, (AT) n is the most common. This motif widely exists in eukaryotic and partial prokaryotic genomes, randomly distributed in nuclear DNA, chloroplast DNA and mitochondrial DNA.
The principle of SSR is that most of the microsatellite DNA is highly conserved single copy sequence on both sides, so primers can be designed for PCR amplification according to the two side sequence, and then the size of PCR products can be detected by gel electrophoresis. The high polymorphism of SSR was mainly due to the different number of tandem repeats of the core sequence.
Features of SSR marking:
|Co dominant markers can identify heterozygotes and homozygotes; The quantity of DNA samples is small, and the requirements for DNA quality are not high; Good repeatability and high reliability; There are a lot of allelic differences and rich polymorphism.
|The DNA sequence information at both ends of the repeated motif must be known. If it is not possible to query directly from the DNA database, it must be sequenced first, and then designed primers, so the development cost is high.
Third generation molecular markers
The third generation of molecular markers is based on nucleic acid sequences. With the development of DNA sequencing technology, the third generation molecular markers represented by single nucleotide polymorphism (SNP) have rapidly developed into the mainstream. SNP is the difference caused by the variation of a single nucleotide at a certain site of the genome.
Features of SNP tag:
|Large quantity, wide and uniform distribution; High stability; Co dominance; It is suitable for rapid and large-scale screening.
|If the method of sequencing or DNA chip hybridization is used, the cost is higher.
Although a large number of SNP markers have emerged, it is still a difficult problem to classify SNP in a continuous large sample. Classical SNP typing methods include design of restriction endonuclease markers (CAPS, dCAPS) and Sanger generation sequencing. The defects are long cycle and high cost. To solve this problem, researchers have developed many SNP typing techniques in recent years, such as competitive allele specific PCR (KASP), targeted sequencing genotyping (GBTS), Hyper seq, etc. However, it is quite difficult for ordinary laboratories at the scientific research end to purchase supporting instruments and equipment; For the application side enterprises, cost is still a big problem.
At present, there are at least 20 SNP genotyping methods, including traditional methods based on gel and new methods based on high-throughput automation.
The following sections are listed here:
- Enzyme digestion amplification polymorphism sequence method: CAPS, PCR-RFLP
- Allele specific PCR: AS-PCR
- Single strand conformation polymorphism method: SSCP
- Denatured high performance liquid chromatography: DHPLC
- High resolution dissolution curve: HRM
- Mass spectrometry: MAL-DI-TOF-MS
- Gene chip technology: Illumina, Affymetrix
- Competitive allele specific PCR: KASP
- Simplified genome sequencing: GBS, RADseq, Hyper seq
- Targeted sequencing: GBTS, MNP
- Whole genome re sequencing: WGS, LcWGS
The comparison of several common SNP detection methods is shown in the following table:
High throughput, high accuracy and low cost genotyping are still the key limiting factors for the wide production and application of SNP. Later, we will focus on several mainstream genotyping strategies with relatively low cost.
The following table summarizes and compares the characteristics of several representative DNA molecular markers in the 1-3 generations of markers
|Reproducibility and reliability
Other mark classifications
Of course, in addition to DNA, there are other types of molecular markers, such as protein, small metabolite molecules and other markers. In addition to classification by different types of molecules or omics, it can also be classified by location, function, etc.
MRNA based: Expression Sequence Tags (EST)
By location: physical markers
By function: function mark
Non nuclear markers: such as mtDNA
This is not the focus of our attention and will not be repeated.
Well, let's see the development and characteristics of molecular markers. In the next issue, we will introduce several current mainstream and common genotyping techniques. Bye~~~
Targeted sequencing is a method to isolate, enrich and sequence a group of target genes or genome regions. This method enables researchers to focus time, cost and data analysis on specific regions of interest (target regions, genes), and use less data to obtain higher sensitivity and accuracy, so as to achieve rapid screening of mutation sites. These target regions usually include exome (the protein coding part of the genome), specific genes of interest (customized content), and target regions in genes or mitochondrial DNA.
FBI seq (Foreground and Background Integrated genotyping by sequencing) is not sequencing by the US Federal Bureau of Investigation, but genotyping sequencing integrating foreground and background. As the name implies, this technology realizes the detection and selection of foreground genes and genetic background at the same time. The selection of foreground and background is two very important steps in molecular breeding. At present, breeders often need to carry out these two steps independently: first, screen foreground target sites, and then develop a large number of probes/bait/PCR primers to detect background genotypes. These time-consuming and costly preparations greatly delay the start of breeding projects.
Deeply cultivate the agricultural core, and work together for the future. In response to the call of the country to build a "China core" of seed industry, and to promote the joint construction of the "Northern Seed City", Phoenix Expo was officially upgraded to the "China Northern Seed Expo" on the basis of successfully holding three sessions of China Shandong International Vegetable Seed Expo, focusing on new varieties and new technologies such as field, vegetables, fruits and vegetables, horticulture, balcony agriculture, and combining cloud seed industry, cloud plant protection, cloud agricultural science platform, Build online and offline promotion and trading dual platforms for seed breeding. The 2022 Northern Species Expo will be linked with the 29th Shandong Double Trade Fair for Plant Protection, China Shandong International New Fertilizer Exhibition, and China (Shandong) New Agricultural Equipment Exhibition!
The genotype detected by the method of high depth re sequencing is undoubtedly the most comprehensive, but at present, the cost of application in animal and plant breeding is too high, especially for those species with complex and huge genomes. As mentioned in the previous issue, researchers usually use a unique library construction method to carry out simplified genome sequencing (RAD seq), thereby reducing the cost of genotyping. However, the amount of simplified genome data is generally only 1~10% of the total genome, and a lot of information is still lost. Pool sequencing is also an effective way to reduce the cost of population research, but it cannot analyze individuals, which has little effect on animal and plant breeding.
On October 13, the agricultural industry observed and planned a series of future agricultural activities. This activity focused on the theme of "the core future, the road to commercialization of biological breeding". It was hoped that through the interpretation and sharing of the breeding innovation layout of breeding companies and agricultural science and technology companies, new trends, industrialization and commercialization of biological breeding, the new picture, new business and new models of biological breeding would be revealed, and the possibility of future seed industry would be jointly looked forward to. Wu Xin, chief technology officer of Biocloud, participated in this activity and shared the theme of "accelerating the application of artificial intelligence and leading precision breeding". The following is a summary of the essence of the speech.
The layman who hears "gene chip" for the first time can easily connect it with the electronic chip of industrial integrated circuit. In fact, except that they all use micro technology to make the appearance more similar, they have nothing to do with each other and are purely porcelain. Gene chip, also known as DNA chip, biochip or DNA microarray, is based on the principle of specific interaction between molecules, integrating discontinuous analysis processes on the surface of solid phase chips such as silicon or glass, to achieve accurate, rapid and large detection of cells, proteins, genes and other biological components. According to specific scientific research and application contents, gene chips can be subdivided into microarray comparative genomic hybridization (a-CGH) chips, microRNA chips, SNP chips, expression profile chips, DNA methylation chips and chromatin immunoprecipitation chips.