With the cost of sequencing and detection decreasing contiously , it will be cheaper to detect a material's genotype than its phenotype in the future, and it can be expected that a large amount of genotype data will be generated. Commercial breeding projects cultivate thousands of new materials every year, and each material contains tens of thousands of genotype loci. Therefore, tens of millions or even hundreds of millions of data will be generated every year. Conventional relational databases may be difficult to effectively manage massive genotype data. BioCloud aims to build a genotypic big data system to support the storage and management of massive (PB level) markers and genotypic data.

Brief Introduction

SNP markers are widely used in plant genetic map construction, whole genome association analysis, genetic diversity research, germplasm identification and variety distinguishment, whole genome prediction and other fields. The backend of BioCloud's genetic big data system adopts a distributed database architecture, which can easily manage massive data. The system also has built-in common genotype analysis tools to easily complete professional biological information analysis.


  • More massive distributed storage based on big data cluster can store 10 billion level SNP genotype data with high scalability, high reliability and high performance.
  • FasterFast retrieval in seconds, easy to view the genotype of the specified regions, and the retrieval speed will not slow down with the increase of data volume.
  • More professionalIntegration of public library resources and effective variety comparison. Integrating professional bio-infomatics analysis tools to improve the target range and design depth in breeding.
  • More cutting-edgeThe system integrates the whole genome prediction module for more professional model prediction, and can also manage and analyze the genotype data of different species and the data of multiple reference genome versions under the same species.


  • Promotion Decision and Tracking
  • Character stability analysis and character correlation analysis
  • Raw data correction
  • Data visualization
  • Regional adaptability analysis
  • Field thermodynamic diagram
  • Variety test report