Q: What factors can affect the quality of sequencing results?
A: (1) Individual heterozygosity: The higher the individual heterozygosity, the more difficult the splicing is, and may even lead to the splicing of the sequence.
(2) Polymorphism of the species genome: Since the individuals of some species are too small, the amount of genomic DNA extracted by a single individual may be difficult to meet the sequencing requirements (such as some parasites), so it is necessary to mix multiple individuals for genomic DNA pumping. Used for sequencing. For such cases, the polymorphism of the genome of the species needs to be assessed, and if the polymorphism of the genome is too high, it will affect the splicing of subsequent genomes.
(3) Quality of DNA samples: For bacteria and fungi, the source of the sample must be single colony free of contamination, and the animal and plant samples should be as homozygous as possible, and there is no pollution, otherwise the quality of the sequencing results will be seriously affected. In addition, the prepared genome can not be less than 23Kb. If the fragment is too small, small fragments are easily lost during the process of genomic fragmentation, resulting in the complete sequencing of the constructed sequencing library, which has a significant impact on the sequencing results.
(4) In addition, if the GC content of some regions of the genome is too high (GC% ≥ 65%), the bias will occur during the sequencing process, resulting in too low coverage in some areas, thus affecting subsequent splicing and annotation.
(5) For species with too many repeats, the presence of a large number of repeats creates many erroneous overlaps, causing the contigs produced by splicing to be too short, resulting in severe deviations in the results.
Q: How is the genome assembled?
A: In general, the assembly strategy based on Roche 454 FLX+ sequencing results is as follows:
(1) First use the short sequence assembly software to de novo sequencing splicing the paired-end data and assembling it into contigs. This stage generally needs to provide paired-end sequencing data with high coverage, which requires a lot of computer memory, which is also the most genome assembly. a difficult step;
(2) Gradually add the mate-pair data of the long insert to build the scaffold. In general, the sequencing depth of the mate-pair is not too high. The contigs are connected to a larger scaffold by the mate-pair double-end distance information.
(3) Review the paired-end and mate-paired insert length information to fill the gap;
(4) Sometimes adding Sanger data will greatly help fill gaps and extend contigs.
Q: What are the common contents of comparative genomic analysis?
A: Comparative genomics refers to the comparison of known genes and gene structures based on genomic maps and sequence analysis to understand the functions of genes, the mechanisms of expression regulation and the evolution of species. Generally include the following aspects:
(1) Pairwise genome alignment with closely related species. The sequence and structural homology between the two genomes can be utilized to map the genes in the other genome by mapping information of the known genome, thereby revealing the potential function of the gene and the changes in the internal structure of the genome.
(2) Multi-genomic alignment with closely related species. When sequence comparisons are made between two or more genomes, the evolutionary relationship of the sequences in the phylogenetic tree is essentially obtained. The increase in genomic information makes it possible to study molecular evolution and gene function at the genomic level. By studying a variety of biological genomic data and its vertical evolution and horizontal evolution process, we can understand the structure and regulation of genes.
De Novo sequence is also called de novo sequencing, and a species can be sequenced without any genetic sequence information. The sequence is spliced and assembled by bioinformatics analysis to obtain the genomic sequence map of the species. It is currently widely used to denovo sequencing analyze the genomic sequence, gene composition, and evolutionary characteristics of unknown species.
No comments:
Post a Comment