Information courtesy of Roche 454 Life Sciences
Titanium Shotgun DNA Sequencing
Whole genome sequencing allows sequencing and structural variation detection of viruses, bacteria and other microbes, fungi, animals and plants, whole genome amplified or randomly amplified DNA, purified DNA such as BACs, fosmids and plasmids.
Shotgun Sample Requirements
The procedure requires 500 ng of sample DNA but we request at least 1 μg in case of problems. Ideally the DNA should be checked to ensure it is derived from the target organism and contains no other contaminating DNA.
At a minimum, the DNA sample should meet the following criteria:
- DNA must be double stranded
- DNA should not be from an amplified library (which may compromise representativity)
- DNA should not be degraded and should contain no particulate matter
- Input DNA size should be in pieces >1.5kb
- OD260/280 should be approximately 1.8
- Concentration should be 50 ng/μl or above, in TE
Because DNA quantitation using OD260 is variable and dependent upon DNA purity, the sample DNA concentration should be checked by fluorometry or gel electrophoresis on a 12% agarose gel using a DNA Mass Ladder.
A variation on the standard procedure allows the preparation of ssDNA libraries from Low Molecular Weight DNA samples. LMW DNA is a sample comprised of fragments in the 70500 bp range, such as short sequence tags, PCR products, cDNA derived from microRNAs, etc.
Titanium Paired End Sequencing
Titanium Paired End Sequencing allows the sequencing of DNA samples in paired ends separated by either approximately 3 kb, 8 kb or 20 kb. Paired End data is typically used to help order and orient contigs from a Shotgun sequencing project, this is also referred to as scaffolding. Therefore, Paired End sequencing is usually performed in parallel with the sequencing of a general library, prepared from the same DNA sample that was used to make the shotgun library but requires the preparation of a separate Paired End library. Such a Paired End library is composed of paired tags that are a known (approximate) distance from each other in the original sample. When the two tags of a pair can be mapped to two different assembly contigs, they specify the contigs order, orientation and approximate distance in the initial sample DNA.
Paired End Sample Requirements
The quality and quantity of the DNA sample are critical to the success of this procedure. Any contamination or degradation of the starting material will be directly reflected in the quality of the output library.
The procedure requires 5 μg of input DNA for making a 3 kb span, 15 μg of input DNA for making a 8 kb span and 30 μg of input DNA for making a 20 kb span Paired End library. Usage of lesser DNA amounts could result in poor quality or underperforming Paired End libraries. Ideally, the DNA should be checked to ensure it is derived from the target organism and contains no other contaminating DNA.
At a minimum, the DNA sample should meet the following criteria:
- DNA must be double stranded
- DNA should preferably not be the result of any amplification (which may compromise representativity)
- DNA should not be degraded and should contain no particulate matter
- Input DNA size should be in pieces >15 kb for a 3 kb span, >24 kb for a 8 kb span and >60 kb for a 20 kb span Paired End library
- OD260/280 should be approximately 1.8
- Concentration should be at the minimum 200 ng/μl, in 10 mM TrisHCL, pH 7.58.5, or Molecular Biology Grade water, not TE buffer, as higher concentrations of salt will alter the shearing characteristics of the sample
DNA quantitation using OD260 is not reliable. Input DNA concentrations must be verified by fluorometry. Other methods of quantitation often overestimate sample concentration, resulting in an inadequate amount of starting material. This could lead to poor library yield and reduced quality.
Titanium Amplicon DNA Sequencing
The processing and sequencing of amplicons is quite flexible and allows for a wide range of
experimental design. A researcher can choose a variety of options regarding design
parameters, such as the length of amplicons, the number of amplicons pooled together, the
number of reads desired for a given amplicon pool, and whether to read from the A end, the B
end, or both. Although the setup for a given experiment will depend on the specific project
goals, there are a number of general guidelines that will ensure the best possible result.
- The highest confidence in low frequency variation will result from bi-directional reads.
- A high-fidelity polymerase must be used in the amplicon generation step. Use of a low fidelity
polymerase will result in many amplification induced variations in the sequence.
Although there are many choices of enzyme, Roche's FastStart High Fidelity PCR
System has high fidelity coupled with robust amplification of a wide array of input
templates.
- Greater confidence in results may be achieved by running replicates of the biological
material through the sequencing process and comparing the results.
- The level of multiplexing should be determined by:
- Number of amplicons of interest
- Desired sensitivity or depth of coverage
- When sequencing mixtures of multiple amplicons, care must be taken in quantification
and pooling of amplicons as equimolar mixtures will generate best results.
- Forward and reverse reads will eliminate most systematic, context-dependent
sequencing errors.The ideal experiment has reads covering the amplicon forward and reverse.
The number of amplicons that can be combined in an experiment, while theoretically unlimited,
is primarily determined by the desired sensitivity of detection.
These following guidelines are a general, though conservative, aid to help determine the level of
oversampling required for a desired level of detection. The following guidance accommodates
experimental realities such as variation in quantitation, pooling, amplification or sequencing
efficiencies of MID labeled amplicons, and amplification efficiencies of long versus short
amplicons.
- Heterozygote detection 40x coverage
- 5% variation of single base changes and multibase deletions 1000x coverage (good statistical chance for 50 variation reads)
- 1% variation of single base changes and multibase deletions 5000x coverage (good statistical chance for 50 variation reads)
- Single-base indels may require additional depth
Amplicon Sample Requirements
The quality and quantity of the DNA sample are critical to the success of this procedure. Any contamination in the starting material will be directly reflected in the output library.
Since it includes an amplification step, this procedure requires less input DNA than the other DNA library preparation procedures, though the amount required will depend on the nature of the experiment. For example, if searching for low abundance sequence variants out of a complex sample (such as genomic DNA), one should start with 520 ng of DNA. If the starting material is cloned into a plasmid or is a PCRgenerated DNA fragment, 12 ng is usually sufficient.
Ideally the DNA should be checked to ensure it is derived from the target organism and contains no other contaminating DNA.
At a minimum, the DNA sample should meet the following criteria:
- DNA should not be degraded, and should contain no particulate matter
- Input DNA size should be sufficient to allow amplification of the target
- OD260/280 should be approximately 1.8
- Concentration should be 5 ng/μl or above, in TE (0.5 ng/μl for cloned or PCRgenerated targets)
Because DNA quantitation using OD260 is variable and dependent upon DNA purity, the sample DNA concentration should be checked by fluorometry or gel electrophoresis on a 12% agarose gel using a DNA Mass Ladder.
The procedure for preparing a DNA sample for Amplicon Sequencing consists of a simple PCR amplification, but requires special Fusion Primers, which must be designed by the user according to the specific requirements of the experiment. Amplicon fusion primers must contain a directional GS FLX Titanium Primer A or Primer B sequence (which includes a fourbase library key sequence) at the 5prime portion of the oligonucleotide in addition to the template specific sequence at the 3prime end. An optional Multiplex Identifier MID sequence may be added between the Primer A (or Primer B) and template specific sequences to allow for automated software identification of samples after pooling or multiplexing and sequencing (also referred to as barcoding)
The choice of appropriate PCR primers for the generation of the Amplicon library is critical for a successful experimental design, as studies aimed at identifying and quantitating sequence variants can only be as accurate and unbiased as the original amplification.
The 5'part of each Fusion Primer, Primer A and B, is always the same, as dictated by the requirements of the Genome Sequencer System. The requirements are as follows:
- Length: 25 nt
- Sequence:
- Primer A: 5' CGT ATC GCC TCC CTC GCG CCA TCAGMID template specific sequence 3'
- Primer B: 5' CTA TGC GCC TTG CCA GCC CGC TCAGMID template specific sequence 3'
Functions: - bind to the DNA Capture Beads of GS emPCR Kits
- anneal the Amplification Primers of GS emPCR Kits
- anneal the Sequencing Primers of GS em PCR Kits
- end with the sequencing key TCAG used by the system's software for base calling and to recognise legitimate library reads.
The 3'part of each Fusion Primer is specific to each Amplicon:
- Length: typically 2025 nt (may vary)
- Sequence: specific to each side of the desired Amplicon
Functions: - anneal to either side of the target to be sequenced
- serve as PCR amplification primer, during library preparation
- start the sequence of the reads, since they are fused directly to the sequencing key from Primer A or Primer B
- used by the Amplicon Variant Analyzer software to assign reads to corresponding Amplicons
- Design considerations:
- The normal constraints of PCR primer specificity and annealing conditions apply.
- It is recommended that the total length of the amplified products (including Fusion Primers) are between 200 and 600 bp.
- Total amplicon length should be less than 800 bp to facilitate high quality sequencing.
- When possible, design amplicons to cover the sequence of interest within the first 400 bp of sequencing; i.e., the first 400 bp after the adaptor sequence but including the key and both MID sequences (if applicable).
Multiplexing
There are many methods to segregate samples to maximize the throughput from a single sequencing run. These include separating the samples physically (loading samples in different regions of the Pico Titer Plate PTP gasket), coded separation using multiplex identifiers (MIDs) or a combination of the two. If employed, MID sequences should be used in both the A and B Fusion Primers. Using different MIDs in each of the two Fusion Primers will enable a broad range of multiplexing possibilities up to 196fold with 14 MIDs on each end. In all cases, bidirectional sequencing should be employed.
While other barcode sequences may be incorporated, we recommend using MIDs from the Standard 454 set in the table below (more MID sequences can be provided on request). These 10mer sequences have been carefully engineered to avoid misassignment of reads and are tolerant to several errors, such as those often introduced during primer synthesis.
These 14 MID sequences have been preloaded in the Amplicon Variant Analyzer Software developed by 454 to analyze data from Amplicon library sequencing.
| ID |
MID Sequence |
ID |
MID Sequence |
| MID1 |
ACG AGT GCG T |
MID8 |
CTC GCG TGT C |
| MID2 |
ACG CTC GAC A |
MID9 |
TAG TAT CAG C |
| MID3 |
AGA CGC ACT C |
MID10 |
TCT CTA TGC G |
| MID4 |
AGC ACT GTA G |
MID11 |
TGA TAC GTC T |
| MID5 |
ATC AGA CAC G |
MID12 |
TAC TGA GCT A |
| MID6 |
ATA TCG CGA G |
MID13 |
CAT AGT AGT G |
| MID7 |
CGT GTC TCT A |
MID14 |
CGA GAG ATA C |
cDNASequencing
Transcriptome sequencing is a term that encompasses experiments including mRNA transcriptexpression
analysis (fulllength mRNA, expressed sequence tags (ESTs) and ditags), novel gene
discovery, gene space identification in novel genomes, assembly of fulllength genes, single nucleotide
polymorphism (SNP), insertiondeletion and splicevariant discovery, as well as analyses of
allelespecific expression and chromosomal rearrangement. The combination of long, accurate reads
and high throughput makes 454 Sequencing analysis on the Genome Sequencer FLX ideally suited to
detailed transcriptome investigation.
Guidelines to cDNA Sequencing
The number of transcriptome sequencing runs required depends on several factors, including
the following:
- The purpose of the experiment
- The completeness of mRNA species in the sequenced sample
- The estimated gene count
- The sequence yield (total number of reads) per run
- The average length of raw reads
- The normalization of cDNA library
- The quality of the mRNA sample
We recommend the following equation to estimate the required number of runs:
Total number of bases required (Mb) = (Estimated gene count in the sample to be sequenced) x 40,000
Or, for every 8,000 genes in a sequenced RNA sample, we recommend one full GS FLX
Titanium sequencing Run (approximately 1 million reads with an average length of 350 bases).
To obtain wider dynamic range in read count, add appropriate number of runs as desired.
Usage of pooled and normalized cDNA libraries is recommended for gene
discovery and genome annotation. Normalized samples are likely to produce higher
sensitivity and better gene coverage than non-normalized cDNA libraries.
Non-normalized cDNA libraries are used for measuring relative gene expression
using read counts.
RNA Sample Requirements
The sample RNA should be:
- total amount of RNA > 200 ng
- quantitated by Ribogreen
- sample volume > 19 μl
- pure (OD 260/280 > 1.8)
- DNA free
- enriched in RNA of interest. For example, if the RNA of interest is mRNA, remove ribosomal RNA
prior to proceeding with the procedure
- quality assessed on an RNA 6000 Pico Chip on the Agilent 2100 Bioanalyzer instrument. A typical
RNA sample will produce a smear that ranges from 0.2 kb to 7 kb.
This protocol is not designed for
preparing small RNA molecules, for example snoRNA, microRNA, tRNA, etc,.
Samples and completed 454 sequencing request forms can be dropped off in the basket provided in the Biochemistry reception (Sanger Building) or posted to the facility at the following address:
FAO: Mrs Shilo Dickens
Department of Biochemistry
University of Cambridge
DNA Sequencing Facility
80 Tennis Court Road
Old Addenbrookes Site
Cambridge
CB2 1GA
Sequencing requests from anywhere other than Biochemistry must be accompanied by a purchase order and a VAT exemption form where relevant.
 |
 |
|
|
|