2D LC and Data Analysis Procedures
Mass spectrometry and tandem MS/MS
2D LC-MALDI Separation and Analysis Procedures
After iTRAQ labeling is complete, or for LC-MALDI MudPit experiments, 2D-LC separation of the tryptic peptides is carried out as follow:
The samples are dried down and resuspended in SCX loading buffer (Buffer A below).
SCX Separations are performed on a passivated Waters 600E HPLC system, using a 4.6 X 250 mm PolySULFOETHYL Aspartamide column (PolyLC, Columbia, MD) at a flow rate of 1 ml/min. Buffer A contains 10 mM ammonium formate, pH 3.6, in 20% acetonitrile/80% water. Buffer B contains 666 mM ammonium formate, pH 3.6, in 20% acetonitrile/80% water.
The gradient is Buffer A at 100% (0- 22 minutes following sample injection), 0%¿40% Buffer B (22-48 min), 40%¿100% Buffer B (48-49 min) 100% Buffer B isocratic (49-56 min), then at 56 min switched back to 100% A to re-equilibrate for the next injection. The first 28 ml of eluant (containing all flow-through fractions) are combined into one fraction, then 14 additional 2-ml fractions are collected. All 15 of these SCX fractions are dried down completely to reduce volume and to remove the volatile ammonium formate salts, then resuspended in 9 µl of 2% (v/v) acetonitrile, 0.1% (v/v) trifluoroacetic acid and filtered prior to reverse phase C18 nanoflow-LC separation.
For LC/MS/MS Analysis using the ABSciex TripleTOF 5600
Initial SCX fractionation is performed as described above, although ERLIC and high-pH RP separations are also used as first dimension separations for some sample types.
For analysis with the TripleTOF 5600, the 2nd dimension separation by low pH reverse phase nanoflow LC is performed by having 2mg of each SCX fraction autoinjected from a NanoLC AS-2 Autosampler (ABSciex/Eksigent) into an NanoLC-Ultra-2D Plus HPLC (ABSciex/Eksigent) using a 10 µl injector loop. Trap and elute mode is used to separate each SCX fraction using the microfluidics on a cHiPLC Nanoflex system equipped with a Trap Column (200 µm x 0.5 mm Reprosil-Pur C18-AQ 3 µm 120 Å) and a separation column (75 µm x 15 cm Reprosil-Pur C18-AQ 3 µm 120 Å). Buffer C was degassed 0.1% formic acid in water, and Buffer D was degassed 0.1% formic acid in acetonitrile. After loading the trap column with Buffer C 95% Buffer D 5%, elution and nanospray into the mass spec source was accomplished with the following 185 minute gradient (shorter gradients from 30-120 minutes are used for less complex samples), in which most peptides elute between 15 and 110 minutes: Buffer D continuing at 5% (0-1 minutes following sample injection), 5%-->35% Buffer D (1-155 min), 35%-->85% Buffer D (155-157 min), then isocratic 85% Buffer D (157-165 min), 85%-->5% Buffer D (165-166 min), then isocratic at the original 5% Buffer D start conditions (166-185 min) to re-equilibrate for the next injection.
Eluate is delivered into the ABSciex 5600 TripleTOF mass spectrometer with a NanoSpray III source and using a 10 mm id nanospray tip (New Objective, Woburn, MA).
Mass Spectrometer settings used vary depending on optimized conditions on each day, but typical values are curtain gas=25, Gas1=4-6, Gas2=0, an ionspray floating voltage around 2200, and a rolling collision energy voltage was used for CID fragmentation for MS/MS spectra acquisitions. Each cycle consisted of a TOF-MS spectrum acquisition for 250 ms (mass range 400-1250 Da), followed by information-dependant acquisition of up to 50 MS/MS spectra (50 ms each) of MS peaks above intensity 150 (TOF mass range 65-1600 Da) with a charge state between 2 and 5, taking 2.8 seconds total per full cycle. Once MS/MS fragment spectra were acquired for a particular mass, that mass was dynamically excluded for 6 seconds. Mass spectrometer recalibration was performed using a known beta-galactosidase digest prior to analysis of each fraction. Full instrument optimization was also performed at least once a week.
Second Dimension for MALDI Analysis:
For 2nd dimension separation by reverse phase nanoflow LC for subsequent MALDI analysis, each SCX fraction from above is autoinjected onto a Chromolith CapRod column (150 X 0.1 mm, Merck) using a 5 µl injector loop on a Tempo LC MALDI Spotting system (ABI-MDS/Sciex). Buffer C is 2% acetonitrile, 0.1% trifluoroacetic acid, and Buffer D is 98% acetonitrile, 0.1% trifluoroacetic acid.
The elution gradient starts at 95% C/ 5% D (2µl per minute flowrate from 0-3 min, switching to 2.5µl per minute at 3 min for the remainder of the gradient), changes from 5% D-->38% D (8.1-40 min), 38% D-->80% D (41-44 min), 80% D-->5% D (44-49 min) (initial conditions). A 3 µl per minute flow of MALDI matrix solution is added post-column (7 mg/ml recrystallized CHCA (a-cyano-hydroxycinnamic acid), 2 mg/ml ammonium phosphate, 0.1% trifluoroacetic acid, 80% acetonitrile).
The combined eluant is automatically spotted onto a stainless steel MALDI target plate every 6 seconds (0.55 µl per spot), for a total of 370 spots per original SCX fraction.
5800 MALDI TOF-TOF Mass Spec analysis:
After sample spot drying above, thirteen calibrant spots (ABI 4700 Mix) are added to each plate manually. MALDI target plates (15 per experiment) are analyzed in a data-dependent manner on an ABI 5800 MALDI TOF-TOF.
As each plate is entered into the instrument, a plate calibration/ MS Default calibration update is performed, and then the MS/MS default calibration is updated. MS Spectra are then acquired from each sample spot using the newly updated default calibration, using 500 laser shots per spot, laser intensity 3200 (this can change somewhat with laser age and tuning). A plate-wide interpretation is then automatically performed, choosing the highest peak of each observed m/z value for subsequent MS/MS analysis.
Up to 2500 laser shots at laser power 4200 are accumulated for each MS/MS spectrum, then analyzed as described below.
Proteomic Data Analysis
The combined MS and MS/MS spectra from all SCX or other first dimension fractions are analyzed by the ProteinPilot 4.5 Beta software (Build 1656, ABSciex as of April 2013; ProteinPilot 5.0 as of Fall, 2014) using the Paragon and Pro Group Algorithms (Shilov IV, Seymour SL, Patel AA, Loboda A, Tang WH, Keating SP, Hunter CL, Nuwaysir LM, Schaeffer DA. The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra (Mol Cell Proteomics. 2007 Sep;6(9):1638-55), is used to search against complete RefSeq databases from NCBI concatenated to a reversed sequence Decoy database derived from the same RefSeq database, plus a list of 536 common lab contaminants. Protein identifications are accepted if they have an estimated Local False Discovery Rate of less than 5%, which is a more stringent criterion than the often-used 1% Global False Discovery Rate. The Local False Discovery Rate estimation for each protein was calculated based on the accumulations of Decoy database hits using the Proteomics System Performance Evaluation Pipeline (PSPEP) algorithm (Tang, W.H., Shilov, I.V., and Seymour, S.L. A Non-linear Fitting Method for Determining Local False Discovery Rates from Decoy Database Searches, Journal of Proteome Research 2008 Sep;7(9):3661-7. Epub 2008 Aug 14.PMID: 18700793).
The resulting ProteinPilot .group files can be viewed with a Windows XP or later computer by installing a trial version of the ProteinPilot Software from http://www.absciex.com/products/software/proteinpilot-software - once the trial period is over, the software continues to work as a viewer for .group files produced by licensed versions of the software.
Older analyses were performed with earlier versions of ProteinPilot (version 3.0 prior to 2011, or version 2.01 prior to July 2009, from ABI/MDS-Sciex), or GPS Explorer software (ABI) and Matrix Sciences Mascot algorithm version 2.1, in either case searching the spectra against either full or RefSeq species-specific NCBInr databases (plus 536 common lab contaminants) concatenated with a reversed "decoy" version of itself (in Winter 2014 we are using the January 1 2014 versions of these FASTA databases, obtained from http://www.ncbi.nlm.nih.gov/sites/entrez?db=Taxonomy&cmd=search&term= , but we upload the latest database every 3-6 months). We occasionally the UniProt/SwissProt database plus decoy database (Uniprot from Jan 2013 concatenated with a reversed "decoy" version of itself in use Spring 2013).
For the predominantly used ProteinPilot analyses, the preset Thorough (iTRAQ or Identification) Search settings are used, and identifications must have a ProteinPilot Unused Score > 1.3 (>95% Confidence interval) in order to be accepted (click to download an Excel list of all modifications looked for with this Thorough setting). In addition, all protein IDs accepted MUST have a "Local False Discovery Rate" estimation of no higher than 5%, as calculated from the slope of the accumulated Decoy database hits by the PSPEP (Proteomics System Performance Evaluation Pipeline ) program by Sean Seymour and colleagues (Tang, W.H., Shilov, I.V., and Seymour, S.L. A Non-linear Fitting Method for Determining Local False Discovery Rates from Decoy Database Searches, Journal of Proteome Research 2008 Sep;7(9):3661-7. Epub 2008 Aug 14.PMID: 18700793)
Note that this Local or "Instantaneous" FDR estimate is much more stringent than p<0.05 or 95% confidence scores in Mascot, Sequest, ProteinPilot, or the aggregate False Discovery Rate estimations (number of Decoy database IDs/Total IDs at any chosen threshold score) commonly used in the literature, and combined with the ProGroup algorithm included in ProteinPilot gives a very conservative and fully MIAPE-compliant list of proteins identified (i.e., Mascot and other lists of "Proteins ID'd at p<0.05" will produce more numerous "significant" IDs from the same data, but those larger lists are highly likely to contain many more False Positive IDs). For additional discussion of False Discovery Rates and their estimation, please see "Calculating False Discovery Rates". For iTRAQ and LC-MudPit experiments analyzed with ProteinPilot, we recommend accepting all protein IDs with a LOCAL estimated FDR of 5% or lower.
For statistical analysis of quantitative iTRAQ experiments, we have modified and implemented the MatLab program WHATraq (Workflow for Hierarchical Analysis of iTRAQ datasets), as published in "A hierarchical statistical modeling approach to analyze proteomic isobaric tag for relative and absolute quantitation data " Zhou C, Walker MJ, Williamson AJ, Pierce A, Berzuini C, Dive C, Whetton AD. Bioinformatics. 2014 Feb 15;30(4):549-58. doi:10.1093/bioinformatics/btt722
To the original WHATRaq analysis, we have added Local FDR calculations (q-value calculation from p-values) for the quantitative aspects of the iTraq experiment, based on Storey JD and Tibshirani R. (2003) "Statistical significance for genome-wide studies". PNAS 100: 9440-9445.
The q value is similar to the well known p-value, except it is a measure of significance in terms of the False Discovery Rate (FDR) rather than the false positive rate - FDR is the most generally accepted multiple-testing correction for genomic and proteomic data where hundreds to thousands of simultaneous hypotheses are tested, and the Local FDR<0.05 we use as a threshold for significance is more conservative (fewer positives called) than the easier to calculate Global FDR<0.01, but not overly conservative (too few positives called) like the Bonferroni and other multiple-testing corrections.
Importantly, the Local FDR also gives an estimate of the likelihood that a particular protein is a False Discovery, unlike the Global FDR which gives an estimate only of the overall probability of finding false positives in an entire dataset above a certain score.
(For Mascot searches, parameters used are 50-100 PPM mass error tolerance for MS spectra, 0.4 Da MS/MS error tolerance, no missed cuts, fixed modifications of carbamidomethylation (and iTRAQ (lysine) and iTRAQ (NH-terminus) for iTRAQ experiments, and variable modifications of methionine oxidation and deamidation. Individual peptides have to be identified with an Ion Score Confidence Interval % of at least 90% in order to contribute to protein identifications and quantitation; protein identifications had to have a Total Ion Score Confidence Interval% of at least 95% to be considered significant.
For Mascot searches, you should never accept IDs with final score cutoffs low enough to produce a global false discovery rate of more than 5%, with more stringent score cutoffs more commonly used to keep the global false discovery rate below 1%-2% for Mascot searches (global false discovery rate (FDR) calculation based on 2x the number of identifications from the reversed decoy portion of the concatenated database at any score cutoff value)).