MS²PIP Server

MS² Peak Intensity Prediction

MS²PIP is a tool to predict MS² signal peak intensities from peptide sequences. It employs the XGBoost machine learning algorithm and is written in Python.

You can install MS²PIP on your machine by following our extended install instructions found on the MS²PIP GitHub repository. For a more user friendly experience, we created this web server. Below, you can easily upload a list of peptide sequences, after which the corresponding predicted MS² spectra can be downloaded in multiple file formats.

More advanced users can also access MS²PIP Server through our RESTful API. We provide Swagger-generated API documentation and an example Python script to contact the API.

If you use MS²PIP for your research, please cite the following papers:

  • Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research https://doi.org/10.1093/nar/gkz299
  • Degroeve, S., Maddelein, D., & Martens, L. (2015). MS²PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Research, 43(W1), W326–W330. https://doi.org/10.1093/nar/gkv542
  • Degroeve, S., & Martens, L. (2013). MS²PIP: a tool for MS/MS peak intensity prediction. Bioinformatics (Oxford, England), 29(24), 3199–203. https://doi.org/10.1093/bioinformatics/btt544

How to

PEPREC file

MS²PIP takes a PEPREC (Peptide Record) file as an input. This is a space, tab, colon, or semicolon-separated file that lists all peptides. To run our server smoothly, we limit the number of peptides to 100.000. If you need to predict more peptide spectra, we recommend you to split-up your dataset into multiple batches, or to download MS²PIP from GitHub and run it locally.

A PEPREC file contains the following columns:

  • spec_id: A unique ID for the peptide.
  • peptide: Peptide sequence.
  • modifications: PTMs for the given peptide. Every modification is listed as location|name, separated by a pipe (|) between the location, the name, and other PTMs. The location is an integer counted starting at 1 for the first AA. 0 is reserved for N-terminal modifications. Name has to correspond to a preset or custom PTM (see below) . Unmodified peptides are marked with a hyphen (-).
  • charge: Precursor charge of the peptide.

Optionally, a protein_list column can be provided with proteins formatted as "['example_protein_1', 'example_protein_2']". The provided proteins will then be written in the Comment field of the MSP file.

Example of a PEPREC file:

spec_id modifications peptide charge
peptide1 - ACDE 2
peptide2 2|Carbamidomethyl ACDEFGHI 3
peptide3 0|iTRAQ|10|Oxidation ACDEFGHIKMNPQ 2

Allowed filename extensions for the PEPREC file are: .peprec, .csv, .tsv and .txt.

PEPREC files can be created from, for example, Excel, by exporting the table to a .CSV file. On the MS²PIP GitHub repository, we also provide a host of Python scripts to convert common search engine output files to a PEPREC file.

Modifications

A list of all modifications and the corresponding mass shifts is required for MS²PIP to properly calculate the fragmentation peak m/z values. Even though we provide specialized models for certain modififications, the specific modification info you provide here and in the PEPREC file does not influence the predicted peak intensites. It is only used to calculate m/z values.

You can select some preset modifications below or provide your own list. For the preset modifications, we use the PSI-MS names and monoisotopic mass shifts from Unimod. This means that, if you use these preset modifications, the modification names in your PEPREC file need to match the Unimod PSI-MS names. If MS²PIP encounters a modification in the PEPREC file that is not provided in the modifications list, it will skip that peptide.

If you provide your own list of modifications, each line can only contain one modification, with the following comma-separated properties:

  • Modification name, as used in the PEPREC file
  • Monoisotopic mass shift
  • Amino acid one-letter code, N-term or C-term

If a certain modification occurs on different amino acids, every modification-amino acid combination should have it's own entry and have a unique name (eg PhosphoS, PhosphoT and PhosphoY or TMT6plex and TMT6plexN). N- and C-terminal modifications can be added in the same way, but require N-term or C-term instead of an amino acid code.

Example of a custom modification list:

Oxidation,15.994915,M
Carbamidomethyl,57.021464,C
PhosphoS,79.966331,S
PhosphoT,79.966331,T
PhosphoY,79.966331,Y
iTRAQ,144.102063,N-term

Models

MS²PIP currently supports the models listed in the table below. Always take note of the MS²PIP version and model versions you use and mention these in your publications. The current MS²PIP version is v20190312.

Model Current model version Train-test dataset (unique peptides) Evaluation dataset (unique peptides) Median Pearson correlation on evaluation dataset
HCD2019 v20190107 MassIVE-KB
(1 623 712)
PXD008034
(35 269)
0.903786
HCD2021 v20210416 [Combined dataset] (520 579) PXD008034
(35 269)
0.932361
CID v20190107 NIST CID Human
(340 356)
NIST CID Yeast
(92 609)
0.904947
iTRAQ v20190107 NIST iTRAQ
(704 041)
PXD001189
(41 502)
0.905870
iTRAQphospho v20190107 NIST iTRAQ phospho
(183 383)
PXD001189
(9 088)
0.843898
TMT v20190107 Peng Lab TMT Spectral Library
(1 185 547)
PXD009495
(36 137)
0.950460
TTOF5600 v20190107 PXD000954
(215 713)
PXD001587
(15 111)
0.746823
HCDch2 (including b++ and y++ ions) v20190107 MassIVE-KB
(1 623 712)
PXD008034
(35 269)
0.903786 (+) and 0.644162 (++)
CIDch2 (including b++ and y++ ions) v20190107 NIST CID Human
(340 356)
NIST CID Yeast
(92 609)
0.904947 (+) and 0.813342 (++)
Immuno-HCD v20210316 [Combined dataset] (460 191) PXD005231 (HLA-I)
(46 753)
PXD020011 (HLA-II)
(23 941)
0.963736
0.942383
CID-TMT v20220104 [in-house dataset] (72 138) PXD005890
(69 768)
0.851085

In the following table, we list all MS² acquisition information and peptide properties for the different models. For optimal results, your experimental data should match the properties of the MS²PIP model. For instance, all MS²PIP models were trained on tryptic peptides. As the C-terminal lysine and arginine heavily influence MS² fragmentation, these models are not intended to make predictions for non-tryptic peptides.

For more specific information on the experimental settings, please refer to the train datasets' publications (links are provided in the table above).

Model Fragmentation method MS² mass analyzer Peptide properties
HCD2019 HCD Orbitrap Tryptic digest
HCD2021 HCD Orbitrap Tryptic/Chymotrypsin digest
CID CID Linear ion trap Tryptic digest
iTRAQ HCD Orbitrap Tryptic digest, iTRAQ-labeled
iTRAQphospho HCD Orbitrap Tryptic digest, iTRAQ-labeled, enriched for phosphorylation
TMT HCD Orbitrap Tryptic digest, TMT-labeled
TTOF5600 CID Quadrupole Time-of-Flight Tryptic digest
HCDch2 (including b++ and y++ ions) HCD Orbitrap Tryptic digest
CIDch2 (including b++ and y++ ions) CID Linear ion trap Tryptic digest
Immuno-HCD HCD Orbitrap Immunopeptides
CID-TMT CID Linear ion trap Tryptic digest, TMT-labeled

Results

MS²PIP predictions can be downloaded in CSV, MGF, MSP and BibloSpec / Skyline (SSL and MS2) file formats. Predicted intensities are normalized to the total ion current (sum of all intensities) and add up to 1 in the CSV file and to 10.000 in the MGF, MSP and MS2 files. On the download page we also provide an interactive visualization of the predicted spectra.

Run MS²PIP

Select preset modifications

Upload your PEPREC file or use the example file

Select the desired output formats

Contact

If you have any questions, feedback or suggestions, please contact one of the following people: