MS²PIP Server

MS² Peak Intensity Prediction

MS²PIP is a tool to predict MS2 signal peak intensities from peptide sequences. It employs the XGBoost machine learning algorithm and is written in Python.

Below, you can easily upload a list of peptide sequences or a protein FASTA file, after which the corresponding predicted MS2 spectra can be downloaded in multiple file formats. We also provide ready-to-download proteome-wide spectral libraries for various model organisms, ideal for DIA or DDA spectral library searching.

More advanced users can also access MS²PIP Server through our RESTful API. We provide Swagger-generated API documentation and an example Python script to contact the API.

For more customizability, MS²PIP can be installed locally. Check out the MS²PIP GitHub repository for more information.

If you use MS²PIP for your research, please cite the following publication:

  • Arthur Declercq, Robbin Bouwmeester, Cristina Chiva, Eduard Sabidó, Aurélie Hirschler, Christine Carapito, Lennart Martens, Sven Degroeve & Ralf Gabriels (2023). Updated MS²PIP web server supports cutting-edge proteomics applications. Nucleic Acids Research doi:10.1093/nar/gkad335

Prior MS²PIP publications:

  • Ralf Gabriels, Lennart Martens, & Sven Degroeve (2019). Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research, 47(W1), W295–W299. doi:10.1093/nar/gkz299
  • Sven Degroeve, Davy Maddelein, & Lennart Martens (2015). MS²PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Research, 43(W1), W326–W330. doi:10.1093/nar/gkv542
  • Sven Degroeve, & Lennart Martens (2013). MS²PIP: a tool for MS/MS peak intensity prediction. Bioinformatics (Oxford, England), 29(24), 3199–203. doi:10.1093/bioinformatics/btt544

How to

Input file and settings

From peptide list
PEPREC file

MS²PIP takes a PEPREC (Peptide Record) file as an input. This is a space, tab, colon, or semicolon-separated file that lists all peptides. To run our server smoothly, we limit the number of peptides to 500000. If you need to predict more peptide spectra, we recommend you to split-up your dataset into multiple batches, or to download MS²PIP from GitHub and run it locally.

A PEPREC file contains the following columns:

  • spec_id: A unique ID for the peptide.
  • peptide: Peptide sequence.
  • modifications: PTMs for the given peptide. Every modification is listed as location|name, separated by a pipe (|) between the location, the name, and other PTMs. The location is an integer counted starting at 1 for the first AA. 0 is reserved for N-terminal modifications. Name has to correspond to a preset or custom PTM (see below) . Unmodified peptides are marked with a hyphen (-).
  • charge: Precursor charge of the peptide.

Optionally, a protein_list column can be provided with proteins formatted as "['example_protein_1', 'example_protein_2']". The provided proteins will then be written in the Comment field of the MSP file.

Example of a PEPREC file:

spec_id modifications peptide charge
peptide1 - ACDE 2
peptide2 2|Carbamidomethyl ACDEFGHI 3
peptide3 0|iTRAQ|10|Oxidation ACDEFGHIKMNPQ 2

Allowed filename extensions for the PEPREC file are: .peprec, .csv, .tsv and .txt.

PEPREC files can be created from, for example, Excel, by exporting the table to a .CSV file. We recommend to use the psm_utils Python package to convert various search engine output files to a PEPREC file.

Modifications

A list of all modifications and the corresponding mass shifts is required for MS²PIP to properly calculate the fragmentation peak m/z values. Even though we provide specialized models for certain modififications, the specific modification info you provide here and in the PEPREC file does not influence the predicted peak intensites. It is only used to calculate m/z values.

You can select some preset modifications below or provide your own list. For the preset modifications, we use the PSI-MS names and monoisotopic mass shifts from Unimod. This means that, if you use these preset modifications, the modification names in your PEPREC file need to match the Unimod PSI-MS names. If MS²PIP encounters a modification in the PEPREC file that is not provided in the modifications list, it will skip that peptide.

If you provide your own list of modifications, each line can only contain one modification, with the following comma-separated properties:

  • Modification name, as used in the PEPREC file
  • Monoisotopic mass shift
  • Amino acid one-letter code, N-term or C-term

If a certain modification occurs on different amino acids, every modification-amino acid combination should have it's own entry and have a unique name (eg PhosphoS, PhosphoT and PhosphoY or TMT6plex and TMT6plexN). N- and C-terminal modifications can be added in the same way, but require N-term or C-term instead of an amino acid code.

Example of a custom modification list:

Oxidation,15.994915,M
Carbamidomethyl,57.021464,C
PhosphoS,79.966331,S
PhosphoT,79.966331,T
PhosphoY,79.966331,Y
iTRAQ,144.102063,N-term
From a protein FASTA file New

MS²PIP an DeepLC can also generate a spectral library from a protein FASTA file. Protein entries will be first in silico digested to peptides and for each combination of precursor charge state and modifications, the fragmentation spectra will be predicted. As for a normal peptide list input, the modifications are only considered for the m/z values, not for the predicted peak intensities.

While some peptide 'search space' parameters can be configured on this web server, we recommend using a local MS²PIP installation for more flexibility. These restrictions are mainly put in place to avoid an overload due to accidental setting of parameters that lead to a combinatorial explosion of the peptide 'search space'. In the local version, more options are available for cleavage agents, peptide lengths, precursor charge states, custom modifications, output file formats, etc.

Prediction models

MS²PIP contains prediction models for various fragmentation modes, instruments, and peptide modifications. In the following table, we list all MS2 acquisition information and peptide properties for the different models. For optimal results, your experimental data should match the properties of the MS²PIP model. For more specific information on the experimental settings, please refer to the train dataset publications. These are listed on the MS²PIP GitHub README page .

Always take note of the MS²PIP version and model you use and mention these in your publications. The current online MS²PIP version is v3.11.0.

Model Fragmentation method MS2 mass analyzer Peptide properties
HCD (2021) New HCD Orbitrap Tryptic/Chymotrypsin digest
CID CID Linear ion trap Tryptic digest
iTRAQ HCD Orbitrap Tryptic digest, iTRAQ-labeled
iTRAQphospho HCD Orbitrap Tryptic digest, iTRAQ-labeled, enriched for phosphorylation
TMT HCD Orbitrap Tryptic digest, TMT-labeled
TTOF5600 CID Quadrupole Time-of-Flight Tryptic digest
HCDch2 (including b++ and y++ ions) HCD Orbitrap Tryptic digest
CIDch2 (including b++ and y++ ions) CID Linear ion trap Tryptic digest
Immuno-HCD New HCD Orbitrap Immunopeptides (HLA class I and class II)
CID-TMT New CID Linear ion trap Tryptic digest, TMT-labeled

Results

MS²PIP predictions can be downloaded in CSV, MGF, MSP and BibloSpec / Skyline (SSL and MS2) file formats. Predicted intensities are normalized to the total ion current (sum of all intensities) and add up to 1 in the CSV file and to 10.000 in the MGF, MSP and MS2 files. On the download page we also provide an interactive visualization of the predicted spectra.

Get predictions

Select preset modifications

Upload your PEPREC file or use the example file

Download example PEPREC file

Select output formats

Download example FASTA file

m/z

Select residue modifications

Peptide retention time

Note that full proteome prediction can take up to an hour or more to complete, depending on the peptide 'search space' parameters. While MS²PIP is running, you can close your browser and later return to the results page using the link provided on next window.

Download pregenerated spectral libraries for common model organisms here. These libraries are available in multiple download formats and are compatible with most DIA or DDA spectral library search engines. The libraries are updated whenever new prediction models are available and at least yearly with new UniProt Proteome versions. Older version remain available for download here.

Human
Homo sapiens

Protein sequences downloaded from UniProt Proteomes on 13/02/2023. Contains 20594 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Mouse-ear cress
Arabidopsis thaliana

Protein sequences downloaded from UniProt Proteomes on 13/02/2023. Contains 27498 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Cattle
Bos taurus

Protein sequences downloaded from UniProt Proteomes on 12/12/2022. Contains 23844 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
C. elegans
Caenorhabditis elegans

Protein sequences downloaded from UniProt Proteomes on 12/12/2022. Contains 19838 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Wolf
Canis lupus

Protein sequences downloaded from UniProt Proteomes on 12/12/2022. Contains 23844 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Zebra fish
Danio rerio

Protein sequences downloaded from UniProt Proteomes on 12/12/2022. Contains 20358 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Fruit fly
Drosophila melanogaster

Protein sequences downloaded from UniProt Proteomes on 12/12/2022. Contains 13821 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
E. coli
Escherichia coli

Protein sequences downloaded from UniProt Proteomes on 13/02/2023. Contains 4402 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Mouse
Mus musculus

Protein sequences downloaded from UniProt Proteomes on 13/02/2023. Contains 21968 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Rat
Rattus norvegicus

Protein sequences downloaded from UniProt Proteomes on 12/12/2022. Contains 22860 protein entries.

MSP (DIA-NN compatible) BiblioSpec / Skyline (SSL, MS2)
Library generation settings
charges:
[2, 3]
min_peplen:
8
max_peplen:
30
min_precursor_mz:
None
max_precursor_mz:
None
cleavage_rule:
trypsin
missed_cleavages:
2
modifications:
[{'name': 'Acetyl', 'unimod_accession': 1, 'mass_shift': 42.01057, 'protein_n_term': True}, {'name': 'Oxidation', 'unimod_accession': 35, 'mass_shift': 15.9994, 'amino_acid': 'M'}, {'name': 'Carbamidomethyl', 'unimod_accession': 4, 'mass_shift': 57.0513, 'amino_acid': 'C'}]
ms2pip_model:
HCD2021
add_retention_time:
True
Materials and methods

A prebuilt in silico predicted spectral library was downloaded from the MS²PIP web server (https://iomics.ugent.be/ms2pip/). The library was generated using the following software packages: MS²PIP v3.11.0 for peptide spectrum prediction, DeepLC v1.2.1 for peptide retention time prediction, and Pyteomics v4.5.6 for parsing the FASTA file and applying in silico digestion. The following parameters were used: charges: [2, 3], min_peplen: 8, max_peplen: 30, min_precursor_mz: None, max_precursor_mz: None, cleavage_rule: trypsin, missed_cleavages: 2, modifications: [{'name': 'Acetyl', 'unimod_accession': 1, 'mass_shift': 42.01057, 'protein_n_term': True}, {'name': 'Oxidation', 'unimod_accession': 35, 'mass_shift': 15.9994, 'amino_acid': 'M'}, {'name': 'Carbamidomethyl', 'unimod_accession': 4, 'mass_shift': 57.0513, 'amino_acid': 'C'}], ms2pip_model: HCD2021, add_retention_time: True.

This text is licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Feel free to use it in your manuscript.

Contact

If you have any questions, feedback or suggestions, please contact one of the following people: