MS²PIP takes a PEPREC (Peptide Record) file as an input. This is a space-separated file that lists all peptides. To run our server smoothly, we limit the number of peptides to 100.000. If you need to predict more peptide spectra, we recommend you to split-up your dataset into multiple batches, or to download MS²PIP from GitHub and run it locally.
A PEPREC file contains the following columns:
spec_id: A unique ID for the peptide.
peptide: Peptide sequence.
modifications: PTMs for the given peptide. Every modification is listed as
name|location, separated by a pipe (
|) between the name, the location and other PTMs. The location is an integer counted starting at
1for the first AA.
0is reserved for N-terminal modifications.
Namehas to correspond to a preset or custom PTM (see below) . Unmodified peptides are marked with a hyphen (
charge: Precursor charge of the peptide.
Example of a PEPREC file:
spec_id modifications peptide charge
peptide1 - ACDE 2
peptide2 2|Carbamidomethyl ACDEFGHI 3
peptide3 0|iTRAQ|10|Oxidation ACDEFGHIKMNPQ 2
A list of all modifications and the corresponding mass shifts is needed for MS²PIP to properly calculate the fragmentation peak m/z values. You can select some preset modifications below or provide your own list. For the preset modifications, we use the PSI-MS names and monoisotopic mass shifts from Unimod. This means that, if you use these preset modifications, the modification names in your PEPREC file need to match the Unimod PSI-MS names. If MS²PIP encounters a modification in the PEPREC file that is not provided in the modifications list, it will skip that peptide.
If you provide your own list of modifications, each line can only contain one modification, with the following comma-separated properties:
If a certain modification occurs on different amino acids, every modification-amino acid combination
should have it's own entry and have a unique name
N- and C-terminal modifications can be added in the same way, but require
C-term instead of an amino acid code.
Example of a custom modification list:
MS²PIP currently supports the models listed in the table below. Always take note of the MS²PIP version and model versions you use and mention these in your publications. The current MS²PIP version is v20190120.
|Model||Current model version||Train-test dataset (unique peptides)||Evaluation dataset (unique peptides)||Median Pearson correlation on evaluation dataset|
(1 623 712)
|CID||v20190107||NIST CID Human
|NIST CID Yeast
|TMT-labeling||v20190107||Peng Lab TMT Spectral Library
(1 185 547)
|iTRAQ-labeled phosphopeptides||v20190107||NIST iTRAQ phospho
|HCD (including b++ and y++ ions)||v20190107||MassIVE-KB
(1 623 712)
|0.903786 (+) and 0.644162 (++)|
|CID (including b++ and y++ ions)||v20190107||NIST CID Human
|NIST CID Yeast
|0.904947 (+) and 0.813342 (++)|
Note that the MS²PIP models were trained on tryptic peptides. As the C-terminal lysine and arginine heavily influence MS² fragmentation, these models are not intended to make predictions for non-tryptic peptides.
MS²PIP predictions can be downloaded in CSV or MGF file format. Predicted intensities are normalized to the total ion current (sum of all intensities) and add up to 1 in the CSV file and to 10.000 in the MGF file. On the download page we also provide an interactive visualization of the predicted spectra.