The fastq-files of the triplicate human proteome screens in S. cerevisiae and P. pastoris were deposited in SRA under SRP094995. The corresponding fragment count tables can be found on Figshare (10.6084/m9.figshare.5349943 and 10.6084/m9.figshare.5349955), as can the R code used for data processing (10.6084/m9.figshare.5349979). Other datasets are available by request.
TA fragment mapped to part of your protein and labeled ‘enriched’ means that a polypeptide of this sequence was detected as consistently displayed by yeast (S. cerevisiae or P. pastoris as noted in the species) when put in the context of an N-terminal secretory leader sequence (preproMF) and the Sag1 surface display anchor at the C-terminus. This suggests that this polypeptide can traverse (or evade) the yeast secretory system without being degraded, ie that this sequence is ‘secretable’, regardless of its original cellular localization in human cells. On the other hand, ‘depleted’ fragments are not consistently detected as displayed. In that case, this could suggest that the fragment can intrinsically not be secreted, that the context of the secretory leader or Sag1 interfered with secretory passage, or that technical reasons hampered detection (eg low display levels compared to noise, inconsistencies between replicates, clone loss during expansions, etc). Depleted fragments are thus not necessarily ‘non-secretable’.
All fragments are derived from polyA+ transcripts extracted from human cell lines. For the secretion screens in S. cerevisiae, we used HEK293T cells; for the secretion screens in P. pastoris, we pooled transcripts from 4 cell lines of different origins (breast, brain, liver and blood) to cover a larger fraction of all human ORFs.
No, the current output is binary (secretable/not detected as secretable) and fragment count levels are not correlated with display levels. We are working on implementing this type of information in future versions of this platform.
Currently, our libraries covered between 25-40% of the human transcriptome. We working on expanding library size, screening scale and sequencing depth in order to cover the human transcriptome to near-saturation.
Failure to express a full-length heterologous protein in a recombinant host, be it yeast or any other organism, through secretion or intracellular expression, is a very common issue. The culprit of such expression failures is often local, which is why many researchers resort to expressing only a part of their protein of interest - for instance, only the soluble domain of a transmembrane protein. By focusing on protein fragments instead of full-length proteins, given enough fragments covering the protein of interest in the input library, our method should allow to identify exactly which regions of the proteins can be expressed (and in our case, secreted) and which ones are problematic, and this across entire proteomes. Whether the resulting fragment, when expressed alone, has the same structural, biochemical, or functional properties as in the full-length protein will depend on the protein itself and warrants an examination on a case-by-case basis.
Regarding the length of the fragments, we aimed to generate fragments about the size of protein domains (50-100 amino acids), considering the structural, functional and evolutionary modularity of proteins in domains. While we have countered many of the sizes biases that can accumulate during the procedure (e.g. during PCR or cloning), we are still working on broadening the size range of the tested fragments and incorporating more longer (> 100 aa) fragments.
Results from our and other labs have indicated that for a wide variety of single proteins, Sag1 display efficiency correlates well with relative secretion levels. Inevitably, it cannot be excluded that for certain fragments, C-terminal fusion to the +/- 300 amino acid Sag1 anchor might differentially influence fragment folding, solubility, or stability, which can lead to false positives and negatives when assessing secretability. Currently, we estimate the sensitivity of a single replicate screening experiment to lie around 90%, and likely higher when combining results from replicate screens.
Lead contact: Nico Callewaert firstname.lastname@example.org
Experiments and sequencing analysis: Morgane Boone email@example.com
PDB and Pfam mapping: Pathmanaban Ramasamy firstname.lastname@example.org
Dynamics predictions: Wim Vranken email@example.com