Introduction
Large sequence-based datasets are often scanned for conserved sequence patterns to extract useful biological information
1.
Sequence logos
2 were the first to visualize conserved patterns in oligonucleotide and protein sequences and rely on Shannon’s
information theory to calculate the conservation level amongst all positions in a multiple sequence alignment. A sequence logo is a
histogram-like presentation in which bars are vertical stacks of symbols, the stack height reflects the level of conservation and the
height of individual symbols is a measure for their frequency at a given position. In a statistically sound manner however, no tool can
compare an experimental peptide or protein sequence set to the background of species-specific natural occurrences of amino acids, to a
position-specific background set, or to a background set that is influenced by the experimental protocol. In addition, underrepresented
elements – non-tolerated amino acids or nucleotides – are generally not or not statistically well presented.
Recently we introduced iceLogo
3 which takes the analysis and visualisation of consensus patterns in aligned peptide sequences
to a new level. IceLogo is a free, open source Java application that can be downloaded at
http://icelogo.googlecode.com/. Here we present an iceLogo web application and a SOAP web server.
Instead of relying on the information theory, iceLogo builds
on the probability theory. This theory and the iceLogo algorithm is explained in the
manual.
Basically the algorithm takes the experimental set normally used to
generate a sequence logo and compares it with a reference set. This reference set can be configurated by the user allowing it to be tailored to ideally approximate the expected background
distribution. The experimental sequence set is generally a multiple sequence alignment of peptides that are expected to share sequence features. These two set will be used in a probability analysis and the result is shown in complementary illustrations
like heat maps, amino acid parameter graphs and so-called iceLogos, which were all developed to aid analysis, visualisation and
understanding of consensus sequences in an intuitive way.
1.
Hulo, N. et al. Nucleic Acids Res 36, D245-249 (2008).
2.
Schneider, T. D. & R. M. Stephens Nucleic Acids Res 18, 6097-6100 (1990).
3.
Colaert, N. et al. Nature Methods 6, 786-787 (2009)
Reference
If you use the iceLogo web application or the iceLogo stand alone version do not forget to reference the iceLogo publication.
Colaert, N. et al. Nature Methods 6, 786-787 (2009)
Acquiring icelogo
A stand alone java application that can generate iceLogos can be found
here.
The iceLogo server and SOAP client examples can be found
here.