Icelogo is a web service for visualising conserved patterns in protein and nucleotide sequences through probability theory.
Large sequence-based data sets are often scanned for conserved sequence patterns to extract useful biological information1. Sequence logos2 were the first to visualize conserved patterns in oligonucleotide and protein sequences.
They rely on Shannon's information theory to calculate the conservation level in a multiple sequence alignment. Usually represented as vertical stacks of symbols, the stack height reflects the level of conservation and the height is a measure for their frequency at a given position.
As information theory is based on a combination of entropy, mutual information and the information contained in a random variable, we must specify a background set or the experimental setting to compare to. In most cases the background is unknown or influenced by a multitude of experimental and evolutionary factors, so a general solution is usually not forthcoming.
Another common problem is that excluding or under-representing elements disrupts the statistical models, as is often the case in a biological context.
To battle these problems we have developed iceLogo3, a free open source Java application based on probability theory instead of information theory. While Java employs the "write once, run anywhere" paradigm, we offer usage of this algorithm as a SOAP web service and a web application to further help with making IceLogo accessible. More info on the IceLogo algorithm can be found in the manual.
If you use the iceLogo web application or the iceLogo stand alone version do not forget to reference the iceLogo publication.Colaert, N. et al. Nature Methods 6, 786-787 (2009)