iceLogo

Introduction

Icelogo is a web service for visualising conserved patterns in protein and nucleotide sequences through probability theory.

Background

Large sequence-based data sets are often scanned for conserved sequence patterns to extract useful biological information¹. Sequence logos² were the first to visualize conserved patterns in oligonucleotide and protein sequences.

They rely on Shannons information theory to calculate the conservation level in a multiple sequence alignment. Usually represented as vertical stacks of symbols, the stack height reflects the level of conservation and the height is a measure for their frequency at a given position.

Problems with this method

As information theory is based on a combination of entropy, mutual information and the information contained in a random variable, we must specify a background set or the experimental setting to compare to. In most cases the background is unknown or influenced by a multitude of experimental and evolutionary factors, so a general solution is usually not forthcoming.

Another common problem is that excluding or under-representing elements disrupts the statistical models, as is often the case in a biological context.

Our solution

To battle these problems we have developed iceLogo³, a free open source Java application based on probability theory instead of information theory. While Java employs the "write once, run anywhere" paradigm, we offer usage of this algorithm as a SOAP web service and a web application to further help with making IceLogo accessible. More info on the IceLogo algorithm can be found in the manual.

Reference

If you use the iceLogo web application or the iceLogo stand alone version do not forget to reference the iceLogo publication.

Colaert, N. et al. Nature Methods 6, 786-787 (2009)

Acquiring icelogo

A stand alone java application that can generate iceLogos can be found here. The iceLogo server and SOAP client examples can be found here.