You are here : Home > LPCV > PredAlgo - A new subcellular localization prediction tool dedicated to green algae

highlight / actuality

PredAlgo - A new subcellular localization prediction tool dedicated to green algae


To improve production yields of 3rd generation biofuels, biologists seek to domesticate the process of conversion and storage of solar energy from microalgae. Studies focus in particular on the chloroplast, an organelle where photosynthesis takes place and at the origin of the production of energetic reserves (starch or lipids). In the ALGOMICS project, researchers at the large Scale Biology Laboratory in collaboration with others from the iBEB have developed a tool to annotate protein from the model alga, in order to track their location in different cellular compartments (including chloroplast).

Published on 4 September 2012
​An important contribution to our understanding of cellular mechanisms is the assignment of protein localization in the different subcellular compartments. In eukaryotes, organelles such as mitochondria and chloroplast have a bacterial endosymbiotic origin. The genes of this ancestral bacterium migrated to the cell nucleus and their products, the proteins are synthesized in the cytoplasm. This is why the biogenesis and function of mitochondria or chloroplast involve the migration of these proteins from the cytoplasm into the organelle across the delimiting membrane system. For the most part, these proteins have acquired a signal sequence called import sequence that allow them to be properly addressed in the sub-cellular compartment.

To predict the presence of an import sequence (also called transit peptide in the literature) is a key component for the annotation of subcellular localization of a protein. However, existing tools were developed using animals or terrestrial higher plants sequences, and thus prove to be unsuitable for the Green Algae. In this new study, the goal of the Exploring the Dynamics of Proteomes (EDyP) at the Large Scale Biology Laboratory was to design a software dedicated to green algae, called PredAlgo [1].

The unicellular green algae Chlamydomonas reinhardtii was chosen as a source of experimental data for training the software. It is indeed the only green algae for which there is a large enough inventory of proteins for which subcellular localization is known. Such proteins have their import sequence cleaved during their transfer into the chloroplast or mitochondria. The precise identification of import sequences (i.e. at the cleavage site) of these proteins could be obtained by re-analyzing data from tandem mass spectrometry (Figure 1) previously acquired for the analysis of the mitochondrial [2] and chloroplastic proteomes [3].


Figure 1: Identification of N-terminal peptides of proteins by mass spectrometry.
In a classical analysis of tandem mass spectrometry (MS/MS), the sample is digested with trypsin hence generating peptides having a cleavage consensus site for trypsin () to the N-and Cter. Only peptide located at one end of the protein contains only one tryptic cleavage. To identify the N-terminal peptides of proteins (peptide in green), the search is set by requiring the presence of a trypsin site at the Cter side only. Because the proteins imported into the mitochondria or chloroplasts are cleaved at their import sequence, it is possible to accurately determine the site of cleavage and the import sequence (in light blue) by substracting the N-terminal peptide from the whole sequence of the protein (in databases).

The analysis of sequences adjacent to the identified cleavage sites reveals general trends (Figure 2) which does not allow to obtain clear and distinctive consensus for the chloroplastic or mitochondrial compartments. As a result, PredAlgo was developed on the principle of "neural networks", a black box in which training optimizes decision rules that are not interpretable by humans.


Figure 2: Relative occurrence of amino acids over a window of 10 positions apart from the cleavage site (arrow): these schemes have been generated on the basis of the chloroplastic and mitochondrial proteins used for the training of PredAlgo.

In comparison with existing software, PredAlgo achieves the best performance for algae, especially for the discrimination between chloroplastic and mitochondrial proteins. The tool is relevant not only in Chlamydomonas but also for other close lineages of green algae. It is less predictive in the case of the localization of mitochondrial proteins from most distant algae.

Green algae are increasingly recognized as a potential source of 3rd generation biofuels (fatty acids, sugars, hydrogen) whose production is associated with the chloroplast metabolism. PredAlgo is the best suitable tool for predicting subcellular localization in these organisms and should accelerate the understanding of their metabolism and compartmentalization in order to increase the knowledge base required for green algae engineering.

Top page