GBoxes | Data Science

G-domain prediction across the diversity of G protein families

Guanosine-5′-triphosphate (GTP) is known to bind proteins that perform diverse cellular functions via a conserved domain known as the GTP binding domain (G-domain). The G domain consists of five consensus short motifs (G boxes; G1-G5) separated by some spacer amino acids (inter G box spacers). G5 sequence is less conserved than other G boxes (G1-G4). However, the G domain structural fold is conserved across different GTP binding proteins (G proteins).

We develop spacers and mismatches based algorithm (SMA) that identifies a G protein class-specific G5 box sequence.SMA takes either a set of multiple proteins or a single protein sequence and identifies putative G boxes (G1-G5) based on the already available structural information from well studied twenty-three G protein families. It can be used to identify putative G boxes in less studied multi-domain proteins.

topology diagram of the G domain with β-strands in green, α-helices in red, and N and C termini as indicated

(a) Ras protein structure (3X1W) structure as the prototype G domain

topology diagram of the G domain with β-strands in green, α-helices in red, and N and C termini as indicated

(b) topology diagram of the G domain with β-strands in green, α-helices in red, and N and C termini as indicated

Several G proteins, discovered over time, are characterized by diverse functions and sequences. This sequence diversity is also observed in the G box motifs (specifically the G5 box) as well as the inter-G box spacer length. Here we have created Spacers and Mismatch Algorithm (SMA) that can predict G-domains in a given G protein sequence, based on user-specified constraints for approximate G-box patterns and inter-box gaps in each G protein family. The SMA parameters can be customized as more G proteins are discovered and characterized structurally. Family-specific G box motifs including the less characterized G5 motif as well as G domain boundaries were predicted with higher precision. Overall, our analysis suggests the possible classification of G protein families based on family-specific G box sequences and lengths of inter-G box spacers.

Team
Hiral M. Sanghavi, Richa Rashmi, Anirban Dasgupta, and Sharmistha Majumdar.

Demo

You can upload your sequences and execute the algorithm— click here if you want to run it while being outside the IITGN network, and here if you want to run it from inside the IITGN network. (Note : recommended browser : mozilla firefox)