According to Wikipedia:
“Structural genomics seeks to describe the three-dimensional structure of every protein encoded by a given genome.”
But what does this mean?
The first time I heard about structural genomics it had a much better name, and a more limited scope: it was being called the ‘human protein structure project’. It was 1996 and the human genome project was well underway and it had really caught the public’s imagination. Looking at the proteins encoded by these genes seemed the obvious next step.
I was a PhD student attending a summer school course and we were celebrating the end of the week with a formal meal. David Blow, a renowned scientist who was in the same lab in Cambridge as James Watson and Francis Crick when they published the structure of DNA, was the guest speaker. His view was that it was an extraordinarily exciting time to be in science, and that to really understand human biology we needed a human protein structure project.
So the initial idea was to establish the molecular shape, the three dimensional structure, of every protein in the body. This was a big ask, especially considering at that point we didn’t know how many protein-coding genes there were in the human genome, although the working estimate at the time was 100,000. (We still don’t really know the answer to this, but it looks to be between 20-40,000.) And if that wasn’t a big enough problem, it didn’t take into account how difficult it can be to solve a protein structure. Years can be lost in the lab trying to produce protein that is suitable to work with.
Around 1997 various pilot projects started, notably in Japan. Along the line the aim of the project grew and grew. Why limit ourselves to one genome when there are so many others?
On a practical level this makes sense. You can spend years struggling with a human proten only to find that a virtually identical one from a pig works like magic. Plus, the human genome was practically the beginning for the field of genomics, now 1000s of genomes have been sequenced. And if we’re hoping to use this information to treat disease, then we’ll need to know what the protein structures of micro-organisms look like and, importantly, what proteins these bacteria have that we haven’t so that we can make drugs that disable only bacterial ones and not our own.
To solve all the structures of every genome is a Herculean task, partly because of the sheer scale of the task and partly because it is never-ending because we are discovering new organisms and thus new genomes.
In practice, the idea is to solve enough structures, which means thousands of structures, to know what’s out there. In an ideal world, we’d reach a point that for each new genome that is sequenced, we could say: ‘Ah, that new gene is very similar to one we’ve seen before in budding yeast, so we can confidently predict that the protein structure will look like this…’