Sugars are found in cells and on the cell surface. The Human Proteoform Project will include mapping glycoproteoforms.

Nucleic acids, proteins, lipids and small molecules are major constituents of the human cell but as molecular biology barreled ahead in terms of progress, another major class of macromolecules was “inadequately considered,” notes Ajit Varki of the University of California at San Diego. The molecules left behind are sugar chains, also called glycans.

Protein-bound glycans are found in the nucleus and the cytoplasm of human cells. And cells are sugar-covered externally with a forest of sugar chains—glycans—that are attached to a cell membrane’s lipids or proteins. Glycosylation is involved in many biological processes such as cell growth, molecular recognition and immune defense among others.

You can read more about glycoproteomics in my latest story in Nature Methods called ‘Tools to cut the sweet layer-cake that is glycoproteomics.’

Nucleic acids and proteins are diverse in terms of sequence and the types of changes they can show. Nucleic acids can be methylated, proteins have a variety of conformations. In glycobiology scientists face diversity, too.

Lloyd Smith from the University of Wisconsin Madison is excited about The Human Proteoform Project, which he and Northwestern University researcher Neil Kelleher and others are advocating for. Proteoforms are proteins produced from different splice variants of a given gene. In Smith’s view “the world needs to develop powerful new technology for comprehensive proteoform analysis in complex systems.”

The concept of the project is to generate a reference set of the human proteoforms produced from the genome to include the products of alternative splicing, various polymorphisms and the around 400 types of post-translational modifications such as glycosylation, phosphorylation, acetylation and methylation.

Conventional ‘bottom-up’ proteomics involves digesting protein mixtures into peptides, followed by analysis by tandem mass spec. The approach is powerful and invaluable for protein expression, note Smith and Kelleher. Just as isoforms exist as varying products of genes, so too can different proteoforms contain the same peptide.

The “bottom-up paradigm of proteomics sacrifices information on proteoforms,” they point out in Science. Top-down proteomics analyzes the entire intact proteoform and although it’s still in “its early stage of development,” they believe it’s “poised to mature rapidly.”  

In their pre-print ‘The Human Proteoform Project: Bringing Proteoforms to Life A Plan to Define the Human Proteome’, which they co-authored with others, they point out:

We propose here a plan to elucidate the complete human proteome– a community effort to identify a definitive reference set of the expressed proteoforms derived from the ~20,000 gene blueprint encoded in the human genome. We outline a two-pronged strategy: on the one hand, we will pursue deep proteoform-level analysis of critical medically-relevant system (neurodegeneration, cardiovascular health, infectious disease, cancer, immunobiology, …this will open up fundamental new insights into these critical medical targets.

In parallel, we will invest heavily in the accelerated development of proteoform discovery and characterization technologies, and apply these technologies to global proteoform-wide analysis…The plan is modeled roughly after the successful roadmap provided by the Human Genome Project (HGP), which provided a reference sequence for the Human Genome…

In 2013, they noted in this Nature Methods correspondence

Accordingly, we propose that the term ‘proteoform’ be used to designate all of the different molecular forms in which the protein product of a single gene can be found, including changes due to genetic variations, alternatively spliced RNA transcripts and post-translational modifications…. Any gene or protein processing events such as those using inteins or RNA-editing mechanisms are now covered cleanly by the term ‘proteoform’.

The job, with proteoform analysis, Smith told me, is to obtain the accurate amino acid sequence from N to C terminus, to then identify and localize all the post translational modifications that may be present. This is what he calls ‘class 1’ identification. 

Learning about the function of all of those different molecules matters, too, of course. But Smith separates the chemical problem of identifying all proteoforms in a sample of interest from the biological problem of learning what their functions are.

The Human Proteoform Project will include glycoproteoforms. “I am interested in proteoforms particularly and glycoproteoforms are a particularly challenging target,” says Smith.

It is a good idea to include glycoproteomic analysis in Human Proteoform Project, says Shiseng Sun, a researcher at Northwest University in Xi’an city, China. As a research fellow in Hui Zhang’s lab at Johns Hopkins University, Sun was part of the first phase of the National Cancer Institute’s Clinical Proteomics Tumor Analysis Consortium, which sought an enhanced molecular understanding of cancer. In a study in Nature Communications they, along with others, worked on glycoproteomic signatures in ovarian cancer. 

During  CPTAC-2 and CPTAC-3, says Sun, Zhang pushed hard to have glycoproteomic analysis included the CPTAC project. But, says Sun, at the time most scientists still thought glycosylation was less important than phosphorylation.

Now in his own lab, Sun analyzes glycoproteomes alongside his work in proteomics and phosphoproteomics. So it fits in with the ongoing work in his lab that glycoproteoforms will be included in the future proteoform project. 

Says Smith, his group would probably not have become “all mass spec all the time” as it is now, if the Orbitrap instrument had not become available. Its data quality and data amounts are quite enabling, he says. There remain many limitations with mass spectrometry, though. It’s “much less sensitive than you would think, doesn’t hold a candle to fluorescence, for example, where single molecule has become routine.” 

The Smith lab team built a fluorescence detector to look, during an experiment, at the molecules they were electrospraying. It was “super easy to match and exceed the mass spec sensitivity,” says Smith. He’s an advocate for developing powerful new technology for comprehensive proteoform analysis in complex systems, and a “steadfast advocate” for a large Human Proteoform Project, he says, “which would include glycoproteoforms of course.”

Original story published in Protocols and Methods Community on  Aug 23, 2021 by Vivien Marx, Journalist , Nature Portfolio