The Human Genome Project (HGP) was a remarkable and unqualified success profoundly transforming and accelerating biological and medical research while converting a ~ $4B public investment into over $700B of economic activity and new industries (1). The challenge of revealing the “Blueprints of Life,” however, is surpassed by the challenge we face today: deriving from these blueprints an understanding of the structures they dictate and how these function within biological systems.
Proteins are primary effectors of function in biology, and thus, complete knowledge of their structure and behavior is fundamental to deciphering function in basic and translational research.. The richness of protein structure and function goes far beyond the linear amino acid sequence dictated by the genetic code. Genetic variation, alternative splicing, and posttranslational modification (PTM) work together to create a rich variety of different proteoforms arising from our genes. The chemical diversity of proteins is foundational for the biological complexes and networks that control biology yet remains largely unknown. Genome sequence alone does not provide the needed information—only direct analysis of the proteoforms themselves can reveal their composition, enabling studies of their spatial distributions and temporal dynamics in biological systems. We propose here an ambitious initiative to define the human proteome, that is, to generate a definitive set of reference proteoforms produced from the genome.
The original white paper by Neil Kelleher and coauthors appeared on June 12, 2021 in Science Advances.