FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning by Dan Kalifa & Uriel Singer & Kira Radinsky instant download
ABSTRACT(SOTA) techniques today for 3D protein representation is GearAccurate protein representation is vital for diverse biological andNet [58]. This approach converts the 3D structure of a protein intoa graph that captures its biological characteristics. Subsequently,biomedical applications. While three-dimensional (3D) structuralgraph neural network techniques [16, 37] are applied to this graph,context is central to protein function, most computational approachesfacilitating the creation of comprehensive protein representations.either ignore it or fuse it with sequence information in a singleRecent research emphasizes the importance of comprehensivelate step, yielding limited benefits. We present FusionProt, a uniprotein representation that includes both 1D and 3D structuresfied representation learning framework that iteratively exchangesto capture the protein’s functional and interactional propertiesinformation between a protein language model and a graph-basedaccurately. ESM-GearNet [57] was one of the first approaches tostructure encoder via a single learnable fusion token. This early,integrate these modalities. Although the study explored variousbidirectional conditioning preserves structural cues across layersfusion strategies, empirical results showed that the most effectivewhile maintaining near-constant complexity. Across EC and GOmethod is using a large protein language model (PLM) such asbenchmarks, FusionProt achieves state-of-the-art results, improvingESM [22] to generate representations, which were then used asFmax by up to 3% over strong joint baselines; on mutation stabilitycontext for a graph encoder like GearNet [58]. Other approaches,prediction it boosts AUROC by 5.1% versus the best structure model,such as SaProt [41], leverage an AlphaFold-based model [46] towith only 2–5% runtime overhead. We further demonstrate howreduce the 3D structure to tokens and train them along with aminoanalysing the specific gains in predictive capability can help
*Free conversion of into popular formats such as PDF, DOCX, DOC, AZW, EPUB, and MOBI after payment.