DS569k: Protein Sequence and Function Joint Embeddings Dataset

By Donald Bertucci and Alex Endert

Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.

Demo

Paper

Cite