DS569k: Protein Sequence and Function Joint Embeddings Dataset

By Donald Bertucci and Alex Endert
Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.
Demo
Paper
Cite
BibTeX
@misc{bertucci2024ds569k, author = {Donald Bertucci and Alex Endert}, title = {DS569k: Protein Sequence and Function Joint Embeddings Dataset}, year = {2024}, url = {https://xnought.github.io/files/DS569k.pdf}, }