ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings

By Donald Bertucci

Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.

Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR

Demo

Paper

Cite

BibTeX
@misc{bertucci2024proteinscatter,
  author = {Donald Bertucci},
  title = {ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings},
  booktitle = {Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR},
  year = {2024},
  url = {https://xnought.github.io/files/protein-scatter.pdf},
}