ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings

Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.
Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR