Donny Bertucci

Interested in applying my skills to decode computation in the brain, discover mechanisms in biology, and improve safety/alignment.
I've researched machine learning model interpretability, built systems to analyze protein function, trained large deep learning models, and developed large-scale data visualization tools.
Previously member of the Georgia Tech Visual Analytics Lab, Oregon State Venom Biochem Lab, Carnegie Mellon Data Interaction Group (DIG), and Oregon State Data Interaction and Visualization Lab.

Education

9.2020 – 6.2024

B.S. Computer Science

Oregon State University
Degree Focus: Artificial Intelligence, Minor: Mathematics

Experience

8.2024 – 11.2024

Georgia Institute of Technology

Research Assistant, GT Visual Analytics Lab
Researched Explainable AI and built interfaces to interact with ML and bio models [_, _, _, _]. Advised by Dr. Alex Endert.
9.2023 – 6.2024

Oregon State University

Software Engineer, Venom Biochemistry and Molecular Biology Lab
Built a system to store and analyze venom protein structure and function [_]. Advised by Michael Youkhateh and Dr. Nathan Mortimer.
6.2023 – 9.2023

Carnegie Mellon University

Data Visualization Research Intern, CMU Data Interaction Group (DIG)
Researched interactive methods to improve language model prompt generation and transparency with Dr. Adam Perer. Developed interactive visualizations of neural network compression/quantization error [_].
9.2022 – 6.2023

Carnegie Mellon University

Research Assistant, CMU Data Interaction Group (DIG)
Developing human-centered ways to evaluate Machine Learning model behavior within Zeno [_] with Dr. Alex Cabrera. Enabling linked visualizations at scale with Falcon [_] with Dr. Dominik Moritz.
5.2022 – 8.2022

Carnegie Mellon University

HCII Summer Undergraduate Research Program
Developed user interfaces to interactively discover poor behavior in neural networks [_]. Advised by Dr. Alex Cabrera and Dr. Adam Perer. Hosted by the Data Interaction Group (DIG).
8.2021 – 6.2022

Oregon State University

Research Assistant, Data Interaction and Visualization (DIV) Lab
Developed user interfaces to visualize large data and interpret complex machine learning models [_, _]. Mentored and advised by Dr. Minsuk Kahng.
2.2021 – 6.2021

Oregon State University

URSA Engage Research Program
Developed interactive interfaces to visualize difficult concepts in learned neural networks [_, _]. Advised by Dr. Minsuk Kahng.

Publications

Featured

(papers that reflect current interests)

DeepLocalizer: A Library to Find Functional Specialization in Deep Neural Networks

Donald Bertucci
Python library to find function specific activations in artificial neural networks using fMRI-like localization.

DS569k: Protein Sequence and Function Joint Embeddings Dataset

Donald Bertucci and Alex Endert
Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.

Venome: A Computational Analysis Tool for Protein Function

Donald Bertucci, Ansen Garvin, Cora Bailey, Amanda Sinha, Michael Youkhateh, and Nathan Mortimer
A tool to analyze venom protein function (w/ Foldseek and TMAlign integration).
2024 Engineering Expo, Oregon State University. Corvallis, OR

ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings

Donald Bertucci
Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.
Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Alex Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, and Adam Perer
ACM Conference on Human Factors in Computing Systems (CHI). Hamburg, Germany, 2023.

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng
IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2022). Oklahoma City, OK

Conference

C3

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Alex Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, and Adam Perer
ACM Conference on Human Factors in Computing Systems (CHI). Hamburg, Germany, 2023.
C2

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng
IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2022). Oklahoma City, OK
C1

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based Reinforcement Learning

Kin-Ho Lam, Delyar Tabatabai, Jed Irvine, Donald Bertucci, Anita Ruangrotsakun, Minsuk Kahng, and Alan Fern
32nd International Conference on Automated Planning and Scheduling (ICAPS 2022).

Workshop

W4

Venome: A Computational Analysis Tool for Protein Function

Donald Bertucci, Ansen Garvin, Cora Bailey, Amanda Sinha, Michael Youkhateh, and Nathan Mortimer
2024 Engineering Expo, Oregon State University. Corvallis, OR
W3

Mirror: Interactive Discovery of Blindspots in Machine Learning Models

Donald Bertucci, Alex Cabrera, Nari Johnson, Gregory Plumb, Erica Fu, and Adam Perer
Human-Computer Interaction Institute (HCII) Summer Research Showcase (2022). Pittsburgh, PA
W2

Backprop Explainer: Interactive Explanation of Backpropagation in Neural Network Training

Donald Bertucci and Minsuk Kahng
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).
W1

An Interactive Introduction to Autoencoders

Donald Bertucci
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).

Miscellaneous

M14

DeepLocalizer: A Library to Find Functional Specialization in Deep Neural Networks

Donald Bertucci
Python library to find function specific activations in artificial neural networks using fMRI-like localization.
M13

Paper Implement: Reproduce Interesting Research Papers for Educational Purposes

Donald Bertucci
Ongoing repository for CS/Bio/ML/Neuro paper reimplementations with code implemented from scratch.
M12

GPT from scratch to generate bioRxiv titles

Donald Bertucci
BPE Tokenizer and Transformer model implemented from scratch to generate or embed bioRxiv titles.
M11

TensorScript: Tensor Library accelerated by WebGPU

Donald Bertucci
Tensor operations and auto differentiation with custom WebGPU kernels.
M10

WebGPU Compute Library

Donald Bertucci
PyCuda-like library for WebGPU to easily run compute shaders with minimal lines of code.
M9

VQ-VAE Explainer: Learn the VQ-VAE Implementation with Interactive Visualization

Donald Bertucci and Polo Chau
Interact with and visualize a VQ-VAE (Vector-Quantized Variational Autoencoder) directly in the browser.
M8

Explore ARC-AGI

Donald Bertucci
Visualize the ARC-AGI dataset with live crossfiltering for compression metrics.
M7

VAE Explainer: Supplement Learning Variational Autoencoders with Interactive Visualization

Donald Bertucci and Alex Endert
Interact with and visualize a Variational Autoencoder directly in the browser.
M6

DS569k: Protein Sequence and Function Joint Embeddings Dataset

Donald Bertucci and Alex Endert
Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.
M5

Random Number Generator with Elementary Cellular Automata in Matlab

Donald Bertucci
Random numbers with Elementary Cellular Automata Rule 30 in Matlab + transform to any other distribution.
Mathematical Software with Torrey Johnson, Oregon State University. Corvallis, OR
M4

ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings

Donald Bertucci
Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.
Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR
M3

Visualizing Neural Network Compression

Donald Bertucci and Adam Perer
An interactive article exploring how model compression error affects neural network behavior.
M2

FalconVis: A Library to Cross-Filter Billions of Data Entries on the Web

Donald Bertucci and Dominik Moritz
A JavaScript library for visualizing big data on the web with your custom visualizations and scalable data formats.
M1

Finding the Distance Function in the PoincarΓ© Disk using Stereographic Projection

Donald Bertucci
A paper that derives the Poicare disk distance function using stereographic projection from Minkowski Space.
Non-Euclidean Geometry with Tevian Dray, Oregon State University (2023). Corvallis, OR

Skills

  • ML/AI Python, PyTorch, Jax, TensorFlow, Keras, Scikit-learn, NumPy.
  • Data Visualization Javascript/Typescript, Svelte, SVG, WebGL, WebGPU, D3, Vega/VL, Matplotlib, Altair.
  • Frontend Javascript/Typescript, HTML/CSS, Svelte, React, Vue, Tailwind, Website Design (Figma).
  • Backend Python, Pandas, FastAPI, Flask, NodeJS, MySQL, DuckDB, PostgreSQL, C/C++, Assembly.
  • HPC CUDA, OpenCL, OpenMP, MPI, Slurm, Multi-Processing/Threading, SIMD.
  • OS Linux, Bash, Git, SSH, FTP, NGINX, Apache, Vim, Docker.
  • Math/Stats R, Matlab, Mathematica.
  • Bio ChimeraX, Mol*, BioPython.
  • Research LaTeX, Paper Figure Design, Statistical Analysis.

Volunteer

2023
ACM CHI Late-Breaking Work Review
2022 – 2023
IEEE Workshop on Visualization for AI Explainability Program Committee
2018 – 2019
Helped surgeons with cleft lip and palate surgery on two trips (Cure Kenya Clinic in Kijabe)
2019
Led fundraiser for toddler shoes to give to children after surgery (Cure Kenya Clinic in Kijabe)

Awards

2022
NSF Summer REU Scholarship
2021
URSA Engage REU Award
2020 – 2024
Oregon State University deans list every term + summa cum laude at graduation
2020 – 2024
Oregon State University Finely Academic Scholarship

References

Dr. Minsuk Kahng

Senior Research Scientist at Google Deepmind

Dr. Alex Cabrera

Founding Engineer at Axiom Bio

Dr. Adam Perer

Professor at Carnegie Mellon University

Dr. Dominik Moritz

Professor at Carnegie Mellon University and Apple ML Research Scientist

Dr. Nathan Mortimer

Professor at Oregon State University