Donny Bertucci
Interested in applying my skills to decode computation in the brain, discover mechanisms in
biology, and improve safety/alignment.
I've researched machine learning model interpretability, built systems to analyze protein
function, trained large deep learning models, and developed large-scale data visualization tools.
Previously member of the Georgia Tech Visual Analytics Lab, Oregon State Venom Biochem Lab, Carnegie Mellon Data Interaction Group (DIG), and
Oregon State Data Interaction and Visualization Lab.
Education
9.2020 β 6.2024
B.S. Computer Science
Oregon State University
Experience
8.2024 β 11.2024
Georgia Institute of Technology
Research Assistant, GT Visual Analytics Lab
Researched Explainable AI and built interfaces to interact with ML and bio models [_, _, _, _]. Advised by Dr. Alex Endert.
9.2023 β 6.2024
Oregon State University
Software Engineer, Venom Biochemistry and Molecular Biology Lab
Built a system to store and analyze venom protein structure and function [_]. Advised by Michael Youkhateh and Dr. Nathan Mortimer.
6.2023 β 9.2023
Carnegie Mellon University
Data Visualization Research Intern, CMU Data Interaction Group (DIG)
Researched interactive methods to improve language model prompt generation and transparency
with Dr. Adam Perer. Developed interactive visualizations of neural
network compression/quantization error [_].
9.2022 β 6.2023
Carnegie Mellon University
Research Assistant, CMU Data Interaction Group (DIG)
Developing human-centered ways to evaluate Machine Learning model behavior within Zeno [_] with Dr. Alex Cabrera. Enabling linked visualizations at scale
with Falcon [_] with Dr. Dominik Moritz.
5.2022 β 8.2022
Carnegie Mellon University
Developed user interfaces to interactively discover poor behavior in neural networks [_]. Advised by Dr. Alex Cabrera and Dr. Adam Perer. Hosted by the Data Interaction Group (DIG).
8.2021 β 6.2022
Oregon State University
Research Assistant, Data Interaction and Visualization (DIV) Lab
Developed user interfaces to visualize large data and interpret complex machine learning models
[_, _]. Mentored and
advised by Dr. Minsuk Kahng.
2.2021 β 6.2021
Oregon State University
Developed interactive interfaces to visualize difficult concepts in learned neural networks [_, _]. Advised by Dr. Minsuk Kahng.
Publications
Featured
(papers that reflect current interests)

DeepLocalizer: A Library to Find Functional Specialization in Deep Neural Networks
Python library to find function specific activations in artificial neural networks using fMRI-like localization.

DS569k: Protein Sequence and Function Joint Embeddings Dataset
Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.

Venome: A Computational Analysis Tool for Protein Function
A tool to analyze venom protein function (w/ Foldseek and TMAlign integration).
2024 Engineering Expo, Oregon State University. Corvallis, OR

ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings
Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.
Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning
Alex Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, and Adam Perer
ACM Conference on Human Factors in Computing Systems (CHI). Hamburg, Germany, 2023.

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps
Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng
IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2022). Oklahoma City, OK
Conference

C3
Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning
Alex Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, and Adam Perer
ACM Conference on Human Factors in Computing Systems (CHI). Hamburg, Germany, 2023.

C2
DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps
Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng
IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2022). Oklahoma City, OK

C1
Beyond Value: CHECKLIST for Testing Inferences in Planning-Based Reinforcement Learning
Kin-Ho Lam, Delyar Tabatabai, Jed Irvine, Donald Bertucci, Anita Ruangrotsakun, Minsuk Kahng, and Alan Fern
32nd International Conference on Automated Planning and Scheduling (ICAPS 2022).
Workshop

W4
Venome: A Computational Analysis Tool for Protein Function
2024 Engineering Expo, Oregon State University. Corvallis, OR

W3
Mirror: Interactive Discovery of Blindspots in Machine Learning Models
Human-Computer Interaction Institute (HCII) Summer Research Showcase (2022). Pittsburgh, PA

W2
Backprop Explainer: Interactive Explanation of Backpropagation in Neural Network Training
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).

W1
An Interactive Introduction to Autoencoders
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).
Miscellaneous

M14
DeepLocalizer: A Library to Find Functional Specialization in Deep Neural Networks
Python library to find function specific activations in artificial neural networks using fMRI-like localization.

M13
Paper Implement: Reproduce Interesting Research Papers for Educational Purposes
Ongoing repository for CS/Bio/ML/Neuro paper reimplementations with code implemented from scratch.

M12
GPT from scratch to generate bioRxiv titles
BPE Tokenizer and Transformer model implemented from scratch to generate or embed bioRxiv titles.
M11
TensorScript: Tensor Library accelerated by WebGPU
Tensor operations and auto differentiation with custom WebGPU kernels.

M10
WebGPU Compute Library
PyCuda-like library for WebGPU to easily run compute shaders with minimal lines of code.

M9
VQ-VAE Explainer: Learn the VQ-VAE Implementation with Interactive Visualization
Interact with and visualize a VQ-VAE (Vector-Quantized Variational Autoencoder) directly in the browser.

M8
Explore ARC-AGI
Visualize the ARC-AGI dataset with live crossfiltering for compression metrics.

M7
VAE Explainer: Supplement Learning Variational Autoencoders with Interactive Visualization
Interact with and visualize a Variational Autoencoder directly in the browser.

M6
DS569k: Protein Sequence and Function Joint Embeddings Dataset
Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.

M5
Random Number Generator with Elementary Cellular Automata in Matlab
Random numbers with Elementary Cellular Automata Rule 30 in Matlab + transform to any other distribution.
Mathematical Software with Torrey Johnson, Oregon State University. Corvallis, OR

M4
ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings
Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.
Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR

M3
Visualizing Neural Network Compression
An interactive article exploring how model compression error affects neural network behavior.

M2
FalconVis: A Library to Cross-Filter Billions of Data Entries on the Web
A JavaScript library for visualizing big data on the web with your custom visualizations and scalable data formats.

M1
Finding the Distance Function in the PoincarΓ© Disk using Stereographic Projection
A paper that derives the Poicare disk distance function using stereographic projection from Minkowski Space.
Non-Euclidean Geometry with Tevian Dray, Oregon State University (2023). Corvallis, OR
Skills
- ML/AI Python, PyTorch, Jax, TensorFlow, Keras, Scikit-learn, NumPy.
- Data Visualization Javascript/Typescript, Svelte, SVG, WebGL, WebGPU, D3, Vega/VL, Matplotlib, Altair.
- Frontend Javascript/Typescript, HTML/CSS, Svelte, React, Vue, Tailwind, Website Design (Figma).
- Backend Python, Pandas, FastAPI, Flask, NodeJS, MySQL, DuckDB, PostgreSQL, C/C++, Assembly.
- HPC CUDA, OpenCL, OpenMP, MPI, Slurm, Multi-Processing/Threading, SIMD.
- OS Linux, Bash, Git, SSH, FTP, NGINX, Apache, Vim, Docker.
- Math/Stats R, Matlab, Mathematica.
- Bio ChimeraX, Mol*, BioPython.
- Research LaTeX, Paper Figure Design, Statistical Analysis.
Volunteer
2023
ACM CHI Late-Breaking Work Review
2022 β 2023
IEEE Workshop on Visualization for AI Explainability Program Committee
2018 β 2019
Helped surgeons with cleft lip and palate surgery on two trips (Cure Kenya Clinic in Kijabe)
2019
Led fundraiser for toddler shoes to give to children after surgery (Cure Kenya Clinic in Kijabe)
Awards
2022
NSF Summer REU Scholarship
2021
URSA Engage REU Award
2020 β 2024
Oregon State University deans list every term + summa cum laude at graduation
2020 β 2024
Oregon State University Finely Academic Scholarship
References
Dr. Minsuk Kahng
Senior Research Scientist at Google Deepmind
Dr. Alex Cabrera
Founding Engineer at Axiom Bio
Dr. Adam Perer
Professor at Carnegie Mellon University
Dr. Dominik Moritz
Professor at Carnegie Mellon University and Apple ML Research Scientist
Dr. Nathan Mortimer
Professor at Oregon State University