Donny Bertucci
I want to help make biology programmable using my skills (ML/AI, Large-Scale Data Visualization,
Design).
I've published machine learning research [_, _, _, _], applied machine
learning to biology [_, _, _], explored the internals of machine learning
architectures [_, _, _, _, _, _], and developed libraries to process large-scale data [_, _, _].
Fortunate to have been a member of the Carnegie Mellon Data Interaction Group (DIG), Oregon State Venom Biochemistry & Molecular Biology Lab, and Oregon State Data Interaction and Visualization Lab.
Education
9.2020 – 6.2024
B.S. Computer Science
Oregon State University
Experience
8.2024 – 11.2024
Georgia Institute of Technology
Research Assistant, GT Visualization Lab
Built interactive visualizations to interpret machine learning model architectures (such as VAE
and VQ-VAEs) [_, _, _, _].
9.2023 – 6.2024
Oregon State University
Research Assistant, Venom Biochemistry and Molecular Biology Lab
Built a system to store, visualize, and search for similar venom protein structures using
computational methods [_]. Advised by Michael Youkhateh and Dr. Nathan Mortimer.
Summer 2023
Carnegie Mellon University
Research Intern, CMU Data Interaction Group (DIG)
Researched interactive methods to improve language model prompt generation and transparency
with Dr. Adam Perer. Developed interactive visualizations of neural
network compression/quantization error [_].
9.2022 – 6.2023
Carnegie Mellon University
Research Assistant, CMU Data Interaction Group (DIG)
Developing human-centered ways to evaluate Machine Learning model behavior within Zeno [_] with Dr. Alex Cabrera. Enabling linked visualizations at scale
with Falcon [_] with Dr. Dominik Moritz.
Summer 2022
Carnegie Mellon University
Developed user interfaces to interactively discover poor behavior in neural networks [_]. Advised by Dr. Alex Cabrera and Dr. Adam Perer. Hosted by the Data Interaction Group (DIG).
8.2021 – 6.2022
Oregon State University
Research Assistant, Data Interaction and Visualization (DIV) Lab
Developed user interfaces to visualize large data and interpret complex machine learning models
[_, _]. Mentored and
advised by Dr. Minsuk Kahng.
2.2021 – 6.2021
Oregon State University
Developed interactive interfaces to visualize difficult concepts in learned neural networks [_, _]. Advised by Dr. Minsuk Kahng.
Publications
Conference

C3
Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning
Alex Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, and Adam Perer
ACM Conference on Human Factors in Computing Systems (CHI). Hamburg, Germany, 2023.

C2
DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps
Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng
IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2022). Oklahoma City, OK

C1
Beyond Value: CHECKLIST for Testing Inferences in Planning-Based Reinforcement Learning
Kin-Ho Lam, Delyar Tabatabai, Jed Irvine, Donald Bertucci, Anita Ruangrotsakun, Minsuk Kahng, and Alan Fern
32nd International Conference on Automated Planning and Scheduling (ICAPS 2022).
Workshop

W4
Venome: A Computational Analysis Tool for Protein Function
2024 Engineering Expo, Oregon State University. Corvallis, OR

W3
Mirror: Interactive Discovery of Blindspots in Machine Learning Models
Human-Computer Interaction Institute (HCII) Summer Research Showcase (2022). Pittsburgh, PA

W2
Backprop Explainer: Interactive Explanation of Backpropagation in Neural Network Training
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).

W1
An Interactive Introduction to Autoencoders
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).
Miscellaneous

M7
VQ-VAE Explainer: Learn the VQ-VAE Implementation with Interactive Visualization
Interact with and visualize a VQ-VAE (Vector-Quantized Variational Autoencoder) directly in the browser.

M6
VAE Explainer: Supplement Learning Variational Autoencoders with Interactive Visualization
Interact with and visualize a Variational Autoencoder directly in the browser.

M5
DS569k: Protein Sequence and Function Joint Embeddings Dataset
Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.

M4
Random Number Generator with Elementary Cellular Automata in Matlab
Random numbers with Elementary Cellular Automata Rule 30 in Matlab + transform to any other distribution.
Mathematical Software with Torrey Johnson, Oregon State University. Corvallis, OR

M3
ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings
Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.
Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR

M2
Visualizing Neural Network Compression
An interactive article exploring how model compression error affects neural network behavior.

M1
Finding the Distance Function in the Poincaré Disk using Stereographic Projection
A paper that derives the Poicare disk distance function using stereographic projection from Minkowski Space.
Non-Euclidean Geometry with Tevian Dray, Oregon State University (2023). Corvallis, OR
Software/Packages

S8
nsdget: Quickly use Single Trial Betas and COCO Images from the Natural Scenes Dataset
Python library to easily download and use the single trial betas (1.8mm res) and COCO images from the Natural Scenes Dataset.

S7
DeepLocalizer: A Library to Find Functional Specialization in Deep Neural Networks
Python library to find function specific activations in artificial neural networks using fMRI-like localization.

S6
Paper Implement: Reproduce Interesting Research Papers for Educational Purposes
Ongoing repository for CS/Bio/ML/Neuro paper reimplementations with code implemented from scratch.

S5
GPT from scratch to generate bioRxiv titles
BPE Tokenizer and Transformer model implemented from scratch to generate or embed bioRxiv titles.
S4
TensorScript: Tensor Library accelerated by WebGPU
Tensor operations and auto differentiation with custom WebGPU kernels.

S3
WebGPU Compute Library
PyCuda-like library for WebGPU to easily run compute shaders with minimal lines of code.

S2
Explore ARC-AGI
Visualize the ARC-AGI dataset with live crossfiltering for compression metrics.

S1
FalconVis: A Library to Cross-Filter Billions of Data Entries on the Web
A JavaScript library for visualizing big data on the web with your custom visualizations and scalable data formats.
Skills
- Languages Python, Javascript/Typescript, C/C++, Matlab, R, Mathematica, Bash.
- ML/AI PyTorch, Jax, TensorFlow, Keras, Scikit-learn, NumPy, CUDA.
- Bio/Neuro ChimeraX, Mol*, BioPython, NiBabel, FSL.
- Frontend Figma, HTML, CSS, Svelte, React, Vue, Tailwind.
- Backend Pandas, FastAPI, Flask, NodeJS, MySQL, DuckDB, PostgreSQL, Assembly.
- OS Linux, Git, SSH, FTP, NGINX, Apache, Docker.
- HPC Slurm, CUDA, OpenCL, OpenMP, MPI.
- Data Vis D3, SVG, Canvas, WebGPU, Vega, Matplotlib, Seaborn, Altair.
- Research LaTeX, Figure Design, Statistical Analysis.
- Relevant Coursework Machine Learning, Artificial Intelligence, Parallel Programming, Molecular Modeling, Animal Genetics, Human Behavioral Biology, Discrete Math, Linear Algebra I & II, Numerical Linear Algebra, Non Euclidean Geometry, Differential Calculus, Integral Calculus, Vector Calculus, Mathematical Statistics I & II.
References
Dr. Minsuk Kahng
Senior Research Scientist at Google Deepmind
Dr. Alex Cabrera
Founding Engineer at Axiom Bio
Dr. Adam Perer
Computer Science Professor at Carnegie Mellon University HCII
Dr. Dominik Moritz
Computer Science Professor at Carnegie Mellon University HCII and Apple ML Research Scientist
Dr. Nathan Mortimer
Biochemistry Professor at Oregon State University