Donny Bertucci

I want to help make biology programmable using my skills (ML/AI, Large-Scale Data Visualization, Design).
I've published machine learning research [_, _, _, _], applied machine learning to biology [_, _, _], explored the internals of machine learning architectures [_, _, _, _, _, _], and developed libraries to process large-scale data [_, _, _].
Fortunate to have been a member of the Carnegie Mellon Data Interaction Group (DIG), Oregon State Venom Biochemistry & Molecular Biology Lab, and Oregon State Data Interaction and Visualization Lab.

Education

9.2020 – 6.2024

B.S. Computer Science

Oregon State University
Degree Focus: Artificial Intelligence, Minor: Mathematics

Experience

8.2024 – 11.2024

Georgia Institute of Technology

Research Assistant, GT Visualization Lab
Built interactive visualizations to interpret machine learning model architectures (such as VAE and VQ-VAEs) [_, _, _, _].
9.2023 – 6.2024

Oregon State University

Research Assistant, Venom Biochemistry and Molecular Biology Lab
Built a system to store, visualize, and search for similar venom protein structures using computational methods [_]. Advised by Michael Youkhateh and Dr. Nathan Mortimer.
Summer 2023

Carnegie Mellon University

Research Intern, CMU Data Interaction Group (DIG)
Researched interactive methods to improve language model prompt generation and transparency with Dr. Adam Perer. Developed interactive visualizations of neural network compression/quantization error [_].
9.2022 – 6.2023

Carnegie Mellon University

Research Assistant, CMU Data Interaction Group (DIG)
Developing human-centered ways to evaluate Machine Learning model behavior within Zeno [_] with Dr. Alex Cabrera. Enabling linked visualizations at scale with Falcon [_] with Dr. Dominik Moritz.
Summer 2022

Carnegie Mellon University

HCII Summer Undergraduate Research Program
Developed user interfaces to interactively discover poor behavior in neural networks [_]. Advised by Dr. Alex Cabrera and Dr. Adam Perer. Hosted by the Data Interaction Group (DIG).
8.2021 – 6.2022

Oregon State University

Research Assistant, Data Interaction and Visualization (DIV) Lab
Developed user interfaces to visualize large data and interpret complex machine learning models [_, _]. Mentored and advised by Dr. Minsuk Kahng.
2.2021 – 6.2021

Oregon State University

URSA Engage Research Program
Developed interactive interfaces to visualize difficult concepts in learned neural networks [_, _]. Advised by Dr. Minsuk Kahng.

Publications

Conference

C3

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Alex Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, and Adam Perer
ACM Conference on Human Factors in Computing Systems (CHI). Hamburg, Germany, 2023.
C2

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng
IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2022). Oklahoma City, OK
C1

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based Reinforcement Learning

Kin-Ho Lam, Delyar Tabatabai, Jed Irvine, Donald Bertucci, Anita Ruangrotsakun, Minsuk Kahng, and Alan Fern
32nd International Conference on Automated Planning and Scheduling (ICAPS 2022).

Workshop

W4

Venome: A Computational Analysis Tool for Protein Function

Donald Bertucci, Ansen Garvin, Cora Bailey, Amanda Sinha, Michael Youkhateh, and Nathan Mortimer
2024 Engineering Expo, Oregon State University. Corvallis, OR
W3

Mirror: Interactive Discovery of Blindspots in Machine Learning Models

Donald Bertucci, Alex Cabrera, Nari Johnson, Gregory Plumb, Erica Fu, and Adam Perer
Human-Computer Interaction Institute (HCII) Summer Research Showcase (2022). Pittsburgh, PA
W2

Backprop Explainer: Interactive Explanation of Backpropagation in Neural Network Training

Donald Bertucci and Minsuk Kahng
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).
W1

An Interactive Introduction to Autoencoders

Donald Bertucci
Workshop on Visualization for AI Explainability (VISxAI, IEEE VIS 2021).

Miscellaneous

M7

VQ-VAE Explainer: Learn the VQ-VAE Implementation with Interactive Visualization

Donald Bertucci and Polo Chau
Interact with and visualize a VQ-VAE (Vector-Quantized Variational Autoencoder) directly in the browser.
M6

VAE Explainer: Supplement Learning Variational Autoencoders with Interactive Visualization

Donald Bertucci and Alex Endert
Interact with and visualize a Variational Autoencoder directly in the browser.
M5

DS569k: Protein Sequence and Function Joint Embeddings Dataset

Donald Bertucci and Alex Endert
Protein embeddings based on function (ProteinCLIP + ESM2) for ~569k proteins from UniprotKB. And web app to query similar proteins given a sequence.
M4

Random Number Generator with Elementary Cellular Automata in Matlab

Donald Bertucci
Random numbers with Elementary Cellular Automata Rule 30 in Matlab + transform to any other distribution.
Mathematical Software with Torrey Johnson, Oregon State University. Corvallis, OR
M3

ProteinScatter: Visualizing Structurally Similar Proteins with 3Di Embeddings

Donald Bertucci
Trained a GPT-like model on 300+ thousand protein 3Di sequences (from Foldseek) and visualized embeddings in a 2D scatterplot via UMAP.
Molecular Modeling with Juan Vanegas, Oregon State University (2024). Corvallis, OR
M2

Visualizing Neural Network Compression

Donald Bertucci and Adam Perer
An interactive article exploring how model compression error affects neural network behavior.
M1

Finding the Distance Function in the Poincaré Disk using Stereographic Projection

Donald Bertucci
A paper that derives the Poicare disk distance function using stereographic projection from Minkowski Space.
Non-Euclidean Geometry with Tevian Dray, Oregon State University (2023). Corvallis, OR

Software/Packages

S8

nsdget: Quickly use Single Trial Betas and COCO Images from the Natural Scenes Dataset

Donald Bertucci
Python library to easily download and use the single trial betas (1.8mm res) and COCO images from the Natural Scenes Dataset.
S7

DeepLocalizer: A Library to Find Functional Specialization in Deep Neural Networks

Donald Bertucci
Python library to find function specific activations in artificial neural networks using fMRI-like localization.
S6

Paper Implement: Reproduce Interesting Research Papers for Educational Purposes

Donald Bertucci
Ongoing repository for CS/Bio/ML/Neuro paper reimplementations with code implemented from scratch.
S5

GPT from scratch to generate bioRxiv titles

Donald Bertucci
BPE Tokenizer and Transformer model implemented from scratch to generate or embed bioRxiv titles.
S4

TensorScript: Tensor Library accelerated by WebGPU

Donald Bertucci
Tensor operations and auto differentiation with custom WebGPU kernels.
S3

WebGPU Compute Library

Donald Bertucci
PyCuda-like library for WebGPU to easily run compute shaders with minimal lines of code.
S2

Explore ARC-AGI

Donald Bertucci
Visualize the ARC-AGI dataset with live crossfiltering for compression metrics.
S1

FalconVis: A Library to Cross-Filter Billions of Data Entries on the Web

Donald Bertucci and Dominik Moritz
A JavaScript library for visualizing big data on the web with your custom visualizations and scalable data formats.

Skills

  • Languages Python, Javascript/Typescript, C/C++, Matlab, R, Mathematica, Bash.
  • ML/AI PyTorch, Jax, TensorFlow, Keras, Scikit-learn, NumPy, CUDA.
  • Bio/Neuro ChimeraX, Mol*, BioPython, NiBabel, FSL.
  • Frontend Figma, HTML, CSS, Svelte, React, Vue, Tailwind.
  • Backend Pandas, FastAPI, Flask, NodeJS, MySQL, DuckDB, PostgreSQL, Assembly.
  • OS Linux, Git, SSH, FTP, NGINX, Apache, Docker.
  • HPC Slurm, CUDA, OpenCL, OpenMP, MPI.
  • Data Vis D3, SVG, Canvas, WebGPU, Vega, Matplotlib, Seaborn, Altair.
  • Research LaTeX, Figure Design, Statistical Analysis.
  • Relevant Coursework Machine Learning, Artificial Intelligence, Parallel Programming, Molecular Modeling, Animal Genetics, Human Behavioral Biology, Discrete Math, Linear Algebra I & II, Numerical Linear Algebra, Non Euclidean Geometry, Differential Calculus, Integral Calculus, Vector Calculus, Mathematical Statistics I & II.

References

Dr. Minsuk Kahng

Senior Research Scientist at Google Deepmind

Dr. Alex Cabrera

Founding Engineer at Axiom Bio

Dr. Adam Perer

Computer Science Professor at Carnegie Mellon University HCII

Dr. Dominik Moritz

Computer Science Professor at Carnegie Mellon University HCII and Apple ML Research Scientist

Dr. Nathan Mortimer

Biochemistry Professor at Oregon State University