Caleb Dame | Data Science & AI

# About Me

I'm a data scientist with a track record of building and shipping predictive models that move real business metrics, from churn prediction at credit unions to fraud detection at scale at TransUnion.

My work spans the full ML lifecycle: problem framing, feature engineering, model development, A/B testing, and production deployment. I'm equally comfortable wrangling multi-terabyte data in Databricks or PySpark as I am designing the business strategy around a model's output.

Currently a Senior Data Scientist at BlastPoint and founder of Modulus Partners, a data science consultancy. Pursuing an M.S. in Artificial Intelligence at Johns Hopkins, building on a B.S. in Applied and Computational Mathematics & Economics from Brigham Young University.

# Featured Projects

HOA CC&R PDF Parser + RAG Q&A Pipeline

Built an API-triggered document knowledge extraction service for a client who needed to ingest large 50 to 100 page HOA/CC&R PDFs and answer dozens of structured questions automatically. The service accepts a webhook payload, fetches the PDF from a provided URL/path, converts pages to images, runs OCR locally (Tesseract), and chunks the text into blocks where SentenceTransformer embeddings are stored in a FAISS index to enable fast and token-efficient answer retrieval. For question answering from relevent snippets, the service uses an OpenAI model for grounded answers, with quality controls to ensure answers fall in the expected JSON format.

PDF-to-Image (Poppler) OCR (Tesseract / pytesseract) SentenceTransformers Embeddings FAISS Vector Index / Similarity Search Retrieval-Augmented Generation (RAG) OpenAI API

Music-Theoretic Attention Biases for Symbolic Music Generation

Injected music-theory inductive biases into transformer attention for symbolic MIDI generation, encoding circle-of-fifths harmonic distance and temporal onset distance as learned bias embeddings added directly to attention logits. Trained a decoder-only transformer on GigaMIDI (~382K files, ~565M notes) using octuple tokenization across four experimental conditions (no bias, harmonic-only, temporal-only, combined) with data-fraction ablations to measure sample efficiency gains.

PyTorch Transformers Music Information Retrieval Attention Mechanism Design Symbolic Music (MIDI) Weights & Biases

Reducing Reward Variance in Continuous-Control DRL via Genetic Pre-Training

Investigated whether genetic algorithm (GA) policy pre-training of agent weights could improve stability, reduce reward variance, and potentially increase reward outcomes in deep reinforcement learning (DRL) continuous control problems using CleanRL and Weights & Biases tracking. Designed two genetic algorithm variants to evolve actor networks quickly in parallel and benchmarked the effects of further RL training compared to those with randomly initialized weights.

PyTorch Reinforcement Learning PPO Soft Actor-Critic (SAC) Genetic Algorithms Weights & Biases

See All Projects →

# Experience

Senior Data Scientist

BlastPoint Dec. 2025 – Present

Data Scientist

BlastPoint Oct. 2023 – Nov. 2025

Lead development of predictive models for strategic product marketing, churn prediction, and collections optimization used by credit unions and utility partners, directly impacting program adoption rates and retention strategies.

Drive to market a privacy-affirming strategy of serving ensemble federated prediction models, aggregating insights to develop generic, industry-wide product models while protecting client data privacy.

Building scalable internal tooling to standardize data science processes, streamline project throughput, monitor model performance, and expose model interpretability to end users.

Data Scientist Consultant & Founder

Modulus Partners Jun. 2023 – Present

Built ML-driven lead qualification models that improved sales efficiency by eliminating 20–40% of low-quality meetings and increased top salesperson conversion rates 2–3x.

Designed and managed ongoing A/B testing programs to optimize marketing funnel performance and application outcomes.

Created and oversaw CRM integrations (HubSpot, Close, Zapier) and real-time dashboards to enable data-driven marketing decisions.

Data Scientist

TransUnion Feb. 2022 – Oct. 2023

Developed fraud detection models using LightGBM and XGBoost, and maintained Python packages for measuring drift in unlabeled production data.

Implemented advanced ML architectures (Autoencoders, Stacked Models, and Graph Neural Networks) to research applications for identifying fraud networks.

Led cross-team collaborations, integrating newly acquired data assets to enrich features and improve model performance.

Data Engineer

M Science LLC Jun. 2021 – Mar. 2022

Designed and maintained 15+ ETL pipelines in Databricks, Airflow, and Snowflake, processing 3.5 TB of transformed data daily for financial analysts.

Built automated anomaly detection using clustering algorithms, increasing data purity by 10–15% and enabling real-time visualization in Tableau dashboards.

Machine Learning Researcher

Brigham Young University, Economics Apr. 2020 – Jul. 2020

Researched ML applications (XGBoost, Neural Networks, Naive Bayes) to extract insights from large-scale census data.

Managed large-scale databases in MS SQL Server, ultimately reducing record-linking error by 80%.

# Skills & Tools

Languages

Python Unix Shell HTML JavaScript SQL C++ STATA

Ml Frameworks

Scikit-Learn PyTorch TensorFlow Keras XGBoost LightGBM StatsModels NLTK transformers

Data Engineering

Databricks PySpark Airflow Snowflake PostgreSQL NoSQL

Cloud

AWS (S3, EC2, Lambda, Redshift) GCP Railway Vercel

Mlops

Git Docker Conda Weights and Biases MLFlow Airflow AWS SageMaker Model Registry Flask FastAPI

Visualization

Tableau Grafana Matplotlib Seaborn Looker

# Education

M.S. in Artificial Intelligence

Johns Hopkins University 2022 – Present

Pursuing advanced coursework in deep learning, AI/MLOps, and reinforcement learning with a focus on transformer architectures.

Transformers NLP Reinforcement Learning Computer Vision

B.S. in Applied & Computational Mathematics, Economics

Brigham Young University 2017 – 2021

Studied applied mathematics and economics with coursework in statistical modeling, optimization, and econometrics. Undergraduate research in machine learning applications for census record-linking.

Applied Mathematics Econometrics Statistical Modeling Optimization

Explore the Full Project Portfolio

Detailed write-ups, demos, and source code for every project.

See All Projects →