Hello, I'm

Caleb Dame

Having Too Much Fun With Data

Caleb Dame

# About Me

I'm a data scientist with a track record of building and shipping predictive models that move real business metrics — from churn prediction at credit unions to fraud detection at scale at TransUnion.

My work spans the full ML lifecycle: problem framing, feature engineering, model development, A/B testing, and production deployment. I'm equally comfortable wrangling multi-terabyte data in Databricks or PySpark as I am designing the business strategy around a model's output.

Currently a Senior Data Scientist at BlastPoint and founder of Modulus Partners, a data science consultancy. Pursuing an M.S. in Artificial Intelligence at Johns Hopkins, building on a B.S. in Applied and Computational Mathematics & Economics from Brigham Young University.

# Featured Projects

HOA CC&R PDF Parser + RAG Q&A Pipeline

HOA CC&R PDF Parser + RAG Q&A Pipeline

Built an API-triggered document knowledge extraction service for a client who needed to ingest large 50 to 100 page HOA/CC&R PDFs and answer dozens of structured questions automatically. The service accepts a webhook payload, fetches the PDF from a provided URL/path, converts pages to images, runs OCR locally (Tesseract), and chunks the text into blocks where SentenceTransformer embeddings are stored in a FAISS index to enable fast and token-efficient answer retrieval. For question answering from relevent snippets, the service uses an OpenAI model for grounded answers, with quality controls to ensure answers fall in the expected JSON format.

PDF-to-Image (Poppler) OCR (Tesseract / pytesseract) SentenceTransformers Embeddings FAISS Vector Index / Similarity Search Retrieval-Augmented Generation (RAG) OpenAI API
Reducing Reward Variance in Continuous-Control DRL via Genetic Pre-Training

Reducing Reward Variance in Continuous-Control DRL via Genetic Pre-Training

Investigated whether genetic algorithm (GA) policy pre-training of agent weights could improve stability, reduce reward variance, and potentially increase reward outcomes in deep reinforcement learning (DRL) continuous control problems using CleanRL and Weights & Biases tracking. Designed two genetic algorithm variants to evolve actor networks quickly in parallel and benchmarked the effects of further RL training compared to those with randomly initialized weights.

PyTorch Reinforcement Learning PPO Soft Actor-Critic (SAC) Genetic Algorithms Weights & Biases
Real-Time Web Lead Qualification and Calendar Routing

Real-Time Web Lead Qualification and Calendar Routing

ML-driven lead scoring models in real time with dynamic rerouting that cut low-quality sales meetings by 20–40% and boosted top salesperson conversion rates 2–3x, reducing the need for large sales headcounts. Built for clients at Modulus Partners alongside A/B testing programs and CRM integrations with HubSpot, OnceHub, and Zapier.

FastAPI Scikit-Learn A/B Testing Site Tracking Data Collection Campaign Management HubSpot API Custom Zapier Webhook
Census Record-Linking Research

Census Record-Linking Research

ML research within the BYU Economics department applying Boosted Tree classifiers to census data to identify and track individuals named in multiple census years. Using graph-theoretical document relationships, the strategic choice of record training pairs for training and inference reduced record-linking error rate by 80%.

Graph Theory XGBoost Cross-Discipline MS SQL Server Research

# Experience

BlastPoint

Senior Data Scientist

BlastPoint Dec. 2025 – Present

Data Scientist

BlastPoint Oct. 2023 – Nov. 2025

Lead development of predictive models for strategic product marketing, churn prediction, and collections optimization used by credit unions and utility partners, directly impacting program adoption rates and retention strategies.

Driving to market a privacy-affirming strategy for aggregating insights to develop generic, industry-wide product models that can be fine-tuned automatically to a new client's unique customer base.

Building scalable internal tooling to standardize data science processes, streamline project throughput, monitor model performance, and expose model interpretability to end users.

Modulus Partners

Data Scientist Consultant & Founder

Modulus Partners Jun. 2023 – Present

Built ML-driven lead qualification models that improved sales efficiency by eliminating 20–40% of low-quality meetings and increased top salesperson conversion rates 2–3x.

Designed and managed ongoing A/B testing programs to optimize marketing funnel performance and application outcomes.

Created and oversaw CRM integrations (HubSpot, Close, Zapier) and real-time dashboards to enable data-driven marketing decisions.

TransUnion

Data Scientist

TransUnion Feb. 2022 – Oct. 2023

Developed fraud detection models using LightGBM and XGBoost, and maintained Python packages for measuring drift in unlabeled production data.

Implemented advanced ML architectures — Autoencoders, Stacked Models, and Graph Neural Networks — to research applications for identifying fraud networks.

Led cross-team collaborations, integrating newly acquired data assets to enrich features and improve model performance.

M Science LLC

Data Engineer

M Science LLC Jun. 2021 – Mar. 2022

Designed and maintained 15+ ETL pipelines in Databricks, Airflow, and Snowflake, processing 3.5 TB of transformed data daily for financial analysts.

Built automated anomaly detection using clustering algorithms, increasing data purity by 10–15% and enabling real-time visualization in Tableau dashboards.

Brigham Young University, Economics

Machine Learning Researcher

Brigham Young University, Economics Apr. 2020 – Jul. 2020

Researched ML applications — XGBoost, Neural Networks, Naive Bayes — to extract insights from large-scale census data.

Managed large-scale databases in MS SQL Server, ultimately reducing record-linking error by 80%.

# Skills & Tools

Languages

Python Unix Shell HTML JavaScript SQL C++ STATA

Ml Frameworks

Scikit-Learn PyTorch TensorFlow Keras XGBoost LightGBM StatsModels NLTK transformers

Data Engineering

Databricks PySpark Airflow Snowflake PostgreSQL NoSQL

Cloud

AWS (S3, EC2, Lambda, Redshift) GCP Railway Vercel

Mlops

Git Docker Conda Weights and Biases MLFlow Flask FastAPI

Visualization

Tableau Grafana Matplotlib Seaborn Looker

# Education

Johns Hopkins University

M.S. in Artificial Intelligence

Johns Hopkins University 2022 – Present

Pursuing advanced coursework in deep learning, AI/MLOps, and reinforcement learning with a focus on transformer architectures.

Transformers NLP Reinforcement Learning Computer Vision
Brigham Young University

B.S. in Applied & Computational Mathematics, Economics

Brigham Young University 2017 – 2021

Studied applied mathematics and economics with coursework in statistical modeling, optimization, and econometrics. Undergraduate research in machine learning applications for census record-linking.

Applied Mathematics Econometrics Statistical Modeling Optimization