Paul Anzel’s Portfolio

Introduction

Hello there, my name is Paul Anzel! I’m a Data Engineer/Data Scientist with experience across the entire data pipeline. I was most recently employed with Codecov (acquired by Sentry), where I worked on building ETL pipelines, handling ad-hoc data analyses, and integrating our data stack with Sentry’s stack. Prior to that I worked at H-E-B (data workflow productionalization and data quality management), at Metromile (managing our ETL system and using telematics data to detect vehicle crashes and evaluate potential insurance fraud), and at Wiser (price-demand estimation).

I did doctoral research (left ABD) in Applied Physics at Caltech with Chiara Daraio where I worked on developing a new type of acoustic imaging system for non-destructive evaluation. During my graduate research I received a NASA Space Technology Research Fellowship and a M.S. in Applied Physics. I have a B.S. and B.A. in Chemical Physics and Mathematics (respectively) from Rice University.

Outside of work, I’ve been involved in different political efforts. I was a volunteer with Tech for Campaigns where I was recognized as a “Super Volunteer” for improving our processes and documentation around email fundraising. Before that, I worked with East Bay For Everyone to advocate for building more housing in the SF Bay Area. I am an instructor with Software Carpentry, have helped organize the mentorship program for the SciPy conference, and am on the NumFocus Affiliate Project Selection Committee. I managed Caltech’s bicycle repair cooperative for three years, play accordion and piano, and have started getting into ham radio.

I live with my lovely wife Rose, young son Isaac, and fussy cats Coltrane, Simone, and Dinah.

Public talks

Dead simple CI with pre-commit (local meetup, 2024)

A lightning talk introducing the pre-commit framework. I emphasize the security potential of using this tool. Database passwords in plaintext are real and can hurt you.

Introdution to dbt (local DS Meetup, 2023)

An introduction to dbt (data build tool). Presentation has a live demo, but the middle set of the slides cover everything that would be in the demo. I used Rob Conery’s A Curious Moon as my inspiration for generating a data set. The resulting dbt project can be seen here.

How do you test data workflows? (SciPy 2022)

I’ve long wondered how I could go about testing data code, but how do you even unit-test a SQL query? After a lot of thought and experimentation, I think I have an approach I’m happy with. Most advice I see focuses on the standard Python testing tooling, but I see a three-pronged approach–testing, static analysis, and data quality management–as the solution to this problem.

2022 SciPy video here.

2023 SATX Data Science Meetup slides here.

Introdution to Bayesian A/B testing (local DS Meetup, 2022)

With my involvement in Tech for Campaigns, I was always stymied that the email lists I worked with never had enough of a population for good statistical power for A/B testing. I wondered if Bayesian methods could play more of a role, and after looking into them I’m never going back to Frequentist statistics. I cover a lot of the approach outlined by the VWO paper and go a bit into picking sample sizes for A/B tests based on a multi-armed-bandit approach.

PandasUDFs - One Weird Trick to Scaled Ensembles (Data + AI Summit 2021)

Presentation on the use of Spark PandasUDFs to speed up workflows where you might use regular Python UDFs or are building ensembles of models. They really were my One Weird Trick for productionalization at H-E-B.

Git-ting along with others (PyDataLA 2019)

Tutorial on using Git for collaboration. Topics included creating issues, branching, and doing code review. I’m very proud of the pun. Slides can be found here.

Hi, I’m Your Technical Interviewer: Advice for Breaking Into Industry (Scipy 2019)

Quick talk for folks looking at making the transition from academia to industry of some of the advice I wish I had had.

The Science(?) of Documentation (WriteTheDocs 2017)

Inspired by Greg Wilson’s commentary that the state of software-development research is thin and poorly distributed on the ground, I did some investigation on my own looking into what research there is regarding software documentation. Long-story-short, there’s precious little. I attended the WTD conference to see if I could find any more, and ended up giving a lightning talk about how little we know and how to assess relevant scientific evidence.

Some projects

Statistics Study Group Notes

During my time at Wiser, I started a journal club to go over some techniques I wanted to learn more about, and to teach some of the other analysts some basic techniques (e.g. what a Fourier transform is). Upon moving to Metromile, I was pleased to see that they had an active study group there. Here is a collection of iPython notebooks where I go through implementing various algorithms myself and demonstrate their use.

Some ones I’m particularly proud of:

Beer analysis

I had some friends in grad school with whom I’d grab a Friday afternoon beer, and eventually I started rating the different beers we’d try. Many years and beers later, I’ve learned some interesting things about the beers I like.

A Moveable Feast Kinetic Sculpture

An eight-person pedal-powered dining table. Project led by Daniel Busby, I provided many bicycle parts, ergonomic advice, and plenty of drilling and grinding.

PyBadge conference badge

Some code for an Adafruit PyBadge to work as a fun name badge for conferences. Has some blinkenlights for good measure.

Pantsuit Politics bingo card

Little bingo card generator based off of this tweet for one of my favorite podcasts. Biggest challenge was finding a good way to get interactive widgets to work online. I initially tried to get this operational with Pyodide (link) but came to the conclusion that Pyodide wasn’t ready for me to use. Current version hosted in Binder.

Domestic Violence Program mapping

Mapping and basic analysis of domestic violence programs in the Los Angeles area completed on request of a colleague.