Projects

I contribute to the stochastic gradient algorithms module for scalable mcmc. PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms.

After trying to solve missing data problems using some form of a tensor or matrix factorization technique, I figured it was worthwhile implementing my own library of different implementations. These algorithms are commonly used for building recommendation engines.

I have contributed to visualizations. I use it for data exploration and data analysis for my projects. It is an open-source enterprise-ready business intelligence web application.

Using a public dataset maintained by Center of Medicare Services (CMS) on every physician that accepts medicare. The data file contains key attributes about the physician and how they are employed in groups. I did a network analysis of the data to find out how physcians work together.

Implemented multi-threaded training of a naive bayes classifier using akka streams. It trains on 10k negative and 3.4k positive html files in 30 seconds on a macbook.

Notes and code from my attempt to build 3 regression models with heavy tailed noise from a common RV, implemented using pymc3.

Made this with akka, it listens to any reddit comment feed via akka HTTP and processes it using akka streams. The output is persisted into an apache kafka queue as the sink.

An implementation of the dining philosophers problem and other multi-agent problems such as shared ledger systems using akka actors, cluster with data persistence on apache cassandra

Interests

Experience

Amino Inc

Data Scientist • April, 2016 — August, 2018

I work on algorithms and data pipelines for the cost transparency tool. I was the technical lead on a project partnership between Amino and Aon. I collaborated with senior actuaries and consultants who have had extensive experience in the health benefits space. We wrote about it here. I have written about my work on Demystifying healthcare costs with canonical episodes of care and How deep learning can help us understand physician specialties from billions of insurance claims.

Lumiata

Data Scientist • July, 2014 — March, 2016

I worked as a lead engineer on the design and implementation of the claims data pipeline. It ran graph algorithms on billions of claim files ingested from ftp, fhir server or REST API. We built this on a SMACK stack.