Shashank Shekhar


I contribute to the stochastic gradient algorithms module for scalable mcmc. PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms.

After trying to solve missing data problems using some form of a tensor or matrix factorization technique, I figured it was worthwhile implementing my own library of different implementations. These algorithms are commonly used for building recommendation engines.

I have contributed to visualizations. I use it for data exploration and data analysis for my projects. It is an open-source enterprise-ready business intelligence web application.

Using a public dataset maintained by Center of Medicare Services (CMS) on every physician that accepts medicare. The data file contains key attributes about the physician and how they are employed in groups. I did a network analysis of the data to find out how physcians work together.

Implemented multi-threaded training of a naive bayes classifier using akka streams. It trains on 10k negative and 3.4k positive html files in 30 seconds on a macbook.

Notes and code from my attempt to build 3 regression models with heavy tailed noise from a common RV, implemented using pymc3.

Made this with akka, it listens to any reddit comment feed via akka HTTP and processes it using akka streams. The output is persisted into an apache kafka queue as the sink.

An implementation of the dining philosophers problem and other multi-agent problems such as shared ledger systems using akka actors, cluster with data persistence on apache cassandra



Hypertrack Inc

Lead Data Scientist • August, 2018 — February, 2019

I worked on data pipelines and algorithms to make accurate and descriptive stories out of workforce movement. I wrote about Reducing on-demand delivery time with activity data. Some of my projects,

Amino Inc

Data Scientist • April, 2016 — August, 2018

I worked on algorithms and data pipelines for the cost transparency tool. I lead development of multiple machine learning models in a 3-4 data science team. I was the technical lead on a project partnership between Amino and Aon. I collaborated with senior actuaries and consultants who have had extensive experience in the health benefits space. I have written about my work on Demystifying healthcare costs with canonical episodes of care and How deep learning can help us understand physician specialties from billions of insurance claims. Some of my projects,


Lead Data Engineer • July, 2014 — March, 2016

I lead multiple projects in a team of 5-7 data engineers. I worked as a lead engineer on the design and implementation of the claims data pipeline. It ran graph algorithms on billions of claim files ingested from ftp, fhir server or REST API. We built this on a SMACK stack. Some of my projects,