Shashank Shekhar
I like to do data analysis
Projects
pymc3
I contribute to the stochastic gradient algorithms module for scalable mcmc. PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms.
Noisy matrix completion
After trying to solve missing data problems using some form of a tensor or matrix factorization technique, I figured it was worthwhile implementing my own library of different implementations. These algorithms are commonly used for building recommendation engines.
(incubating) apache superset
I have contributed to visualizations. I use it for data exploration and data analysis for my projects. It is an open-source enterprise-ready business intelligence web application.
Physician compare
Using a public dataset maintained by Center of Medicare Services (CMS) on every physician that accepts medicare. The data file contains key attributes about the physician and how they are employed in groups. I did a network analysis of the data to find out how physcians work together.
Classifying wikipedia pages as `disease` or `not disease`
Implemented multi-threaded training of a naive bayes classifier using akka streams. It trains on 10k negative and 3.4k positive html files in 30 seconds on a macbook.
Bayesian linear regression
Notes and code from my attempt to build 3 regression models with heavy tailed noise from a common RV, implemented using pymc3.
RedditFlow
Made this with akka, it listens to any reddit comment feed via akka HTTP and processes it using akka streams. The output is persisted into an apache kafka queue as the sink.
Dining philosophers and other multi-agent problems
An implementation of the dining philosophers problem and other multi-agent problems such as shared ledger systems using akka actors, cluster with data persistence on apache cassandra
Interests
- I like reading books on sociology, economics, and other related topics. I try to stay uptodate with Weekly essays by Zat Rana, andrewgelman, cafehayek, marginalrevolution and philosophal economics
- I wrote a piece on Why I read books on history
- I enjoy storytelling, and I decided to post a few stories on medium at some point.
Experience
Hypertrack Inc
Lead Data Scientist • August, 2018 — February, 2019
I worked on data pipelines and algorithms to make accurate and descriptive stories out of workforce movement. I wrote about Reducing on-demand delivery time with activity data. Some of my projects,
Amino Inc
Data Scientist • April, 2016 — August, 2018
I worked on algorithms and data pipelines for the cost transparency tool. I lead development of multiple machine learning models in a 3-4 data science team. I was the technical lead on a project partnership between Amino and Aon. I collaborated with senior actuaries and consultants who have had extensive experience in the health benefits space. I have written about my work on Demystifying healthcare costs with canonical episodes of care and How deep learning can help us understand physician specialties from billions of insurance claims. Some of my projects,
Lumiata
Lead Data Engineer • July, 2014 — March, 2016
I lead multiple projects in a team of 5-7 data engineers. I worked as a lead engineer on the design and implementation of the claims data pipeline. It ran graph algorithms on billions of claim files ingested from ftp, fhir server or REST API. We built this on a SMACK stack. Some of my projects,
Social Links
- Github: https://github.com/shkr
- LinkedIn: https://www.linkedin.com/in/shashank-shekhar-01506949/