# Google Summer of Code'20 Highlights with NumFOCUS¶

This post is meant to summarize the work done over the GSoC coding period. Let's get started real quick.

## About the project¶

My GSoC proposal was about adding a Variational Inference interface to PyMC4. Apart from MCMC algorithms, VI proposes an approximating distribution to fit the posterior. The whole plan was to implement two Variational Inference algorithms - Mean Field ADVI and Full Rank ADVI.

## Resolving Key challenges¶

Key Challenges | Solutions proposed | How its resolved |
---|---|---|

`theano.clone` equivalent for TF2 |
Model execution with replaced inputs | Normal distribution's sample method is executed over flattened view of parameters |

Flattened view of parameters | Use `tf.reshape()` |
Used `tf.concat()` with `tf.reshape()` |

Optimizers for ELBO | Use tf.keras.optimizers | Optimizers either from TFv1 or TFv2 with defaults from pymc3.updates can be used |

Initialization of MeanField and Full Rank ADVI | Manually set bijectors | Relied on `tfp.TransformedVariable` |

Progress bar | Use `tqdm` or `tf.keras.utils.Progbar` |
A small hack over `tf.print` |

Minibatch processing of data | Capture slice in memory | This is the only incomplete feature. Maybe `tf.Dataset` API has to explored more or implement our own `tfp.vi.fit_surrogate_posterior` function. |

## Community Bounding Period¶

- This was a super interesting period. I got to know about many PyMC core developers through slack.
- I spent the entire time learning about the basics of Bayesian statistics, prior, posterior predictive checks, and the theory of Variational Inference.
- I had also written a blog post during this interval about the nuts and bolts of VI and the implementation of Mean Field ADVI as well in Tensorflow Probability. Here is the blog post - Demystify Variational Inference.
- The most difficult part of learning VI was to understand the transformations because PyMC3 and TFP handle transformations differently.

## Month 1¶

The coding period started from June 1 and my intention for this period was to add a very basic and general Variational Inference interface to PyMC4. Here is the PR #280 and workflow of the basic interface was -

- Get the vectorized
`log prob`

of the model. - For each parameter of the model, have a Normal Distribution with the same shape and then build a posterior using
`tfd.JointDistributionSequential`

. - Add optimizers with defaults from PyMC3 and perform VI using
`tfp.fit_surrogate_posterior`

. - Sample from
`tfd.JointDistributionSequential`

and there is no need of equivalent of`theano.clone`

. - Transform the samples by quering the
`SamplingState`

but`Deterministics`

have to be added as well. - Resolve shape issues with ArviZ. In short, making
`chains=1`

.

I got the basic interface merged by late June and now, it was time to work upon Full Rank ADVI. I managed to open a PR #289 with Full Rank ADVI interface by the end of June.

## Month 2¶

This was the most dramatic month of GSoC coding period. Because Full Rank ADVI proposed in PR #289 resulted in errors most of the time. Here is the gist of workflow that was followed to get some useful insights about the errors -

- Instead of solving the shape issues independently and posing a
`MvNormal`

distribution for each parameter, build the posterior using flattened view of parameters. - There were lots of NaNs in the ELBO, because of improper handling of transformations. As a result,
`Interval`

,`LowerBounded`

and`UpperBounded`

transformations were added as well. - Then came the issue of
`Cholesky Decomposition errors`

while working with Gaussian Processes and Variational Inference. Here are my few insights after rigorous testing with different inputs -- Use dtype
`tf.float64`

with FullRank ADVI to maintain positive definiteness of covariance matrix. - Avoid aggressive optimization of ELBO. Maintain learning rates around
`1e-3`

. - Stabilize the diagonal of covariance matrix by adding a small jitter.
- Double check for NaNs in the data.

- Use dtype
- Here the results after trying reparametrization and different jitter amounts while doing VI.

I got this PR merged by the end of July. And now, it was time to work on adding some features to ADVI.

## Month 3¶

After adding missing transformations in PR #289, my mentor asked me to write a proposal so as the Bounded Distributions are inherited instead of we applying transformations manually to each distribution. I explored each possibility to make a generalized version of transformations as it is done in PyMC3 using `tf.cond`

. Since, we do not have values before model execution, it was difficult to use `tf.cond`

. Here is the proposal's source.

After getting an interface to use MeanField and FullRank ADVI, some features that are included in the PR #310 -

- Add a progress bar. (This is small hack over
`tf.print`

) - Test progress bar in different OS.
- Add
`ParameterConvergence`

criteria to test convergence. - Add LowRank Approximation.

I am still working on adding examples on hierarchical models and I hope to get it merged soon.

## Contributions¶

The Pull Requests I have opened and got merged during GSoC. I have explained each one above but here I try to summarize.

- Add Variational Inference Interface: #280
- Add Full Rank Approximation: #289
- Add features to ADVI: #310 (WIP)
- Remove transformations for Discrete distributions: #314

## Gists created¶

Whatever experiments I perform to aid my learnings, I polish them out and share through GitHub gists. I do not why but I started loving to share code through GitHub gists rather than Colab or GitHub repo. Here are all the experiments I performed with ADVI during this summer.

- Comparison of MeanField ADVI in TFP, PyMC3, PyMC4: Source
- Demonstration of shape issues while working with InferenceData: Source
- Playing around Convergence and Optimizers: Source
- Tracking all parameters including deterministics: Source
- Implementation of FullRank ADVI in TFP: Source
- Comparison of MeanField and FullRank ADVI over correlated Gaussians: Source
- Model flattening and Full Rank ADVI in PyMC4: Source
- Missing transformations in PyMC4: Source
- Testing transformations in PyMC4: Source
- Distribution Enhancement Proposal: Source
- Hacking
`tf.print`

for progress bar: Source - Parameter Convergence Checks in TFP: Source

## Future Goals¶

Some future tasks I would like to work upon -

- Configure Mini Batch processing of data.
- Add Normalizing Flows to variational inference interface.
- Add support of Variational AutoEncoders to PyMC4.

## Conclusion¶

It was an incredible experience contributing to open source. I have improved my Python skills. I want to thank my mentors @ferrine and @twiecki for being extremely supportive throughout this entire journey. I am loving my time with the PyMC community. Next, I also want to thank @numfocus community for sharing this opportunity via Google Summer of Code.

Thank you for being a part of this fantastic summer.

With , Sayam Kumar