RL | Mayank Kumar Pal

I assume the reader is familiar with the basics of Reinforcement learning and has a basic understanding of statistics and a bit of calculus. One should be comfortable with manipulating value functions, policy, and bellman equations. The main idea of writing this blog post is to summarize and extend the understanding of reinforcement learning methods that directly optimizes policy. More or less, this blog post is a summary for me to revisit the concepts and various tricks that are helpful while dealing with Policy-based optimization....