You can reach out to me at mynkpl1998@gmail.com.
[~]$
whoamiYou can reach out to me at mynkpl1998@gmail.com.
[~]$
whoamiWhat is tokenization ? Machine learning models operate on numerical representations of input data. Language models, such as LLMs, work with text input. Tokenization is the process of converting the text into smaller chunks called tokens, which are then assigned numerical representations. The collection of unique tokens forms what we call a vocabulary. It is essentially a dictionary that maps each token to a unique integer. The size of the vocabulary is determined by the number of unique tokens in the corpus....
Bitwise Operators The following are the bitwise operators which are available in C/C++ programming language. Operator Meaning & Bitwise AND | Bitwise OR ^ Bitwise XOR ~ Bitwise complement « Shift left » Shift right XOR The main property of XOR is that it keep the bit same if both operands are same, else it flips the bit. A B Result 0 0 0 0 1 1 1 0 1 1 1 0 Few Properties of XOR Property Result A ^ 0 A A ^ 1 ~A A ^ A 0 A ^ A ^ A A A ^ B ^ A B A ^ B ^ B A Few properties of Shift operators....
I assume the reader is familiar with the basics of Reinforcement learning and has a basic understanding of statistics and a bit of calculus. One should be comfortable with manipulating value functions, policy, and bellman equations. The main idea of writing this blog post is to summarize and extend the understanding of reinforcement learning methods that directly optimizes policy. More or less, this blog post is a summary for me to revisit the concepts and various tricks that are helpful while dealing with Policy-based optimization....