De-Mystifying the Magical Chat-GPT

Hii there you guys let’s start today with the Generative AI (Gen AI) and also let’s decode the GPT’s.
What is Gen AI?
Gen AI is a field of AI that deals in generate some data based on the given input prompt and some pre-trained dataset. It could be a Text-to-Text or Text-to-Image or Image-to-Text.
For eg: OpenAI’s ChatGPT.
GPT
The GPT
The GPT stands for the Generative Pre-trained Transformer, i.e.,
Generative:- Generative means the nature of the transformer is a generative type, it means it generates the output from the given input.
Pre-trained:- This generation is done with the help of a pertained data, so that the relative things can be generated as the output.
Transformer:- The Transformer is the part or the function that is responsible for the generation of the output based on the given input. The Transformer generally transforms only the next token(part of output, eg, one letter only.).
How GPT Model works?
Token
Token in the AI world is used to communicate to the model, as we know the the computer system do not understand the human language directly, so first we convert our input into tokens and then the output that is produced by the Transformer or AI Model is also in the form of Tokens.
For each AI Model (like ChatGPT, Gemini, etc.) the conversion from input (prompt) to the Tokens and teh Tokens to the Output can be different.
The process is called Tokenisation & De-tokenisation.
How Transformer works?
As we know that the transformer only generates(or predicts) the next token, how does it do it?
What happens is that all the data in the data set is mapped on an vector graph/database in the form of vector embedding. Then the relationship between the data items is established, and in this part the transformer can predict the next token.
So we first map and create the input embeddings based on the Input prompt given to us (after tokenisation).
Then we will place these vector embeddings (input embeddings) on the graph.
But before placing the vectors over the graph, we perform 2 more operations:
Positional Encodings:- Done to prevent the positional clashes of similar tokens.
Self Attention:- In this step we let the vectors embeddings talk to each other so there there meaning can be understood. Eg:-
A River Bank.
A HDFC Bank.
Both of these are “Bank“ have different meaning, it will only come out once the embeddings can identify each other.
Then we place these encodings on the vector graph along with the pre trained data.
Then the transformer predicts the next token based on the SoftMax value of the next possible tokens.
- The SoftMax values gives the probability of all the next possible tokens, then the next token is choosen as an outcome based in this value.
Some Models choose only the high probability once, some choose otherwise.
- The SoftMax values gives the probability of all the next possible tokens, then the next token is choosen as an outcome based in this value.
At last, while iterating over this process, when the transformer gives us the token that tells the end of string, something like
<EOS>. Then the transformer’s work is being completed.Then these newly generate tokens are De-tokenised with the help of De-tokenisation.
Summery
This article explores the intriguing world of Generative AI, focusing on the Generative Pre-trained Transformer (GPT) model. It explains how GPT models, like ChatGPT, function by generating data based on input prompts and pre-trained datasets. The article breaks down the components of GPT—Generative, Pre-trained, and Transformer—highlighting their roles in producing human-like text. It also delves into the concept of tokens and the process of tokenization and de-tokenization, which are crucial for communication between humans and AI models. Additionally, the article provides insights into how transformers work, including the use of vector embeddings, positional encodings, and self-attention mechanisms to predict the next token in a sequence.



