ML Spring
Posts
Attention is all you need: Self Attention Clearly explained!

Attention is all you need: Self Attention Clearly explained!

An illustrated guide! ✍️ Author: @akshay_pachaar

Akshay Pachaar
November 24, 2023

In 2017, a groundbreaking paper titled "Attention is All You Need" introduced the transformer architecture, which led to the Large Language Model (LLMs) revolution that we witness today.

At the heart of this architecture lies the attention mechanism.

In this post, I'll clearly explain self-attention & how it can be thought of as a directed graph.

Before we start a quick primer on tokenization!

Raw text → Tokenization → Embedding → Model

Embedding is a meaningful representation of each token (roughly a word) using a bunch of numbers.

This embedding is what we provide as an input to our language models.

Check this👇

Illustration of tokenization

Subscribe to keep reading

This content is free, but you must be subscribed to ML Spring to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.