Week 11 - OOP concepts & LLMs

Object oriented programming

Classes and objects

Encapsulation

Code style: clean code

LLMs

Tokenization

Inference

Tools

RAGs

Using LLMs in Code

Practice

Assignment

Back to core program

Inference: How a Response is Generated

When you send a prompt, the model doesn't write the full response in one go. It generates one token at a time, each time calculating the most probable next token based on everything before it - your prompt plus all the tokens it has already generated. This loop repeats until the response is complete.

Source: "Large Language Models explained briefly" by 3Blue1Brown

Source: "Large Language Models explained briefly" by 3Blue1Brown

<aside> 💭

This has an important implication: the model has no ability to go back and revise. Each token is committed before the next one is chosen. What looks like a coherent, considered response is actually the result of thousands of small, sequential predictions.

</aside>

Next token probability

Looking at the animated image above, you may have noticed that the LLM doesn't provide just one option for the next token but it offers multiple options, each with a probability percentage.

Why doesn't the model always choose the most probable token?

If the model always picked the most probable token, every response would be identical and predictable.Instead, the model sometimes picks a less likely token which is what produces varied, creative, and natural-sounding responses. It’s a clever mathematical trick to introduce a bit of randomness to our world.

How much randomness is introduced is controlled by three parameters:

The context window

The context window is the total amount of text the model can "see" at any given moment. Your prompt, the conversation history, and the response being generated all share this space. Once the limit is reached, older content is dropped—not gradually forgotten, but removed entirely, as if it never existed. This is measured in tokens, not words or characters. Modern models have huge context windows - from hundreds of thousands to millions of tokens.

Watch: What is a context window

https://www.youtube.com/watch?v=-QVoIxEpFkM

Additional Resources

Video


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.