Free course · 3 modules

NLP & Large Language Models

📚 3 lessons
🎯 Advanced

Lesson 2 of 3 · NLP Fundamentals

Beginner — concepts + examples ⏱ 134 min · watch in chunks, progress auto-saves

The spelled-out intro to language modeling: building makemore

Watch this lesson to understand The spelled-out intro to language modeling: building makemore — takes about 134 minutes.

What You Will Learn

How to build a character-level language model from scratch
How to train a model to generate new names based on a given dataset
How to use PyTorch to create and manipulate tensors for language modeling

Key Concepts

Character-level language modeling is a technique where a model is trained to predict the next character in a sequence, given the context of the previous characters. In this lesson, we’re using a dataset of names to train a model to generate new, unique names. The model is built using a bi-gram language model, which looks at pairs of characters to predict the next character. We’re also using PyTorch to create and manipulate tensors, which are multi-dimensional arrays used to represent the model’s parameters and data.

Code Examples

for character_one, character_two in zip(w, w[1:]):
    print(character_one, character_two)

This code snippet is used to iterate over each word in the dataset and print out the consecutive pairs of characters.

n = torch.zeros(28, 28, dtype=torch.int32)

This code creates a 28x28 tensor filled with zeros, which will be used to store the counts of each bi-gram in the dataset.

s2i = {c: i for i, c in enumerate(chars)}

This code creates a dictionary that maps each character to its corresponding integer index.

Lesson Summary

In this lesson, we started building a character-level language model from scratch using a dataset of names. We began by loading the dataset and splitting it into individual words, and then we created a bi-gram language model to predict the next character in a sequence. We used PyTorch to create and manipulate tensors, which will be used to store the model’s parameters and data. We also created a dictionary to map each character to its corresponding integer index, which will be used to index into the tensor. By the end of this lesson, we had a tensor that stores the counts of each bi-gram in the dataset, which can be used to generate new names.

Practice Exercise

Using the code snippets from this lesson, try to generate a new name by sampling from the probability distribution of the first character of a word. You can do this by normalizing the counts of the first row of the tensor and then using the torch.multinomial function to sample from the distribution.

What Is Next

In the next lesson, we’ll be exploring more advanced techniques for language modeling, including the use of recurrent neural networks and transformers. We’ll also be learning how to fine-tune our model to generate more realistic and diverse names.

Go deeper on NLP & Large Language Models

Hand-picked books and tools our learners actually use. Affiliate-supported — same price for you, supports free lessons.

📚 Books, films & collectibles about The spelled-out intro to language modeling: building makemore

Amazon New editions, films & cookbooks

Programming books Browse Amazon → Python for beginners Browse Amazon → Web development Browse Amazon → AI / Machine learning Browse Amazon →

eBay Vintage posters, rare books & memorabilia

Used programming books Browse eBay → Video course bundles Browse eBay →

As an Amazon Associate and eBay Partner we earn from qualifying purchases.

The spelled-out intro to language modeling: building makemore

What You Will Learn

Key Concepts

Code Examples

Lesson Summary

Practice Exercise

What Is Next

Free for readers

Get the Best of the Week — One Email, Every Sunday

Go deeper on NLP & Large Language Models

Choose Language

The spelled-out intro to language modeling: building makemore

What You Will Learn

Key Concepts

Code Examples

Lesson Summary

Practice Exercise

What Is Next

Free for readers

Get the Best of the Week — One Email, Every Sunday

Go deeper on NLP & Large Language Models

🍪 We Value Your Privacy

Cookie Preferences