The Adventure Zone Script Generation

This is a character-level Recurrent Neural Network approach to generating new scripts based on the transcript of the entire Balance arc of The Adventure Zone, a well-known tabletop roleplaying and D&D podcast.

To begin, the entire script from The Adventure Zone Season 1: Balance is loaded and then transformed into lowercase characters for training.

In [1]:
# load in data
import helper
data_dir = './taz.txt'
text = helper.load_data(data_dir)
text = text.lower()

Explore the Data

You can see that the entire script is now all lowercase text and each new line of dialogue is separated by a newline character \n.

In [2]:
import numpy as np

view_line_range = (63, 85)

print('Number of unique words: {}'.format(len({word: None for word in text.split()})))

lines = text.split('\n')
print('Number of lines: {}'.format(len(lines)))
word_count_line = [len(line.split()) for line in lines]
print('Average number of words in each line: {}'.format(np.average(word_count_line)))

print()
print('The lines {} to {}:'.format(*view_line_range))
print('\n'.join(text.split('\n')[view_line_range[0]:view_line_range[1]]))
Number of unique words: 48960
Number of lines: 91028
Average number of words in each line: 9.18031814386782

The lines 63 to 85:
clint: i’ve never played in my entire life.
griffin: you’re the biggest goddamn nerd i’ve ever met!
clint: i know, and how did this happen?
griffin: i don’t know, you should—
travis: was it something you actively avoided? ‘cause that’s what i did for
a while, where i was—
griffin: yeah.
travis: i just wasn’t willing to take that step.
clint: i just never had anybody who was interested in it besides me.
griffin: but you, like, came up in the gygax era! you—
clint: oh yeah.
griffin: you were in the—
clint: i’ve actually met gygax.
griffin: did you really?
clint: yeah, at a san diego con.
justin: i bet that’s a, that’s a fucking awkward conversation. “so what’s
your thing again? dungeons & dragons?”
travis: [ laughs ] “that sounds really good, can’t wait to play it, gary.”
griffin: “i know of those ideas as separate entities, but—”
justin: “keep plugging at it, gar.”
travis: [ laughing ] “you’ll get ‘em.”
justin & griffin: [at the same time] “you’ll get there.”

Implement Pre-processing Functions

Lookup Table

To create a word embedding, the words are first transformed to encodings, or ids.

  • Dictionary to go from the words to an id, we'll call vocab_to_int
  • Dictionary to go from the id to word, we'll call int_to_vocab

These dictionaries are returned in the following tuple (vocab_to_int, int_to_vocab)

In [3]:
from collections import Counter

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    word_counts = Counter(text)
    sorted_vocab = sorted(word_counts, key=word_counts.get, reverse=True)
    
    int_to_vocab = {ii: word for ii, word in enumerate(sorted_vocab)}
    vocab_to_int = {word: ii for ii, word in int_to_vocab.items()}
    
    # return tuple
    return (vocab_to_int, int_to_vocab)

Tokenize Punctuation

The script is split into a word array using spaces as delimiters. However, punctuations like periods and exclamation marks can create multiple ids for the same word. For example, "bye" and "bye!" would generate two different word ids.

The function token_lookup will return a dictionary that will be used to tokenize the following like "!" into "".

This dictionary will be used to tokenize the symbols and add the delimiter (space) around it. This separates each symbol as its own word, making it easier for the neural network to predict the next word.

In [4]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenized dictionary where the key is the punctuation and the value is the token
    """
    token_dict = {
        ".": "<PERIOD>",
        ",": "<COMMA>",
        '"': "<QUOTATION_MARK>",
        ":": "<COLON>",
        ";": "<SEMICOLON>",
        "!": "<EXCLAMATION_MARK>",
        "?": "<QUESTION_MARK>",
        "(": "<LEFT_PAREN>",
        ")": "<RIGHT_PAREN>",
        "[": "<LEFT_BRACKET>",
        "]": "<RIGHT_BRACKET>",
        "{": "<LEFT_BRACE>",
        "}": "<RIGHT_BRACE>",
        "-": "<HYPHEN>",
        "–": "<EN_DASH>",
        "—": "<EM_DASH>",
        "\n": "<RETURN>",
        "&": "<AMPERSAND>",
        "…": "<ELLIPSIS>"
    }
        
    return token_dict

Pre-process and Save the Data

In [5]:
# pre-process training data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)
In [6]:
import helper
int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

Network Architecture

In this section, an RNN is built by implementing the PyTorch RNN Module with custom forward and backpropagation functions.

Check Access to GPU

In [7]:
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
    print('No GPU found.')

Input

Starting with the preprocessed input data, TensorDataset will be used to provide a known format to our dataset in combination with DataLoader. These will handle batching, shuffling, and other dataset iteration functions.

data = TensorDataset(feature_tensors, target_tensors)
data_loader = torch.utils.data.DataLoader(data, 
                                          batch_size=batch_size)

Batching

The batch_data function batches words data into chunks of size batch_size using the TensorDataset and DataLoader classes.

For example, say we have these as input:

words = [1, 2, 3, 4, 5, 6, 7]
sequence_length = 4

Your first feature_tensor contains the values:

[1, 2, 3, 4]

And the corresponding target_tensor is just the next "word"/tokenized word value:

5

This will then continue with the second feature_tensor, target_tensor being:

[2, 3, 4, 5]  # features
6             # target
In [8]:
from torch.utils.data import TensorDataset, DataLoader


def batch_data(words, sequence_length, batch_size):
    """
    Batch the neural network data using DataLoader
    :param words: The word ids of the TV scripts
    :param sequence_length: The sequence length of each batch
    :param batch_size: The size of each batch; the number of sequences in a batch
    :return: DataLoader with batched data
    """
    text, labels = [], []
    
    for i in range(len(words)-sequence_length):
        text.append(words[i:i+sequence_length])
        labels.append(words[i+sequence_length])
    
    data = TensorDataset(torch.from_numpy(np.array(text)),
                         torch.from_numpy(np.array(labels)))
    
    dataloader = DataLoader(data, shuffle=True, batch_size=batch_size)
    
    # return a dataloader
    return dataloader

Test your dataloader

Below, some test text data is generated and a dataloader is defined using the function above. Then, a sample batch of inputs sample_x and targets sample_y are retrieved from the dataloader.

torch.Size([10, 5])
tensor([[ 28,  29,  30,  31,  32],
        [ 21,  22,  23,  24,  25],
        [ 17,  18,  19,  20,  21],
        [ 34,  35,  36,  37,  38],
        [ 11,  12,  13,  14,  15],
        [ 23,  24,  25,  26,  27],
        [  6,   7,   8,   9,  10],
        [ 38,  39,  40,  41,  42],
        [ 25,  26,  27,  28,  29],
        [  7,   8,   9,  10,  11]])

torch.Size([10])
tensor([ 33,  26,  22,  39,  16,  28,  11,  43,  30,  12])

Sizes

The sample_x should be of size (batch_size, sequence_length) or (10, 5) in this case and sample_y should just have one dimension: batch_size (10).

Values

You should also notice that the targets, sample_y, are the next value in the ordered test_text data. So, for an input sequence [ 28, 29, 30, 31, 32] that ends with the value 32, the corresponding output should be 33.

In [9]:
# test dataloader
test_text = range(50)
t_loader = batch_data(test_text, sequence_length=5, batch_size=10)

data_iter = iter(t_loader)
sample_x, sample_y = data_iter.next()

print(sample_x.shape)
print(sample_x)
print()
print(sample_y.shape)
print(sample_y)
torch.Size([10, 5])
tensor([[ 36,  37,  38,  39,  40],
        [  6,   7,   8,   9,  10],
        [ 27,  28,  29,  30,  31],
        [ 18,  19,  20,  21,  22],
        [  8,   9,  10,  11,  12],
        [ 44,  45,  46,  47,  48],
        [ 24,  25,  26,  27,  28],
        [  1,   2,   3,   4,   5],
        [ 11,  12,  13,  14,  15],
        [ 37,  38,  39,  40,  41]])

torch.Size([10])
tensor([ 41,  11,  32,  23,  13,  49,  29,   6,  16,  42])

Build the Neural Network

Implement an RNN using PyTorch's Module class. You may choose to use a GRU or an LSTM. To complete the RNN, you'll have to implement the following functions for the class:

  • __init__ - The initialize function.
  • init_hidden - The initialization function for an LSTM/GRU hidden state
  • forward - Forward propagation function.

The initialize function should create the layers of the neural network and save them to the class. The forward propagation function will use these layers to run forward propagation and generate an output and a hidden state.

The output of this model should be the last batch of word scores after a complete sequence has been processed. That is, for each input sequence of words, we only want to output the word scores for a single, most likely, next word.

Hints

  1. Make sure to stack the outputs of the lstm to pass to your fully-connected layer, you can do this with lstm_output = lstm_output.contiguous().view(-1, self.hidden_dim)
  2. You can get the last batch of word scores by shaping the output of the final, fully-connected layer like so:
# reshape into (batch_size, seq_length, output_size)
output = output.view(batch_size, -1, self.output_size)
# get last batch
out = output[:, -1]
In [10]:
import torch.nn as nn

class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        """
        Initialize the PyTorch RNN Module
        :param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
        :param output_size: The number of output dimensions of the neural network
        :param embedding_dim: The size of embeddings, should you choose to use them        
        :param hidden_dim: The size of the hidden layer outputs
        :param dropout: dropout to add in between LSTM/GRU layers
        """
        super(RNN, self).__init__()
        
        # set class variables
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # define model layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=dropout, batch_first=True)
        
        self.fc = nn.Linear(hidden_dim, output_size)
        
    
    def forward(self, nn_input, hidden):
        """
        Forward propagation of the neural network
        :param nn_input: The input to the neural network
        :param hidden: The hidden state        
        :return: Two Tensors, the output of the neural network and the latest hidden state
        """
        # TODO: Implement function
        batch_size = nn_input.size(0)
        
        nn_input = nn_input.long()
        embeds = self.embedding(nn_input)
        lstm_out, hidden = self.lstm(embeds, hidden)
        
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        
        out = self.fc(lstm_out)
        out = out.view(batch_size, -1, self.output_size)
        out = out[:, -1]

        # return one batch of output word scores and the hidden state
        return out, hidden
    
    
    def init_hidden(self, batch_size):
        '''
        Initialize the hidden state of an LSTM/GRU
        :param batch_size: The batch_size of the hidden state
        :return: hidden state of dims (n_layers, batch_size, hidden_dim)
        '''
        # Implement function
        weight = next(self.parameters()).data
        
        # initialize hidden state with zero weights, and move to GPU if available
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden

Define forward and backpropagation

Use the RNN class you implemented to apply forward and back propagation. This function will be called, iteratively, in the training loop as follows:

loss = forward_back_prop(decoder, decoder_optimizer, criterion, inp, target)

And it should return the average loss over a batch and the hidden state returned by a call to RNN(inp, hidden). Recall that you can get this loss by computing it, as usual, and calling loss.item().

If a GPU is available, you should move your data to that GPU device, here.

In [11]:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
    """
    Forward and backward propagation on the neural network
    :param decoder: The PyTorch Module that holds the neural network
    :param decoder_optimizer: The PyTorch optimizer for the neural network
    :param criterion: The PyTorch loss function
    :param inp: A batch of input to the neural network
    :param target: The target output for the batch of input
    :return: The loss and the latest hidden state Tensor
    """
    # move data to GPU, if available
    if train_on_gpu:
        rnn.cuda()
        inp, target = inp.cuda(), target.cuda()
    
    # perform backpropagation and optimization
    hidden = tuple([each.data for each in hidden])
    
    rnn.zero_grad()
    out, hidden = rnn(inp, hidden)
    
    loss = criterion(out, target)
    loss.backward(retain_graph=True)
    
    nn.utils.clip_grad_norm_(rnn.parameters(), 5)
    optimizer.step()

    # return the loss over a batch and the hidden state produced by our model
    return loss.item(), hidden

Neural Network Training

With the structure of the network complete and data ready to be fed in the neural network, it's time to train it.

Train Loop

The training loop is implemented for you in the train_decoder function. This function will train the network over all the batches for the number of epochs given. The model progress will be shown every number of batches. This number is set with the show_every_n_batches parameter. You'll set this parameter along with other parameters in the next section.

In [12]:
def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):
    best_loss = np.Inf
    batch_losses = []
    report = 0
    losses = []
    
    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
    for epoch_i in range(1, n_epochs + 1):
        
        # initialize hidden state
        hidden = rnn.init_hidden(batch_size)
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):
            
            # make sure you iterate over completely full batches, only
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # forward, back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            # record loss
            batch_losses.append(loss)

            # printing loss stats
            if batch_i % show_every_n_batches == 0:
                avg_loss = np.average(batch_losses)
                print('Epoch: {:>4}/{:<4}  Loss: {}\n'.format(
                    epoch_i, n_epochs, avg_loss))
                losses.append([report, avg_loss])
                report += 1
                
                if avg_loss < best_loss:
                    helper.save_model('./save/trained_rnn', rnn)
                    print('New Best Loss: Model Saved')
                    best_loss = avg_loss
                batch_losses = []

    # returns a trained rnn
    return rnn, losses

Hyperparameters

Set and train the neural network with the following parameters:

  • sequence_length is the length of a sequence.
  • batch_size is the batch size.
  • num_epochs is the number of epochs to train for.
  • learning_rate is the learning rate for the Adam optimizer.
  • vocab_size is the number of unique tokens in our vocabulary.
  • output_size is the desired size of the output.
  • embedding_dim is the embedding dimension; much, much smaller than the vocab_size.
  • hidden_dim is the hidden dimension of the RNN.
  • n_layers is the number of layers/cells in the RNN.
  • show_every_n_batches is the number of batches at which the neural network should print its training progress.

If the network isn't getting your desired results, tweak these parameters and/or the layers in the RNN class.

In [13]:
# Data params
# Sequence Length
sequence_length = 12  # of words in a sequence
# Batch Size
batch_size = 128

# data loader - do not change
train_loader = batch_data(int_text, sequence_length, batch_size)
In [14]:
# Training parameters
# Number of Epochs
num_epochs = 20
# Learning Rate
learning_rate = 0.001

# Model parameters
# Vocab size
vocab_size = len(int_to_vocab) + 1
# Output size
output_size = len(int_to_vocab)
# Embedding Dimension
embedding_dim = 500
# Hidden Dimension
hidden_dim = 1200
# Number of RNN Layers
n_layers = 2

# Show stats for every n number of batches
show_every_n_batches = 1000

A number of different combinations of hyperparameters settings were tested one one by one to determine which were among the most effective. show_every_n_batches was set to a low value, 150, and these combinations were trained 1 or 2 epochs total in order to find promising combinations before committing to prolonged training.

  • batch_size was tested at 32, 64, and 128, but all worked within memory constraints once other errors were fixed, so 128 was chosen for training speed.
  • sequence_length was found to be particularly impactful to training speed, so it was tested on 9, 12, 15, and 20, but larger values were not particularly important to loss and 12 was settled on.
  • n_layers was set to 2 and not experimented on as this was recommended by most classroom lectures and 3 or more layers rarely offer better benefits for character-level text generation RNNs.
  • embedding_dim was tested at the hundred marks from 100 to 700, but 500 seemed to be the best medium. 400 and below appeared too simplistic for the model and anything above 500 didn't seem to have any benefits over 500.
  • hidden_dim ended up being the most important hyperparameter to tune as the model did not appear to converge effectively at initial values. Values from 300 to 1500 were tested, with the lower values ending up to be too simplistic for the model to effectively learn from the training data with. 1200 appeared to have the best results.

Training

In the next cells, neural network is trained on the pre-processed data. Proper loss development isn't guarenteed, and the hyperparameters may need adjusted multiple times to find a good combination for the training data and task. In general, you may get better results with larger hidden and n_layer dimensions, but larger models take a longer time to train.

A respectable loss to aim for is less than 3.5.

Different sequence lengths should be experimented with as this length determines the size of the long range dependencies that a model can learn.

Note:

This project was originally trained on Udacity-provided servers. The active_session() from the following cell block keeps the connection alive while training completes. For replicating this on your own machines, this cell block and the "with active_session():" line from the training cell should be removed.

In [15]:
import signal

from contextlib import contextmanager

import requests


DELAY = INTERVAL = 4 * 60  # interval time in seconds
MIN_DELAY = MIN_INTERVAL = 2 * 60
KEEPALIVE_URL = "https://nebula.udacity.com/api/v1/remote/keep-alive"
TOKEN_URL = "http://metadata.google.internal/computeMetadata/v1/instance/attributes/keep_alive_token"
TOKEN_HEADERS = {"Metadata-Flavor":"Google"}


def _request_handler(headers):
    def _handler(signum, frame):
        requests.request("POST", KEEPALIVE_URL, headers=headers)
    return _handler


@contextmanager
def active_session(delay=DELAY, interval=INTERVAL):
    """
    Example:

    from workspace_utils import active session

    with active_session():
        # do long-running work here
    """
    token = requests.request("GET", TOKEN_URL, headers=TOKEN_HEADERS).text
    headers = {'Authorization': "STAR " + token}
    delay = max(delay, MIN_DELAY)
    interval = max(interval, MIN_INTERVAL)
    original_handler = signal.getsignal(signal.SIGALRM)
    try:
        signal.signal(signal.SIGALRM, _request_handler(headers))
        signal.setitimer(signal.ITIMER_REAL, delay, interval)
        yield
    finally:
        signal.signal(signal.SIGALRM, original_handler)
        signal.setitimer(signal.ITIMER_REAL, 0)


def keep_awake(iterable, delay=DELAY, interval=INTERVAL):
    """
    Example:

    from workspace_utils import keep_awake

    for i in keep_awake(range(5)):
        # do iteration with lots of work here
    """
    with active_session(delay, interval): yield from iterable
In [16]:
with active_session():
    # create model and move to gpu if available
    rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5)
    if train_on_gpu:
        rnn.cuda()

    # defining loss and optimization functions for training
    optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
    criterion = nn.CrossEntropyLoss()

    # training the model
    trained_rnn, losses = train_rnn(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)
Training for 20 epoch(s)...
Epoch:    1/20    Loss: 5.023212515354157

/opt/conda/lib/python3.6/site-packages/torch/serialization.py:193: UserWarning: Couldn't retrieve source code for container of type RNN. It won't be checked for correctness upon loading.
  "type " + obj.__name__ + ". It won't be checked "
New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.523681572914123

New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.357919644832611

New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.256966039896011

New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.206574712276459

New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.169255956172943

New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.135705820322037

New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.114186443090439

New Best Loss: Model Saved
Epoch:    1/20    Loss: 4.074571540594101

New Best Loss: Model Saved
Epoch:    2/20    Loss: 3.841714834509068

New Best Loss: Model Saved
Epoch:    2/20    Loss: 3.8300334751605987

New Best Loss: Model Saved
Epoch:    2/20    Loss: 3.8445535378456115

Epoch:    2/20    Loss: 3.826541626930237

New Best Loss: Model Saved
Epoch:    2/20    Loss: 3.838555276632309

Epoch:    2/20    Loss: 3.827766616344452

Epoch:    2/20    Loss: 3.839187493562698

Epoch:    2/20    Loss: 3.832147630214691

Epoch:    2/20    Loss: 3.842352229595184

Epoch:    3/20    Loss: 3.575420247612399

New Best Loss: Model Saved
Epoch:    3/20    Loss: 3.5771549525260924

Epoch:    3/20    Loss: 3.5859152026176453

Epoch:    3/20    Loss: 3.61576287817955

Epoch:    3/20    Loss: 3.6328304860591887

Epoch:    3/20    Loss: 3.644551205396652

Epoch:    3/20    Loss: 3.645131004810333

Epoch:    3/20    Loss: 3.662799038171768

Epoch:    3/20    Loss: 3.707046409368515

Epoch:    4/20    Loss: 3.399345034763543

New Best Loss: Model Saved
Epoch:    4/20    Loss: 3.4146838200092318

Epoch:    4/20    Loss: 3.42980473613739

Epoch:    4/20    Loss: 3.444021119117737

Epoch:    4/20    Loss: 3.4838025910854338

Epoch:    4/20    Loss: 3.4844834179878235

Epoch:    4/20    Loss: 3.5134334399700164

Epoch:    4/20    Loss: 3.5272429411411284

Epoch:    4/20    Loss: 3.54845725107193

Epoch:    5/20    Loss: 3.242033117051129

New Best Loss: Model Saved
Epoch:    5/20    Loss: 3.273796578168869

Epoch:    5/20    Loss: 3.28691948223114

Epoch:    5/20    Loss: 3.313450812101364

Epoch:    5/20    Loss: 3.3269090163707733

Epoch:    5/20    Loss: 3.3544340674877167

Epoch:    5/20    Loss: 3.3820889763832094

Epoch:    5/20    Loss: 3.4025665228366853

Epoch:    5/20    Loss: 3.4252790093421934

Epoch:    6/20    Loss: 3.129925215845444

New Best Loss: Model Saved
Epoch:    6/20    Loss: 3.1412453763484955

Epoch:    6/20    Loss: 3.1792691149711607

Epoch:    6/20    Loss: 3.1948207347393036

Epoch:    6/20    Loss: 3.2189578416347504

Epoch:    6/20    Loss: 3.246541239976883

Epoch:    6/20    Loss: 3.2866171128749846

Epoch:    6/20    Loss: 3.3054614861011506

Epoch:    6/20    Loss: 3.335029572725296

Epoch:    7/20    Loss: 3.020186313552929

New Best Loss: Model Saved
Epoch:    7/20    Loss: 3.0533013434410097

Epoch:    7/20    Loss: 3.0772991843223574

Epoch:    7/20    Loss: 3.1019813883304597

Epoch:    7/20    Loss: 3.1202284171581267

Epoch:    7/20    Loss: 3.169564992427826

Epoch:    7/20    Loss: 3.1844072852134704

Epoch:    7/20    Loss: 3.2218992977142333

Epoch:    7/20    Loss: 3.240302947998047

Epoch:    8/20    Loss: 2.935693301664547

New Best Loss: Model Saved
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-16-9fb8b1f24668> in <module>()
     14 
     15     # training the model
---> 16     trained_rnn, losses = train_rnn(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

<ipython-input-12-b41b479ea2f2> in train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches)
     21 
     22             # forward, back prop
---> 23             loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)
     24             # record loss
     25             batch_losses.append(loss)

<ipython-input-11-acee5f63dbed> in forward_back_prop(rnn, optimizer, criterion, inp, target, hidden)
     23     loss.backward(retain_graph=True)
     24 
---> 25     nn.utils.clip_grad_norm_(rnn.parameters(), 5)
     26     optimizer.step()
     27 

/opt/conda/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py in clip_grad_norm_(parameters, max_norm, norm_type)
     26         total_norm = 0
     27         for p in parameters:
---> 28             param_norm = p.grad.data.norm(norm_type)
     29             total_norm += param_norm ** norm_type
     30         total_norm = total_norm ** (1. / norm_type)

KeyboardInterrupt: 

The training was cut off here before the full 20 epochs due to time constraints, but reasonable results should be available from the current loss values. As the training was interrupted manually, the losses were not returned by the train_rnn function, and have been manually copied and entered into a list for plotting below.

In [15]:
losses = [5.023212515354157, 4.523681572914123, 4.357919644832611, 4.256966039896011, 4.206574712276459, 4.169255956172943, 4.135705820322037, 4.114186443090439, 4.074571540594101, 3.841714834509068, 3.8300334751605987, 3.8445535378456115, 3.826541626930237, 3.838555276632309, 3.827766616344452, 3.839187493562698, 3.832147630214691, 3.842352229595184, 3.575420247612399, 3.5771549525260924, 3.5859152026176453, 3.61576287817955, 3.6328304860591887, 3.644551205396652, 3.645131004810333, 3.662799038171768, 3.707046409368515, 3.399345034763543, 3.4146838200092318, 3.42980473613739, 3.444021119117737, 3.4838025910854338, 3.4844834179878235, 3.5134334399700164, 3.5272429411411284, 3.54845725107193, 3.242033117051129, 3.273796578168869, 3.28691948223114, 3.313450812101364, 3.3269090163707733, 3.3544340674877167, 3.3820889763832094, 3.4025665228366853, 3.4252790093421934, 3.129925215845444, 3.1412453763484955, 3.1792691149711607, 3.1948207347393036, 3.2189578416347504, 3.246541239976883, 3.2866171128749846, 3.3054614861011506, 3.335029572725296, 3.020186313552929, 3.0533013434410097,3.0772991843223574, 3.1019813883304597, 3.1202284171581267, 3.169564992427826, 3.184407285213470, 3.2218992977142333, 3.240302947998047, 2.935693301664547]
In [19]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(losses)
plt.title("TAZ Script Generation RNN Training Progress")
plt.xlabel("1000th Batch Results")
plt.ylabel("Model Loss")
plt.show()

The loss dropped quite well with these hyperparameters, respresenting a model that was effectively learning the patterns and structure of its training data. Seeing as the model never actually plateaued, further training would have continued to improve the model's performance for quite some time. Finally, notice how the loss descends in a staircase pattern. This is because the batch-averaged loss was recorded every 1,000 batches and not once per epoch. Each large step down in training loss is actually the beginning of a new training epoch and a new average loss after the completion of the previous.


Checkpoint

After running the above training cell, the model is saved to the filename, trained_rnn. The notebook can be resumed from here by running the next cell, which will load in the word:id dictionaries and load in the saved model by name.

In [20]:
import torch
import helper
import numpy as np

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
trained_rnn = helper.load_model('./save/trained_rnn')

Generate a TAZ Podcast Episode Script

With the network trained and saved, it can be used to generate a new, "fake" The Adventure Zone script.

Generate Text

To generate the text, the network needs to start with a priming word or phrase and repeat its predictions until it reaches a set length. The generate function is used to do this. It takes a word id to start with, prime_id, and generates a set length of text, predict_len. Also note that it uses topk sampling to introduce some randomness in choosing the most likely next word, given an output set of word scores!

In [21]:
import torch.nn.functional as F

def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
    """
    Generate text using the neural network
    :param decoder: The PyTorch Module that holds the trained neural network
    :param prime_id: The word id to start the first prediction
    :param int_to_vocab: Dict of word id keys to word values
    :param token_dict: Dict of puncuation tokens keys to puncuation values
    :param pad_value: The value used to pad a sequence
    :param predict_len: The length of text to generate
    :return: The generated text
    """
    rnn.eval()
    
    # create a sequence (batch_size=1) with the prime_id
    current_seq = np.full((1, sequence_length), pad_value)
    current_seq[-1][-1] = prime_id
    predicted = [int_to_vocab[prime_id]]
    
    for _ in range(predict_len):
        if train_on_gpu:
            current_seq = torch.LongTensor(current_seq).cuda()
        else:
            current_seq = torch.LongTensor(current_seq)
        
        # initialize the hidden state
        hidden = rnn.init_hidden(current_seq.size(0))
        
        # get the output of the rnn
        output, _ = rnn(current_seq, hidden)
        
        # get the next word probabilities
        p = F.softmax(output, dim=1).data
        if(train_on_gpu):
            p = p.cpu() # move to cpu
         
        # use top_k sampling to get the index of the next word
        top_k = 5
        p, top_i = p.topk(top_k)
        top_i = top_i.numpy().squeeze()
        
        # select the likely next word index with some element of randomness
        p = p.numpy().squeeze()
        word_i = np.random.choice(top_i, p=p/p.sum())
        
        # retrieve that word from the dictionary
        word = int_to_vocab[word_i]
        predicted.append(word)     
        
        # the generated word becomes the next "current sequence" and the cycle can continue
        current_seq = np.roll(current_seq, -1, 1)
        current_seq[-1][-1] = word_i
    
    gen_sentences = ' '.join(predicted)
    
    # Replace punctuation tokens
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
    gen_sentences = gen_sentences.replace('\n ', '\n')
    gen_sentences = gen_sentences.replace('( ', '(')
    
    # return all the sentences
    return gen_sentences

Generate a New Script

Set gen_length to the length of script you want to generate and set prime_word to start the prediction:

You can set the prime word to any word in the encoding id dictionary, but it's best to start with a name from the original Balance Arc (or "previously", as in "Previously on the Adventure Zone") for generating a TAZ script.

In [26]:
# run the cell multiple times to get different results!
gen_length = 400 # modify the length to your preference
prime_word = 'previously' # name for starting the script

pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:40: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
previously and
thinking, but…
magnus: yeah!
taako: no, no, no, it’s not like a good idea.
clint: okay.
griffin: and then you hear the sound of, like, scream in a room?
griffin: yeah, sure, sure.
justin:[ crosstalk] yeah, sure.
travis:[ rolls dice] that’s a 17, plus 7, 25?
griffin: yeah, that’s good.
justin: i mean it sounds like the only ones that we have, like, in the
world, like, a lot of people are wonderin’,
they’re all seated, so they don’t really know about it.
travis: okay, i’m gonna do it.
griffin: yeah, that’s a hit.
travis: i rolled, i rolled a 14, plus two, so that’s fourteen points of damage.
justin: okay, so that’s 16+7.
griffin: okay.
justin:[ snorts]
griffin: and the other two goblins come out from under their weapons, and you
can see it, and it’s a pretty small stretch of wood, it’s a
sentient- shaker ceremony.
justin:[ laughing]
merle:[ imitating] o cha, oontz… ”
griffin: okay.
clint:— to that, to find out your solution.
taako: i mean, i mean, you can see it, right?
travis: i think that.
griffin: okay.
clint:[ laughs]
griffin: okay. you pop open isaak’s locker.
travis: okay, i’m gonna go first with the chance lance.
justin: okay, i just wanna see if i can figure it out.
griffin: okay, you slam the handle.
justin: yeah, that’s fine.
clint:[ crosstalk] i have an ax, right?
justin: i don’t know what the fuck is goin’ on.
[ travis laughs]
griffin: you

Save Scripts

In [27]:
# save script to a text file
f =  open("generated_script_5.txt","w")
f.write(generated_script)
f.close()

Results

It's alright if the script doesn't make perfect sense. It should look like alternating lines of dialogue. Here is one such example of a few generated lines.

Example generated script

taako.

clint: well, we got that.

griffin: okay. uh, and uh, he-

justin:[ in a high- pitched voice] hey guys.[ pause]

griffin: she says,

lup: okay, i got a— i have a question. i’m gonna need to go to the quarry, and try and deduce what happened, because we have to warn you, but i promise you.

magnus: well, i have a question for you.

hudson: i mean you guys are getting pretty close to the astral plane. you can see it.

magnus:[ whispering] oh!

merle: no, no, no, no, no! no! no. no.

justin:[ crosstalk] oh, i see.

travis:[ crosstalk] i have no idea what the fuck i do.

griffin: yeah, i mean you just chill in that.

justin: okay.

griffin: okay, so this brings a rock around its central neck and you realize that this memory is just radiating you in the ceiling.

clint: okay, and i cast zone on truth.

griffin: okay. you pop open the stair in the gachapon cell, you hear a voice say:

lydia: well, i have a— i had a— i have a +1 skill.

griffin: okay!

travis: okay, i’m gonna roll a d8.

griffin: okay.

travis: and i’m gonna do a nature check now?

griffin: no, he is alive. he’s lying in the center, he’s not especially familiar, he has a big bandage of stars and steel. and then you hear the woman’s voice say,

female elf: i don’t know what to tell you, merle.

clint: alright, let’s go.

griffin: yeah, so you guys have just won a war.

clint: well, i’m not gonna do a perception check

You can see that there are multiple characters that say (somewhat) complete sentences, but it's not going to be perfect. It takes quite a lot of training and a large amount of computing power to get good results, and often, it is necessary to use a reduced vocabulary by discarding unimportant and uncommon words while also getting more data. A reduced vocabulary could be explored here, but the nature of the podcast genre results in a plethora of fantastical words and colorful phrases that are not common or even existent in English, while not even being commonly repeated in the transcript at all. Additionally, with this original season ending in August of 2017, additional training data is not really available. Transcripts from live shows or even future seasons could be utilized to better learn the actors personalities, but extreme care would be necessary to prevent the model from combining character personalities which is well beyond the scope of this project.