Create and train a model from scratch !
- What will we do ?
- Getting and preparing the data (you can skip this part)
- Create and train the model
What will we do ?
We will create a linear model then a neural network from scratch to do binary classification, train them using gradient descent and finally see how current libraries sipmlify these things for us and we'll create our own class that mimicks the tools provided by these libraries.
If you never used Pytorch before, i would refer you to my other post Start simple, start with a baseline !, which gives a more gentle introduction to the tools needed here.
Getting and preparing the data (you can skip this part)
We'll be using the MNIST_SAMPLE dataset provided by fastai, which contains images of 3's and 7's.
We start by downloading the data, opening the images, turning them into tensors and stacking them into a rank-3 tensors (matrix) for each label (train/valid) separately.
path = untar_data(URLs.MNIST_SAMPLE)
three_train = torch.stack([tensor(Image.open(o)).float()/255 for o in (path/'train'/'3').ls()])
seven_train = torch.stack([tensor(Image.open(o)).float()/255 for o in (path/'train'/'7').ls()])
seven_valid = torch.stack([tensor(Image.open(o)).float()/255 for o in (path/'valid'/'7').ls()])
three_valid = torch.stack([tensor(Image.open(o)).float()/255 for o in (path/'valid'/'3').ls()])
Then, we'll concatenate 3/7 tensors obtained (train/ valid seperately) into one single tensor and flatten the images out by reshaping the tensors.
For our independant variables we'll create the corresponding tensor, with 1 indicating a 3 and 0 indicating a 7. Add a dimension, to not have problems further down the road (due to broadcasting).
train_x = torch.cat([three_train, seven_train]).view(-1, 28*28)
train_y = tensor([1] * len(three_train) + [0] * len(seven_train)).unsqueeze(1)
train_x.shape, train_y.shape
valid_x = torch.cat([three_valid, seven_valid]).view(-1, 28*28)
valid_y = tensor([1] * len(three_valid) + [0] * len(seven_valid)).unsqueeze(1)
valid_x.shape, valid_y.shape
Create our dataset (for PyTorch it is a list of tuples containing our dependant/ independant variables).
Then create our dataloader, which is obtained by shuffling the dataset, and creating batches of size 256.
And wrap that in a Dataloaders object.
dset = list(zip(train_x, train_y))
dl = DataLoader(dset, batch_size=256, shuffle=True)
dset_valid = list(zip(valid_x, valid_y))
dl_valid = DataLoader(dset_valid, batch_size=256, shuffle=True)
dls = DataLoaders(dl, dl_valid)
def batch_accuracy(predictions, targets):
return ((predictions > 0) == targets).float().mean()
Now we also need to define a loss function, that we will optimize using SGD.
def mnist_loss(predictions, targets):
result = predictions.sigmoid()
return torch.where(targets == 1, 1 - result , result).mean()
Where's the model at ? No worries, he's just here waiting for you !
In python "@" refers to matrix multiplication.
def linear(xb): return xb @ w + b
Something to randomly initialize the parameters with. And initialize them.
def init_params(*size): return (torch.randn(size)).requires_grad_()
w, b = init_params(28*28, 1), init_params(1)
We need a training loop that corresponds to the graph below. So let's do just that.
epochs = 5
# setting the learning rate
lr = 1
# Training loop
for i in range(epochs):
for xb, yb in dls[0]:
# Predict
result = linear(xb)
# Calculate the loss
loss = mnist_loss(result, yb)
# Calculate the gradient
loss.backward()
# Take a step
w.data -= w.grad * lr
b.data -= b.grad * lr
# Set the gradient to zero, s the next time we calculate it, it doesn't accumulate
w.grad.zero_()
b.grad.zero_()
# Show the batch accuracy for each epoch
print(tensor([batch_accuracy(linear(xb), yb) for xb, yb in dls[1]]).mean().item(), end=' ')
Now let's scale up a bit, and go for the neural net. Which will consist of two linear layers (first and last) and one non-linearity between them (Which in this case is the rectified linear unit).
def simple_net(xb):
result1 = xb @ w1 + b1
result2 = F.relu(result1)
return result2 @ w2 + b2
w1, b1 = init_params(28*28, 30), init_params(30)
w2, b2 = init_params(30, 1), init_params(1)
We'll be using the same training loop, but since it's a more "complex" model, we'll lower the learning rate and train for more epochs. So we'll just show the accuracy for the last epoch.
epochs = 40
# setting the learning rate
lr = 0.1
# Training loop
for i in range(epochs):
for xb, yb in dls[0]:
# Predict
result = simple_net(xb)
# Calculate the loss
loss = mnist_loss(result, yb)
# Calculate the gradient
loss.backward()
# Take a step
w1.data -= w1.grad * lr
b1.data -= b1.grad * lr
w2.data -= w2.grad * lr
b2.data -= b2.grad * lr
# Set the gradient to zero, s the next time we calculate it, it doesn't accumulate
w1.grad.zero_()
b1.grad.zero_()
w2.grad.zero_()
b2.grad.zero_()
# Show the batch accuracy for the last epoch
print(tensor([batch_accuracy(simple_net(xb), yb) for xb, yb in dls[1]]).mean().item(), end=' ')
simple_net = nn.Sequential(
nn.Linear(28*28, 30),
nn.ReLU(),
nn.Linear(30, 1)
)
# Create the Learner object
learn = Learner(dls, simple_net, loss_func=mnist_loss, opt_func=SGD, metrics=batch_accuracy)
# Train the model
learn.fit(40, lr = 0.1)
As you can see we create a Learner object, it's a class that handles all the training process for you, you just give it five things:
- Your dataloaders object.
- The model.
- The loss function to optimize.
- The optimization function.
- The metrics to be displayed (Optionnal).
The optimization function (Optimizer) in PyTorch is a function that handles the gradient step for you, e.g. updating the params and setting the gradient to zero.
opt = SGD(simple_net.parameters(), 0.1) # we give it the params and learning rate
Let's try to mimik this by creating our own Learner class from scratch.
class MyLearner:
# Initializing the learner
def __init__(self, dls, model, opt_func=SGD, loss_func=mnist_loss, metrics=batch_accuracy):
self.dls = dls
self.model = model
self.opt_func = opt_func
self.loss_func = loss_func
self.metrics = metrics
# We store the metric values in a list (to plot them for example)
self.metric_values = []
# Method for training our model
def fit(self, epochs, lr=1):
# Create the optimizer
opt = self.opt_func(self.model.parameters(), lr)
# Training loop (same as before)
for i in range(epochs):
for xb, yb in self.dls[0]:
result = self.model(xb)
loss = self.loss_func(result, yb)
loss.backward()
# We update the weights using our optimizer
opt.step()
# Setting the gradient to zero using the same optimizer
opt.zero_grad()
# Calculate the metric value, store it and print it for each epoch
b_accuracy = tensor([self.metrics(self.model(xb), yb) for xb, yb in self.dls[1]]).mean().item()
self.metric_values.append(b_accuracy)
print(round(b_accuracy, 4), end=' ')
Let's try it for a simple linear model. IT WORKS !!!
my_learner = MyLearner(dls, nn.Linear(28*28, 1), opt_func=SGD, loss_func=mnist_loss, metrics=batch_accuracy)
# Train it
my_learner.fit(20)
# Plot the metric values with each epoch
plt.plot(my_learner.metric_values);