Getting started with PyTorch for Deep Learning (Part 1: Tensors)

This is Part 1 of the tutorial series. Please also see the other parts (Part 2, Part 3).

In this tutorial I will try and give a very short, to the point guide to using PyTorch for Deep Learning. I will not be explaining the concepts behind machine learning, neural networks, deep learning, etc. There are already several great tutorials on these topics out there, such as the through series by Adam Geitgey, Machine Learning is Fun! as well as the free lectures from Stanford University. Also look at the Deep Learning with PyTorch tutorial, which was used as the basis for this post.

This tutorial also assumes a basic understanding of Python and NumPy. For that see this quickstart guide. NOTE: we will use Python 3.6 for this tutorial.

Also there are many useful code samples on GitHub, such as this nice tutorial.

Why use PyTorch?

A good overview is given on the PyTorch Website. But in short:

PyTorch allows tensor computations (like NumPy), but with strong GPU acceleration, which significantly speeds up the process of training a neural network.
It has a built in autograd system, which allows automatic back-propagation through the network.
The network is dynamically defined, which means the behavior of the network can change during training. This is not possible with other static frameworks where the network is defined before training such as TensorFlow, Theano, Caffe, etc.
Due to the dynamic nature of PyTorch, debugging is much simpler, since the line at which an error occurred can be investigated (as apposed to a deep stack trace of the internally built model).
It is fast.

Installation

If you are planning to use a GPU to train your neural networks, you will have to install CUDA first. This is probably something you would want to do if you have a strong GPU, since it will be easily 10 times faster than running the same training process on a CPU.

You can install CUDA from the Nvidia website.

Next we will install PyTorch. Use the section appropriate to your operating system.

Linux

We will need both PyTorch and TorchVision (contains built-in data sets like MNIST and CIFAR10), so using conda, install them using the following commands:

conda install pytorch torchvision cuda90 -c pytorch

This assumes you installed CUDA 9, if you are still using CUDA 8, simply drop the cuda90 part.

Windows

At the time of writing (November 2017) Windows was not officially supported by PyTorch, but the community released a mostly stable version. Some problems you may encounter is probably already addressed here. We will need both PyTorch and TorchVision (contains built-in data sets like MNIST and CIFAR10), so using conda, install them using the following commands:

# for CPU only packages
conda install -c peterjc123 pytorch

# for Windows 10 and Windows Server 2016, CUDA 9
conda install -c peterjc123 pytorch cuda90

conda install -c soumith torchvision

This assumes you installed CUDA 9, if you are still using CUDA 8, simply change cuda90 to cuda80.

PyTorch basics

Before we can use PyTorch we first import it using:

from __future__ import print_function
import torch

For a neural network we will need inputs, outputs, weights and biases. All of these will be represented with PyTorch Tensors. A tensor can be thought of as general term for a multi-dimensional array (a vector is a 1D tensor, and a matrix is a 2D tensor, etc.). After importing PyTorch, we can now define a Tensor (which is similar to the ndarray in NumPy) as:

x = torch.Tensor(5, 3)
print(x)

This produces the output:

1.00000e-36 *
  0.0228  0.0000  1.3490
  0.0000  0.0958  0.0000
  0.0958  0.0000  0.0958
  0.0000  0.0958  0.0000
  0.0958  0.0000  0.0958
[torch.FloatTensor of size 5x3]

Note that by defining a Tensor it doesn’t automatically initialize the values to zero. This is because only the memory address are assigned to the tensor, and what happened to be in memory at that time is the values of the Tensor. Therefore, it is more common to create a Tensor using one of several initialization functions built into PyTorch (see here and here), such as:

torch.eye(3)
torch.linspace(3, 10, steps=5)
torch.zeros(5, 3)
torch.rand(5, 3)
torch.Tensor(3, 3).uniform_(0, 1)

NumPy to/from PyTorch Tensors

Since PyTorch Tensors are so similar to NumPy arrays they can easily be translated from the one to the other:

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a) # convert ndarray to Tensor

a = torch.ones(5)
b = a.numpy() # convert Tensor to ndarray

Indexing/Slicing

Indexing and slicing of PyTorch Tensors work the same as in NumPy.

b = a[:, 3:5]  # selects all rows, 4th column and 5th column from a

Note that, at the time of writing negative strides are not supported by PyTorch (you can follow the issue here). To make negative strides, the best way is to convert the Tensor to NumPy and make a copy, and convert it back. Since a copy is made, this will also use extra memory, which might be undesirable, so use with caution.

a_temp = a.numpy()[-3:3:-1].copy()
a = torch.from_numpy(a_temp)

Math operations

There are several Tensor functions available to perform standard math operations, such as abs(), cos(), sin(), add(), mul(), min(), max(), etc. Many of these functions have an in-place and out-of-place version, where the in-place version is identified by adding an underscore _ at the end of the function name. For example multiplication can be written as:

# a = 3.0

b = a.mul(4.0) # out-of-place (doesn't modify a)
# a = 3.0 and b = 12.0

a.mul_(4.0) # in-place (modifies a)
# a = 12.0

Furthermore, standard math operators also behave as expected. Only note that addition and multiplication of a constant with a matrix will be applied to every element in the matrix. And similarly addition and multiplication of a vector with a matrix will be applied to every row/column in the matrix (given that the dimensions of the vector and matrix agree).

c = a * b
d = a + b

Using the GPU

Usually we would like all of our Tensors to reside in GPU memory, in order to eliminate the transfer bottleneck from RAM to the GPU when training a model. This can be be done by simply calling the cuda() function.

# First check if we can use the GPU
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

Note that if your check if CUDA is available and it returns false, it probably means that CUDA has not be installed correctly (see the download link in the beginning of this post). Here is a installation guide that you might find useful.

Conclusion

That is the end of Part 1 of Getting Started with PyTorch. In Part 2 we will look at the automatic differentiation functionality the PyTorch offers through its autograd package.

Continue to Part 2: Autograd >>

3 thoughts on “Getting started with PyTorch for Deep Learning (Part 1: Tensors)”

sogaiu says:

January 11, 2018 at 9:47 pm

Thanks for the series with its clear explanations.

On a minor note, does the second example in the ‘Indexing/Slicing’ section work for you? I had problems getting negative steps to work. Looking into it a bit, I got the sense that the PyTorch folks are interested in supporting that but haven’t gotten around to it yet:

https://github.com/pytorch/pytorch/issues/604
https://github.com/pytorch/pytorch/issues/229

In any case, thanks again for your efforts!

LikeLike

1. Rensu Theart says:
  
  January 12, 2018 at 6:00 am
  
  Hi sogaiu,
  
  Thank you for your comment. I actually did not test it myself, I just assumed it would work, since the PyTorch documentation said that slicing works the same as NumPy and it seemed reasonable that it should work, so thanks for picking that up. As you pointed out, it is actually not supported at the moment since PyTorch explicitly requires the slice step to be greater than 0. And even something like a_t=torch.from_numpy(a[-3:3:-1]) gives an error, stating that it is not currently supported…
  
  Currently it seems like slicing a tensor requires some numpy/pytorch acrobatics. Here is a possible solution that I came up with. a_t2=torch.from_numpy(a_t.numpy()[-3:3:-1].copy())
  One big downside of this is that you are making a copy of the data, and if you are working with massive tensors (as are often the case in deep learning), you will use unnecessarily large amounts of memory. So lets hope they can implement it soon, since it is quite useful.
  
  Rensu
  
  LikeLike
  
Pingback: Getting started with PyTorch for Deep Learning (Part 3.5: PyTorch Sequential) – Code to Light