What is Pytorch?
- A machine learning framework in python.
- Two main features:
- N-dimension Tensor(Matrix) computation (Like
numpy) on GPUs - Automatic differentiation for training deep neural networks
- N-dimension Tensor(Matrix) computation (Like
Tensor
High-dimensional matrices (arrays)

Check its shape with .shape

Create tensor:
- Directly from data:
x = torch.tensor([[1, -1], [-1, 1]])
x = torch.from_numpy(np.array([[1, -1], [-1, 1]]))
"""
tensor([[1., -1.],
[-1., 1.]])
"""- Tensor of constant zeros & ones:
x = torch.zeros([2, 2])
"""
tensor([[0., 0.],
[0., 0.]])
"""
x = torch.ones([1, 2, 5])
"""
tensor([
[[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]]
])
"""Tensor Operations:
-
Basic arithmetic
- Addition:
z = x + y - Subtraction:
z = x - y - Summation:
y = x.sum() - Mean:
y = x.mean() - Power:
y = x.pow(2)
- Addition:
-
Transpose: transpose two specified dimensions
x = torch.zeros([2, 3])
x.shape # torch.Size([2, 3])
x = x.transpose(0, 1)
x.shape # torch.Size([3, 2])- Squeeze: remove the specified dimension with
length == 1
x = torch.zeros([1, 2, 3])
x.shape # torch.Size([1, 2, 3])
x = x.squeeze(0)
x.shape # torch.Size([2, 3])- Unsqueeze: expand a new dimension
x = torch.zeros([2, 3])
x.shape # torch.Size([2, 3])
x = x.unsqueeze(1) # Assert a dimension with length == 1 at the index
x.shape # torch.Size([2, 1, 3])- Cat: concatenate multiple tensors
x = torch.zeros([2, 1, 3])
y = torch.zeros([2, 3, 3])
z = torch.zeros([2, 2, 3])
w = torch.cat([x, y, z], dim=1)
w.shape # torch.Size([2, 6, 3])
- Stack:stack multiple tensor with same size
x = torch.zeros([2, 3])
y = torch.zeros([2, 3])
z = torch.stack([x, y], dim=0)
z.shape # torch.Size([1, 2, 3])- Permute:rearrange the dimensions
x = torch.zeros([3, 5, 7])
x = x.permute(1, 2, 0)
x.shape # torch.Size([5, 7, 3])bmm: a 3D matrix multiple another one.
x = torch.zeros([10, 8, 6]) # (B, M, N)
y = torch.zeros([10, 6, 9]) # (B, N, K)
# Must have same B, and N1 == N2
z = torch.bmm(x, y) # (B, M, K)
z.shape # torch.Size([10, 8, 9])Data Type
| Data type | dtype | Notes |
|---|---|---|
| 32-bit floating point | torch.float32 | 默认浮点类型,训练常用 |
| 64-bit floating point | torch.float64 | 高精度数值计算 |
| 16-bit floating point | torch.float16 | 混合精度 / GPU |
| bfloat 16 floating point | torch.bfloat16 | 无独立类名 |
| 8-bit integer (unsigned) | torch.uint8 | 常用于图像像素 |
| 8-bit integer (signed) | torch.int8 | 量化 |
| 16-bit integer (signed) | torch.int16 | 很少用 |
| 32-bit integer (signed) | torch.int32 | CUDA 索引 |
| 64-bit integer (signed) | torch.int64 | 索引 / 标签 |
| Boolean | torch.bool | mask |
| Complex (64-bit) | torch.complex64 | FFT |
| Complex (128-bit) | torch.complex128 | FFT |
Casting:
.to()
import torch
t = torch.randn(3, 4)
t_double = t.to(torch.float64)
t_int = t.to(torch.int32)- “Shortcut” Method
| 目标类型 | 快捷方法 | 对应 PyTorch 类型 |
|---|---|---|
| Float | t.float() | torch.float32 |
| Double | t.double() | torch.float64 |
| Half | t.half() | torch.float16 |
| Int | t.int() | torch.int32 |
| Long | t.long() | torch.int64 |
| Byte | t.byte() | torch.uint8 |
| Bool | t.bool() | torch.bool |
x = torch.ones(2, 2)
float32 x = x.long()Device
Tensors & modules will be computed with CPU by default
Use .to() to move tensors to appropriate devices
x = x.to("cpu")
x = x.to("cuda")- GPU:
- Check if your computer has NVIDIA GPU by
torch.cuda.is_available() - Multiple GPUs: specify
"cuda:0", "cuda:1", "cuda:2", …
- Check if your computer has NVIDIA GPU by
Gradient Calculation
x = torch.tensor([[1., 0.], [-1., 1.]], requires_grad=True)
z = x.pow(2).sum()
z.backward()
x.grad # tensor([[ 2., 0.], [-2., 2.]])Let
Define
Gradient of a single element
For any element :
Gradient in matrix form
Therefore, the gradient with respect to is
Numerical substitution
Given
Then
- Backpropagation from the Autograd perspective
Computational graph
Chain rule
Where:
Thus:
Leaf Tensor and Autograd
Definition
A leaf tensor is a parameter tensor that
- is created directly by the user,
- has
requires_grad=True, and - is not the result of another operation.
While Non-leaf tensors are intermediate results
- Only leaf tensors have their
.gradpopulated automatically afterbackward(). - If it is not leaf tensors even though it participates in backpropagation, it does not keep
.gradunless you explicitly request it by.retain_grad()
Notices
- Gradients accumulate
z.backward()
z.backward()
So, always clear gradients: optimizer.zero_grad()
-
Only floating / complex tensors can require gradients
torch.tensor([1, 2, 3], requires_grad=True) # invalid -
.gradis not part of the computation graph.gradis a buffer- It does not track gradients itself
The Training Procedures
Load Data
torch.utils.data.Dataset
torch.utils.data.DataLoaderDataset: Stores data samples and expected valuesDataLoader: groups data in batches, enables multiprocessing
You should override it.
E.g.
dataset = MyDataset(file)
dataloader = DataLoader(dataset , batch_size, shuffle=True)
# shuffle: True for training and False for testingHow to override?
from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
def __init__(self, file): # Read data & preprocess
self.data = ...
def __getitem__(self, index): # Returns one sample at a time
return self.data[index]
def __len__ (self): # Returns the size of the dataset
return len(self.data)
dataset = MyDataset(file)
dataloader = DataLoader(dataset, batch_size=5, shuffle=False)
Define Neural Network
Use torch.nn.Module (The super class of all models)
model.parameters(): Return all parametersmodel.children(),Model.Modules(): Manage its children modulestorch.save(model.state_dict(), 'model.pth'),model.load_state_dict(torch.load('model.pth')): Save and load parameters.model.train(),model.eval(): Change modes.
E.g.
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(MyNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
model = MyNet(10, 20, 1)
x = torch.randn(5, 10) # batch_size=5, input_size=10
y = model(x)
print(y.shape) # torch.Size([5, 1])torch.nn– Network Layersnn.Linear(in_features, out_features)

-
layer = torch.nn.Linear(32, 64) layer.weight.shape # torch.Size([64, 32]) layer.bias.shape # torch.Size([64])
torch.nn– Non-Linear Activation Functionsnn.Sigmoid()nn.ReLU()
More examples:
import torch.nn as nn
class MyModel(nn.Module): # Inherit from nn.Module
def __init__(self): # Initialize your model & define layers
super(MyModel, self).__init__()
self.net = nn.Sequential(
nn.Linear(10, 32),
nn.Sigmoid(),
nn.Linear(32, 1)
)
def forward(self, x): # Compute output of your NN
return self.net(x)Equals to:
import torch.nn as nn
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.layer1 = nn.Linear(10, 32)
self.layer2 = nn.Sigmoid()
self.layer3 = nn.Linear(32,1)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = self.layer3(out)
return outTraining and Testing
torch.nn– Loss Functions- Mean Squared Error (for regression tasks):
criterion = nn.MSELoss() - Cross Entropy (for classification tasks)
criterion = nn.CrossEntropyLoss() loss = criterion(model_output, expected_value)
- Mean Squared Error (for regression tasks):
torch.optim– Optimization Algorithms- For every batch of data:
- Call
optimizer.zero_grad()to reset gradient of model parameters. - Call
loss.backward()to backpropagate gradients of prediction loss. - Call
optimizer.step()to adjust model parameters
- Call
- SGD:
torch.optim.SGD(model.parameters(), lr, momentum = 0)
- For every batch of data:
Entire Procedure
"""
Setup
"""
train_dataset = MyDataset(file1)
validation_dataset = MyDataset(file2)
test_dataset = MyDataset(file3) # read data via MyDataset
tr_set = DataLoader(train_dataset, 16, shuffle=True)
dv_set = DataLoader(validation_dataset, 16, shuffle=False)
tt_set = DataLoader(test_dataset, 16, shuffle=False)
# put dataset into Dataloader
model = MyModel().to(device) # construct model and move to device (cpu/cuda)
criterion = nn.MSELoss() # set loss function
optimizer = torch.optim.SGD(model.parameters(), 0.1) # set optimizer"""
Training Loops
"""
for epoch in range(n_epochs): # iterate n_epochs
model.train() # set model to train mode
for x, y in tr_set: # iterate through the dataloader
optimizer.zero_grad() # set gradient to zero
x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
pred = model(x) # forward pass (compute output)
loss = criterion(pred, y) # compute loss
loss.backward() # compute gradient (backpropagation)
optimizer.step() # update model with optimizer"""
Validation Loops
"""
model.eval() # set model to evaluation mode
total_loss = 0
for x, y in dv_set: # iterate through the dataloader
x, y = x.to(device), y.to(device)
with torch.no_grad():
pred = model(x)
loss = criterion(pred, y)
total_loss += loss.cpu().item() * len(x)
avg_loss = total_loss / len(dv_set.dataset)"""
Testing Loops
"""
model.eval() # set model to evaluation mode
preds = []
for x in tt_set: # iterate through the dataloader
x = x.to(device) # move data to device (cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass (compute output)
preds.append(pred.cpu()) # collect predictionmodel.eval(): Changes behaviour of some model layers, such as dropout and batch normalization.with torch.no_grad(): Disable Autograd engine. Prevents calculations from being added into gradient computation graph. Usually used to prevent accidental training on validation/testing data.
"""
Saving and Loading
"""
torch.save(model.state_dict(), path) # Saving
ckpt = torch.load(path)
model.load_state_dict(ckpt) # Loading