By the end of this 3-hour session you should be able to:
explain what a tensor is and why machine learning frameworks are built around it,
create tensors from Python data, NumPy arrays, and built-in factories,
inspect and reshape tensors confidently,
write small numerical programs using broadcasting and matrix multiplication,
move computation to a GPU,
read and write [C, H, W] image tensors and apply simple filters to them.
The tensor is the only data structure deep learning really has. Every model input, every weight, every gradient, every output is a tensor. Spending three hours getting comfortable with them pays off in every chapter that follows.
A tensor is a multi-dimensional container for numbers. You already know its low-dimensional cousins:
Math name
Tensor name
Example
Number
scalar (0-D)
7
List of numbers
vector (1-D)
[1, 2, 3]
Table
matrix (2-D)
[[1, 2], [3, 4]]
Cube of numbers
3-D tensor
RGB image, time-series of frames
…
n-D tensor
Mini-batch of RGB images: [batch, channels, height, width]
Why does deep learning need them?
Hardware fits. GPUs are designed to execute the same operation on millions of numbers in parallel — exactly what tensor operations do.
Calculus fits. Backpropagation reduces to repeated matrix multiplications and element-wise functions. Tensors are the natural type for both.
Models fit. A neural network is essentially a stack of tensor operations. The “weights” of a layer are tensors and the data flowing through it is tensors.
In-place variants end with an underscore: x.add_(10) mutates x. Most of the time you should avoid them — non-mutating code is easier to reason about and plays better with autograd.
When shapes don’t match exactly, PyTorch tries to broadcast one tensor across the other. The rule, applied from the right:
Two dimensions are compatible when they are equal or one of them is 1.
py title="main.py"
import torch
a = torch.ones(3, 4) # shape (3, 4)
b = torch.tensor([1, 2, 3, 4]) # shape (4,)
print(a + b) # b is repeated for every row
c = torch.tensor([[10], [20], [30]]) # shape (3, 1)
print(a + c) # c is repeated across columns
If the rule fails, you get RuntimeError: The size of tensor a (...) must match the size of tensor b (...). The fix is almost always unsqueeze, view, or transpose so the shapes line up.
mean() requires a floating dtype — cast with .float() first if you started with integers.
You can also aggregate along a single axis:
py title="main.py"
import torch
x = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(x.sum(dim=0)) # tensor([5., 7., 9.]) sum over rows -> per column
print(x.sum(dim=1)) # tensor([6., 15.]) sum over cols -> per row
A useful mnemonic: dim is the dimension that disappears.
import torch
x = torch.arange(1, 10)
print(x)
print(x.reshape(3, 3)) # change shape, same data
print(x.view(1, 9)) # alias of the same data
print(torch.stack([x, x], dim=0)) # add new outer dim
print(x.unsqueeze(dim=0).shape) # torch.Size([1, 9])
print(x.unsqueeze(dim=0).squeeze().shape) # torch.Size([9])
print(x.reshape(3, 3).T) # transpose
print(x.reshape(3, 3).permute(1, 0)) # equivalent for 2-D
view only works on contiguous memory; reshape always works (it copies if needed). When in doubt, use reshape.
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
print(x[0]) # first matrix
print(x[0, 1]) # second row of that matrix
print(x[0, 1, 2]) # the scalar at row 1, col 2
print(x[:, :, 0]) # first column of every matrix
print(x[0, :, ::2]) # every other column of the first matrix
Boolean masks are particularly useful:
py title="main.py"
import torch
x = torch.arange(10)
print(x[x > 5]) # tensor([6, 7, 8, 9])
x[x > 5] = 0 # zero out elements above 5
print(x)
torch.from_numpy keeps the original dtype. NumPy floats are float64; PyTorch defaults to float32. Cast with .float() if the tensor is going into a model.
A tensor on the GPU cannot be converted to NumPy directly. Bring it back to CPU first: tensor.cpu().numpy().
torchvision reads images for us. The result is a [C, H, W] tensor of uint8 values in [0, 255].
py title="capstone.py"
import torch
import matplotlib.pyplot as plt
from torchvision.io import read_image
# Any small JPG/PNG works. You can use a photo of your own.
image = read_image("cat.jpg")
print(image.shape, image.dtype) # e.g. torch.Size([3, 300, 400]) torch.uint8
# matplotlib expects [H, W, C], so permute the axes.
plt.imshow(image.permute(1, 2, 0))
plt.axis("off")
plt.show()
A blur replaces every pixel with the average of its 3×3 neighborhood. We can implement that with a single torch.nn.functional.conv2d call. The kernel is a (out_channels, in_channels, kH, kW) tensor of 1/9 values.
py title="capstone.py"
import torch.nn.functional as F
kernel = torch.ones(1, 1, 3, 3) / 9.0
# conv2d expects [B, C, H, W] of floats; one channel at a time.
def blur(channel: torch.Tensor) -> torch.Tensor:
x = channel.float().unsqueeze(0).unsqueeze(0) # [1, 1, H, W]
out = F.conv2d(x, kernel, padding=1)
return out.squeeze().clamp(0, 255).to(torch.uint8)
blurred = torch.stack([blur(image[c]) for c in range(image.shape[0])], dim=0)
plt.imshow(blurred.permute(1, 2, 0))
plt.axis("off")
plt.show()
The trick is the shape juggling, not the math. Practice reading the comments — every line is a tensor reshape.