Machine learning models for image classification often use convolution layers to extract features from images while employing max-pooling layers to reduce dimensionality. The goal is to extract increasingly higher-level features from regions of the image, to ultimately make some kind of prediction such as an image classification.