If you have 2000 images, at the default setting (70:20:10) it will use 1400 for training, 400 for validation and 200 for test per epoch.
Note that PL makes the split early on and then does not mix samples between them, so the 1400 will be used for training 100 times, 200 100 times for validation, etc. Randomisation only occurs within splits.
Yes, increasing dropout is one way of trying to avoid classical overfitting - it (somehow, I know the principle but the specifics elude me) helps prevent memorisation of the training data if the model size is large compared to the dataset size. (I should really get a better handle on Batch Normalisation, which I donât really understand right now)
Model Size
Within TF thereâs a model.summary function that will tell you how many parameters a model has, hence what load it would present - itâs not accessible via PL right now (but see this feature request and upvote if you like it). However, you can calculate by hand see e.g. here. Maybe @robertl has other ideas. Itâs certainly not a small model, but with GPU (I have a 1080 Ti) I found very little difference in training speed up to about 75k parameters - havenât tried anything really big myself)
Dropout Can Cause Validation Loss < Training Loss
One thing about dropout is that (see e.g. here, but this google query will lead to other reasons, such as biased splits ) the loss on training is calculated with the effects of dropout but not during validation/test. Thus if your dropout rate is âhighâ validation scores can in fact be lower than the training scores, which goes against naive expectation.
I would therefore consider an initial run to see what the behaviour is before adjusting dropout or anything else (such reordering your CSV)
A Very Non-Standard Approach
Hereâs a technique for effectively NON RANDOM sampling I came up with specifically so I could inspect for generalisation: if it is possible to rank the targets on quality/âdifficultyâ (I used the std. dev. of observations in the regression inputs⌠low SD data was better/easier), manually order your data so that the hardest examples are represented in both the training and test datasets, but with fewer âeasyâ targets in the test data set (watch out if you change the split % afterwards though).
Then, if your model performs well in test you need to worry less about sampling effects (of course the more data you have, the less sample sizes can affect things). Normally, better performance on test is a bad sign, but with this approach, better performance on test can give confidence in model generalisation.