Super resolution model with simple FastAPI serving

birdstream · 21 September 2021 15:56

Lovely read
Glad to hear you sorted things out

Btw, I would expect that training this model to produce sharp uspcaled images may take hours if not days on a consumer grade GPU… which is a bit of a pain right now because of the slow-down issue… If you decide to train it, please let me know how it goes for you, as your rig is probably a little more powerful than mine

Edit: btw, you probably want to train the model with noise for a couple of epochs, then decrease it, run some more, decrease it and so on until there is practically no noise left… at least thats what i think right now. Exploring that path is taking som time though

JulianSMoore · 22 September 2021 08:23

I have finally started training! It looks as though it will take ~9-10 hours based on 11% progress after ~1hr. Not too bad. (NB I did start training from scratch.)

Question: 112k images and batch size = 8… that’s quite a small batch size.

Was it specified after experimenting with other batch sizes? I wonder whether training would be faster with larger batches (answer: surely yes, PL has callbacks on batch end and there’s other overhead, but I wonder how much data is being transferred - inefficiently?- to the GPU per batch. In theory, I think my GPU could keep ALL data in memory, I’m just not sure how this bit works!)

Info for @robertl… it took ~3 minutes for the training panels to become “active”, during which time there is no indication of what is happening or how long it thinks it will take before training starts - is that something that can be added.

And… recalling a slack discussion about training slow-down there seem to be a number of things that could be done to speed things up…

The info display is absolutely great for model development - it’s a major strength
However, once the model is “developed” and ready for a full training run you could
2a. Still acquire data, but turn off live updates to screen until e.g. refresh button is clicked (or do every nth batch, or…)
2b Don’t acquire data at all (if that makes it even faster.) and just show a message "Disabled for

Could PL give simple ETA for training completion? One can sort of do it in one’s head based on progress but it would be nice to have a date/time for completion.

What’s the internal thinking on performance these days?

robertl · 22 September 2021 09:21

Info for @robertl… it took ~3 minutes for the training panels to become “active”, during which time there is no indication of what is happening or how long it thinks it will take before training starts - is that something that can be added.

Thanks @JulianSMoore!
I think this is the one you are looking for: https://perceptilabs.canny.io/feature-requests/p/more-informative-loading-when-starting-training
It’s quite high on the roadmap, should come out this year.

As for performance, we are first looking for a solution to fix the progressive slowdown, that shouldn’t be too bad but is behind some other fixes.
As for the training speed vs gathering data in general, we used to have something called “headless mode” which did something similar to your 2b suggestion. Whenever you were not looking at the statistics it would only gather accuracy and loss (for history) and nothing else.
We are looking to re-introduce something similar where you can toggle how much visualizations vs how fast model training you want to have.

As for the ETA, we can as soon as the training has started, and will for sure add something like that. I had forgotten to add that feature to Canny, added it here now (as a Feature Draft until I fill out the spec a bit more): https://perceptilabs.canny.io/drafts/p/allow-the-option-to-train-with-less-visualizations-for-faster-training

birdstream · 24 September 2021 05:02

I often run into OOM’s when using higher batch sizes, so i set it a bit lower to be on the safe side… running this model with batch size 8 still consumes most of the 4GB
on my GTX980 anyway Surely, with more VRAM one could use higher batch size This model has millions of trainables, and since all activations needs to be stored during the forward pass, it adds up pretty easy. This is something gradient checkpointing could remedy a bit, but unfortunately there doesn’t seem to be an official implementation for this in TF. There is however in Pytorch. (Hint: look at the modified U-Net model in my colorization project on github)