Super resolution model with simple FastAPI serving

birdstream · 19 September 2021 08:29

https://github.com/birdstream/Super_Resolution

This app takes an image and ~~doubles the resolution~~ upscales it to 4x… with A.I!

The model itself is as simple as it can be: First, the image is scaled down to ~~0.5x~~ 0.25x, then back to 1x. This is then fed to a ~~attention-enabled~~ U-Net ~~with ResNet50v2 as backbone~~. The resulting output will have the same dimension as the input, but (hopefully) with the pixels that was lost from scaling down and then up again recreated. Doing the model this way i only had to use one set of images as training data and simply let the target be the same as the input in my .csv file.

The model is served via a simple (reeeally simple!) web interface with FastAPI: You just select an image file, (if browsing from a smartphone, you can use the camera, too!) click upload and the response will be the 2x 4x upscaled image. It can accept any input size; The program will slice the image into blocks the size the model was trained on, make a prediction on each of them individually, and then stitch them back together.Preformatted text

JulianSMoore · 7 September 2021 06:51

Congratulations @Birdstream on producing the first model shared by a user.

Not only that, I think the way you approached it was really neat, and it looks great too.

AND you can use a smartphone camera??? That is so cool!

robertl · 7 September 2021 08:52

This is really cool!!

JulianSMoore · 7 September 2021 13:29

I guess there’s a proper way, but I always use DownGit to download, in case that helps anyone.

birdstream · 17 September 2021 12:23

For those of you who might be interested, I just wanted to let you know that i’ve given this project some love over the week I’m working to improve the model and be upscaling to 4x instead. Added a gaussian noise layer to make the model more robust etc. I’ll likely push the changes sometime this weekend

JulianSMoore · 17 September 2021 13:11

I think the use of gaussian noise in preference to dropout is a very practical idea - I’d love to know how much difference it makes - and how adding noise works in this use case…

Any chance you can share the PerceptiLabs model.json too?

birdstream · 17 September 2021 15:08

Can i just share the json file found in the Perceptilabs model folder? Or would you need the .csv and data files too?

JulianSMoore · 17 September 2021 20:39

model.json alone would be a start, but CSV + datafiles would make a turnkey demo

birdstream · 19 September 2021 09:57

Pushed the changes now and also added releases with the Perceptilabs model.json and the training dataset

JulianSMoore · 19 September 2021 16:20

Is there a way to download the dataset in one go? The “downgit” approach applied to “https://github.com/birdstream/Super_Resolution/releases/tag/dataset” gave a “Server failure or wrong URL” error…

birdstream · 19 September 2021 17:06

Hmm, I’ve never used downgit but I guess maybe that only works for the repo and not for releases

My initial though was to just push the files to the repo but I guess not everyone is keen on pulling a repo that just jumped up to over 2GB

JulianSMoore · 19 September 2021 18:08

2GB - ha! - a mere bagatelle in these days of multi-terabyte disks! Mention it not! But you’re right, not everyone would want all that. Oh well, I suppose the mouse could do with the exercise

I wasn’t clear about what was in the different “source code” things… . are zip and tar.gz the same, just packaged differently?

birdstream · 20 September 2021 05:21

Tbh i don’t really know because the source code stuff is something github puts there it’s the Zip you should go for

JulianSMoore · 20 September 2021 09:46

Hi @birdstream

Well, I finally downloaded the releases from github - datasets and PL model.- and extracted them. Many thanks - though I would appreciate your input/help on the image size issue below.

Info to others: don’t be confused by seeing the same file used as both input and target in the CSV - when you open the PL model itself you will see that the model generates low res versions to learn from by down-sampling/rescaling to 1/4 size to support the target x4 up-sampling.

Issue 1: For info, the CSV file refers to images in a ‘cropped’ sub-folder, whereas the archives place the images in a “faces” folder. Workaround: a simple global search & replace in the CSV file should take care of file placement.

Issue 2: the model seems to be expecting images of 224 x224 but the images in the dataset are 1024x1024. Does it matter? (I seem to recall some discussion between you and @robertl on this or a related topic). If it does matter, how would you recommend dealing with difference in image sizes?

Thx!

birdstream · 20 September 2021 12:23

The answer is simple, i accidentally uploaded the original dataset sorry for that i’ll fix it when i get back home

JulianSMoore · 20 September 2021 12:47

Ah ok, I was also going to say I had discovered 112k rows in the CSV but only 28k images… and no _00x All clear now!

birdstream · 20 September 2021 19:05

Sorry for the delaym but it’s up now. Hopefully right this time

JulianSMoore · 21 September 2021 08:22

7zip says that training_data.z06 is missing…

birdstream · 21 September 2021 09:54

Noo hahah you have to be joking, right?

Would you mind trying another zip program? There is no .z06, and p7zip (Ubuntu) opens and extract the files just fine

JulianSMoore · 21 September 2021 15:30

TL;DR I have sorted it. Mea culpa, Joakim!

The masochists in the audience may enjoy the longer, more painful version; the technical elite may be content with smirks of smug superiority at what follows.

Please bear in mind that I intended to repeat exactly what I had done previously with the .z01 to .z24 files of the initial release - downloaded everything in the release (remember that, it will be important!), selected them all, right-clicked and extracted with 7zip from the context menu. No problems: I got 28k files… 28k good files, even if they weren’t the right files. It follows, logically, that on at least one occasion, I knew what I was doing - and despite the flaws of the inductive method, maybe, just maybe, I was not doing anything wrong this time either.

Anyway, when there was a new release to process I deleted the 1st 24 zips. Am I sure I did that? Yes… because if I hadn’t done that, the new downloads would have prompted to overwrite or rename and that did not happen, so I am sure I had the right files.

Except <sigh> I didn’t download everything in the release… and I might not have downloaded training_data.zip, which I think is in fact index 0 of a set of split archive files -in the zip files there is a Volume column… and looking in the (latest) zip the files are in volumes 0-5. That might have confused 7zip - I selected only the numbered .zXX files and it might have inferred .zip for volume 0 and got rather confused when it wasn’t the right .zip… though I would have expected the error behaviour to be different. i.e. Yes! I did have the right files! Alas, not all of them

The gory details of the 7zip struggle were: not only did 7zip complain about .z06 being missing, there were heading errors (5760 files) and ~22k “unavailable data” errors for a total of 27, 579 errored files. 421 files were however extracted, but they were 1k x 1k files, IIRC.

But, rather than assume I had the moral high ground, I did try what @Birdstream suggested.

And the lesson is, when tech support ask you to check your device is plugged in and switched on, just go with the flow… regardless of certainty, one day you will find that the cleaner unplugged it for you.

Anyway. First I installed peazip; that couldn’t handle the archive format. So I then installed zipware, and that also complained about errors… but, remember the wrong zip file? That was in play there too.

Finally, I fired up an ubuntu vm, installed p7zip to be just like @birdstream’s system and had a look at the files that I downloaded again from github - they were fine!

So I re-downloaded for Windows - but this time I took the .zip and the .zXX files, and lo and behold, there were all the cropped images.