This will be a little hard to describe so bear with me.
When I do a long training session 3-4 days there seems to be a disconnection that occurs between the trainer and web end to were it stops receiving updates from the trainer like status, percentage, loss, accuracy all of it. This has happened twice and seems to only happen when it’s left to idle after it auto logs you out.
I left it to continue training without receiving updates till I saw the terminal window log post “INFO:perceptilabs.applogger:Finished epoch 10/10” and proceeded to restart Perceptilabs but unfortunately it still thinks the model is being trained doing a validation at 86% which it isnt and suppose it failed to save after epoch.
Not exactly sure what’s going on and it’s a little tricky to log on long runs like this and only things I could catch while it was still in the buffer was some “BadRequest: /mixpanel/track/” and a couple of “POST” with 400 23.
Any ideas?