Changing model Data

JulianSMoore · 11 November 2021 17:09

I am generating some test data to look at the Double Descent phenomenon as a regression because in a regression I can control numbers very precisely in order to look at “generalisation” etc.

I need to push a model until it shows the right behaviour and one way I can do this is make it

harder to approximate as the function becomes more complex
harder to “memorise” because there are more training data points supplied

To do this, I thought I would set up a single model and use different CSV files - with exactly the same structure etc. just different numbers of rows - by copying/pasting to the same path

Now, PL has a pre-processing pipeline and I suspect I will need to do something each time I update the CSV…

What is the best way to proceed?

robertl · 13 November 2021 14:09

Hi @JulianSMoore,
Cool project!

I think this will be a lot easier when this feature is done: https://perceptilabs.canny.io/feature-requests/p/load-a-model-into-an-existing-dataset
It’s getting spec:ed out right now so I can soon give you a time estimate on when to expect it.

If you want to start this a bit earlier, the best way to do it would likely be to create all the different CSV files, load them in as separate datasets in PL and create one model for them each (with the right preprocessing). Then you can build out the model you want to have in one of them and copy paste the components over to the other models.

Hope that helps!

JulianSMoore · 13 November 2021 20:52

Thanks @robertl

That sounds do-able for a couple of datasets… I’ll give it a go!

And yes, that feature-to-be sounds good

However - and I hope to remember to add these to feature requests - the following would be nice variations that could also help

Allow user to specify start/end rows of data to be processed for training/validation/test
Even more flexible: allow user to specify training/validation/test blocks by individual row ranges

(I like suggesting features that don’t require big functional changes but do provide more user control/flexibility)

robertl · 14 November 2021 14:36

Thanks!
I created some features for them here:
https://perceptilabs.canny.io/feature-requests/p/allow-user-to-specify-startend-rows-of-data-to-be-processed-for-trainingvalidati
https://perceptilabs.canny.io/feature-requests/p/allow-user-to-specify-trainingvalidationtest-blocks-by-individual-row-ranges

Changing model *Data*

Changing model Data