I am generating some test data to look at the Double Descent phenomenon as a regression because in a regression I can control numbers very precisely in order to look at “generalisation” etc.
I need to push a model until it shows the right behaviour and one way I can do this is make it
- harder to approximate as the function becomes more complex
- harder to “memorise” because there are more training data points supplied
To do this, I thought I would set up a single model and use different CSV files - with exactly the same structure etc. just different numbers of rows - by copying/pasting to the same path
Now, PL has a pre-processing pipeline and I suspect I will need to do something each time I update the CSV…
What is the best way to proceed?