And...we're back! Data wizard issue (because 18 cols?)

JulianSMoore · 23 July 2021 09:06

PL 0.12.16 - much smoother all around, however, I specialise in cracks

Win 10 21H1, TF2.5 with the right cudart, cuDNN (all seem to load OK at PL startup)

I have now exported a CSV file from my python pre-processing pipelline for photometric redshift estimation. On attempting to create a new model I get this when I try to select whatever is on the 2nd row - the columns (of which there are 18 - including the 1st which is the pandas dataframe index column to be ignored) overflow and there is no selection action (should be ?numeric/categorical IIRC)

It was the same in 0.12.15 but I updated to 0.12.16 before reporting.

Any ideas?

UPDATE Columns still overflow but adding a label in the CSV for the index column (on the left) at least makes selection actions possible.

And… ooops! I chose z_spec as the target but mag_w1 seems to have been set… ?Not handling arbitrary numbers of columns correctly?>

JulianSMoore · 23 July 2021 09:32

Adding issues related to the same model in the same thread… I’ve also attached the source CSV for you to reproduce this with.

My guess is that PL is not optimised for mixed numerical/categorical types like this yet, is that so?

When I customise the model

a) model top left is off the top of the screen
b) there were initially 4 unhandled problems, but selecting all to drag the elements into view reduced that to 2 (I don’t think there should be errors for an autogenerated model) - see 2nd image below, esp. for the merge error
c) after rearrangement (see pic 3) we see the connections are not as expected (and the problems are now 4 again)
d) this is a multiple input regression - what form should it take in PerceptiLabs given the mix of numerical and categorical variables?

9772_4PL.zip (2.3 MB)

JulianSMoore · 23 July 2021 09:43

There are in fact a couple of other inputs off screen… but only 16 in total whereas there are 18 columns in the CSV - with only (one) I thought marked unused. (UPDATE of course, one became the target!)

More significantly, I can’t see any way to tell which inputs correspond to which columns. Is there a way to identify them?

Regression? Merge node (that has no preview) is selectable by cursor (click or crossing drag) in the non-existent preview extent

UPDATE I think inputs are numbered in column order, because inputs 11 & 12 correspond to the categorical columns 12 & 13 - the max x-values match the number of distinct categories in each column. Can you clarify why would there be a Dense connected to categorical like this?

It would be nice if inputs were labelled with the CSV column label - or at least if there were an option for that.

robertl · 26 July 2021 09:59

Hey @JulianSMoore,
Great example of why we need to get the array data type into the tool

Some follow-ups:

PL is not yet optimized for multi-input, but functional (although clunky).
It looks like your CSV is missing a title for the first column, which looks to cause an issue in the Data Wizard (you are not getting recommendations).
We have on our July Todo (it might be pushed a little into August though) to name the components based on their columns to make identifying them easier. (I just now saw that you recommended that as well).
The inputs are in the same order as the columns.
How it’s supposed to work is to create an encoder for each input and a decoder for each target. I believe that numerical inputs don’t have any encoders but categorical has a Dense component as its encoder. For your case, you may want to delete those encoders and connect everything up to a merge component and then use a single Dense between the Merge and the Target.
I think the initial Merge component that’s placed is broken, I would recommend deleting it and adding a new one (again, multi-input’s a bit clunky right now).

JulianSMoore · 26 July 2021 11:01

Hi @robertl Yes, the missing column 1 label was noticed and fixed - obviously after I attached the CSV in another Q I did rewire in a similar way to what you describe, but maybe I also need to replace the Merge as suggested.

robertl · 26 July 2021 11:13

Ah I have yet to get to that topic, yea replacing the Merge layer should help, let me know if it doesn’t

JulianSMoore · 26 July 2021 11:17

Sadly the behaviour of the merge node is quite maddening if one wants to change the number of inputs… I was trying to add progressively, adding one connector at a time but forgot that I can’t type “10” because “1” is converted automatically to 2 and then somehow I reduced the number of inputs and lost existing connections. Adding the scalars was going ok up to that point…

Also confused by the concat dimension, which I think should be 0 but -1 is default and seems to work. Did I ask about that once before?

robertl · 26 July 2021 11:26

Sorry about that, seems that entry box is not very friendly to use at the moment.

-1 defaults to the last dimension, so if you only have 0 as your dimension it will pick that one.

JulianSMoore · 26 July 2021 12:00

Thx for dim info - that explains it. Gives the same result if the data is only 1 dimensional

Alas, I can’t get the merge to work… I’ve tried this about 4 times now and I finally got up to 9 scalar inputs and then it errored, but deleting all the way back to 4 didn’t clear it. Also picking up outputs of the input components is hampered by the connector proximity selection (I keep old connections until replaced by new set)

I’ll try again when there’s a PL update that should resolve this.

robertl · 26 July 2021 11:44

Sounds good, hopefully we can get it out soon
Before you stop though, would you mind sending the error you got (and maybe a screenshot)? It sounds strange that it throws a random error just because a new component is connected, given that the data in that component is similar to others.

Edit: I just remembered that you sent the dataset for this, I’ll try it locally also and see what happens.

Edit 2: I finally read your other thread which I guess contain the error, and gave an answer there

JulianSMoore · 26 July 2021 12:00

I think between those two threads you’ve got it covered now