How to concatenate scalars and one-hot encodings?

JulianSMoore · 23 July 2021 10:28

I tend to think of a one-hot encoding as another set of columns, and from that POV they should concatenate straightforwardly, but something there is that does not like an encoding… (c.f. Robert Frost & walls )

(This is essentially what I did directly in TF with numpy get_dummies for categories and then fed everything to a dense layer)

What is the correct/recommended way of combining the one hot encodings with numerical values?

robertl · 26 July 2021 11:45

Hey @JulianSMoore,

Hmm, since the input component automatically one-hot encodes (you can see it’s an array as output rather than a scalar) if it’s of categorical type, you have now one-hot encoded twice. Is this intended?
If you leave it at a single one-hot encoding, it should work fine to concatenate it with scalars.

However, if you are trying to concatenate a scalar to a 2d array, or the other way around, I can see how there could be some issues as you only can concatenate “items” which are of the same size on the concat dimension.

JulianSMoore · 26 July 2021 12:02

ho ho ho. I was running on automatic… didn’t notice it was one-hot encoding (what happens if I don’t want that to happen??) Thx

robertl · 26 July 2021 12:04

Haha, if you don’t want it to happen you can set the datatype to numerical and it won’t do any one-hot encoding
The categorical datatype is essentially the numerical one but with the one-hot, we probably should document that somewhere.