Training value error?

JulianSMoore · 16 November 2021 14:17

PL 0.13.1, Windows 10, Py 3.8.10

This is odd. It’s a regression, with only 61 rows. batch size = 32; loss was cross-entropy. There’s nothing wrong with the data in the CSV file as far as I can see.

After further experiments it seems the cross-entropy loss is the culprit - no problem if batch size = 32 with quadratic loss.

Error during training!
Traceback (most recent call last):
  File "perceptilabs\coreInterface.py", line 32, in perceptilabs.coreInterface.TrainingSessionInterface.run_stepwise
  File "perceptilabs\coreInterface.py", line 33, in perceptilabs.coreInterface.TrainingSessionInterface.run_stepwise
  File "perceptilabs\coreInterface.py", line 52, in _main_loop
  File "perceptilabs\trainer\base.py", line 174, in run_stepwise
  File "perceptilabs\trainer\base.py", line 289, in _loop_over_dataset
  File "perceptilabs\trainer\base.py", line 443, in perceptilabs.trainer.base.Trainer._update_tracked_values
  File "perceptilabs\layers\iooutput\stats\numerical.py", line 216, in perceptilabs.layers.iooutput.stats.numerical.NumericalOutputStatsTracker.update
  File "perceptilabs\stats\r_squared.py", line 76, in perceptilabs.stats.r_squared.RSquaredStatsTracker.update
  File "perceptilabs\stats\r_squared.py", line 89, in perceptilabs.stats.r_squared.RSquaredStatsTracker._store_r_squared_values
  File "c:\users\julian\anaconda3\envs\pl_tf250_py3810\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\julian\anaconda3\envs\pl_tf250_py3810\lib\site-packages\sklearn\metrics\_regression.py", line 676, in r2_score
    y_type, y_true, y_pred, multioutput = _check_reg_targets(
  File "c:\users\julian\anaconda3\envs\pl_tf250_py3810\lib\site-packages\sklearn\metrics\_regression.py", line 90, in _check_reg_targets
    y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
  File "c:\users\julian\anaconda3\envs\pl_tf250_py3810\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "c:\users\julian\anaconda3\envs\pl_tf250_py3810\lib\site-packages\sklearn\utils\validation.py", line 720, in check_array
    _assert_all_finite(array,
  File "c:\users\julian\anaconda3\envs\pl_tf250_py3810\lib\site-packages\sklearn\utils\validation.py", line 103, in _assert_all_finite
    raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

Model looks like this:

Data file attachedUniform-X, Nyqist x 5.zip (1.7 KB)

robertl · 17 November 2021 12:42

Hey @JulianSMoore,

Hmm, it seems like the first row is the issue where x = 0.
I’m guessing that this causes Cross-Entropy to divide by 0 in some way, causing a NaN somewhere.
Remove that row and everything runs fine

JulianSMoore · 17 November 2021 13:05

I didn’t think of that I’d call that a cross-entropy bug, except that IIRC there’s a log (x) in there - and see Chris Olah’s blog where he protests the awful cross-entropy notation that makes it look symmetric when it isn’t!

Which makes the question: can this be avoided in future - because it will occur again, and again, for other users experimenting with loss functions?

It would be nice if one could define 0 x nan = 0