Testing a model: Precision and Recall the exact same value as categorical accuracy

PBrams · 5 November 2021 13:43

Hi everyone,
I’m creating an image classification model (CNN) with fNIRS time series data converted to images. I’m doing a 3-class classification currently and it achieves validation acc at 20% and training acc at 40%. When testing, it achieves 44% categorical accuracy - but Recall and Precision are the exact same value of 0.44 as well. This seems extremely weird - is it a bug? It has happened more times over with other accuracy scores as well. It also says top K categorical accuracy of 1.000.

TL;DR: Accuracy, Precision, Recall show the exact same number and are also very much higher than the validation accuracy. Isn’t this weird?

Best,
PBrams

JulianSMoore · 5 November 2021 15:35

Hi @PBrams!

Welcome and: excellent find I think @robertl will have a quick answer about the duplicate values because, IIRC, it has been mentioned once before - I guess fixing it just didn’t make it into the V 0.13 release.

i.e. yes, unless someone from PL says otherwise it’s a bug!

I’m curious… there can’t be many people doing ML on fNIRS time series, so would I be right in thinking you’re working with @esbenkran?

How’s it going?

PBrams · 5 November 2021 15:46

Thanks for the reply Julian! Your inference is extremely correct - me and my colleague Klara are working with @esbenkran in doing this - we’re all three enrolled in the Cognitive Science BSc at Aarhus University in the same cohort How fun that you know each other; small world! So far, it’s going well but isn’t easy at all as the data can be challenging to work with since Klara and I are working with Tensorflow/Keras/Perceptilabs for the first time here, so we’re grateful for these quick replies on the forum

Just to be clear though - given that it is a bug (unless someone from PL says otherwise) - is there any other way to access the correct values?

Best,
Pernille

JulianSMoore · 5 November 2021 16:16

It’s not really that small a world - I started talking to Esben here (because I do brain stuff too, in a theoretical sort of way)

Unfortunately I don’t know how else to access the correct metric values, but I doubt you’ll have long to wait before someone from PL helps out.

Once upon a time everything was a problem… now I think it’s much more about data handling and ML concepts because things like PL make it so much easier to do the messy correlation/prediction stuff (otherwise you’d have to do a lot of Granger causality calculations etc. - probably!)

I’ve done a bit of work on time series myself (don’t expect too much!) so I’d be happy to comment on anything in that area too (one thing I did do was special windowed fourier code… but, IIRC all your time series are of fixed length so windowing wouldn’t be necessary. Could be interesting to see whether wavelet decomposition would be of assistance - via re-encoding as CNN for now too…)

Don’t ask me about wavelets though… I barely know how to spell the word

robertl · 5 November 2021 16:27

Hi @PBrams,
Welcome to the forum!

It’s a bug…-ish
What happens is that it calculates Precision and Recall for every class and then averages them, which just so happens to end up as the same value as if you calculate the accuracy.
To fix this, we are going to let you view the metrics for each class by themselves, which will come in together with a larger update of the Evaluation view.

Until that happens though, the best way would be to look at the confusion matrix instead to see how well individual classes perform There you can see what percentage of the time it classifies correctly, what percentage of the time it classifies as a specific other class and how often it classifies as another class as itself - all the things that go into making Precision and Recall.

Hope that helps!
All the best,
Robert