Classifying Chest X-Rays to Detect Pneumonia

Timely diagnosis of diseases has been at the core of artificial intelligence in healthcare for the last 50 years. To accurately diagnose ailments, doctors need every possible tool at their disposal. Now with the growing use of machine learning (ML) in healthcare, we wanted to see how quickly we could create an image classification model that could give doctors an upper hand in diagnosing patients’ ailments, specifically pneumonia.

To accomplish this, we built an ML model that can classify images of chest x-rays as either normal or infected. This involved included preparing and wrangling the training data, building a .csv file to map that data to the two classifications, and iterating with the model in PerceptiLabs. Let’s see how we did!

Dataset Sample

Workspace

Statistics View

Accuracy Plot

2 Likes

Does the validation accuracy and training accuracy seem weird to anyone else? Or is it just me?

I think this is one of those cases where weighting should be used because correct/incorrect infection/ no infection are not equally important. A bit like fraud detection, you do not want to miss the rarer events. However I haven’t looked at the balance of the clear/infected samples in the dataset. That said, it all seems done by the end of epoch 1 and somehow that doesn’t feel right, even though it doesn’t seem to have overfitted (validation accuracy less than training).

Also, remarkably little variation between epochs… I can’t recall whether PL does shuffle between epochs (or whether there’s an option for it) but that seems a little odd to me. No idea what the dataset size is though.

What exactly were you thinking of?

(NB I don’t like PL’s choice of colours, they’re not different enough to me when there are small samples, e.g. on the legend)

From the looks of it (in the Predictions for Each Class graph), it’s a little unbalanced, but not badly. 345 to 200-something (we really need to fix that text :sweat_smile:).
The full dataset (the smaller set in the Test folder was used) is about 600 samples: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Maybe it’s the small dataset size which has this effect, or MobileNet just does a lot of heavy lifting combined with the dataset unbalance :thinking:
It does classify it well without overfitting though as you mentioned, and from what I recall it did well on the testing partition as well (do you have an image of the confusion matrix @aruneshm)?

I can’t recall whether PL does shuffle between epochs

We do, if enabled in the Training Settings :slight_smile:

(NB I don’t like PL’s choice of colours, they’re not different enough to me when there are small samples, e.g. on the legend)

We have some color changes on the way for this :wink:

1 Like

I think my notifications were turned off so I didn’t see the message :slight_smile: But yeah from what I remember the testing accuracy was close to 92%. I don’t have a screenshot as the Test Hub was disappearing instantly but yeah It did a pretty good job in the test as well!