The colors correspond to the number of samples that are classified as one thing or another. For example, if you have many samples for one class and few for another, then you have an unbalanced test dataset. If the samples are not on the diagonal, then they are falsely classified as another class, making it easy to see if any one class has better classifications than any other.