MineralId: On choosing the appropriate evaluation metric
This is an ongoing series about the progress on improving our mineral identification web app.
TL;DR:Permalink
Choosing the correct evaluation metric matters. In this post, I talk about why the evaluation metric nDCG@k (normalized Discounted Cumulative Gain), typically used in recommender systems, is the appropriate metric for the task.
Mineral classification is an inherently multi-class multi-label problemPermalink
Minerals typically do not occur in isolation in nature. For example, in the photo below shows a rock with uvite crystals in talc schist. This means that classifying the image as uvite or talc are both “correct”, i.e., a multi-label problem.

However, we find that people normally do not go to such extent to specify all the minerals in their photos. Labeling everything exactly would be crazy expensive, therefore, we have to tackle this inherently multi-label problem as a single label classification problem.
nDCG as the appropriate metricPermalink
A good evaluation metric helps frame the problem and can give us feedback on how well we are doing. Typically, accuracy, precision and recall are good enough metrics for straightforward classification problems. However, it is clearly that these metrics might not be appropriate, since they would take the maximum likelihood estimates (mineral with highest probability in the output vector). In the example above, if the classifier decides to give talc 70% and uvite 30%, then all the three metrics would be hurt.
Instead, it is more appropriate to treat this as a ranking problem where nDCG is a good evaluation metric.
Typically, you would normalized
Interpretation of nDCG@k for mineral classificationPermalink
The nDCG metric gives us much more information on how well our classifier is doing. In our use case, we typically show the user a ranking of possible mineral classes for the given image. It is likely that the user would look at the first few suggestions before moving on.
This means that we would like to have the correct mineral class within the top
Moreover, computing the average nDCG over observations with non-zero nDCG
for a particular class would tell us what’s the average position of the correct
mineral class if we manage to get the class within the top
Lastly, we can use the metric to compute the precision and recall for getting
the correct mineral class within top