Key AI Terms and Metrics
  • 30 Jul 2024
  • 3 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Key AI Terms and Metrics

  • Dark
    Light
  • PDF

Article summary

This section defines key ML definitions and concepts that will help operations teams understand the AI process used to operationalize the Ushur SmartMail solution.

Machine Learning

Machine Learning (ML) is an application of Artificial Intelligence (AI) that provides systems the ability to perform tasks without being explicitly programmed. It involves computers learning from data.

Let’s use a simple classification example to describe the key Metrics used in Machine Learning to evaluate performance. Consider an email classification model that classifies 100 samples as either “spam” or “not-spam”. In this example, there are nine spam and 91 not-spam mails. This is a heavily imbalanced dataset, as > 90% of the set is of type not-spam. Note that the dataset will be fine-tuned in a real-world ML deployment to be more balanced to improve model performance.

This spam prediction model can be summarized into a 2 x 2 “confusion matrix,” as shown in the diagram below, which lists all the possible outcomes. A “True Positive” is an outcome where the model correctly predicts the “spam” class. A “True Negative” is when it correctly predicts the “not-spam” class. On the other hand, a “False Positive” is when the model incorrectly predicts the “spam” class. A “False Negative” is when it incorrectly predicts the “not-spam” class.

For perspective, the confusion matrix is used by the operations team to maintain accuracy in production. This exercise is described briefly in the section that follows. Ushur includes the confusion matrix as a topic in Onboarding Training.

Metrics Summary

There are three metrics organizations rely upon to assess model performance: accuracy, misclassification, and default. Each is addressed in the section below.

Topic Accuracy

Topic accuracy is the ratio of the number of correct predictions per topic to the total number of results per topic.

Topic Accuracy = Total number of correct results per topic/ Total number of results per topic.

In this example, the topic accuracy of spam is 11%, and that for not spam is 98%. The overall model accuracy is (1+90)/100 = 91%. In the given example, using just overall classification accuracy can lead us to incorrectly gauge the model’s predictive power as the spam topic accuracy is not great. However, if you just looked at the overall classification accuracy, you may overlook this performance impact. This is why Ushur recommends using per-topic accuracy metrics to determine the performance of the model.

Misclassification

Misclassification per topic identifies the number of model predictions that did not match the labeled samples (or source of truth).

Misclassification = Total number of incorrect results / Total number of results

In this example, the misclassification of spam is 88%, as 8 out of 9 emails were incorrectly classified as not-spam. In this scenario, there is an extreme case of misclassification. It is likely because the training data is imbalanced and is heavily skewed towards not-spam samples.

Note

Misclassification should be considered as an indicator of a need for improvement in the modeling exercise as described in Appendix A - Monitor and Maintain. Correctly identifying the root cause of this metric and addressing and iterating improves the topic-level accuracy.

Default

One of the tactics deployed by Ushur when there is a high level of misclassification is to tailor the qualified ML model. This helps categorize results into a Default category instead of selecting one of the defined topics. The parameters that define the Default category can be set based on the topic confidence level and/or by training a distinct dummy topic as Default.

What this means in our example is that Ushur can refuse to categorize an email as either spam or non-spam if the confidence level of the model in determining the class is not sufficiently high. In that case, the model will classify it as default. This set of emails can be manually reviewed, correctly labeled, and used to retrain the model to improve the model performance.


Was this article helpful?