Better assessment of disease progression with ranking SVM

00 Blog Kugler - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© ML2R

During clinical studies, it is crucial to evaluate the effectiveness of therapy and monitor disease progression over time. These evaluations can be made based on determining the activity level of the disease (severity of a disease). Therefore, clinical studies include not only objectively measured values, such as laboratory results, but also subjective expert assessments to determine the current activity of the disease under investigation. The methods for assessing activity vary depending on the disease. This article focuses on assessing the activity of Psoriatic Arthritis (PsA), a form of psoriasis involving joint inflammation.

Because subjective assessments of PsA activity are subject to significant fluctuations due to differences in expert knowledge and physician intuition, a method is presented here to predict a more robust and stable assessment of the patient status.

Assessing patient activity

There are generally two widely used medical methods to determine the current status of a disease:

  1. Firstly, the status can be determined by physicians using numerical disease activity ratings. This score requires domain expertise and is highly subjective since no strict diagnostic criteria is established.
  2. Secondly, disease status can be calculated based on existing symptoms. This method is more objective but also requires domain knowledge to correctly weigh individual attributes.

In a study by Fraunhofer IAIS and ITMP, the Ranking SVM method was developed to predict a more robust and stable assessment of PsA activity from existing medical evaluations of the disease. The prediction involves a combination of the two existing assessments. The methodology for solving activity determination is described below.

How does the ranking SVM rank disease activities?

The described challenge in assessing disease activities can be attributed to the problem of ordinal regressions. An ordinal regression is a combination of pure classification and regression: data points are assigned to different classes, and there is a linear order between the classes. An order between data points can be used not only to classify disease activities but also in information retrieval and econometrics, among other areas. This order typically represents a preference: Object A is preferred over Object B (AB). The method will be described below using the example of PsA patients.

neu ord EN - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© ML2R
An example of ordinal regression. Classes 1-3 are separated by parallel hyperplanes.

One way to solve ordinal regressions is through the Ranking Support Vector Machine (Ranking SVM). The Ranking SVM learns an order between the disease activities of patients, meaning that patients are ordered based on the strength of PsA activity. The label to be sorted represents the physicians’ assessment. The symptoms and their weighting are used as a feature vector. To reduce the influence of strong subjective fluctuations in activity assessments, only patient pairs with measurably different symptoms and activity statuses are used for training.

Rank SVM EN - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© ML2R

Using the learned weight vector, a new activity value for each patient can be calculated, and the disease activity can be classified relative to other patients. Converting the ordinal scaled dataset into a binary classification dataset enables ordinal regression using a classification SVM.

datensatz EN 1 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© ML2R
Ordinal scaled order of patients (left: highest disease activity, right: lowest disease activity) and the difference between patient pairs (blue – positive value, orange – negative value).

During dataset adjustment, data points (patients) are considered pairwise, and the difference between the points is calculated. The difference allows two points to be ordered relative to each other. In the case of PsA patients, the difference indicates whether certain symptoms are more or less represented in the minuend compared to the subtrahend. Positive values thus indicate that the disease is more represented in the first patient; negative values indicate that the disease is less represented. If the symptoms are equally represented in patients, they cancel each other out during difference formation. If the minuend is higher in the order, the label of the difference is set to 1, otherwise to 0 (see above graphic). Now we have a dataset that can be used to train binary classification models like SVM.

Training the Classification SVM

This modified dataset can now be used to train an SVM for binary classification problems. Here, the order of object pairs is learned instead of classes. From the trained SVM, a weight vector and a decision function can be determined. Multiplying the data points by the learned weight vector calculates an approximation of the patient’s disease activity rank. New data points can thus be sorted into the existing order. The detailed methodology can be found in the paper by Herbich et al.

In the end, a model is available that can predict patient disease activities based on symptoms. Through this estimation of disease activities, a patient’s status can be tracked over an extended period, or patients can be ranked according to the severity of activity.

To assess the success of therapy during a clinical study, it is important to determine the level of disease activity. Clinical studies often include subjective expert assessments, such as the disease activity of arthritis participants, alongside objectively measured indicators of patients, such as laboratory values. These assessments show strong fluctuations due to differences in expert knowledge and physician intuition. We have shown that this can be determined using the Ranking SVM. Through the model, the subjective influences of physicians are reduced, and more accurate assessments of patient conditions can be made. In our study, the Ranking SVM achieved the highest accuracy of 80%.

More information can be found in the associated publication:

Aligning Subjective Ratings in Clinical Decision Making A. Pick, S. Ginzel, S. Rüping, J. Sander, A. C. Foldenauer, M. Köhm, 2020, arXiv

Sabine Kugler

Sabine Kugler is a data scientist at the Fraunhofer Institute IAIS in Sankt Augustin in the Healthcare Analytics business unit and works mainly on projects in the field of Artificial Intelligence in Pharmacology. Her research interests include explainable AI and causal inference.

More blog posts