The autoML predicts categorical labels as addition to probabilities. There is an optimal threshold computed for the best model which maximize F1 score.
The predicted data frame right now looks like this:
p_0, p_1, label
0.1, 0.9, 1
0.1, 0.9, 1
0.9, 0.1, 0
...
The p_0
is probability for class 0
. The p_1
is probability for class 1
. The 'label' column is the prediction label decided based on threshold.
In case in target columns there are other values than 0 and 1, then they will be internally converted to 0, 1 but in predicted data frame they will appear in columns. For example if there are A
and B
values in a target column, then the predicted data frame will look like:
p_A, p_B, label
0.1, 0.9, B
0.1, 0.9, B
0.9, 0.1, A