In that case, the score function is different from what was announced earlier during the competition. In addition, the best score (winner by dsuc) only has 62% classification accuracy. Would this indicate that the lack of sensor fault training data significantly hindered the participants from verifying their sensor fault hypotheses? Is it possible to reveal the ground truth to us at some point?