Build a multi-class classifier using softmax and cross-entropy loss.
The forward pass computes:
where $y_i$ is the class index for sample $i$.
Input:
x: input tensor of shape (batch, features) W: weight matrix of shape (features, num_classes) b: bias vector of shape (num_classes,) y: integer class labels of shape (batch,) with values in [0, num_classes)
Output: A dict with “probabilities” (shape (batch, num_classes)) and “loss” (scalar).