Implement a cosine annealing learning rate scheduler.
The cosine annealing schedule is: $$\text{lr}(t) = \text{lr}_{min} + \frac{1}{2}(\text{lr}_{max} - \text{lr}_{min})\left(1 + \cos\left(\frac{t}{T}\pi\right)\right)$$
where:
Compute the learning rate for each step from 0 to T-1.
Input:
lr_max: maximum learning rate (float) lr_min: minimum learning rate (float) T: total number of steps (int)
Output: A 1D tensor of shape (T,) with the learning rate at each step.