Implement a custom activation function f(x) = x * sigmoid(x) (SiLU/Swish)
and compute its gradient at given input values.
Return a tuple of (output, gradient) where:
x * sigmoid(x) d/dx [x * sigmoid(x)]
The derivative is: sigmoid(x) + x * sigmoid(x) * (1 - sigmoid(x))
Input: A 1D tensor x.
Output: A dict with keys “output” and “gradient”, each a 1D tensor of the same shape as input.
API Reference:
torch.autograd.grad jax.value_and_grad or jax.grad