>>> import autograd.numpy as np
>>> from autograd import grad
>>> def fn(x):
... return np.power(np.power(x, 3), 1/3)
...
>>> gradfn = grad(fn)
>>> gradfn(0.0)
/usr/local/lib/python3.6/dist-packages/autograd/numpy/numpy_vjps.py:59:RuntimeWarning: divide by zero encountered in double_scalars
lambda ans, x, y : unbroadcast_f(x, lambda g: g * y * x ** anp.where(y, y - 1, 1.)),
/usr/local/lib/python3.6/dist-packages/autograd/numpy/numpy_vjps.py:59: RuntimeWarning: invalid value encountered in double_scalars
lambda ans, x, y : unbroadcast_f(x, lambda g: g * y * x ** anp.where(y, y - 1, 1.)),
nan
… Damn.
>>> import torch
>>> x = torch.tensor(0.0, requires_grad=True)
>>> y = ((x**3) ** (1/3))
>>> y.backward()
>>> x.grad
tensor(nan)
except by the definition of dual numbers eps^2 is 0, as is eps^3. So the slope you computed is 0, which is rather wrong for (a complicated representation of) y=x.
> I don't see a problem with simplifying the exponents first.
Sure, we call that computer algebra. As soon as you start doing that, you aren't doing automatic differentiation, you are doing (at least in part) symbolic differentiation.
Ok,
f(x) = cuberoot(x^3)
f(0 + eps) = cuberoot((0 + eps)^3) = cuberoot(eps^3) = eps
e.g. value is 0 and the slope is 1.