Sorry, I was out. Ok, f(x) = cuberoot(x^3) f(0 + eps) = cuberoot((0 + eps)^3) = ...

tntn · on June 22, 2019

    >>> import autograd.numpy as np
    >>> from autograd import grad
    >>> def fn(x):
    ...   return np.power(np.power(x, 3), 1/3)
    ...
    >>> gradfn = grad(fn)
    >>> gradfn(0.0)
    /usr/local/lib/python3.6/dist-packages/autograd/numpy/numpy_vjps.py:59:RuntimeWarning: divide by zero encountered in double_scalars
    lambda ans, x, y : unbroadcast_f(x, lambda g: g * y * x ** anp.where(y, y - 1, 1.)),
    /usr/local/lib/python3.6/dist-packages/autograd/numpy/numpy_vjps.py:59: RuntimeWarning: invalid value encountered in double_scalars
     lambda ans, x, y : unbroadcast_f(x, lambda g: g * y * x ** anp.where(y, y - 1, 1.)),
    nan

… Damn.

    >>> import torch
    >>> x = torch.tensor(0.0, requires_grad=True)
    >>> y = ((x**3) ** (1/3))
    >>> y.backward()
    >>> x.grad
    tensor(nan)

… Damn.

adamnemecek · on June 22, 2019

eps^3 is undefined

gugagore · on June 22, 2019

What is eps^3? ;) You have to turn that into some representation on the computer before you can even pass it to cuberoot.

adamnemecek · on June 22, 2019

Theres is no representation of that value.In this case power is defined only in the range 1-2. You need to simplify the roots first.

petschge · on June 22, 2019

except by the definition of dual numbers eps^2 is 0, as is eps^3. So the slope you computed is 0, which is rather wrong for (a complicated representation of) y=x.

adamnemecek · on June 22, 2019

Eps^3 is undefined. I don't see a problem with simplifying the exponents first.

> a complicated representation of) y=x.

The problem started as that. So no surprise.

tntn · on June 22, 2019

> I don't see a problem with simplifying the exponents first.

Sure, we call that computer algebra. As soon as you start doing that, you aren't doing automatic differentiation, you are doing (at least in part) symbolic differentiation.

adamnemecek · on June 22, 2019

You could similarly say that a computer can't calculate x^1000/x^998 because you can't fit it in 32/64 bits.

petschge · on June 22, 2019

And computers have a hard time with that. Take the following example code:

  #include <math.h>
  #include <stdio.h>
  #include <stdlib.h>
  double f(const double x) {
      return pow(x, 1000) / pow(x, 998);
  }
  int main(int argc, char* argv[]) {
      if(argc != 2) {
          fprintf(stderr, "Usage: %s x\n", argv[0]);
          exit(1);
      }
      const double x = atof(argv[1]); // demo code without error checking
      printf("x = %g, f(x) = %g\n", x, f(x));
      exit(0);
  }

compile with

  gcc -Wall -g -o test test.c -lm

and run it

  petschge@localhost:~$ ./test 0
    x = 0, f(x) = -nan
  petschge@localhost:~$ ./test 1
    x = 1, f(x) = 1
  petschge@localhost:~$ ./test 2
    x = 2, f(x) = 4

The fun thing is, when you compile with

  gcc -Wall -O3 -ffast-math -g -o test2 test.c -lm

you actually get

  petschge@localhost:~$ ./test2 0
    x = 0, f(x) = 0

adamnemecek · on June 22, 2019

That's exactly what i'm saying.