The gradient of a scalar-valued function of a vector is the vector of partial derivatives of with respect to each component :

Wikipedia defines it in terms of a specific point as “the vector field whose value at a point gives the direction and the rate of fastest increase”:

Intuitively, it can be understood as an -dimensional surface whose contours reflect the effect on of a change in any one input value. It is one of the most important operations in computational optimization, as it forms the basis of gradient descent.

Note that the same premise extends to scalar-valued functions of tensor inputs. (In this case, the gradient is also a tensor.)

Notation abuse warning!

Be aware that, in the neural network literature, gradients are often written as partial derivatives. To be absolutely crystal clear, this is an abuse of notation. Partial derivatives are always scalar-valued. The gradient of a function with respect to some tensor is another tensor with the same shape as whose elements are the partial derivatives of with respect to the corresponding elements of .