gradient
In the context of neural networks, a gradient is a vector that represents the direction and rate of the fastest increase of a function. More specifically, it is the collection of partial derivatives of a function with respect to all its variables. In simpler terms, for a function that takes multiple inputs, the gradient tells us how much the output of the function changes if we slightly adjust any of the inputs. This concept is crucial in training neural networks through a process called gradient descent, where the gradient is used to adjust the network’s parameters (weights) in the opposite direction of the gradient to minimize the error or loss function. This iterative adjustment helps the network to learn from the data and improve its predictions[1][2][3].
Citations:
[1] https://machinelearningmastery.com/gradient-in-machine-learning/
[2] https://web.stanford.edu/class/cs224n/readings/gradient-notes.pdf
[3] https://www.ibm.com/topics/gradient-descent
[4] https://stats.stackexchange.com/questions/181629/why-use-gradient-descent-with-neural-networks
[5] https://youtube.com/watch?v=IHZwWFHWa-w
[6] https://towardsdatascience.com/calculating-gradient-descent-manually-6d9bee09aa0b