More formal treatment of multivariable chain rule

Khan Academy

20 May 201611:55

EducationalLearning

32 Likes 10 Comments

TLDRThis video delves into the formal argument behind the multivariable chain rule, clarifying misconceptions from previous discussions. The host explains the concept by considering a vector-valued function 'v' and a scalar function 'f', illustrating how the composition of these functions leads to the need for a gradient and vector derivative. The video uses the formal definition of a derivative, introduces the vector value derivative, and connects these concepts to the directional derivative, ultimately revealing the multivariable chain rule in a rigorous manner.

Takeaways

📚 The video discusses the formal argument behind the multivariable chain rule, providing a more rigorous justification for concepts previously explained in a more intuitive manner.
🔍 The script clarifies that the multivariable chain rule is applicable when dealing with a composition of functions where the output is a real number, despite the intermediate steps involving multi-dimensional spaces.
📉 The video explains the use of the derivative's formal definition, which involves taking a limit as a variable approaches zero, in this case, represented by 'h' or 'dt'.
📈 The concept of 'dv' is introduced as the change in the vector-valued function 'v' when the input 't' is nudged by 'dt', highlighting the role of the vector-value derivative in this process.
📝 The script emphasizes the importance of considering the error term 'E(h)' in the formal definition of derivatives, which approaches zero as 'h' approaches zero.
🧩 The video uses the concept of 'little o of h' to represent terms that shrink to zero faster than 'h' as 'h' approaches zero, which is crucial for the formal manipulation of the multivariable chain rule.
🔄 The process involves rewriting the expression for 'v(t+h)' to include the derivative term and an error term, which helps in understanding the change in the vector 'v' when 't' is nudged.
📐 The script connects the formal definition of the derivative to the concept of the directional derivative, showing that the multivariable chain rule is essentially the directional derivative in the direction of the vector-value derivative of 'v'.
📘 The final expression for the multivariable chain rule is presented as the gradient of 'f' evaluated at the output of 'v(t)', multiplied by the vector-value derivative of 'v' at 't'.
🔑 The video concludes by affirming that the multivariable chain rule aligns with the intuitive understanding of nudging inputs and observing the resulting changes in outputs.
🚀 Mention is made of a more general multivariable chain rule for vector-valued functions, which will be discussed in future content, indicating a broader application of the concept.

Q & A

What is the main topic of the video?
-The main topic of the video is the formal argument behind the multivariable chain rule in calculus.
Why might some viewers find the initial explanation of the multivariable chain rule to be hand-wavy?
-Some viewers might find the initial explanation hand-wavy because it involves treating derivatives and differential operators in a way that resembles canceling out fractions, which is not mathematically rigorous.
What does the video aim to clarify about the multivariable chain rule?
-The video aims to clarify the formal mathematical argument behind the multivariable chain rule, providing a more rigorous understanding that aligns with the intuitive explanations given in previous videos.
What is the role of the vector-valued function v in the context of the multivariable chain rule?
-The vector-valued function v takes an input t from a number line and maps it to a high-dimensional space, serving as an intermediary step in the composition function before it is mapped onto the number line by function f.
How is the function f related to the vector-valued function v in the composition?
-Function f takes the output of the vector-valued function v, which is in a multi-dimensional space, and maps it back onto the number line, resulting in a single-variable function.
What is the significance of the term 'dt' in the video?
-The term 'dt' represents an infinitesimal change in the input variable t, and it is used in the context of derivatives and differentials in the explanation of the multivariable chain rule.
What is the formal definition of a derivative in the context of the video?
-The formal definition of a derivative in the video is a limit, which is a function of the change in the output value as the input is nudged by a small amount h (considered as dt), divided by h as it approaches zero.
How does the video connect the intuitive understanding of the multivariable chain rule to its formal definition?
-The video connects the intuitive understanding by considering the nudge in the input and its effect on the output, and then formalizes this by using the definitions of derivatives and vector-valued derivatives, ultimately leading to the definition of the directional derivative.
What is the 'error function' E(h) mentioned in the video?
-The 'error function' E(h) is a term used to represent the difference between the actual change in the function when the input is nudged by h and the linear approximation of that change. It goes to zero as h approaches zero.
What is the significance of the 'little o of h' notation used in the video?
-The 'little o of h' notation is used to represent a function that shrinks to zero faster than h as h approaches zero. It is a way to express the error term in the derivative definition, which becomes negligible in the limit.
How does the video explain the connection between the multivariable chain rule and the directional derivative?
-The video explains that the multivariable chain rule can be understood as the directional derivative of function f in the direction of the vector-value derivative of v, evaluated at the point v(t).
What is the final expression for the multivariable chain rule given in the video?
-The final expression for the multivariable chain rule is the gradient of f evaluated at the output of v(t), taken as a dot product with the vector-value derivative of v at t.
Is there a more general multivariable chain rule for vector-valued functions?
-Yes, the video mentions that there is a more general multivariable chain rule for vector-valued functions, which will be discussed in a future video.

Outlines

00:00

📚 Introduction to Multivariable Chain Rule

This paragraph introduces the concept of the multivariable chain rule, addressing potential confusion from previous discussions. The speaker clarifies that while the intuitive approach might seem informal, it aligns well with the formal argument. The setup involves a vector-valued function 'v' that maps a real number 't' to a high-dimensional space, and a function 'f' that maps this space back to a real number. The focus is on the ordinary derivative of a composition function that transitions through a multi-dimensional space, leading to the necessity of understanding gradients and vector-value derivatives. The formal definition of a derivative is introduced, using 'h' as a stand-in for an infinitesimal change 'dt', and the process of differentiating the composition function is outlined.

05:00

🔍 Formalizing the Multivariable Chain Rule

The speaker delves into the formal argument behind the multivariable chain rule, starting with the vector value derivative of 'v'. The process involves rewriting the derivative to include an error term 'E(h)', which approaches zero as 'h' (or 'dt') approaches zero. The convention of 'little o of h' is introduced to represent terms that become insignificant as 'h' diminishes. The paragraph then connects this formalism to the original intuition of nudging the input and observing the output change, culminating in an expression that represents 'v(t+h)' as the original value plus a derivative term and an error term. This leads to the understanding that the change in the output of 'f' due to a change in 'v' is the directional derivative in the direction of the vector-value derivative of 'v'.

10:00

📘 Deriving the Multivariable Chain Rule

The final paragraph concludes the derivation of the multivariable chain rule by substituting the expression for 'v(t+h)' back into the original definition of the derivative of the composition function. The limit is taken as 'h' approaches zero, allowing the insignificant 'o(h)' term to be disregarded. The result is an expression that defines the directional derivative of 'f' in the direction of the vector-value derivative of 'v' at 'v(t)'. The paragraph reinforces the connection between the formal derivation and the initial intuitive understanding of nudging, and it highlights the role of the gradient of 'f' and the vector-value derivative of 'v' in the multivariable chain rule. The speaker also mentions a more general form of the multivariable chain rule for vector-valued functions, which will be discussed in future content.

Mindmap

Keywords

💡Multivariable Chain Rule

The multivariable chain rule is a fundamental concept in calculus that allows for the differentiation of a composition of functions, particularly when the functions involved are vector-valued. In the context of the video, it is used to explain the process of differentiating a function of a function where the inner function is vector-valued and the outer function maps to the real numbers. The video script discusses the formal argument behind this rule, emphasizing its importance in understanding how changes in the input affect the output through a multi-dimensional intermediary space.

💡Vector-Valued Function

A vector-valued function is a function that maps its input to a vector in a multi-dimensional space rather than a single scalar value. In the video, the script describes v as a vector-valued function that takes a real number t as input and maps it to a high-dimensional space, which could be two, three, or even 100 dimensions. This concept is central to understanding the multivariable chain rule as it deals with the changes in the vector space that result from an input change.

💡Derivative

The derivative is a measure of how a function changes as its input changes. In the video, the derivative is discussed in the context of both single-variable and multivariable functions. The script explains the formal definition of a derivative as a limit, which is used to define the rate of change of a function at a specific point. The concept of the derivative is essential for understanding the chain rule, as it is the tool used to measure the sensitivity of the output to changes in the input.

💡Limit

In calculus, a limit is the value that a function or sequence approaches as the input approaches some value. The script mentions the limit as the foundation for defining derivatives, where the derivative is the limit of the function's change over an infinitesimally small change in the input, often denoted as h or dt. The concept of the limit is crucial for the formal definition of the multivariable chain rule, as it allows for the precise mathematical expression of the rate of change.

💡Differential Operator

A differential operator is a mathematical operator that acts on functions to produce their derivatives. In the script, there is a mention of treating derivatives as if they were differential operators, which some might argue is an incorrect approach. However, the video aims to clarify that while this is technically not accurate, the intuition behind it aligns well with the formal argument of the multivariable chain rule.

💡Directional Derivative

The directional derivative is a generalization of the derivative to non-scalar functions, specifically vector-valued functions. It measures the rate of change of a function in a particular direction in the domain. In the video, the script connects the intuition of the multivariable chain rule to the formal definition of the directional derivative, which is the derivative of the function in the direction of a given vector, in this case, the vector-value derivative of v.

💡Gradient

The gradient is a vector of partial derivatives of a scalar-valued function with respect to its variables. In the context of the video, the gradient is mentioned in relation to the multivariable chain rule, where it is used to find the rate of change of a function in the direction of a vector. The script explains that the gradient of the outer function f is used in conjunction with the vector-value derivative of the inner function v to find the overall rate of change.

💡Taylor Polynomial

A Taylor polynomial is a polynomial approximation of a function near a given point, using the derivatives of the function at that point. In the script, the video uses the analogy of a Taylor polynomial to describe the first-order term in the expansion of v(t+h), which represents the change in the vector-valued function v due to a small change in t, represented by h.

💡Little o Notation

Little o notation is used in mathematics to describe the limiting behavior of a function. In the video, the script uses little o of h to represent a function that shrinks to zero faster than h as h approaches zero. This notation is used to account for the error term in the manipulation of the vector-value derivative, showing that it becomes negligible as h gets smaller.

💡Dot Product

The dot product is an algebraic operation that takes two equal-length sequences of numbers and returns a single number. In the video, the script refers to the dot product in the context of the directional derivative, where the gradient of the function f is taken at a point and multiplied with the vector-value derivative of v, resulting in the directional derivative of f in the direction of v.

Highlights

Introduction to the optional video on the multivariable chain rule, addressing potential confusion from previous explanations.

Discussion on the informal approach to the multivariable chain rule, which involved treating derivatives like fractions.

Explanation of the formal argument behind the multivariable chain rule, emphasizing its alignment with intuitive understanding.

Setup of the problem with v as a vector-valued function and f as a function mapping to the number line.

Clarification that the composition function results in a single-variable function despite its multi-dimensional intermediate steps.

Formal definition of a derivative as a limit, with h used as a placeholder for dt.

Description of the change in the function f when the input is nudged by h, relating it to the original value.

Intuitive approach to the multivariable chain rule, considering the change dv in the intermediary space.

Introduction of the vector value derivative of v, and its definition through a limit.

Manipulation of the vector value derivative definition to include an error function E(h) that approaches zero.

Use of the little o notation to represent terms that shrink faster than h as h approaches zero.

Expression of v(t+h) in terms of the original value, derivative term, and an error term.

Application of the vector value derivative to the original definition of the ordinary derivative of the composition function.

Identification of the limit in the derivative definition and its simplification by ignoring the o(h) term.

Reveal of the multivariable chain rule as the directional derivative in the direction of the derivative of the function of t.

Connection of the formal reasoning to the initial intuitive approach, showing consistency.

Mention of a more general multivariable chain rule for vector-valued functions and its future discussion.

Conclusion summarizing the multivariable chain rule for real number to real number compositions and a teaser for the next video.

Transcripts

Browse More Related Video

Multivariable chain rule and directional derivatives

Vector form of the multivariable chain rule

Multivariable chain rule

Directional derivative, formal definition

Derivative of a position vector valued function | Multivariable Calculus | Khan Academy

Directional derivatives and slope

More formal treatment of multivariable chain rule

Takeaways

Q & A

What is the main topic of the video?

Why might some viewers find the initial explanation of the multivariable chain rule to be hand-wavy?

What does the video aim to clarify about the multivariable chain rule?

What is the role of the vector-valued function v in the context of the multivariable chain rule?

How is the function f related to the vector-valued function v in the composition?

What is the significance of the term 'dt' in the video?

What is the formal definition of a derivative in the context of the video?

How does the video connect the intuitive understanding of the multivariable chain rule to its formal definition?

What is the 'error function' E(h) mentioned in the video?

What is the significance of the 'little o of h' notation used in the video?

How does the video explain the connection between the multivariable chain rule and the directional derivative?

What is the final expression for the multivariable chain rule given in the video?

Is there a more general multivariable chain rule for vector-valued functions?