Big Ideas of Calculus
As odd as this sounds, what we now refer to as Calculus (or as Real Analysis if we’re British) was independently invented by Isaac Newton in England and GW Leibnitz in Germany during the 1600s. It would be hard to overstate its importance. Basically everything we know about the world from the physical sciences was only made possible thanks to calculus.
The two big ideas of calculus are:
- An instantaneous rate of change
- Linking quantities with their instantaneous rate of change
These two key ideas may seem like small potatoes and of little practical relevance to ordinary daily life. Maybe so, but you wouldn’t be able to reliably land a rocket on the moon without a lot of calculus. Nor for that matter, as we’ll soon see, can you really understand bonds or options without calculus.
Derivatives
The price of gas at the my local gas station varies across time. In math language we would say that the price of gas, which is a quantity that is expressed in terms of dollars and cents, varies as a function of time. Given a finite interval of time, the rate of change of that quantity is the ratio between the amount of change and the length of the time interval. For example, the price of gasoline might change by thirteen cents over five days.
Graphically, this rate of change is the slope of the straight line that best approximates the curve mapping all of the price movements across all of the days in my time interval. If I modify the time interval, this will generally also modify the rate of change.
What happens when the length of that time interval gets smaller and smaller? Calculus made the concept of infinitely small quantities precise with the notion of limit. If the rate of change can get arbitrarily close to a definite number by making the time interval incredibly teeny tiny, which in math language is called approaching zero, that number is the instantaneous rate of change. The instantaneous rate of change is the limit of the rate of change when the length of the interval gets infinitely small. This limit is referred to as the derivative of a function. Graphically, the derivative is the steepness of the tangent to a curve.
By using this definition and following the computational rules disclosed below, you will find it possible to calculate the instantaneous rate of change for a wide range of functions using a step-by-step formula. This computational process is known as differentiation.
Integrals
While differentiation can tell us the steepness of the tangent line to a curve at one specific point, its inverse operation is integration. Integration tells us the area below that curve.
The reasoning underlying integration is similar to what we do with differentiation. We calculate the area under a curve by drawing imaginary rectangles underneath the curve. We do this because we already know how to compute the area of a rectangle. We multiply its length times its width. Do that for each rectangle and sum them altogether.
As the size of each individual rectangle becomes increasingly smaller, this approximation of the curve becomes increasingly accurate. As the size of these individual rectangles become incredibly teeny tiny, approaching zero, we take the limit of these sums to derive the integral.
If you integrate the derivative of a function, you get the original function itself. This was a key insight–integration and differentiation are inverse operations like division and multiplication in arithmetic.
Why Is Calculus Useful?
An even more important insight was the discovery of differential equations. Differential equations link a quantity with its various instantaneous rates of change.
Why is that useful? Well if you already know some of the initial values of a system, then you can precisely know where that system will be at a future interval. This is why physics and engineering benefit so much from calculus.
Think of the problem of trying to land a rocket on the moon. If you know the differential equations for the motion of objects as they move in space, now it becomes possible to predict the motion of any projectile as it moves across space when you already know its initial position and speed. The reason why physics has been so successful at explaining the physical reality of our world and why engineering has been so successful at inventing things like spaceships is largely thanks to calculus in general and specifically to being able to express physical laws as differential equations.
Problem of Uncertainty
The reason why the social science of economics and the applied science of finance have not similarly been able to leverage the powers of calculus and its differential equations is that unlike the laws of physics human behavior cannot be realistically expressed in terms of differential equations. Economic laws, such as they are, contain an enormous amount of uncertainty.
The way that we deal with the problem of uncertainty in economics and finance is to approximate things as best we can through the framework of probability. Probability is not really a feature inherent to anything. Instead, probability is a useful way for us to more likely be correct in our predictions than incorrect.
There are three ways that we can work with probability in economics and finance.
- Probability distributions can be used within differential equations instead of stationary values to simulate bounded uncertainty. This falls within the framework of ordinary calculus.
- Randomness can be represented through direct relationships among stochastic processes. This falls within the framework of stochastic calculus.
- Vast amounts of observations of prior behavior can be modeled through unsupervised machine learning in ways we don’t fully understand that can usefully predict future economic behaviors. This falls within the framework of data science.
Sets and Set Operations
The concept of a set is fundamental within both calculus and probability. A set is a group of elements. Conventionally, sets are denoted by Latin or Greek capital letters:
\[A, B, C, \Omega, …\]and elements using lowercase Latin or Greek letters:
\[a, b, c, \omega, …\]Don’t panic! Set notation is notoriously confusing, especially at first glance. There’s nothing inherently complicated about sets or set operations. It often helps clarify things if you verbally translate the crazy symbols of set notation into plain English.
Proper Subsets
An element t of a set M is written as:
\[t \in M\]If every element within set C is contained within D, we write:\[C \subset D\]If there are also no elements within D that are not also within C, meaning they’re equal, then we write: \[C \subseteq D\] If, however, C is a proper subset of D, meaning that set D has more elements within it than only set C, then we write: \[C \subset D\]
Additional useful math notation for working with sets:
- The symbol \[\forall\] is the universal quantifier and means “for all”, “for every”, or “for any element.”
- The symbol \[\exists\] means “there exists”
- The symbol \[\Longrightarrow\] means “implies”
Empty Sets
Unsurprisingly, an empty set is a set that contains no elements. Its symbol is:\[\unicode[MathJax_AMS]{x2205}\]
Given a subset B of a set A, the complement of B with respect to A is all those elements of A that do not belong to B. In math notation we would write the complement of B as:\[B^C\]
Union of Sets
A union of two sets is the combination of all the elements of both sets. We notate that set C is the union of sets A and B:\[C = A \cup B\]
Intersection of Sets
The intersection of two sets contains all elements contained within both of those sets. We notate that set C is the intersection of sets A and B:\[C = A \cap B\]
Elementary Properties of Sets
Let’s imagine that we have a set T that includes all the elements we currently care about. That set would be called the “total set.” Given this, we can confidently declare the following three properties:
- The complement of the empty set is the total set: \[T^C = \unicode[MathJax_AMS]{x2205}, \unicode[MathJax_AMS]{x2205}^C = T\]
- If sets A,B,C are subsets of T, then the distribution properties of union and intersection hold:\[A \cup (B \cap C) = (A \cup B) \cap (A \cup C)\]\[A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\]
- The complement of the union is the intersection of the complements and the complement of the intersection is the union of the complements:\[(B \cup C)^C = B^C \cap C^C\]\[(B \cap C)^C = B^C \cup C^C\]
Distances and Quantities
The connection between sets and calculus is that calculus describes the dynamics (change) of quantitative phenomena (things). To practically make this work, we need a way to denote the changes we observe in a variable over time. Further, we need this way to organize the changes in one variable that separates them from changes in other variables.
Here’s an example. Mom wants to keep track of her son Jimmy’s height and weight during his childhood. So on the first day of every month, she measures both his height and weight. How might she record these measurements? Well if mom were a mathematician or a scientist, she would record these observations using n-dimensional vectors.
N-Dimensional Vectors
Mom has two vectors: Height and Weight that we will denote as H and W respectively. This notation system works by assigning subscripts to the observations of H and W from Month 1 and onwards as: \[H_1, H_2, H_3, … \] and \[ W_1, W_2, W_3, …\]
An n-tuple is another name for an n-dimensional vector. Tuples can also be written in a format with a series of observations separated by a comma or space within a bracket like so: \[Jimmy_H = \begin{bmatrix}22 & 24 & 25 & 27\\ 30 & 31 & 34 & 36\end{bmatrix}\]
You can perform operations on these vectors. A common example of this from the world of finance is to compute the trailing twelve month tracking error of a portfolio of investments. The first step in that computation is to calculate the monthly difference between the portfolio’s return and that of a benchmark index like the S&P 500. In order, each of the twelve elements of the S&P 500’s 12-tuple is subtracted from the correspondingly ordered element of the portfolio 12-tuple.
\[r_{port} – r_{SP500} \] \[r_{port} = \begin{bmatrix} 1.10 & 1.37 & 2.95 & 5.78 & 0.51 & 7.32 \\ 7.13 & 1.47 & 9.54 & 7.32 & 6.19 & -4.92 \end{bmatrix}\] \[r_{SP500} = \begin{bmatrix} -1.46 & 1.93 & 3.76 & 6.06 & 0.74 & 7.09\\7.80 & 0.66 & 10.87 & 8.80 & 5.89 & -5.88 \end{bmatrix}\] \[r_{port} – r_{SP500} = \begin{bmatrix} 2.56 & -0.56 & -0.81 & -0.28 & -0.23 & 0.23\\-0.67 & 0.81 & -1.33 & -1.48 & 0.30 & 1.26 \end{bmatrix}\]
In addition to performing operations between tuples, you can also perform operations to a single tuple. For example, to calculate the geometric return of this portfolio, you would add 1 to every component of the portfolio tuple and then take the natural logarithm of each of these 12 components. The geometric return is the geometric average of this new 12-tuple of logarithmic returns. To calculate that, multiply each component of this new tuple together and take the 12th root of that product.
Distance
Let’s imagine the set of all real numbers arrayed along a line. This is known as the real line.
Remember the difference between rational and irrational numbers? A rational number can be expressed as a fraction (aka ratio) of integers. An irrational number can’t be expressed as a fraction of integers. The most famous irrational numbers are \[\pi \approx 3.14159…\]\[e \approx 2.71828…\]
On the real line, distance is the absolute value of the difference between numbers: \[d = \sqrt{(a-b)^2}\] More formally, the Euclidean distance between any two points (a & b) is: \[d \left[ \left( a_1, a_2, …, a_n \right) , \left( b_1, b_2, …, b_n \right) \right] = \sqrt{\Sigma \left( a_i – b_i \right) ^2 }\]
The least upper bound of a set is the smallest number that no other number is greater than. This least upper bound number is known as the supremum. If the supremum belongs to a set, then it is called the set’s maximum.
The infimum is the greatest lower bound of a set. No number contained in the set is less than the infimum. When the infimum belongs to the set, it is called the minimum.
Density of Points
The density of points is another key point of connection between set theory and calculus. The primary distinction to keep in mind regarding the density of points is between discrete and continuous quantities.
Discrete quantities are values that have a finite distance between them. Integers like 1 and 2 are a good example of discrete quantities. Between the integers 1 and 2 is a distance exactly equal to 1. Between the integers 1 and 9 is a distance exactly equal to 8 integers.
Continuous quantities, however, have distances between them that contain every possible intermediate value. In the abstract this may seem weird and confusing. Think about the passing of time between any two times. In between these two times is every possible moment between them, including every infinitesimally small instant of time that passes between those two times. There are no gaps in-between these two times that time does not pass through. Whereas with discrete values, we can step over and ignore 1.2 and 1.538916204 on our way from 1 to 2, with continuous values we cannot ignore any of the infinite values separating 1pm from 2pm.
Functions
The math conception of a function describes a relationship between two quantities. A function maps the elements of a set A onto the elements of a set B. Set A is called the function’s domain. Set B is the function’s range.
These set elements can be numbers, but they don’t have to be. When they are numbers, the function is said to be a real function. In practice, the calculus as used in finance primarily deals with numbers.
A composite function is a function of a function. The following example has the function h as a composite function of the functions g and f. The function g itself takes in the range of function f as its domain. \[h(x) = g[f(x)]\]
Variables
A variable is a symbol that represents any element within a given set. For example, if we declare the variable Y to mean years, this symbol Y can now mean any possible year within our set.
What is our set? Whatever we would define it to be. It could be the years ending in 0 between 1870 and 1970 (ie, 1870, 1880, 1890, etc…) as in the case of my research on the growth of transaction costs within the US economy between 1870 and 1970.
In standard notation, we denote a specific element within our variable’s set by using a subscript to that variable to denote the element’s ordered position within the set. For example, the third element within variable Y’s set of decadal years from my aforementioned research would then be: \[Y_3 = 1890\]
Limits
The concept of the limit is a key idea in calculus. Consider the function \[f(x) = 3x + 1\] Think about what happens to the range of this function as the domain values get closer and closer to 1. As can be seen in the table below, whether we approach 1 from the left-hand or the right-hand side of the number line, the function’s value gets closer and closer to 4.
x | f(x) |
0.9 | 3.7 |
0.99 | 3.97 |
0.999 | 3.997 |
1.0001 | 4.0003 |
1.001 | 4.003 |
1.01 | 4.03 |
In math language we would say: Let c = 1. The limit of the function f as x approaches c is 4. In math notation we would write: \[\lim_{x\rightarrow c} f(x) = 4\]
If the left-hand limit is the same as the right-hand limit, as is the case with our function above, then the limit of that function exists in our simple sense here. If we directly substitute 1 into our function, we see that the answer is 4.
Continuity
Continuity is a property of functions and essentially means that the function does not make jumps. A standard linear function that is continuous is one long smooth line. A standard linear function that is discontinuous will have a jump in it somewhere. The graph below has a discontinuous function in blue and a continuous function in red. Notice the jump at x=4 in the Blue function.
Blue function: \[ x \lt 4: f(x)=0.5x + 3 , x \geq 4: f(x)=0.5x + 4 \]
Red function: \[ f(x) = 0.9x\]
This may seem esoteric but these Red and Blue function types are common within corporate finance. The Blue function is called a step function and is a common cost function in manufacturing companies. Up to a certain level of production, this is your cost. Above that level of production, the company needs to add additional overhead, machinery or whatever, and this is your cost for that level of production. The Red function is a common cost function type for companies that outsource production. At low levels of production, it is less costly than the Blue function, but it rapidly becomes more costly as volume increases.

Differentiation
Let’s consider the quadratic function \[f(x) = x^2\]

We want to know what the instantaneous rate of change is for this function whenever x=2. To figure that out graphically, we want to know what is the slope of the tangent line that touches this function at exactly x=2. At that tangent point, x=2 and f(x)=4.
Until this moment in our math education we needed two points to determine the slope of a line, but we only have the point (2,4). \[\frac{y_2 – y_1}{x_2 – x_1}\]
Here is where the concept of the limit really comes into its own. Select an arbitrary point along that tangent line. The difference between itself and x=2 is the change in x, which in math language is called delta x and is written as: \[\Delta x\] Now let’s substitute: \[f(x) = y_1\] \[f(x + \Delta x) = y_2\] \[x + \Delta x = x_2\] Our formula for finding the slope of this line is thus: \[\frac{f(x + \Delta x) – f(x)}{x + \Delta x – x}\] We can simplify the denominator since x – x = 0: \[\frac{f(x + \Delta x) – f(x)}{\Delta x}\]
Next, shrink the distance between our point of tangency (x=2) and this second arbitrary point way down so that the difference between itself and the point where x=2 approaches zero. Then take this limit. \[\lim_{\Delta x\rightarrow 0} \frac{f(x + \Delta x) – f(x)}{\Delta x}\]
This is in fact the standard definition of the derivative. In math language we often say “f prime of x” and use this notation for the derivative: \[f'(x)\] Sometimes we instead use the Leibnitz notation for the derivative: \[\frac{dy}{dx}\] More rarely you will occasionally see this alternative notation for the derivative: \[\frac{df}{dx}\] Regardless of the notation, the derivative is the slope of the tangent line to f at a given point. \[f'(x) = \lim_{\Delta x\rightarrow 0} \frac{f(x + \Delta x) – f(x)}{\Delta x}\]

Not all functions are differentiable. Some functions are only differentiable over some domains. The prerequisite for a function to be differentiable at a point is that it is continuous at that point. That is not the only requirement, but it is the primary one. We will cover functions that are continuous but not differentiable when we cover the path of Brownian motion later on.
Common Rules for Differentiation
The following simple rules help us to compute derivatives.
- The derivative of a constant is 0.
- Power rule: \[\frac{d}{dx}(bx^n) = nbx^{n-1}\] where b is a real constant.
- Product rule: \[h(x) = f(x)g(x)\]\[h'(x) = f'(x)g(x) + f(x)g'(x)\]
- Quotient rule: \[h(x) = \frac{1}{g(x)}\]\[h'(x) = -\frac{g'(x)}{(g(x))^2}\]
- Termwise differentiation: \[\frac{d}{dx}(af(x) + bg(x)) = a\frac{d}{dx} f(x) + b\frac{d}{dx} g(x)\] where a and b are real constants.
The termwise differentiation rule shows us that differentiation is a linear operation. Here is an example of differentiating a function:
\[y = a + b_1x + b_2x^2 + b_3x^3 + … + b_kx^k\] \[a, b_1, b_2, b_3, …, b_k \;are\; constants\] According to Rule 1 the derivative of a is zero. According to the power rule: \[f'(x) b_1x = b_1\] \[f'(x) b_2x^2 = 2b_2x\] \[f'(x) b_3x^3 = 3b_3x^2\] \[f'(x) b_kx^k = kb_kx^{k-1}\] Thus, the derivative of y is \[\frac{dy}{dx} = b_1 + 2b_2x^1 + 3b_3x^2 + … + kb_k^{k-1}\]
For composite functions we have the ever popular chain rule.
\[h(x) = f[g(x)]\] If both h and g are differentiable at the point x and if f is derivable at the point s = g(x), then the chain rule says: \[h'(x) = f'(g(x))g'(x)\] \[h(x) = f(g(x))\] \[\frac{dh}{dx} = \left(\frac{df}{dg}\right) \left( \frac{dg}{dx} \right)\]
Sometimes it helps to see the chain rule in action.
\[f(x) = 6x + 3\]\[g(x) = -2x + 5\]\[h(x) = f(g(x))\]
Calculate h'(x):
Begin by computing the derivatives of f(x) and g(x).
\[f'(x) = 6\]\[g'(x) = -2\]
According to the chain rule
\[h'(x) = f'(g(x))g'(x)\]
\[h'(x) = f'(-2x + 5)(-2)\]
\[h'(x) = 6(-2)\]
\[h'(x) = -12\]
The most commonly used derivatives in finance are:
Function | Derivative | Constraints |
f(x) | \[\frac{df}{dx}\] | Domain of P |
\[x^n\] | \[nx^{n-1}\] | Real numbers, if \[n\lt 0, x\neq 0\] |
\[x^a\] | \[ax^{a-1}\] | \[x \gt 0\] |
sin x | cos x | Real numbers |
cos x | – sin x | Real numbers |
tan x | \[\frac{1}{cos^2(x)}\] | \[-\frac{\pi}{2}+n\frac{\pi}{2} \lt x \gt \frac{\pi}{2} + n\frac{\pi}{2}\] |
ln x | \[\frac{1}{x}\] | \[x \gt 0\] |
\[e^x\] | \[e^x\] | Real numbers |
log (f(x)) | \[\frac{f'(x)}{f(x)}\] | \[f(x) \neq 0\] |
In finance the function p = p(t) represents prices at time t. Its logarithmic derivative tells us the instantaneous returns.
A commonly used approximation in finance to approximate incremental changes in the value of a function uses the derivative. Given a function \[y = f(x)\] its increments \[\Delta f = f(x + \Delta x) – f(x)\] are generally approximated by \[\Delta f(x) = f'(x)\Delta x\]
Application to Bond Analysis
One of the most important components to valuing a bond is the bond’s duration. Duration is the derivative of a bond’s value with respect to interest rates divided by the value itself. That is also what a logarithmic derivative is. Let V = value and i = the interest rate $$Duration\; = \left(\frac{dV}{di}\right)\left(\frac{1}{V}\right) = \frac{d(logV)}{di}$$
Higher Order Derivatives
We can also differentiate derivatives. The derivative of a derivative of a function is called a second-order derivative or more simply the second derivative. We notate that as f”(x). The higher order derivatives of the sin function are fun and keep repeating: \[f(x) = sin(x)\] \[f'(x) = cos(x)\] \[f”(x) = -sin(x)\] \[f”'(x) = -cos(x)\] \[f^4(x) = sin(x)\]
Application to Bond Analysis
The second derivative of a bond’s value with respect to interest rates is called dollar convexity. When we divide dollar convexity by the value of the bond we get convexity. $$Convexity\; = \left(\frac{dV^2}{di^2}\right)\left(\frac{1}{V}\right)$$
Taylor Series Expansion
One of the most important uses of derivatives in economics and finance is to approximate how the value of a function like a price function will change by using Taylor Series expansion. A Taylor Series is an expansion of a function into an infinite sum of terms, where each term’s exponent gets larger and larger like this: \[e^x = 1 +x + \frac{x^2}{2!}+\frac{x^3}{3!}+\frac{x^4}{4!} + …\]
Recall that ! is the factorial function and means multiply all the integers from this number down to 1.\[4! = 4 x 3 x 2 x 1 = 24\]
It works like this. Using a calculator we get \[e^2 = 7.389056…\] How does this work with a Taylor Series expansion?
Terms | Result |
$$1+2$$ | 3 |
$$1+2+\frac{2^2}{2!}$$ | 5 |
$$1+2+\frac{2^2}{2!}+\frac{2^3}{3!}$$ | 6.333… |
$$1+2+\frac{2^2}{2!}+\frac{2^3}{3!}+\frac{2^4}{4!}$$ | 7 |
$$1+2+\frac{2^2}{2!}+\frac{2^3}{3!}+\frac{2^4}{4!}+\frac{2^5}{5!}$$ | 7.2666… |
$$1+2+\frac{2^2}{2!}+\frac{2^3}{3!}+\frac{2^4}{4!}+\frac{2^5}{5!}+\frac{2^6}{6!}$$ | 7.3555… |
$$1+2+\frac{2^2}{2!}+\frac{2^3}{3!}+\frac{2^4}{4!}+\frac{2^5}{5!}+\frac{2^6}{6!}+\frac{2^7}{7!}$$ | 7.3809… |
As you can see from the table, as we increase the number of terms we get closer and closer to the value of the function. We can use the first few terms of a Taylor Series to get an approximate value for a function. This can be a very useful tool when we want to approximate, for example, how much a bond’s value will change in response to interest rate changes.
Application to Bond Analysis
Consider an option-free bond with a 9% coupon that pays interest semiannually and has 20 years until maturity. If the initial yield is 6%, this bond is worth $134.6722.
Don’t worry if you don’t yet know how to value a bond. Just know that changes in the interest rate (yield) will affect the price of the bond. So what happens if the interest rate instantaneously changes from 6% to 8%? That bond will lose 18.4% of its value.
What if we are trying to assess the riskiness of this bond and want an approximation for what will happen to its value in a hundred different interest rate change scenarios? We can approximate these changes in value using only the first two terms in the Taylor Series expansion.
Using the aforementioned formula for calculating a bond’s duration, the duration for this bond is 10.66. That is the first term of the approximation. The first term of the Taylor series is the change in interest rates (di), which is 0.02. $$-10.66 \, x \, 0.02 = -0.2131 = -21.32\%$$ This overstates the actual change of -18.4%.
The second term of the Taylor series is the bond’s convexity measure, which using the aforementioned convexity formula is 164.11, multiplied by the interest rate change squared and divided by two factorial $$164.11\, x\, \frac{0.02^2}{2} = 3.28\%$$ Add this second term of the Taylor series to the first $$-21.32\% + 3.28\% = -18.0\% $$ Using only the first two terms of the Taylor series already does a good job of approximating the percentage change in value to this bond from a large change in the interest rate. This approximation method is even more accurate for smaller interest rate changes.
Integration
Integration calculates the area of an arbitrary figure, such as the area underneath a curve on a graph. Calculating the area of rectangles and triangles is simple, but thanks to calculus, we can calculate the area of any sort of figure.
The notation for an integral is the long S-like figure below, and a definite integral is said to go from a to b for a function: $$\int_a^b f(x)dx$$
Riemann Integrals
The German mathematician Bernhard Riemann defined the integral. The idea is that under a curve you can draw a number of rectangles that will approximate the shape of that curve. The sum of the areas of all of these rectangles is approximately the area under that curve.
The narrower that you draw each of these rectangles, the more accurate your approximation will become. The integral takes the limit of these sums when their narrowness approaches zero. $$I = \lim_{max\Delta x_i \rightarrow 0} S_n$$
Riemann Integral Properties
The following four properties are rules that will help you compute integrals when all the integrals of a function exist. So a, b and c are fixed real numbers, f, g and h are functions defined in the same domain and they are all integrable on the same interval (a,b).
- Propery 1: An integral over an interval without length is itself zero. $$\int_a^a f(x)dx = 0$$
- Property 2: Integrals are additive with respect to integration limits. $$\int_a^c f(x)dx = \int_a^b f(x) dx + \int_b^c f(x)dx, \; a \leq b \leq c$$
- Property 3: Integration is a linear operation. $$h(x) = \alpha f(x) + \beta g(x) \Longrightarrow \int_a^b h(x)dx = \alpha \int_a^b f(x)dx + \beta \int_a^b g(x)dx$$
- Property 4: The rule of integration by parts. $$\int_a^b f'(x)g(x)dx = f(x)g(x)\big\vert_a^b – \int_a^b f(x)g'(x)dx$$
As previously mentioned, integrals and derivatives are opposites. So, if we know the derivative we can sometimes figure out the integral. If the derivative is 2x then the integral is x2 + C. The C is for the missing unknown constant that in differentiation equals zero.
Integration Rules
Common Functions | Function | Integral |
Constant | $$\int a \, dx$$ | $$ax + C$$ |
Variable | $$\int x \, dx$$ | $$\frac {x^2}{2} \, +C$$ |
Square | $$\int x^2 \, dx$$ | $$\frac {x^3}{3} + C$$ |
Reciprocal | $$\int \frac{1}{x} \, dx$$ | $$ln \vert x \vert + C$$ |
Exponential | $$\int e^x \,dx$$ | $$e^x + C$$ |
$$\int a^x \, dx$$ | $$\frac{a^x}{ln(a)} + C$$ | |
$$\int ln(x) \, dx$$ | $$x \, ln(x)\, – x + C$$ | |
Trigonometry (x in radians) | $$\int cos(x) \, dx$$ | $$sin(x) + C$$ |
$$\int sin(x) \, dx$$ | $$-cos(x) + C$$ | |
$$\int sec^2(x) \, dx$$ | $$tan(x) + C$$ |
Rule | Function | Integral |
Multiplication by a constant | $$\int C\, f(x) \, dx$$ | $$C \int f(x) \, dx$$ |
Power Rule (nā ā1) | $$\int x^n \, dx$$ | $$\frac{x^{n+1}}{n+1} + C$$ |
Sum Rule | $$\int (f + g) \, dx$$ | $$\int f \, dx \; + \; \int g \, dx$$ |
Difference Rule | $$\int (f \, – g) \, dx$$ | $$\int f \, dx \;\; – \; \int g \, dx$$ |
Fundamental Theorems of Calculus
The first fundamental theorem of calculus shows that integration is the inverse operation of differentiation. For a continuous function f(x) on an interval (a,b) with the integral: $$F(x) = \int_a^x f(t) \, dt$$ The derivative of this integral gets us the original function back: $$F'(x) = f(x)$$ This means that the derivative of the integral of f with respect to its upper limit is the function f itself.
Definite Integrals, Indefinite Integrals and Improper Integrals
Definite integrals have starting and ending values. They are bounded by an interval (a,b). If we allow the upper limit of that interval b to vary, then we have an indefinite integral. Given this, for any function, there must be an indefinite integral for each starting point. This implies that for a given function, any two indefinite integrals of that function differ only by a constant. $$F_a(x) = \int_a^x f(u) \, du, \; F_b(x) = \int_b^x f(u) \, du$$ If a < b, then $$F_a(x) = \int_a^x f(u) \, du = \int_a^b f(u) \, du \; + \int_b^x f(u) \, du \; = \; C \, + \, F_b(x)$$
This is the second fundamental theorem of calculus. Given a continuous function f(x) on an interval (a,b) and its indefinite integral F(x), then $$\int_a^b f(x) \, dx \; = \; F(b) \, – \, F(a)$$ The definite integral of f(x) from a to b is the difference in the values of the indefinite integrals of F(x) at b and a.
Consider a simple integration example of finding the definite integral from 1 to 2 of the linear function y=2x. In standard notation for an integral’s interval, a=1 and b=2. $$\int_1^2 2x \, dx$$
First we find the indefinite integral. $$\int 2x \, dx \; = \; x^2 + C$$ Next we calculate this indefinite integral at our starting and ending points (a,b): $$a \; is \;\;\; \int 2x \, dx \; = \; 1^2 + C$$ $$b \; is \;\;\; \int 2x \, dx \; = \; 2^2 + C$$ Then we subtract the starting from the ending. $$(2^2 + C) \, – \, (1^2 + C)$$ $$4 \, + \, C \, – 1 \, – C \; \; = 3$$ The constant C drops out in definite integrals so we ignore the C. In standard notation when we calculate the indefinite integral, we use brackets and show the interval’s limits after the right bracket. So this example would ordinarily look like this: $$\int_1^2 2x \, dx \; = \; [x^2]_1^2$$ $$= 2^2 \, \, – 1^2$$ $$= 3$$
We can also use this example to demonstrate the first fundamental theorem of calculus. The integral of 2x is x2. $$F(x) \; = \; \int_a^x 2t \, dt \; = \; x^2 \, – \, a^2$$We take the derivative: $$F'(x) \; = \; \frac{d}{dx} (x^2 – a^2) \; = \; 2x \,- \, 0 \; = \; 2x$$ The derivative of the integral of 2x is 2x.
Improper integrals are limits of indefinite integrals either when the integration limits are infinite or when the integrand diverges to infinity at a given point.
Integration by Parts
When you want to integrate two functions that are multiplied together, you can use integration by parts. When you have: $$f(x) \; \times \; g(x)$$ Let f = f(x), g = g(x) and f’ = the derivative of f(x). Integration by parts says that: $$\int f g \, dx \; = \; f \int g \, dx \;\; – \; \int f’ \; (\int g \, dx) \, dx $$
As an example, we have x multiplied by cos(x). So, $$f = x$$ $$ g = cos(x)$$First, differentiate f: $$f’ = x’ = 1$$ Next, integrate g: $$\int g \, dx \; = \; \int cos(x) \, dx \; = \; sin(x)$$ Put them together: $$x \, sin(x) \; – \, \int 1 (sin(x)) \, dx$$ Simplify: $$x \, sin(x) \; – \int sin(x) \, dx$$ Solve: $$x \, sin(x) + cos(x) + C$$
Substitution Rule
As with derivatives, for composite functions there is the chain rule of integration. This is integration by substitution and is also sometimes known as the reverse chain rule. Take the composite function $$h(x) = f[g(x)]$$ Provided that g is integrable on the interval (a,b) and that f is integrable on the interval corresponding to all the points s = g(x), then you have the chain rule of integration: $$\int_a^b f(y)dy = \int_{g^{-1}(a)}^{g^{-1}(b)} f(g(x))g'(x)dx$$
An example will help. We have two functions: $$g(x) = x^2$$ $$f(x) = cos(g(x))$$We write our integral into this form: $$\int f(g(x))g'(x) dx$$ Our integral is therefore: $$\int cos(x^2)\, 2x \, dx$$We make two substitutions: $$f(g(x)) \longrightarrow f(u) $$ $$g'(x) dx \longrightarrow du$$ Then we integrate f(u) and finish by putting g(x) back as u: $$\int cos(u) du = sin(u) + C$$ Put g(x) back as u:$$sin(x^2) + C$$ So we have: $$\int cos(x^2) \, 2x \, dx = sin(x^2) + C$$
Convolution
Convolution is complex multiplication. There is an input list and a method called the kernel for how to multiply and sum those inputs. For two functions f(x) and g(x), their convolution h(x) in math notation looks like this: $$h(x) = f(x) * g(x)$$
A corporate finance example is that at a large manufacturing company, new production hires learn on the job. During their first week, 5% of them need remedial training. That level falls to 3% by the second week and 1% by their third week. Each week there are new production hires. We would set f(x) as the percent of new production needing remedial training. $$ f(x) = [0.05, 0.03, 0.01] $$ We would set g(x) as the volume of new production hires for the rest of the quarter. $$ g(x) = [100,200,300,200,100,100,100] $$ This volume starts at 100, rises to 300 and then decays to 100.
For planning purposes, we are interested in the peak value this convolution hits, and that happens in 2 weeks at 22 new production hires needing remedial training. For each of the seven weeks, we calculate how many new hires will need training based on their weeks on the job. Each element of the weekly new hire list will have a percentage of their volume needing training during each of their first three weeks.
As you can imagine, computing how much training would be needed during any given week can quickly get confusing as it’s easy to lose track of where you are in the computation. To make this computation easier, reverse the order of the elements in g(x) so that the first ones in are the first ones out. In math language we would say: take the horizontal reflection of g(x) and notate it as g(-x) to mean reorder the elements of g(x) to go from: [100,200,300,200,100,100,100] to [100,100,100,200,300,200,100].
To figure out the total number of trainings needed for any given week t, we multiply each weekly volume of new hires with the percent of training for that week and sum the results. That summation is an integral. To account for any possible length, the integral’s interval is from -infinity to +infinity. In calculus notation, this convolution is: $$ (f*g)(t) \, = \, \int_{-\infty}^{\infty} f(\tau )g(t-\tau )d\tau $$ The variable Ļ in this notation holds the value derived from implementing the kernel method: the sum of new hires from this week, last week and the week before last needing training during this specific week. The variable t is the specific week.
Integral Transforms
Integral transforms take a function f(x) into another function F(s) of a different variable s through an improper integral. $$ F(s) \,=\, \int_{-\infty}^{\infty} G(s,x)f(x) \, dx$$ The function G(s,x) is known as the kernel of the transform. This association is one-to-one so that f can be uniquely recovered from its transform F.
A common application of integral transforms are for processes that can be studied from both a time domain or a frequency domain. We would link these two domains through integral transforms. The two most important types of integral transforms in finance are the Laplace transform and the Fourier transform.
Laplace Transforms
There are both one-sided and two-sided Laplace transforms. Given a real-valued function f, its one-sided Laplace transform is an operator that maps f to the function $$L(s) = {\scr L} (f(x))$$ The Laplace transform of a real-valued function is also a real-valued function. This mapping is defined by, if it exists, the improper integral like this: $$ L(s) \, = \, {\scr L}[f(x)] \, = \, \int_0^{\infty} e^{-sx} f(x)dx $$
The one-sided Laplace transform is the most common type of Laplace transform used in engineering, but in probability theory, the most common type is the two-sided Laplace transform. This is due to probability theory’s heavy use of density functions, which are defined on the entire real axis. Within probability theory, the two-sided Laplace transform is known as the moment generating function. The moment generating function is defined, if the improper integral exists, as: $$ L(s) = {\scr L}[f(x)] \, = \, \int_{-\infty}^{\infty} e^{-sx} f(x)dx $$
So, Laplace transforms project a function into a different function space. They only exist for functions that are sufficiently smooth and as x approaches infinity rapidly decay to zero. Use the following two conditions to ensure the existence of a Laplace transform:
- The function f(x) is piecewise continuous.
- The function f(x) is of exponential order such that as x approaches infinity, there exist positive real constants K, a and T such that $$ \vert f(x)\vert \leq Ke^{ax} \; for \, x \lt T $$
Properties of Laplace Transforms
- The Laplace transform is a linear operator.
- Laplace transforms are invertible such that the original function can be recovered.
- If f,g are real-valued functions that have Laplace transforms and a,b are real-valued constants, then: $$ L[a f(x) + b g(x)] \, = \, \int_{-\infty}^{\infty} e^{-sx} (a f(x) + b g(x))dx $$ $$ = a \, \int_{-\infty}^{\infty} e^{-sx} f(x)dx + b \, \int_{-\infty}^{\infty} e^{-sx} g(x)dx $$ $$ a{\scr L}[f(x)] + b{\scr L}[g(x)] $$
- Laplace transforms convert differentiation, integration and convolution into algebraic operations as shown below.
Differentiation
For the one-sided Laplace transform: $$ {\scr L}\begin{bmatrix} \frac{df(x)}{dx} \end{bmatrix} \, = \, s{\scr L}[f(x)] \, – \, f(0) $$ For the two-sided Laplace transform: $$ {\scr L}\begin{bmatrix} \frac{df(x)}{dx} \end{bmatrix} \, = \, s{\scr L}[f(x)] $$ For higher-order derivatives, the two-sided transform is: $$ {\scr L}[f^{(n)} (x)] \, = \, s^n {\scr L} [f(x)] – \, s^{n-1} f(0) \, – \, s^{n-2}f'(0) \, – \, … \, – \, f^{(n-1)}(0) $$
Integration
For the one-sided Laplace transform: $$ {\scr L}\begin{bmatrix} \int_0^t f(x) \end{bmatrix} \, = \, \frac{1}{s}{\scr L} [f(x)] $$ For the two-sided Laplace transform: $$ {\scr L}\begin{bmatrix} \int_0^t f(x) \end{bmatrix} \, = \, \frac{1}{s}{\scr L} [f(x)] $$
Convolution
For convolution where h(x) = (f * g)(x): $$ {\scr L} [h(x)] \, = \, {\scr L} [f * g] \, = \, {\scr L}[f(x)] {\scr L}[g(x)] $$
Fourier Transforms
Fourier transforms are similar to Laplace transforms. Given a function f, its Fourier transformation if the improper integral exists is $$ f(\omega) \, = \, {\widehat F} [f(x)] = \int_{-\infty}^{+\infty} e^{-2\pi i \omega x} f(x)dx $$ The i is the imaginary unity. As with the Laplace transform, the original function can be recovered from its transform.
Fourier transforms are linear operations. Fourier transforms of derivatives and integrals are like the Laplace transform. The Fourier transform of convolutions is the product of Fourier transforms.
Multivariate Calculus
The calculus of a single variable can be extended to functions of more than one variable. Given a function of n variables, y = f(x1, …, xn), there are n partial derivatives $$ \frac{\partial f(x_1, …, x_n)}{\partial x_i} $$ i = 1, …, n holding constant n – 1 variables and then using the definition for derivatives of univariate functions: $$ \frac{\partial f(x_1, …, x_n}{\partial x_i} \, = \, \lim_{h\to 0} \frac{f(x_1, …, x_i \, + \, h, …, x_n) \, – \, f(x_1, …, x_i, …., x_n}{h} $$ Repeating this process, partial derivatives can be defined of any order.
Consider this function of two variables. $$ f(x,y) = e^{-(x^2 + \sigma xy + y^2)} $$ Its partial derivatives up to order 2 are: $$ \frac{\partial f}{\partial x} = -(2x + \sigma y)e^{-x^2 + \sigma xy + y^2)}$$ $$ \frac{\partial f}{\partial y} = -(2y + \sigma x) e^{-(x^2 + \sigma xy + y^2)}$$ $$ \frac{\partial^2 f}{\partial x^2} = -2e^{-(x^2 + \sigma xy + y^2)} + (2x + \sigma y)^2 e^{-(x^2 + \sigma xy + y^2)}$$ $$ \frac{\partial^2 f}{\partial y^2} = -2e^{-(x^2 + \sigma xy + y^2)} + (2y + \sigma x)^2 e^{-(x^2 + \sigma xy + y^2)}$$ $$ \frac{\partial^2 f}{\partial x \partial y} = (2x + \sigma y)(2y + \sigma x) e^{-(x^2 + \sigma xy + y^2)} – \sigma e^{-(x^2 + \sigma xy + y^2)}$$
The multivariate integral is defined as the limit of the sums of multidimensional rectangles. Multidimensional integrals represent the same concept of area as with univariate integrals except we would call it volume instead of area.
In bond analysis, partial derivatives are used in the unusual case where interest rates are not the same for each time period. In such a scenario, derivatives are calculated for each time period’s interest rate.
Calculus Essentials Summary
- Calculus makes the infinitesimally small and infinitely large precise.
- A function tends to a finite limit if there is a number to which the function can get arbitrarily close.
- A function tends to an infinite limit if it can exceed any given quantity.
- A derivative of a function is the limit of its incremental ratio when the interval approaches zero. This represents the rate of change of quantities.
- Integrals represent the area below a curve and are the limit of the rectangle sums below the curve that approximate this area. They can be used to represent cumulated quantities.
- Convolution is complex multiplication.
- Integrals and derivates are inverse operations.
- The derivative of the product of a constant and a function is the product of the constant and the integral of the function.
- The integral of the product of a constant and a function is the product of the constant and the integral of the function.
- The derivative and the integral of a sum of functions is the sum of derivatives or integrals.
- Derivation and integration are both linear operations.
- The derivative of a product of functions is the derivative of the first function times the second plus the first function times the derivative of the second.
- The derivative of a function of function is the product of outer function with respect to the inner function times the derivative of the inner function.
- A derivative of order n of a function is defined as the function that results from applying the derivation operation n times.
- A function that is differentiable to any order at a given point a can be represented as a series of the powers of (x ā a) times the nth derivative at a times the reciprocal of n factorial (n!). This is known as a Taylor series expansion.
- Taylor series that are truncated to the first or second terms are known as first and second order approximations, respectively.
- Laplace and Fourier transforms of a function are the integral of that function times an exponential.
- Laplace and Fourier transforms are useful because they transform differentiation and integration into algebraic operations, thereby providing a method for solving linear differential equations.
- Differentiation and integration can be extended to functions of more than one variable.
- A function of n variables has n first derivatives, n-squared second derivatives, n-cubed third derivatives, and so forth.