Chapter 1 considered general limitations of parametric models, and polynomial regression in particular (see Figure 1.1), which motivated the use of the more flexible spline models (see Figure 1.3) – though at that stage no mathematical details were presented. In Chapter 2, basic spline definitions were given, including the notation of smoothness constraints, and these ideas were further explored in the Exercises in Section 3.6. This chapter will now give mathematical details of the interpolating spline problem and consider application to data. A feature of all these interpolation methods is that they fit the data exactly and that the fitted functions are smooth. Figure 3.1, is cubic spline interpolation, which fits a cubic polynomial between successive data points such that the function, gradient and the curvature are all continuous at each data point. The solid line shows the fitted values within the range of the data, whereas the dashed line shows the fitted values outside the range of the data – extrapolation.
Suppose we have \(n\) observations \(\{y_1,\dots,y_m\}\) at locations \(\{t_1,\dots,x_m\}.\) We can construct a cubic spline (that is with \(p=3\)) to pass through (interpolate) all the points \((t_i,y_i),\ i=1,\dots,m\). In fact, for any given set of points, there is an infinite number of cubic splines which interpolate them, see Figure 3.2 for examples. Exactly one of these splines has the property that, in the leftmost and rightmost intervals, it is a straight line. Such a spline is called a natural cubic spline.
Note that natural splines are not the only choice for the spline method – see the R help page for other options – with perhaps the most useful other being \(\texttt{periodic}\) which can be considered if we expect the unknown function to also be periodic. Figure 3.3 shows the fitted natural and periodic spline fitted to a simulated cosine function. The only noticeable difference is outside the range of the data, that is for extrapolation. Great care should be used, however, as imposing such additional restrictions on the fitted function can lead to unforeseen modelling errors when we do not observe the full range of \(\texttt{x}\) values.
Code
set.seed(15340) # for reproducibilitypar(mar=c(3,3,1,1), mgp=c(2,1,0))# Generate some artificial data n =12x =seq(0, 2*pi, length.out=n); y =cos(x+1)+rnorm(n,0,0.1)y =c(y[-n],y[1]) # make sure first/last point equal# Fit splines but with different "methods"# ... using natural splinesplot(x, y, xlim=c(-pi/2,7*pi/3), ylim=c(-3,3), pch=16)mysplinefit2 =splinefun(x, y, method="natural")curve(mysplinefit2, 0, 2*pi, add=T)curve(mysplinefit2, -pi/2, 7*pi/3, add=T, lty=2)# ... using periodic splinesplot(x, y, xlim=c(-pi/2,7*pi/3), ylim=c(-3,3), pch=16)mysplinefit2 =splinefun(x, y, method="periodic")curve(mysplinefit2, 0, 2*pi, add=T)curve(mysplinefit2, -pi/2, 7*pi/3, add=T, lty=2)
3.3 Properties of natural splines
Natural splines are a special case of polynomial splines of odd order \(p\). Thus we have natural linear splines (\(p=1\)), natural cubic splines (\(p=3\)), etc. A spline is said to be natural if, beyond the boundary knots \(t_1\) and \(t_{m}\), its \((p+1)/2\) higher-order derivatives are zero: \[
f^{(j)}(t) = 0,
\tag{3.1}\] for \(j={(p+1)}/{2},\ldots,p\) and either \(t \leq t_1\) or \(t \geq t_m\).
Thus a natural spline of order \(p\) has the following \(p+1\) constraints, in addition to those of Equation 2.2 : \[
\lim _{\epsilon\rightarrow 0} f^{(\ell)}(t_1-\epsilon)
=
\lim _{\epsilon\rightarrow 0} f^{(\ell)}(t_m+\epsilon) =0,
\tag{3.2}\] for \(\ell={(p+1)}/{2},\ldots,p\).
In particular,
a natural linear spline has \(p+1=2\) additional constraints: \[
\lim _{\epsilon\rightarrow 0} f^{(1)}(t_1-\epsilon)
=
\lim _{\epsilon\rightarrow 0} f^{(1)}(t_m+\epsilon) =0,
\tag{3.3}\] implying that \(f(t)\) is constant in the outer intervals of a natural linear spline,
a natural cubic spline has \(p+1=4\) additional constraints: \[
\lim _{\epsilon\rightarrow 0} f^{(2)}(t_1-\epsilon) = \lim _{\epsilon\rightarrow 0} f^{(2)}(t_m+\epsilon) =0,
\]\[
\lim _{\epsilon\rightarrow 0} f^{(3)}(t_1-\epsilon) = \lim _{\epsilon\rightarrow 0} f^{(3)}(t_m+\epsilon) =0,
\tag{3.4}\] implying that \(f(t)\) is linear in the outer intervals of a natural cubic spline.
The total degrees of freedom of a natural spline is, starting from Equation 2.3, but taking into account the additional \(p+1\) additional constraints is \[
\text{df}_{\text{nat.spline}} = m+p+1 - (p+1) = m.
\tag{3.5}\] That is the degrees of freedom for natural splines equals \(m\) whatever the value of \(p\).
Proposition 3.1 Linear and cubic natural splines have the following representations:
Proof: Not covered here (but may be included later in the module if time allows).
Focus on natural splines quiz
Test your knowledge recall and comprehension to reinforce basic ideas about natural splines.
Which of the following is a key property of every spline?
Which of the following is NOT a key property of a cubic spline function?
Which of the following statements is true ?
Which of the following statements is a true ?
5 What determines the degrees of freedom of a natural spline?
3.4 Roughness penalties
An aim of spline models is to describe an unknown function using piecewise-polynomials which are smooth. In the previous section, smoothness was imposed by explicitly constraining specified high-order derivatives. An alternative approach is to measure and control the degree of smoothness of the splines. In practice the roughness of the spline is usually measured and one definition of roughness is: \[
J_\nu(f) = \int_{-\infty}^\infty \left[ f^{(\nu)}(t) \right]^2 \text{d}t
\tag{3.8}\] where \(\nu\geq 1\) is an integer and \(f^{(\nu)}\) denotes the \(\nu\)th derivative of \(f\). Thus \(f^{(1)}(t)\) denotes the first derivative and \(f^{(2)}(t)\) denotes the second derivative of \(f\).
Intuitively, roughness measures the “wiggliness” of a function.
Aim might be to find the smoothest function which interpolates the data points. Hence, an alternative approach to that in previous sections is to find the function \(f\) which minimizes Equation 3.8 and satisfies \(f(t_i)=y_i\) for \(i=1,\dots,m\). We refer to the solutions of this problem as the optimal interpolating function.
It turns out that there is a very close link between \(J_\nu(\cdot)\) and \(p\)th-order natural splines, where \(p=2\nu-1\) (so \(p\) is odd). Important special cases are: \(\nu=1\) and \(p=1\), and \(\nu=2\) and \(p=3\). This relationship is defined in the following proposition.
Proposition 3.2 The optimal interpolating function is a \(p\)th-order natural spline, where \(p = 2ν − 1\). That is, the natural spline \(f\) is the unique minimizer of \(J_\nu(f)\).
Proof: Not covered here (but may be included later in the module if time allows).
Comments
Linear and cubic interpolating splines are also of interest in numerical analysis, for example to interpolate tables of numbers.
The linear interpolating spline is simply the piecewise-linear path connecting the data points.
Of course, in the linear spline case, knot points are clearly visible as kinks in the interpolating function. But, in the cubic spline case, knots points are invisible to the naked eye. Hence, in general, there is little motivation to use higher-order splines.
Numerical considerations: the interpolating spline solutions involve matrix inversion. The inversion of an \(n \times n\) matrix involves \(O(n^3)\) operations – hence it is time consuming if \(n\) is large (for example, \(n=1000\) or \(10000\)). Fortunately there are tricks to reduce the computation to \(O(n)\).
3.5 Fitting interpolating splines in R
There are two main function within \(\mathbf{R}\) for fitting interpolating splines to data, \(\texttt{spline}\) which outputs fitted values for specified points or \(\texttt{splinefun}\) which returns an \(\mathbf{R}\)function which can be used directly by other commands, such as \(\texttt{curve}\). The following illustrates the two approaches.
Code
set.seed(15342)par(mar=c(3,3,1,1), mgp=c(2,1,0))x =1:9; y =rnorm(9)# spline deals with pointsplot(x, y, xlim=c(0,10), ylim=c(-3,3), pch=4)mysplinefit1 =spline(x, y, method="natural")lines(mysplinefit1)# splinefun produces a functionplot(x, y, xlim=c(0,10), ylim=c(-3,3), pch=4)mysplinefit2 =splinefun(x, y, method="natural")curve(mysplinefit2, 0, 10, add=T)
(a) Using the spline command
(b) Using the splinefun command
Figure 3.4: R code for cubic interpolating splines.
The following, illustrates the different ways to draw the spline and to calculated fitted values.
Code
set.seed(15342)x =1:9; y =rnorm(9)# spline deals with pointsmysplinefit1 =spline(x, y, method="natural")spline(x, y, xout=c(2.5, 7.5), method="natural")
$x
[1] 2.5 7.5
$y
[1] 0.08785792 -0.78655273
Code
# splinefun produces a functionmysplinefit2 =splinefun(x, y, method="natural")mysplinefit2(c(2.5, 7.5))
[1] 0.08785792 -0.78655273
Before finishing, as we saw in Lecture 2, let us consider the derivatives of the fitted spline function. Figure 3.5 shows (a) the fitted natural spline, along with its first three derivatives in (b)-(d). Note that the function and the first two derivatives are continuous everywhere, but that the third derivatives is not continuous but has jumps at the knot locations. Note also that the first derivative, and higher derivatives, are all constant outside the range of the interior knots.
Code
set.seed(15342)x =1:9; y =rnorm(9)mysplinefit2 =splinefun(x, y, method="natural")plot(x, y, pch=16)curve(mysplinefit2, 0, 10, add=T, lwd=1.5)abline(v=x, col="grey")# Plot the first derivativecurve(mysplinefit2(x,deriv=1), 0, 10, lwd=1.5, xlab="x", ylab="First derivative")abline(v=x, col="grey"); abline(h=0, col="grey")# Plot the second derivativecurve(mysplinefit2(x,deriv=2), 0, 10, lwd=1.5, xlab="x", ylab="Second derivative")abline(v=x, col="grey"); abline(h=0, col="grey")# Plot the third derivativecurve(mysplinefit2(x,deriv=3), 0, 10, lwd=1.5, n=10001,xlab="x", ylab="Third derivative")abline(v=x, col="grey"); abline(h=0, col="grey")
(a) Data and spline function
(b) First derivative
(c) Second derivative
(d) Third derivative
Figure 3.5: Derivatives of cubic interpolating splines.
Focus on fitting splines quiz
Test your knowledge recall and comprehension to reinforce basic ideas about R commands for splines.
Which of the following should be first in every data analysis?
Which of the following is NOT part of a data analysis using interpolating splines?
What is the main difference between the R commands spline and splinefun?
Which of the following is NOT an optional method for the R command spline?
When using the output from splinefun, which of the following CANNOT be plotted?
3.6 Exercises
3.1 For the situation shown in Figure 2.2, but taking \(p=1\), write-down the linear functions for the three intervals and clearly identify all the \(6\) model parameters. Next, write down the constraints required to make the functions pass through the \(m=2\) data points, and the two constraints which impose continuity of function. What additional constraints are needed to fix the first derivative at zero for the outer two intervals?
You might get two redundant constraints but think carefully about which can be removed.
3.2 Continuing the problem described in Exercise 3.1, write the constraints as a system of 6 linear equations in the 6 unknown model parameters. How might you solve this system to give the parameter values which solve the interpolation problem?
You need to write the six equations as a set of simultaneous equations. A simple matrix inversion is all that is needed for the solution – but don’t try to actually do the inversion!
3.3 Continuing the linear system described in Exercise 3.2, create a synthetic problem by choosing two data response values. Then solve the system in \({\mathbf R}\), or otherwise, and plot the fitted spline interpolating function.
The choice of the two data point locations (x-values) and function value (y-value) is completely your choice. Then, define the two vectors and the matrix defined in the previous question. R has a function “solve” which will invert the design matrix – look at “help(solve)”
3.4 Again, considering the situation shown in Figure 2.2, but taking \(p=1.\) Using the alternative representation in Equation 3.6, write down two constraints involving the data points and the additional constraint on the \(b_i\) parameters. Write this linear system of 3 equations in three unknowns in matrix form.
The first part only requires checking the definition of the alternative form in the notes. Then, write it in vector/matrix form.
3.5 Continuing the linear system described in Exercise 3.4, using the same points created in Exercise 3.4, calculate the parameter values in this new parameterization. Check that your two fitted interpolating spline give the same answers. Which approach do you prefer? Justify you answer.
Again, in R, define the vectors and matrix, and use “solve” to find the parameter estimates. You can check for equality by any approach, for example just plot both solutions and see if they match. The choice of which it preferable is yours, no wrong answer, but it’s the justification that matters.
3.6 Create you own version of the R code used to produce Figure 3.4 and experiment with the two alternative spline fitting commands. Remove the \(\texttt{set.seed(15342)}\) command so that you produce different data each time and comment on the similarities and differences when using different data sets.
Simply copy the code from the referred to figure and remove the set.seed command. Then, each time you run the code you will get a different answer.
3.7 Let \[\begin{equation*}
f(t) = 3 + 2t + 4|t|^3 + |t-1|^3.
\end{equation*}\] Write \(f\) as a cubic polynomial in each of the intervals \((-\infty,0)\), \((0,1)\) and \((1, \infty)\). Verify that \(f\) and its first two derivatives are continuous at the knots.
Is \(f\) a spline? Is \(f\) a natural spline?
To show this is a spline, you need to re-write the equations as polynomials. That is consider what happens to the absolute values in each interval. Then, substitute in the knot locations (0 and 1) into the equations for function, derivative and second derivative and make sure all matches. There is also a solution based on the equation of the alternative form – see if you can argue it this way also.
Show by direct calculation that \[\begin{equation*}
\int_{-\infty}^{\infty} \{f'(t)\}^2 \, dt = 8.
\end{equation*}\]
Show that this integral can also be written in the form \(-2 \mathbf b^T K \mathbf b\), where you should define \(\mathbf b\) and \(K\).
Use a similar approach as in the previous equation to “remove” the absolute value. You should then see that the integral is trivial. For the matrix form, go back to the definitions of each part. Multiplying out will then give the same answer as the integral.