We'll now target integrations of differentiable physics (DP) setups into NNs. When using DP approaches for learning applications, there is a lot of flexibility w.r.t. the combination of DP and NN building blocks. As some of the differences are subtle, the following section will go into more detail. We'll especially focus on solvers that repeat the PDE and NN evaluations multiple times, e.g., to compute multiple states of the physical system over time.
To re-cap, here's the previous figure about combining NNs and DP operators. In the figure these operators look like a loss term: they typically don't have weights, and only provide a gradient that influences the optimization of the NN weights:
---
height: 220px
name: diffphys-short
---
The DP approach as described in the previous chapters. A network produces an input to a PDE solver $\mathcal P$, which provides a gradient for training during the backpropagation step.
This setup can be seen as the network receiving information about how it's output influences the outcome of the PDE solver. I.e., the gradient will provide information how to produce an NN output that minimizes the loss.
Similar to the previously described physical losses (from {doc}physicalloss
), this can mean upholding a conservation law.
However, with DP, there's no real reason to be limited to this setup. E.g., we could imagine a swap of the NN and DP components, giving the following structure:
---
height: 220px
name: diffphys-switch
---
A PDE solver produces an output which is processed by an NN.
In this case the PDE solver essentially represents an on-the-fly data generator. That's not necessarily always useful: this setup could be replaced by a pre-computation of the same inputs, as the PDE solver is not influenced by the NN. Hence, there's no backpropagation through
However, this version does not leverage the gradient information from a differentiable solver, which is why the following variant is much more interesting.
In general, there's no combination of NN layers and DP operators that is forbidden (as long as their dimensions are compatible). One that makes particular sense is to "unroll" the iterations of a time stepping process of a simulator, and let the state of a system be influenced by an NN.
In this case we compute a (potentially very long) sequence of PDE solver steps in the forward pass. In-between these solver steps, an NN modifies the state of our system, which is then used to compute the next PDE solver step. During the backpropagation pass, we move backwards through all of these steps to evaluate contributions to the loss function (it can be evaluated in one or more places anywhere in the execution chain), and to backprop the gradient information through the DP and NN operators. This unrollment of solver iterations essentially gives feedback to the NN about how it's "actions" influence the state of the physical system and resulting loss. Here's a visual overview of this form of combination:
---
height: 180px
name: diffphys-mulitstep
---
Time stepping with interleaved DP and NN operations for $k$ solver iterations. The dashed gray arrows indicate optional intermediate evaluations of loss terms (similar to the solid gray arrow for the last step $k$), and intermediate outputs of the NN are indicated with a tilde.
Due to the iterative nature of this process, errors will start out very small, and then slowly increase exponentially over the course of iterations. Hence they are extremely difficult to detect in a single evaluation, e.g., with a simpler supervised training setup. Rather, it is crucial to provide feedback to the NN at training time how the errors evolve over course of the iterations. Additionally, a pre-computation of the states is not possible for such iterative cases, as the iterations depend on the state of the NN. Naturally, the NN state is unknown before training time and changes while being trained. Hence, a DP-based training is crucial in these recurrent settings to provide the NN with gradients about how its current state influences the solver iterations, and correspondingly, how the weights should be changed to better achieve the learning objectives.
DP setups with many time steps can be difficult to train: the gradients need to backpropagate through the full chain of PDE solver evaluations and NN evaluations. Typically, each of them represents a non-linear and complex function. Hence for larger numbers of steps, the vanishing and exploding gradient problem can make training difficult. Some practical considerations for alleviating this will follow int {doc}diffphys-code-sol
.
One question that we have ignored so far is how the merge the output of the NN into the iterative solving process. In the images above, it looks like the NN
While this approach is possible, it is not necessarily the best in all cases. Especially if the NN should produce only a correction of the current state, we can reuse parts of the current state. This avoids allocating resources of the NN in the form of parts of
In general, we can use any differentiable operator for
Next, we'll formalize the descriptions of the previous paragraphs. Specifically,
we'll answer the question:
what does the resulting update step for
% ... we'll use an additive correction with
$$ \frac{\partial L}{\partial\theta}= \sum_i \sum_{m=1}^{k} \Big[ \frac{\partial L}{\partial x_{i,k}} \Big(\prod_{n=k}^{m+1} \frac{\partial x_{i,n} }{ \partial x_{i,n-1}} \Big) \frac{\partial x_{i,m}}{\partial \tilde{x}{i,m-1}} \frac{\partial \tilde{x}{i,m-1}}{\partial\theta} \Big] $$ (gradient-time-unroll)
This doesn't look too intuitive on first sight, but this expression has a fairly simple structure: the first sum for
At each last step
It's important to keep in mind that for large
In terms of implementation, all deep learning frameworks will re-use the overlapping parts that repeat for different
Now that we have all this machinery set up, a good question to ask is: "How much does training with a differentiable physics simulator really improve things? Couldn't we simply unroll a supervised setup, along the lines of standard recurrent training, without using a differentiable solver?" Or to pose it differently, how much do we really gain by backpropagating through multiple steps of the solver?
In short, quite a lot! The next paragraphs show an evaluation for a turbulent mixing layer from List et al. {cite}list2022piso
, case to illustrate this difference. Before going into details, it worth noting that this comparison uses a differentiable second-order semi-implicit flow solver with a set of custom turbulence loss terms. So it's not a toy problem, but shows the influence of differentiability for a complex, real-world case.
The nice thing about this case is that we can evaluate it in terms of established statistic measurements for turbulence cases, and quantify the differences in this way. The energy spectrum of the flow is typically a starting point here, but we'll skip it and refer to the original paper {cite}list2022piso
, and rather focus on two metrics that are more informative. The graphs below show the Reynolds stresses and the turbulence kinetic energy (TKE), both in terms of resolved quantities for a cross section in the flow. The reference solution is shown with orange dots.
% height: 220px
---
name: diffphys-unrollment-graphs
---
Quantified evaluation with turbulence metrics: Reynolds stresses (L) and TKE (R). The red curve of the training without a differentiable solver deviates more strongly from the ground truth (orange dots) than the training with DP (green).
Especially in the regions indicated by the colored arrows, the red curve of the "unrolled supervised" training deviates more strongly from the reference solution. Both measurements are taken after 1024 time steps of simulation using the fluid solver in conjunction with a trained NN. Hence, both solutions are quite stable, and fare significantly better than the unmodified output of the solver, which is shown in blue in the graphs.
The differences are also very obvious visually, when qualitatively comparing visualizations of the vorticity fields:
---
name: diffphys-unrollment-imgs
---
Qualitative , visual comparison in terms of vorticity. The training with a differentiable physics solver (top) results in structures that better preserve those of the reference solution obtained via a direct numerical simulation.
Both versions, with and without solver gradients strongly benefit from unrollment, for 10 steps in this comparison. However, the supervised variant without DP cannot use longer-term information about the effects of the NN at training time, and hence its capabilities are limited. The version trained with the differentiable solver receives feedback for the whole course of the 10 unrolled steps, and in this way can infer corrections the give an improved accuracy for the resulting, NN-powered solver.
As an outlook, this case also highlights the practical advantages of incorporating NNs into solvers: we
can measure how long a regular simulation would take to achieve a certain accuracy in terms of turbulence statistics.
For this case it would require more than 14x longer than the solver with the NN {cite}list2022piso
.
While this is just a first data point, it's good to see that, once a network is trained, real-world improvements in terms of performance can be achieved more or less out-of-the-box.
Other works have proposed perturbing the inputs and
the iterations at training time with noise {cite}sanchez2020learning
, somewhat similar to
regularizers like dropout.
This can help to prevent overfitting to the training states, and in this way can help to stabilize training iterative solvers.
However, the noise is very different in nature. It is typically undirected, and hence not as accurate as training with the actual evolutions of simulations. So noise can be a good starting point for training setups that tend to overfit. However, if possible, it is preferable to incorporate the actual solver in the training loop via a DP approach to give the network feedback about the time evolution of the system.
The following sections will give code examples of more complex cases to show what can be achieved via differentiable physics training.
First, we'll show a scenario that employs deep learning to represent the errors
of numerical simulations, following Um et al. {cite}um2020sol
.
This is a very fundamental task, and requires the learned model to closely
interact with a numerical solver. Hence, it's a prime example of
situations where it's crucial to bring the numerical solver into the
deep learning loop.
Next, we'll show how to let NNs solve tough inverse problems, namely the long-term control
of a Navier-Stokes simulation, following Holl et al. {cite}holl2019pdecontrol
.
This task requires long term planning,
and hence needs two networks, one to predict the evolution,
and another one to act to reach the desired goal. (Later on, in {doc}reinflearn-code
we will compare
this approach to another DL variant using reinforcement learning.)
Both cases require quite a bit more resources than the previous examples, so you can expect these notebooks to run longer (and it's a good idea to use check-pointing when working with these examples).