-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspine2_devops.html
312 lines (284 loc) · 14 KB
/
spine2_devops.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Spine II - An Overview</title>
<meta name="description" content="A walk-through of how The Big NHS Computer was replaced">
<meta name="author" content="Martin Sumner">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="css/reveal.css">
<link rel="stylesheet" href="css/theme/black.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? 'css/print/pdf.css' : 'css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section>
<h2>The Big NHS Computer</h2>
<p>The Prime Minister said ..</p>
<blockquote cite="http://www.independent.co.uk/voices/commentators/oliver-wright-the-potential-was-huge-but-so-were-the-problems-2330925.html">
“The possibilities are enormous if we can get this right”
</blockquote>
</section>
<section>
<h2>Tonight I intend to ...</h2>
<p>Tell a (long) story ... as fast as I can</p>
<p>Focus on the operations part of the story</p>
<p></p>
<p><small>http://martinsumner.github.io/presentations/spine2_devops.html#/</small></p>
</section>
<section>
<p>See Wikipedia </p>
<p><a href="https://en.wikipedia.org/wiki/List_of_failed_and_overbudget_custom_software_projects#Permanent_failures" target="_blank">List of failed and overbudget custom software projects - Permanent Failures</a>
</p>
<img width="1200" height="200" data-src="images/wikipedia.png" alt="Wikipedia Screenshot"/>
</section>
<section>
<h2>The Spine Part - The supplier speaks ...</h2>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“It has made transformational healthcare applications available to approximately 1.3 million NHS healthcare staff across England, providing care to circa 50 million UK citizens.”
</blockquote>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“20-plus customised NHS Spine applications ... combined cutting edge technologies to meet the demanding service level agreements and response times required ”
</blockquote>
</section>
<section>
<h2>More of their own words</h2>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“The contract was (and continues to be) one of the largest IT programmes in the world, consuming over 15,000 man-years of effort to date ... Over 3,000 servers are hosted and supported”
</blockquote>
<blockquote cite="http://www.globalservices.bt.com/uk/en/casestudy/nhs_spine">
“(The delivery) methodology is now an internationally recognised standard for complex software development programme delivery”
</blockquote>
</section>
<section>
<p>What did we build again?</p>
<img width="500" height="500" data-src="images/first-death-star.png" alt="The Death Star">
</section>
<section>
<h2>What does this kind of success look like?</h2>
<p align="left">Spine can release with <strong>£30m</strong> in transition costs alone</p>
<p align="left">It costs over <strong>£50m</strong> per annum to keep the lights on</p>
<p align="left">Around <strong>50%</strong> of the original business case met</p>
<p align="left">The system is <strong>stable</strong> when <strong>untouched</strong></p>
</section>
<section>
<blockquote cite="https://github.com/GovernmentCommunicationsHeadquarters/BoilingFrogs/blob/master/GCHQ_Boiling_Frogs.pdf">
“They shouldn't build these death stars any more. They keep getting blown up”
</blockquote>
<img width="500" height="500" data-src="images/death-star-2.jpg" alt="The Death Star">
</section>
<section data-background="#A3C2FF">
<h2>How did our rebel alliance approach the problem?</h2>
<p align="left">Making predictions is hard ... especially about the future</p>
<p align="left">If the answer is big and expensive ... re-frame the question</p>
<p align="left">Take responsibility ... no other “who” to blame</p>
</section>
<section data-background="#A3C2FF">
<h2>What did/does it cost?</h2>
<p>Took <strong>100 man years</strong> from inception to 1-years service</p>
<p>Requires just over <strong>100 commodity 1RU servers</strong> in live</p>
<p>Release costs are <strong>< 0.1%</strong> of previous release costs</p>
<p><strong>90%</strong> reduction in operating costs</p>
<p>Total running team of <strong>30</strong> people supporting and ...</p>
<p>... Managing more than <strong>£10m</strong> pa of change backlog</p>
</section>
<section data-background="#A3C2FF">
<h2>Does it work?</h2>
<p>(Nearly) like-for-like functional replacement ...</p>
<p><strong>99.999%</strong> available since go live</p>
<p>Supports over <strong>300</strong> message interactions, eight UI applications</p>
<p><strong>41.3M</strong> messages a day<p>
<p>Provides accesss to <strong>1.5bn</strong> records and documents</p>
<p>The NHS waits more than <strong>800 working days less</strong> each day</p>
</section>
<section>
<p>Ops conversations from inside Death Star 1</p>
<img width="700" height="500" data-src="images/stormtrooper-despair.jpg" alt="Stormtrooper Despair">
</section>
<section>
<h2>Network Down</h2>
<blockquote align="left">
“We've spent an hour troubleshooting DNS - but it turns out the network is down”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“Aren't they resilient links?”
</font>
</blockquote>
<blockquote align="left">
“The standby failed three weeks ago”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“And nobody fixed it?”
</font>
</blockquote>
<blockquote align="left">
“It was only a standby”
</blockquote>
</section>
<section>
<h2>Storage Down</h2>
<blockquote align="left">
“Since the SAN failed on Friday at 18:05 we've had the best experts in the world crawling over it too understand why - and they can't find anything wrong”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“What about this error log that's been raised 360,000 times since 18:05 Friday?”
</font>
</blockquote>
<blockquote align="left">
“Oh ... that might be something”
</blockquote>
</section>
<section>
<h2>Half a brain is dangerous</h2>
<blockquote align="left">
“The server won't start - listener is busy. Must be the load-balancer keepalive. Disable it.”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“WTF? See, makes no difference”
</font>
</blockquote>
<blockquote align="left">
“Made no difference - unconfigure it.”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“WTF? See, makes no difference”
</font>
</blockquote>
<blockquote align="left">
“Made no difference - reboot the load-balancer, erase the config and start again.”
</blockquote>
</section>
<section>
<h2>Big problems need big action</h2>
<blockquote align="left">
“The CRL is massive - why are you sending it over the network 60 times per second, just to confirm which CRL you've checked?”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“Fixing this is unnecessary - we're re-architecting the whole system as it isn't reliable and doesn't perform”
</font>
</blockquote>
</section>
<section>
<h2>Oh, you used that option</h2>
<blockquote align="left">
“Ah yes, turning that option on may have a bug which causes any failure to propagate throughout the cluster failing every single node”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“Perhaps knowing that <strong>before</strong> everyone had lost their jobs would have been helpful”
</font>
</blockquote>
</section>
<section>
<h2>But telling you that isn't in my interest</h2>
<blockquote align="left">
“We don't know why the database slowed down, can we view the stats from the SAN”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“Where's your proof its a SAN issue?”
</font>
</blockquote>
<blockquote align="left">
“I have no proof”
</blockquote>
<blockquote align="right">
<font color="#FFFF66">
“No proof no stats”
</font>
</blockquote>
</section>
<section data-background="#E6E68A">
<h2>Operations Learning Applied to Spine 2</h2>
</section>
<section data-background="#E6E68A">
<h2>You build it, you run it - Learning is Everything</h2>
<p>If you run away from running things - you shouldn't decide anything</p>
</section>
<section data-background="#E6E68A">
<h2>Engineer Aspiration</h2>
<blockquote cite="https://en.wikipedia.org/wiki/No_Silver_Bullet">
<p>“Whereas the difference between poor conceptual designs and good ones may lie in the soundness of design method, the difference between good designs and great ones surely does not.</p>
<p>Great designs come from great designers.</p>
<p>... very best designers produce structures that are faster, smaller, simpler, cleaner, and produced with less effort. ”</p>
</blockquote>
<p><small>Fred Brooks, No Silver Bullet, 1986</small></p>
</section>
<section data-background="#E6E68A">
<h2>Getting away from failure</h2>
<blockquote cite="http://research.microsoft.com/pubs/191008/FailureRecoveryBeEvil.pdf"><p>“ ... failure recovery can cause more problems than it solves, and so must be engineered explicitly according to a <strong>do no harm</strong> requirement”</p>
</blockquote>
</section>
<section data-background="#E6E68A">
<h2>Five Whys</h2>
<blockquote cite="https://daringtolivefully.com/the-5-whys">
<p>“The Toyota production system has been built on the practice and evolution of this scientific approach. By asking and answering ‘why’ five times, we can get to the real cause of the problem, which is often hidden behind more obvious symptoms.”</p>
</blockquote>
</section>
<section data-background="#E6E68A">
<h2>Logs and Automation</h2>
<p>Use the logs, practice using the logs</p>
<p>Unify the view of system in one tool</p>
<p>Forsake speed in development</p>
<p>See detail ... act on detail</p>
</section>
<section data-background="#E6E68A">
<h2>Monolith First</h2>
<p>Finding the simplest path to most destinations</p>
<img width="700" height="400" data-src="images/ChooseBoringTech.png" alt="Map Problems to Few Solutions" cite="http://mcfunley.com/choose-boring-technology">
<p>Use building blocks not magic boxes</p>
</section>
<section data-background="#A3C2FF">
<h2>In conclusion</h2>
<p align="left">Government IT need not be all bad</p>
<p align="left">Radical change is possible</p>
<p align="left">DevOps means something really important to us</p>
</section>
</div>
</div>
<script src="lib/js/head.min.js"></script>
<script src="js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
center: true,
transition: 'slide', // none/fade/slide/convex/concave/zoom
// Optional reveal.js plugins
dependencies: [
{ src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: 'plugin/zoom-js/zoom.js', async: true },
{ src: 'plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>