SERVICE RESTORED: Any services accessed via services.postcodeanywhere.co.uk or api.addressey.com would have experienced increased error rates and latency.
Incident Report for GBG
Postmortem

The Capture+ product experienced an outage starting 20:01 GMT on the 27th November 2020. This outage was triggered by a sequence of cascading events that impacted both services.postcodeanywhere.com and api.addressy.com.

What happened?

  • The US Datacentre experienced a degradation of service caused by the system hitting an internal file limitation.
  • The failed requests caused a large proportion of the US traffic to switch to the UK datacentres (intended failover process)
  • The UK Firewall capacity was 40% less than the specification and testing had indicated causing the firewall to unexpectedly saturate and the UK Datacentres to experience a degradation of service

What was done to fix it?

  • In the UK Datacentre additional manual load balancing was performed to stabilise the routing of traffic between the two UK Datacentres
  • For the US Datacentre, fresh components, without the cached files that caused the internal limit to be hit, were created and deployed.

What will prevent this happening again?

  • Functionality added to prevent the internal file limitation being reached in the future – Complete.
  • Firewall capacity to be doubled (31-Dec-20)

During this incident the Capture service remained available, however it had degraded performance with increased latency in response times to customers.

Posted Dec 01, 2020 - 11:42 GMT

Resolved
Degraded performance on services.postcodeanywhere.co.uk from 20:00 - 22:00 UK time

Degraded performance on api.addressey.com from 20:00 - 02:15 UK time
Posted Nov 28, 2020 - 02:47 GMT