On the 2nd of April 2020 between 16:25 and 17:25, GBG experienced service impact to its products of ID3global/URU, IDscan, IdM and Identity Solution. This was caused by an issue in internet routing by one of our service providers (Imperva) and affected routing of traffic to GBG products mainly affecting UK network routes. GBG uses Imperva as a protection layer to its products, providing Web Application Firewall and DDOS projection.
On April 2, 2020, an account configuration update was published by Imperva within their own environment, through a system administrator-only component of the Imperva Cloud Management console.
Within their systems, this configuration introduced invalid field values, which made it through operational safeguards, including configuration validation. Consequently, the invalid configuration introduced an intermittent degradation of service, affecting proxies in a subset of Imperva Point of Presence (PoP) data centres. From their investigation, the subset is defined as POPs that received traffic for specific sites that loaded the invalid configuration. This meant that GBG UK traffic routes within the Imperva platform were affected.
For validation, Imperva run all configurations through a sandbox environment where they examine side effects, including errors, crashes, and high load. A sandbox is an error checking layer in their production environment that does not process customer traffic. This configuration did not exhibit anomalous behaviour in their sandbox environment and was propagated to their production proxies.
The configuration produced a side effect of CPU exhaustion in a subset (as defined above) of the production proxies. The proxies were automatically restarted by a watchdog process when the CPUs reached a specific threshold, causing the outage to occur.
Operational service was restored within ~1 hour by removing the invalid configuration and temporarily disabling configuration propagation for the duration of the incident.
Software patches were released to Imperva Management Console and proxy servers within ~5.5 hours to remediate the defect.