Service Disruption - ID3Global platforms
Incident Report for GBG
Postmortem

Incident Affecting ID3global and URU Products

P1 Incident Start 05/11/2019 23:02

P1 Incident End 06/11/2019 09:30

On Tuesday 5th of November between 23:02 – 09:30 on the 6th of November you may have experienced an issue with transactions reaching our ID3global/URU platform.

The issue was caused by our one load balancers going into an error state, which caused some transactions to not reach ID3global. While the load balancer went into an error state and it did not alert the GBG support teams as it was serving the majority of customer transactions.

At 08:30am our application support team completed their daily log reviews and identified a percentage of transactions were affected. Their investigation into this identified that there was an issue with one of the load balancers on the production site. The issue meant that SOAP 1.2 transactions were not being processed. Upon determining the issue we failed over at 09:30 to our secondary site which moved customers away from the issue whilst the investigation continued.

Our remediation actions are as follows:

We currently have monitoring on our Load Balancers, however this did not identify the issue. We have identified, developed and enhanced our monitoring - Done – deployed to the platform

Vendor support ticket progressing with some remediation actions completed - Ongoing – some changes have been made but aiming for completion on the 31/01/2020.

Posted Dec 13, 2019 - 15:54 GMT

Resolved
We’re pleased to inform you that the service disruption we reported earlier is now resolved. Thank you for your patience whilst we restored service.

We identified an intermittent issue on our production site that was causing some customers to receive the below error message;

“The message could not be processed. This is most likely because the action 'http://www.id3global.com/ID3gWS/2013/04/IGlobalAuthenticate/AuthenticateSP' is incorrect or because the message contains an invalid or expired security context token or because there is a mismatch between bindings. The security context token would be invalid if the service aborted the channel due to inactivity. To prevent the service from aborting idle sessions prematurely increase the Receive timeout on the service endpoint's binding”.

We have failed our customers over to our secondary site where the issue is not present. We sorry for the inconvenienced caused.
Posted Nov 06, 2019 - 10:07 GMT
Update
We are continuing to work on a fix for this issue.
Posted Nov 06, 2019 - 09:00 GMT
Identified
Initial investigations have determined that the issue is not affecting all customers.

Users may be experiencing intermittent connectivity issues preventing access to the portal and slowness or timeouts when processing transactions.

We’re working to restore our service to normal operating levels as soon as possible.

We will update you at 9:00 GMT or as soon as the issue is resolved. Thank you for your patience whilst we restore our service.
Posted Nov 06, 2019 - 08:22 GMT
Investigating
We are investigating an issue affecting our service. We apologise for the impact this may be having on your business. Our Incident Team is working to identify the root cause and implement a solution as a priority. We hope to advise you within a few minutes that the issue has been resolved, should this not be the case we will provide a further update and estimated resolution time.
Posted Nov 06, 2019 - 02:46 GMT
This incident affected: GBG ID3global (ID3global Portal, ID3global Service).