Rail Delivery Group - LSM Risk Removal
Nasstar secures RDG's Live Sales Management (LSM) platform
The Rail Delivery Group (RDG) has a membership that includes all UK passenger and freight rail companies, as well as Network Rail and HS2. The organisation provides important services including ticketing, travel information, and refund facilities for passengers and staff on behalf of member companies.
RDG’s Live Sales Management (LSM) platform is hosted on AWS and managed by Nasstar. LSM underpins National Rail’s Ticket on Departure offering which allows customers to purchase a ticket online or through a 3rd party in advance of travel, and then collect the ticket at their convenience.
One of the core parts of the LSM service is a reconciliation process that compares ticket sales with ticket issues. This process provides key information to RDG and its retailers, allowing them to calculate profit and loss from ticket sales.
Amazon Web Services
As part of a regular service review with RDG, our team identified several risks within the LSM reconciliation process. The Reconciliation Risk Removal project sought to mitigate or eliminate these concerns to ensure the platform remained secure and operational. The recognised risks included:
- An outdated version of the Infobright database which underpins reconciliation reporting. This was no longer supported by AWS.
- The high number of AWS access keys within the reconciliation process configuration. This increased the possibility of credentials being leaked via inclusion in code repositories or service compromise.
- Interoperability with the AWS SDK version deployed. If the SDK version is not up-to-date, changes made by AWS could render it incompatible with Amazon’s services.
- The Secure File Transfer Protocol (SFTP) server used by the reconciliation process required both an operating system (O/S) patch and an instance-type upgrade.
- The EMR Hadoop/HIVE version used in the reconciliation to process large data sets had been assigned a “legacy” status by AWS.
Each had the potential to significantly impact the LSM’s reconciliation process, either by causing the process to fail outright or by preventing reports from being produced in a timely manner.
The consequences would severely compromise the business operations of ticket retailers and train operating companies across the UK.
These changes not only removed the risks our team had identified, but the new solution also improved security and performance while reducing costs.
The migration of the Infobright-based reporting database from a hosted EC2 instance to an AWS Aurora MySQL serverless instance removed the instance, OS, and database risks. This move also reduced expenditure since serverless models only generate costs while in use.
The use of a serverless Aurora MySQL instance has eliminated the need for a complex database disaster recovery process. In the unlikely event of a full loss of service, platform improvements have reduced Return-to-Operation time from more than 12 hours to 30 minutes.
The use of instance role permissions eliminates the risk of credentials becoming compromised. Additionally, the isolation of the SFTP service improves the system’s security posture as a whole and allows additional improvements to the SFTP service if necessary.
The removal of the EC2-hosted Infobright reporting database has reduced licensing and support costs, as has the transition to a managed event-driven serverless database. The version uplift has also resulted in a significant compute cost reduction associated with EMR processing.
Thanks to our team’s intervention, the LSM EMR process runtime has been reduced by approximately 2 hours while report processing has decreased by 3-4 hours. Overall, the total end-to-end reconciliation process has been reduced from 8-9 hours to 2-3 hours.
Our team deployed Amazon CloudWatch to monitor and observe the uplifted instances involved in the reconciliation process. The platform provides performance metrics and enables a high degree of end-to-end infrastructure observability, real-time alerting, and feedback. The new code structure also allows for improved auditing and tracking if required.
Our team’s detailed analysis highlighted that each problem could be solved by re-platforming RDG’s existing code. This would change both the underlying AWS infrastructure and how the code was deployed.
In a process that avoided outages and downtime, Nasstar designed and managed a smooth transition to a new solution. We took the following steps to eliminate risk from the LSM reconciliation process:
Infobright database: Migrated all data within the Infobright reporting database to an AWS Aurora MySQL serverless instance.
AWS access keys: Migrated all LSM service components away from AWS access key usage in preference of instance-based permissions.
AWS SDK version: Repackaged the core reconciliation process code so that all AWS service interactions were using the latest AWS SDK versions.
SFTP Server O/S: Increased the isolation of the SFTP service within LSM, unlocking the ability to implement a full SFTP service rebuild (the service rebuild was not part of the scope of the solution as this work was already underway).
EMR/HIVE version uplift: Updated the EMR image from the legacy version in use. This included migrating away from some EMR roles that AWS has declared as deprecated.
RDG and Nasstar have engaged regularly for several years now, providing solutions and services that continue to support and improve the LSM processes, the purpose and reasoning behind this activity was to remove risk, improve flexibility, enhance efficiency, cost, performance, and future-proof the platform for RDG LSM. This type of innovative thinking and expert advice is why we continue to trust Nasstar with our cloud estate. The reconciliation process is central to the service we provide to our customers and the migration project with Nasstar has not only reduced our costs but given us more visibility and potential for further options moving forward into the future, together with increasing the TOC/TIS communities efficiencies and reporting capabilities.