Trading in the Cloud

Trading in the Cloud

Blog

Trading in the Cloud

by Ben Newton
about the authors
Trading in the Cloud

Share this Blog with a friend.

About the Author

dark mode

The great cloud migration is well underway, but only the most forward-thinking financial institutions have committed everything to the cloud. In this blog we will address one common exception, matching trades in the cloud; as once that is unlocked, trading will be freed too.

Many financial organisations already have significant cloud commitments, take Deutsche Bank who just committed $15B/£11B to cloud migration, with $1.7B/£1.3B over 10y to GCP directly. Wells Fargo are targeting 60% of consumer facing workloads to cloud by 2025, and 100% by 2031. Though perhaps most importantly, NASDAQ has just partnered with AWS to become the first ever private AWS Local Zone, making trading on NASDAQ easier than ever before. NASDAQ may have stormed ahead, but the world has a further 60 major stock exchanges, and hundreds of thousands of participants yet to make the leap.

The ambition of many players is to migrate everything possible, but could this include the most latency sensitive of workloads, trading and matching? Once the exchanges are there, trading will migrate naturally, so let us focus on the key question, how can financial exchanges setup cloud native matching?

First, let’s look at the benefit trading in the cloud could bring, of course, all the usual cloud truisms apply:

  • Flexibility & scalability
  • Agility
  • Services / *aaS

On top of these well-known arguments is the unique sales proposal of trading in the cloud: lowering participation costs.

Traditionally many participants set up dedicated hardware for an exchange, possibly even in Colo, high-cost high effort activities. The amount of time and effort it takes to set up dedicated hardware is easy to underestimate, and the cost of decommissioning that hardware is even greater. A minimum setup estimate of 3 months is reasonable although it recently took the writer of this post more than 9 months for a large investment bank. Here are some example costs:

Cost Estimates – Traditional Participant Onboarding

Staff include an Engineer, a Network Specialist and a Project Manager for a proportion of 9 months each. Let’s use the lowest amount possible to estimate with 9 person months, costing $200k/£150k

Hardware at a minimum is suggested to be two servers estimated at $13k/£10k.

Minimum participant setup cost = $210/£160k

With these costs in mind, what could a cloud first exchange do to ease onboarding?  The exchange can provide cloud landing zones and additional resources:

  • Cloud native SDK
  • Terraform for typical infrastructure and connectivity for: VM’s, K8s etc.
  • Connectivity monitoring CSP native
  • Integration with cloud PaaS offerings
    • e.g. market data in native databases and messaging tools (e.g. BigQuery, Pub/Sub, Redshift, SQS…)

With these assets the participant could be up and running in few weeks, saving months of effort and opening the door up to massively more participants.

Every exchange would hope that others build on top of their platform too, this is exactly what’s happened with the crypto world where  platforms like FrequTrade and CCXT connect to multiple markets and have tens of thousands of users each. If the LSE-G currently has around 300 participants, is there an opportunity for a 100x growth?

Of course, crypto is effectively an unregulated market, so how can this be achieved for a traditional exchange?

The Matching Recipe

The following recipe outlines the four general stages required to setup matching of regulated instruments to the cloud:

  1. Create a time perimeter on the Points of Presence (PoPs) by accurately time synchronising them. Then set an auction window size, start at double a bad time offset, perhaps 2×20µs.
  2. Reorder + Randomise matching within that window, reorder after the PoP.
  3. Confirmation after window closes: with 100µs of time accuracy hard to achieve, a 1ms delay may be valuable for MiFID relevant markets.
  4. Kill Switch for abnormal or disorderly behaviour, like a bad time offset.

Recipe Details:

  1. Time Perimeter

Multiple PoPs are used to scale and support fair market access, e.g. SEC 1934. These need to be time synced as tightly as possible. See Figure 1 (Simplified participant and Exchange components( for the simplified layout considered. Cloud vendor’s networks and time infrastructure are solid, so even with the free Chrony time daemon, testing shows this could be just tens of micro seconds; alternatively, ClockWork could improve accuracies 1,000 times creating window sizes of tens of nano seconds. No matter what tool used, testing needs to identify the threshold for a bad offset, one that should pause trading if it’s breached. As the bad offset could be positive or negative, it should be multiplied by two to create a key attribute – the auction window.

Participant_Exchange_Trading_Graphic

Figure 1 Simplified participant and Exchange components

  1. Reorder and Randomise matching

With any distributed system there is always a synchronisation error, no matter how small that is, reordering and randomisation will solve it. This approach is thoroughly documented by organisations such as Deustche Boerse as a variant on a continuous auction.

Traditional matching technology is still relevant, be that matching in a single host of a cluster.

The continuous auction scenario can prioritise larger orders, and still be randomised at its core.

  1. Confirmation

Confirmations need to be sent after the matching window. For MiFID regulated markets there is value in reporting after 1 millisecond. One reason for this is that testing shows a major Cloud vendor has difficulty keeping clock accuracy under 100µs, though both AWS and GCP do achieve this level, the additional hurdles associated with a sub 1ms end to end time makes the 1ms delay a sensible place to begin.

A higher priced solution could be offered for faster confirmations. Once the platforms compliance has been validated this is an achievable goal.

  1. Kill Switch

Disorderly market requirements lead fast to discussions of a multi-purpose kill switch. As maintaining clock synchronisation is of paramount importance, a key scenario to prepare for is virtual machine migration, though this scenario could be generalised to any clock sync violation.

Virtual Machine migration is often a cause of sudden clock sync deteriorations, and can be regularly observed. Though it is called VM migration, it could be generalised to unlock serverless platforms too.

In this scenario an exchange’s PoP migrates and it’s offset spikes to breach the bad offset threshold defined in step 1. Now the bad offset is breached, that PoP’s orders should be rejected, and as offset is a lagging indicator a safe bet would be to rejected that PoPs orders for the entire window. To lower the cost of rejection, it may be desired to automatically resubmit orders in the next window. See an example scenario below:

Scenario:

Testing shows that an achievable time window would be 70µs.

This is because testing on stock Redhat OS with no specialist technology, shows that setting the bad offset at 35µs would lead to an SLA of 99.8% uptime and result in 6 lost windows a day.

If a smaller windows size, or less lost windows is desired, then a higher resolution time daemon can be used, for instance ClockWork or Timekeeper. ClockWork deserves special mention as the record holder clock sync accuracy, offering window sizes 1,000 times smaller.

Fairness and Placement

With workload placement being random it’s not obvious how workload location will impact PoP latency. Luckily the cloud vendors are beginning to offer a solution in the form of workload placement.

AWS provides the ability to create placement groups, either to allow the PoPs to be spread widely, or to enable workloads to be setup in proximity to the PoPs.

Summary

While crypto trading in the cloud has begun, the traditional regulated financial exchanges and participants have yet to start their journey. With developments in time accuracy, workload placement and armed with the recipe above, it is only a matter of time before the last bastion of traditional data centers start journeying to the cloud. Above that, the advantages for trading in the cloud could accelerate growth beyond anything we’ve seen before.

Related Insights

code
See technical discoveries and coding insights from our developers.

Find out more about life at Citihub

about us