Using feedback control techniques to build better systems
By Simonluca Landi, Onebip Product Quality and Innovation Senior Manager
One of the most attractive features of the Onebip platform is the ability to support subscription services, which can be explained with a user story like this:
“As a merchant, I want to sell a service to my customers with recurring billing, so that I will have a steady flow of revenue”
It’s easy to implement – all you need is a “scheduling engine” to push the recurring billing system every week, right? Well, actually wrong. Continue reading this post to find out why this “naive” solution just can’t work, and what I would suggest is the best solution.
The very first implementation of the recurring billing system on the Onebip platform followed the easy and “naive” solution, as everyone would have done.
But, since the Onebip platform is built on well defined “application contexts”, we’ve been able to build an additional and separated block – a “subscription engine” that implements the required scheduling mechanism to push the billing every week. This is how the flow works today:
– Every minute a “job” selects the users that should receive their weekly billing, and sends a billing request to our “core platform”, where transactions are recorded, something like “Hey core platform, it’s time to renew the subscription for this user!”
– The core platform puts the billing requests in an internal queue, and replies to the subscription engine “Ok, I’ll do my best and let you know”
– The core platform sends a billing request to a “connectivity layer”: “hey, connectivity layer, I want to bill this guy, please do it and let me know”
– The connectivity layer implements the different carriers protocols, executes the required actions to bill the user and eventually returns the outcome of the billing request to the previous modules in the chain (core platform and subscription engine)
As you can see, we adopt the mantra “everything is asynchronous”, which helps the system to scale up well, but…
The “Pull, don’t push” principle
At Onebip we believe in agile methodologies, and we apply the Kanban method in both product and software development. The essence of Kanban is that you should limit the WIP (Work-In-Progress) in each step of your process; this prevents overproduction and reveals the bottlenecks, so you can address them.
So, where are the bottlenecks here? And what have we learnt?
It’s easy to understand that by using this flow we are just moving a large amount of requests from one queue to the following one in the chain i.e. from the subscription engine to core platform and then to connectivity, but this last layer (connectivity) needs to store the billing requests in a local queue and obey the “maximum rate” imposed by the telco carriers. It’s a waste of time and resources to push all such billing requests through the chain, when we know that the carrier can’t handle them, similar to an “overproduction” in the classical mass-production industry.
Although producing software is a creative activity and therefore different to mass-producing cars, the underlying mechanism for managing the production line can still be applied. For us overproduction means that we need more resources (more powerful servers), and so more money to spend for the underlying infrastructure – all of this just to store billing requests in a local “warehouse” that the whole system can’t handle.
To be more efficient, we need to limit the WIP at the connectivity layer, and propagate this limit backward towards the subscription engine, as a Just-In-Time (JIT) production line tries to minimize the local warehouse of spare parts.
Feedback control to the rescue
When the words “propagate backward” are heard by an engineer, the old memories from university, about the “feedback control” theory, return to mind… “Transfer function”, “Laplace transformations”, “Closed loop”…
So, why not apply the same “control theory” used by your car cruise control to maintain vehicle speed at a constant desired speed provided by the driver to the Onebip platform?
In the cruise control system a sensor monitors the system output (the car’s speed) and feeds the data to a controller which adjusts the control (the throttle position) as necessary to maintain the desired system output (match the car’s speed to the reference speed). When the car goes uphill, the decrease in speed is measured, and the throttle position changed to increase engine power, speeding up the vehicle. When the car goes downhill, the throttle position is adjusted again, to slow down the car and maintain the desired car speed.
Feedback from measuring the car’s speed has allowed the controller to dynamically compensate for changes to the car’s speed.
In the case of Onebip, we want to maintain a constant “speed” at the connectivity layer, and the “throttle” is our subscription engine that can push more or less renewal requests through the system chain down to the connectivity layer. What we are building is a “Controller” that is able to adjust the rate of billing requests flowing into the system; we measure the current rate of requests at the interface with the carrier (the “sensor”) and let the system “pull” more or less requests from the queue of users “that should be renewing their subscription today”
Isn’t it nice to see how production principles from the “old economy”, engineering theories and new development methodologies can be mixed together to solve a complex problem?