Skip to content
This repository has been archived by the owner on Apr 9, 2024. It is now read-only.

Supporting High Availability on AMF #31

Open
ajay-kashyap opened this issue Feb 25, 2022 · 0 comments
Open

Supporting High Availability on AMF #31

ajay-kashyap opened this issue Feb 25, 2022 · 0 comments

Comments

@ajay-kashyap
Copy link

ajay-kashyap commented Feb 25, 2022

Proposal: Supporting High Availability on AMF

Elevator Pitch

High availability is a very essential element in a core network as it offers seamless services in case of failures.
In the current proposal we are planning to achieve it from a load balancer.

When active node fails, load balancer will detect failure and it will switch the context to standby(Standby will undergo a transition to active) and start handling the requests.

The current HA proposal is considered for AMF node only.

Total ask

Support of HA feature on to Magma Architecture will be delivered in 3 milestones.

Contact Information

Ajay Kashyap ([email protected])

Project Details

Prerequisite: Considering the stateless feature into account, we assume all the data structures, counters and configurations are stored in Redis db.

Current Architecture

 <Does not have HA support> 

Current Architecture

Proposed Architecture

  In the Current proposal, Plan is to introduce HA functionality for AMF using Redis sentinal along with a open source load balancer.

Proposed Architecture

Sequence of operations

 * Load balancer to send requests for active node by continuously monitoring heartbeat.
 * All the data in active to be synced continuously  with standby with Redis sentinal.
 * When there is heartbeat failure LB to send signal to Standby to undergo transition from active to standby.

Proposed approach

* An open source load balancer needs to be identified which monitors the heartbeat of magma AMF. 

* If heart beat is not received then load balancer needs to send a signal to standby for a transition from standby to active state. 

* Redis DB is used to store the session and policy details, It also stores the in memory data structures of AMF & this information is confined to a node. 

* Approach would be to use Redis sential for replicating  the data present in active to standby (Master - Slave) 

* When active node goes down, all its resources has to be cleared and the process needs to be gracefully shut down. 

* Load balancer will assign a floating IP to the current active node . 

Feature Roadmap

Feature will be delivered in 3 milestones. Each milestone will have the following 5 process gates.

  • Design
  • Development & Unit Testing
  • code review
  • Integration testing with multinode
  • Resolve integration issues and regression issues

MileStone 1

1)Identify a HA load balancer for magma product.

2)Load balancer (LB)with have the active and standby node configurations understanding and testing.

3)Existing AMF needs modifications to respond for the heartbeat messages sent from LB.

4)Integration testing of AMF and LB for heartbeat and heartbeat Ack.

MileStone 2

1)Configurations of Redis sentinal to be identified for active to standby replication.

2)Multi node setup creation.

3)Testing to be done at standby for redis db replication.

MileStone 3

1)Active to Standby transition testing.

2)After active to standby transition, resources from standby needs to be gracefully released and tested.

3)After active to standby transition all new calls get diverted here and LB assigns a floating IP and current calls will be intact, this needs to be tested.

4)Transitions from active to standby and vice versa to be rigorously tested.

Test plan

  • Testing of HA feature has to be done with multi node set up.
  • Initial UT can be planned by testing with load balancer.
  • Integration testing with multi-node setups.

Milestone Deliverable Summary
MS1 Identify Load Balancer,LB to send heartbeat for SCTPd & AMF, Modify SCTPd & AMF to respond to heart beat , Integration testing
MS2 Configure Redis Sentinal for master-slave, multi-node setup creation, Redis data replication in standby and its testing
MS3 Active-Standby transition testing,LB assigns floating IP testing,All calls to be intact and new calls directed to current active testing

References

https://redis.io/topics/sentinel

https://www.haproxy.org/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant