Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeSWITCH angers Poly phones due to NOTIFY CSeq rollover at midnight January 1 (year boundary) #2688

Open
rlaager opened this issue Jan 2, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@rlaager
Copy link

rlaager commented Jan 2, 2025

Describe the bug
Exactly at 00:00 on January 1, FreeSWITCH rolls over the CSeq values used in NOTIFY packets. This makes our Poly phones mad. When they get their first NOTIFY with a lower CSeq, they reply with "500 Internal Server Error" and start spamming the server with SUBSCRIBEs. The resulting server load is easily high enough to impact service. The phones are fine after a reboot.

In looking at the RFCs, the phones seem to be in the wrong for expecting an ever increasing CSeq. But I'm trying to see if anything better can be done within real world limitations, including obviously the fact that we can't fix the phones' firmware.

To Reproduce
Steps to reproduce the behavior:

  1. Use presence with Poly phones.
  2. Wait until 00:00 January 1.

Expected behavior
This is the tricky part. Obviously I don't want the Poly phones to crash. But it's unclear to me what can be done better on the FreeSWITCH side.

There were some discussions about this in the past:
https://lists.freeswitch.org/pipermail/freeswitch-dev/2018-August/007873.html
https://freeswitch-users.freeswitch.narkive.com/TYeYlPf7/notify-s-cseq-too-high

I believe the whole reason that FreeSWITCH is doing this presence CSeq thing, including the presence epoch, is to avoid this problem when FreeSWITCH is restarted (which, if nothing special was done, would otherwise cause the CSeq to roll over).

In that first linked message, @anthmFS said, "Many phones break until you reset the whole registration if they are unhappy with the cseq." That seems to suggest that the phones might accept a CSeq rollover when they re-REGISTER. If that's the case (which is a big if), would it be reasonably possible to calculate the presence CSeq based on the timestamp of the last registration? If phones really do accept the rollover when REGISTERing, this would be better. If they do not, then this would make the problem massively worse, as the phones would freak out on every REGISTER rather than just once a year.

Package version or git hash
1.10.11~release~25~f24064f7c9~buster-1~buster+1 (i.e. f24064f) but the issue is obviously still present in the code in git master:

#define SOFIA_PRESENCE_COLLISION_DELTA 50

@rlaager rlaager added the bug Something isn't working label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant