-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fdb_flush.lua executes so long leading to REDIS BUSY #1397
base: master
Are you sure you want to change the base?
fdb_flush.lua executes so long leading to REDIS BUSY #1397
Conversation
Why I did it mac learning, configure TC1 to send the traffic of smac changes (number 217600). Log error: No More Resources, Orchagent hang Nov 29 17:50:47.706137 NV2 ERR syncd#SDK: [FDB_UC.ERR] Polling enabled on error Nov 29 17:50:47.806214 NV2 ERR syncd#SDK: [FDB_UC.ERR] Failed adding entries to RM (No More Resources) Nov 29 17:50:47.806214 NV2 ERR syncd#SDK: [FDB_UC.ERR] Process polled data failed on SWID - 0, status - No More Resources Nov 29 17:50:47.806214 NV2 ERR syncd#SDK: [FDB_UC.ERR] Polling enabled on error Nov 29 17:50:47.906356 NV2 ERR syncd#SDK: [FDB_UC.ERR] Failed adding entries to RM (No More Resources) What I did Instead of using Lua script to flush, use loop deletion in the code.
Hi @yxieca , @vaibhavhd , could you please kindly review these? #1397 #1399 #1400 |
@qiluo-msft to help take an initial assessment. |
} | ||
} | ||
} | ||
else if (vals.size() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
portStr.c_str(), | ||
std::to_string(flush_static).c_str()); | ||
|
||
swss::RedisReply r(m_dbAsic.get(), command); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand the root reason of "REDIS BUSY". If m_dbAsic is using redis pipeline, you could explicitly call its flush function to improve the redis responsibilities.
*bridgePortIdFromDb == portStr))) | ||
{ | ||
m_dbAsic->del(it); | ||
countFlushed++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S riot is added in Lua since it's faster to flush fdb than doing it 1by1, are you sure it's flush leading to Redis busy? How many entries fdb do you have in database ?
How come Lua is slower than deleting 1by one using hiredis ? Do you have performance stats for this ? |
@@ -893,17 +893,46 @@ void RedisClient::processFlushEvent( | |||
SWSS_LOG_THROW("unknown fdb flush entry type: %d", type); | |||
} | |||
|
|||
for (int flush_static: vals) | |||
// If has a lot of macs(example:217600) and use lua scripts, will cause REDIS BUSY. | |||
// Change to this without atomicity operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic here would be desired sine it could lead to some wired issues while new Mac is learned during deletion
Why I did it
mac learning, configure TC1 to send the traffic of smac changes (number 217600). Log error: No More Resources, Orchagent hang
Nov 29 17:50:47.706137 NV2 ERR syncd#SDK: [FDB_UC.ERR] Polling enabled on error
Nov 29 17:50:47.806214 NV2 ERR syncd#SDK: [FDB_UC.ERR] Failed adding entries to RM (No More Resources)
Nov 29 17:50:47.806214 NV2 ERR syncd#SDK: [FDB_UC.ERR] Process polled data failed on SWID - 0, status - No More Resources
Nov 29 17:50:47.806214 NV2 ERR syncd#SDK: [FDB_UC.ERR] Polling enabled on error
Nov 29 17:50:47.906356 NV2 ERR syncd#SDK: [FDB_UC.ERR] Failed adding entries to RM (No More Resources)
What I did
Instead of using Lua script to flush, use loop deletion in the code.