-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:big key metricx #3000
base: unstable
Are you sure you want to change the base?
feat:big key metricx #3000
Conversation
WalkthroughThis pull request introduces a comprehensive "Big Key" monitoring and logging mechanism across the Pika server's storage components. The changes span multiple files and add functionality to track and report on large keys across different data structures like hashes, lists, sets, and sorted sets. A new method Changes
Sequence DiagramsequenceDiagram
participant Client
participant PikaServer
participant Redis
participant Database
Client->>PikaServer: Perform key operation
PikaServer->>Redis: Process key
Redis->>Redis: Check key size
alt Key is "big"
Redis->>Redis: Log key access
Redis->>Database: Perform operation
else Key is normal
Redis->>Database: Perform operation
end
PikaServer->>Client: Return result
Possibly related PRs
Suggested Labels
Suggested Reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (2)
src/storage/src/redis_zsets.cc (2)
Line range hint
87-1959
: Consider deduplication for frequent key access.The
CheckBigKeyAndLog
is consistently added across all zset operations, but this could lead to duplicate logging of the same key in rapid succession. Consider:
- Adding a cooldown period between logs for the same key
- Implementing a logging rate limiter
- Aggregating logs for the same key over a time window
Line range hint
87-1959
: Add error handling for monitoring.The monitoring implementation should handle failures gracefully:
- Add error handling for logging failures
- Implement timeouts for monitoring operations
- Add a circuit breaker to disable monitoring if it causes issues
🧹 Nitpick comments (9)
src/storage/src/redis.h (2)
373-376
: Optimize GetBigKeyStatistics performanceThe current implementation creates a full copy of the map on each call, which could be expensive for large maps. Consider adding filtering and pagination.
- std::unordered_map<std::string, int> GetBigKeyStatistics() { + std::unordered_map<std::string, int> GetBigKeyStatistics( + size_t limit = 100, + uint32_t min_count = 1) { std::lock_guard<std::mutex> lock(big_key_access_mutex_); - return big_key_access_count_; + std::unordered_map<std::string, int> filtered_stats; + + for (const auto& pair : big_key_access_count_) { + if (pair.second >= min_count && filtered_stats.size() < limit) { + filtered_stats.insert(pair); + } + } + return filtered_stats; }
558-559
: Add documentation for big key tracking membersAdd comments to explain the purpose and lifecycle of these member variables.
+ // Tracks the number of times each key exceeding the size threshold was accessed std::unordered_map<std::string, int> big_key_access_count_; + // Protects concurrent access to big_key_access_count_ std::mutex big_key_access_mutex_;include/pika_server.h (1)
313-313
: Add documentation for GetBigKeyStatisticsAdd a descriptive comment explaining the method's purpose, return value format, and usage.
+ /** + * Returns statistics about keys that exceed the configured size threshold. + * @return Map of key names to their access counts + * @note This is an expensive operation as it creates a copy of the statistics + */ std::unordered_map<std::string, int> GetBigKeyStatistics();src/storage/src/redis_sets.cc (1)
93-93
: LGTM: Monitoring large sets before operationsThe addition of
CheckBigKeyAndLog
helps track large sets before performing add operations.Consider adding error handling for the
CheckBigKeyAndLog
call to ensure monitoring failures don't impact operations.src/storage/src/redis_zsets.cc (2)
87-87
: LGTM! Consider batching the logging calls.The placement of
CheckBigKeyAndLog
is correct - after meta validation but before operations. However, since this monitoring is added to many methods, consider batching the log writes to reduce I/O overhead.
Line range hint
87-1959
: Consider performance impact of monitoring.The universal addition of monitoring could impact performance, especially for high-frequency operations. Consider:
- Sampling instead of monitoring every operation
- Making the monitoring frequency configurable
- Adding metrics to track the monitoring overhead
src/pika_admin.cc (1)
1094-1098
: LGTM! Consider adding total count for better monitoring.The big key statistics section is well integrated into InfoStats. Consider adding a total count of big keys at the start of the section for easier monitoring.
tmp_stream << "# Big Key Statistics\r\n"; + tmp_stream << "total_big_keys:" << big_key_stats.size() << "\r\n"; for (const auto& entry : big_key_stats) { tmp_stream << "key:" << entry.first << ", access_count:" << entry.second << "\r\n"; }
src/storage/src/redis_lists.cc (1)
82-82
: Consider the performance impact of CheckBigKeyAndLog calls.The CheckBigKeyAndLog function is called on every list operation, including reads. For frequently accessed large lists, this could introduce additional overhead.
Consider optimizing by:
- Adding sampling or throttling for read operations
- Caching the big key status for a short duration
- Making the check asynchronous for read operations
Also applies to: 127-127, 163-163, 204-204, 244-244, 280-280, 339-339, 396-396, 441-441, 502-502, 572-572, 705-705, 750-750, 824-824, 1133-1133, 1174-1174, 1212-1212, 1252-1252, 1292-1292
src/storage/src/redis_hashes.cc (1)
Line range hint
82-82
: Consider architectural improvements for big key monitoring.The current implementation adds monitoring consistently across data structures, but could benefit from the following architectural improvements:
Configuration Management:
- Add configurable thresholds for what constitutes a "big key"
- Allow different thresholds for different data structures
- Enable/disable monitoring per data structure
Performance Optimization:
- Implement sampling for high-traffic keys
- Add throttling mechanisms to prevent monitoring overhead
- Consider async monitoring for read operations
Monitoring Strategy:
- Add gradual levels of monitoring (warn, critical, etc.)
- Implement rate limiting for logging
- Consider aggregating stats before logging
Would you like me to propose a detailed design for any of these improvements?
Also applies to: 127-127, 163-163, 204-204, 244-244, 280-280, 339-339, 396-396, 441-441, 502-502, 572-572, 705-705, 750-750, 824-824, 1133-1133, 1174-1174, 1212-1212, 1252-1252, 1292-1292, 98-98, 163-163, 204-204, 248-248, 306-306, 400-400, 485-485, 529-529, 569-569, 634-634, 710-710, 786-786, 850-850, 914-914, 992-992, 1063-1063, 1135-1135, 1198-1198, 1239-1239, 1277-1277, 1317-1317, 1357-1357
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
include/pika_server.h
(1 hunks)src/pika_admin.cc
(1 hunks)src/pika_server.cc
(1 hunks)src/storage/src/redis.h
(2 hunks)src/storage/src/redis_hashes.cc
(22 hunks)src/storage/src/redis_lists.cc
(17 hunks)src/storage/src/redis_sets.cc
(10 hunks)src/storage/src/redis_zsets.cc
(25 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build_on_ubuntu
- GitHub Check: build_on_macos
- GitHub Check: build_on_centos
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
src/storage/src/redis_sets.cc (6)
173-173
: LGTM: Consistent monitoring patternThe
CheckBigKeyAndLog
call maintains consistency with other set operations for monitoring large keys.
939-939
: LGTM: Monitoring before pop operationsAppropriate placement of
CheckBigKeyAndLog
before performing pop operations.
1057-1057
: LGTM: Monitoring before random member accessThe
CheckBigKeyAndLog
call helps track large sets before random member operations.
1132-1132
: LGTM: Monitoring before remove operationsConsistent monitoring pattern maintained for remove operations.
1362-1362
: LGTM: Monitoring before scan operationsThe
CheckBigKeyAndLog
call helps track large sets during scan operations.
1513-1513
: LGTM: Monitoring before TTL operationsConsistent monitoring pattern maintained across expiry-related operations (Expireat, Persist, TTL).
Also applies to: 1553-1553, 1593-1593
void CheckBigKeyAndLog(const std::string& key, uint64_t size) { | ||
static const uint64_t kBigKeyThreshold = 10000; | ||
if (size > kBigKeyThreshold) { | ||
std::lock_guard<std::mutex> lock(big_key_access_mutex_); | ||
big_key_access_count_[key]++; | ||
std::cerr << "[BIGKEY DETECTED] Key: " << key | ||
<< ", Size: " << size | ||
<< ", Access Count: " << big_key_access_count_[key] << std::endl; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider improving the big key monitoring implementation
The current implementation has several areas that could be enhanced:
- The threshold should be configurable rather than hardcoded
- Logging to stderr may not be suitable for production environments
- Consider adding rate limiting to prevent excessive logging
- Add a cleanup mechanism for
big_key_access_count_
to prevent unbounded memory growth
Consider these improvements:
+ // Configurable threshold with a reasonable default
+ static const uint64_t kDefaultBigKeyThreshold = 10000;
+ uint64_t big_key_threshold_;
+
void CheckBigKeyAndLog(const std::string& key, uint64_t size) {
- static const uint64_t kBigKeyThreshold = 10000;
- if (size > kBigKeyThreshold) {
+ if (size > big_key_threshold_) {
std::lock_guard<std::mutex> lock(big_key_access_mutex_);
+
+ // Rate limit logging using a token bucket
+ if (ShouldLog()) {
big_key_access_count_[key]++;
- std::cerr << "[BIGKEY DETECTED] Key: " << key
- << ", Size: " << size
- << ", Access Count: " << big_key_access_count_[key] << std::endl;
+ LOG(WARNING) << "[BIGKEY DETECTED] Key: " << key
+ << ", Size: " << size
+ << ", Access Count: " << big_key_access_count_[key];
+ }
+
+ // Cleanup old entries periodically
+ MaybeCleanupBigKeyMap();
}
}
Committable suggestion skipped: line range outside the PR's diff.
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() { | ||
std::unordered_map<std::string, int> aggregated_stats; | ||
|
||
for (const auto& db_entry : dbs_) { | ||
auto db_name = db_entry.first; | ||
auto db_instance = db_entry.second; | ||
|
||
if (db_instance) { | ||
auto stats = g_pika_server->GetBigKeyStatistics(); | ||
for (const auto& entry : stats) { | ||
aggregated_stats[entry.first] += entry.second; | ||
} | ||
} | ||
} | ||
|
||
return aggregated_stats; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix infinite recursion in GetBigKeyStatistics
The method has a critical infinite recursion issue where it calls itself recursively on line 814, which will cause stack overflow.
Apply this fix to resolve the recursion:
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() {
std::unordered_map<std::string, int> aggregated_stats;
for (const auto& db_entry : dbs_) {
auto db_name = db_entry.first;
auto db_instance = db_entry.second;
if (db_instance) {
- auto stats = g_pika_server->GetBigKeyStatistics();
+ auto stats = db_instance->storage()->GetBigKeyStatistics();
for (const auto& entry : stats) {
aggregated_stats[entry.first] += entry.second;
}
}
}
return aggregated_stats;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() { | |
std::unordered_map<std::string, int> aggregated_stats; | |
for (const auto& db_entry : dbs_) { | |
auto db_name = db_entry.first; | |
auto db_instance = db_entry.second; | |
if (db_instance) { | |
auto stats = g_pika_server->GetBigKeyStatistics(); | |
for (const auto& entry : stats) { | |
aggregated_stats[entry.first] += entry.second; | |
} | |
} | |
} | |
return aggregated_stats; | |
} | |
std::unordered_map<std::string, int> PikaServer::GetBigKeyStatistics() { | |
std::unordered_map<std::string, int> aggregated_stats; | |
for (const auto& db_entry : dbs_) { | |
auto db_name = db_entry.first; | |
auto db_instance = db_entry.second; | |
if (db_instance) { | |
auto stats = db_instance->storage()->GetBigKeyStatistics(); | |
for (const auto& entry : stats) { | |
aggregated_stats[entry.first] += entry.second; | |
} | |
} | |
} | |
return aggregated_stats; | |
} |
@@ -79,6 +79,7 @@ Status Redis::LIndex(const Slice& key, int64_t index, std::string* element) { | |||
} | |||
if (s.ok()) { | |||
ParsedListsMetaValue parsed_lists_meta_value(&meta_value); | |||
CheckBigKeyAndLog(key.ToString(), parsed_lists_meta_value.Count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
🛠️ Refactor suggestion
Verification confirms missing CheckBigKeyAndLog call in ListsExpire
The review comment is correct. The CheckBigKeyAndLog function is consistently used across all Redis data type operations, including expire methods, but is missing in ListsExpire. The call should be added after parsing the meta value but before checking the stale state, matching the pattern used in other methods.
The suggested fix in the review comment is accurate:
if (s.ok()) {
ParsedListsMetaValue parsed_lists_meta_value(&meta_value);
+ CheckBigKeyAndLog(key.ToString(), parsed_lists_meta_value.Count());
if (parsed_lists_meta_value.IsStale()) {
return Status::NotFound("Stale");
🔗 Analysis chain
Verify consistent placement of CheckBigKeyAndLog calls.
The CheckBigKeyAndLog function has been added consistently across all list operations. The function is called at the right location - after parsing the meta value but before checking the stale state, which ensures accurate tracking of large keys before any operation is performed.
However, there's one inconsistency in the ListsExpire method at line 1133 where the CheckBigKeyAndLog call is missing.
Let's verify if this is intentional by checking the implementation in other data structures:
Add the missing CheckBigKeyAndLog call in ListsExpire for consistency:
if (s.ok()) {
ParsedListsMetaValue parsed_lists_meta_value(&meta_value);
+ CheckBigKeyAndLog(key.ToString(), parsed_lists_meta_value.Count());
if (parsed_lists_meta_value.IsStale()) {
return Status::NotFound("Stale");
Also applies to: 127-127, 163-163, 204-204, 244-244, 280-280, 339-339, 396-396, 441-441, 502-502, 572-572, 705-705, 750-750, 824-824, 1133-1133, 1174-1174, 1212-1212, 1252-1252, 1292-1292
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if other expire methods also skip the CheckBigKeyAndLog call
rg -A 5 "Expire.*\{" | rg -B 5 "CheckBigKeyAndLog"
Length of output: 50
Script:
#!/bin/bash
# Find expire method implementations
ast-grep --pattern 'Expire($$$) {
$$$
}'
# Also search with ripgrep to be thorough
rg -B 5 -A 10 "Status.*Expire.*\{" src/
Length of output: 19891
Script:
#!/bin/bash
# Check for CheckBigKeyAndLog usage in expire methods
rg -B 5 -A 10 "CheckBigKeyAndLog" src/storage/src/redis_*
Length of output: 83683
Script:
#!/bin/bash
# Check ListsExpire implementation
rg -B 10 -A 10 "ListsExpire.*\{" src/storage/src/redis_lists.cc
Length of output: 1730
@@ -95,6 +95,7 @@ Status Redis::HDel(const Slice& key, const std::vector<std::string>& fields, int | |||
} | |||
if (s.ok()) { | |||
ParsedHashesMetaValue parsed_hashes_meta_value(&meta_value); | |||
CheckBigKeyAndLog(key.ToString(), parsed_hashes_meta_value.Count()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Verify consistent placement of CheckBigKeyAndLog calls.
The CheckBigKeyAndLog function has been added consistently across all hash operations. The placement and usage pattern matches that of the list operations, which is good for maintainability.
However, there's one inconsistency in the HashesExpire method at line 1198 where the CheckBigKeyAndLog call is placed after the stale check, unlike other methods.
Move the CheckBigKeyAndLog call before the stale check for consistency:
if (s.ok()) {
ParsedHashesMetaValue parsed_hashes_meta_value(&meta_value);
CheckBigKeyAndLog(key.ToString(), parsed_hashes_meta_value.Count());
- if (parsed_hashes_meta_value.IsStale()) {
- return Status::NotFound("Stale");
- } else if (parsed_hashes_meta_value.Count() == 0) {
- return Status::NotFound();
- }
+ if (parsed_hashes_meta_value.IsStale()) {
+ return Status::NotFound("Stale");
+ } else if (parsed_hashes_meta_value.Count() == 0) {
+ return Status::NotFound();
+ }
Also applies to: 163-163, 204-204, 248-248, 306-306, 400-400, 485-485, 529-529, 569-569, 634-634, 710-710, 786-786, 850-850, 914-914, 992-992, 1063-1063, 1135-1135, 1198-1198, 1239-1239, 1277-1277, 1317-1317, 1357-1357
Summary by CodeRabbit
Release Notes
New Features
Improvements
Performance