-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Outlook] Feature: Office365 multi-user support #2844
base: main
Are you sure you want to change the base?
[Outlook] Feature: Office365 multi-user support #2844
Conversation
💚 CLA has been signed |
Introduced BaseOffice365User abstract base class to standardize Office 365 user handling. Added MultiOffice365Users to manage multiple emails from config. Added client_emails (comma-separated) in OutlookDataSource config. Resolved issue with fetching too many users causing SMTP server not found error.
37f7483
to
897fed8
Compare
buildkite test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thanks for submitting a patch! I've left a few comments, but overall I think this is a pretty clean approach and solves a clear ask.
Outside of the below comments, I'd also request that you change the
Closes #2843
in your PR description to use "Relates to" or "Part of", since another part of the issue you filed is that the current solution is limited to 999 users. I expect that's something we can better control with the graph API and another separate configuration, for folks who want "all users", and have more than 999.
@seanstory After rechecking the code, I realized I was mistaken in my initial description. The code isn't limited to fetching the top 999 users—it handles pagination using the @odata.nextLink property from the Microsoft Graph API, which retrieves all users. I missed that detail. I'll update my issue description to reflect the correct behavior. connectors/connectors/sources/outlook.py Lines 411 to 429 in 6e74c3a
|
…Cloud connector - Refactored the method to utilize Microsoft Graph's JSON batching feature. - The method now processes client emails in batches of up to 20 to optimize API requests. - Added error handling for individual user fetch errors within the batch response. - If any batch request fails, the method collects the errors and raises an exception with details.
buildkite test this |
Lint still fails due to async-related issues. Autoformat doesn't solve the lint errors. I'll investigate further after completing test functions. If unable to resolve, I'll seek assistance |
@ilyasabdellaoui do you want to make this PR ready for review, or is there anything we can help with? |
@artem-shelkovnikov Thanks for checking in! I encountered some linting issues, which I've now resolved in my latest commit, particularly related to async iteration. Everything is fixed and ready for review. |
buildkite test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a comment about configurable field to make it consistent with other connectors, but otherwise great change.
Additional question - have you done any benchmarks on how faster the connector gets?
I know that you're saving a lot of time on network, but throttling-wise I believe this change makes not impact, is it true?
connectors/sources/outlook.py
Outdated
"required": False, | ||
"type": "list", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally handle lists like this with *
being default value and *
meaning that everything should be exported.
See this as an example: https://github.com/elastic/connectors/blob/main/connectors/sources/sharepoint_online.py#L1192-L1199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied the suggested changes and updated the logic to the WILDCARD *
when client_emails is not specified
920697a
connectors/sources/outlook.py
Outdated
async def get_users(self): | ||
access_token = await self._fetch_token() | ||
errors = [] | ||
for i in range(0, len(self.client_emails), 20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to make it a constant at least (20). Also why was 20 chosen as batch size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code to use the GRAPH_API_BATCH_SIZE
constant instead of hardcoding the value 20
, see commit
e4923b5.
The batch size of 20
is due to the max limit set by the Graph API. For ref you can check the Graph API documentation on JSON batching
I haven't run any benchmarks yet, but I expect this change will improve performance by reducing unnecessary user fetches. As for throttling, the client_emails configuration is most effective at avoiding it when the list of emails is small, by limiting the API calls to only the relevant users, it reduces the load on the API. For larger email lists, though, the impact on throttling might not be as important! |
buildkite test this |
Change looks good, but I'll not merge immediately, will address a bit later - we need to do same change in Kibana to make it properly work for native connectors. I'll follow-up and merge both at the same time |
Hi @ilyasabdellaoui, it's on my radar - we currently don't have a running subscription with. Outlook to be able to manually test it, it's ongoing work and we'll get back to this PR once it's been properly tested. |
Thanks for the update @artem-shelkovnikov! Let me know if I can assist with anything to facilitate testing. |
Introduced BaseOffice365User abstract base class to standardize Office 365 user handling. Added MultiOffice365Users to manage multiple emails from config. Added client_emails (comma-separated) in OutlookDataSource config. Resolved issue with fetching too many users causing SMTP server not found error.
Closes #2843
Checklists
Pre-Review Checklist
config.yml.example
)