-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add batching and concurrency options #98
Comments
that is a good suggestion, I had this in mind to implement for dynamodb service (for but I suggest to ether merge options with second argument
or add some higher order function like
that is most probably the case, but I'm not agains to customize clients, especially it is already the case for client-s3 where I added presigned url overloads as custom addition effect-aws/packages/client-s3/src/S3Service.ts Lines 850 to 862 in eabaa4a
|
I took a look at the Request/RequestResolver interfaces in the Batching docs. Here I was taking the approach that requests could be made for single entries, and the resolver would submit the batch in a single However, I don't seem to be able to get the correct return type, as Request.complete/completeEffect have void returns: class PutEventError {
readonly _tag = 'PutEventError'
}
interface PutEvent extends Request.Request<PutEventsResultEntry, PutEventError>{
readonly _tag: 'PutEvent'
readonly entry: PutEventsRequestEntry
}
const PutEvent = Request.tagged<PutEvent>('PutEvent')
const PutEventResolver = RequestResolver.makeBatched((requests: ReadonlyArray<PutEvent>) =>
EventBridge.putEvents({ Entries: requests?.map(request => request.entry) }).pipe(
Effect.map(response => response.Entries ?? []),
Effect.andThen(results =>
Effect.forEach(requests, (request, index) =>
Request.completeEffect(request, Effect.succeed(results[index]!))))
)
) gives:
|
Update code in previous comment to include request types |
you complete your request when it succeeds, but you also need to complete in case of errors, however I see you actually use ...
Effect.catchAll((error) =>
Effect.forEach(requests, Request.completeEffect(Effect.fail(new PutEventError()))),
), |
Thanks, somehow I wasn't thinking that was important to the outcome! Here is an outline of Batching code for an EventBridge PutEvent (singular): class PutEventError {
readonly _tag = 'PutEventError'
}
interface PutEventRequest extends Request.Request<PutEventsResultEntry, PutEventError>{
readonly _tag: 'PutEventRequest'
readonly entry: PutEventsRequestEntry
}
const PutEventRequest = Request.tagged<PutEventRequest>('PutEventRequest')
const PutEventResolver = RequestResolver.makeBatched((requests: ReadonlyArray<PutEventRequest>) =>
EventBridge.putEvents({ Entries: requests?.map(request => request.entry) }).pipe(
Effect.map(response => response.Entries ?? []),
Effect.andThen(results =>
Effect.forEach(requests, (request, index) =>
Request.completeEffect(request, Effect.succeed(results[index]!)))
),
Effect.catchAll((_error) =>
Effect.forEach(requests, Request.completeEffect(Effect.fail(new PutEventError()))),
),
)
).pipe(
RequestResolver.batchN(10),
RequestResolver.contextFromServices(EventBridge)
)
const PutEvent = (entry: PutEventsRequestEntry) => Effect.request(PutEventRequest({ entry }), PutEventResolver) And here is example usage, with the commenting out of my previous chunking code: const decodeServiceHandler = (event: unknown, _context: Context) => pipe(
Schema.decodeUnknown(EventBridgeEventWithUplink)(event),
Effect.andThen(uplink => decodeUplink(uplink.detail)),
Effect.andThen(Effect.forEach(({ type, ...payload }) => Effect.gen(function* () {
const config = yield* DecoderConfig
return Common.PutEventsRequestEntry.make({
Source: config.source,
DetailType: type ?? config.detailType,
Detail: JSON.stringify(payload),
EventBusName: config.eventBusArn,
})
}))),
Effect.andThen(entries => Effect.forEach(entries, PutEvent, { batching: true })),
// Effect.andThen(payloads => Chunk.fromIterable(payloads).pipe(Chunk.chunksOf(10))),
// Effect.andThen(Effect.forEach(entries => EventBridge.putEvents({ Entries: Chunk.toArray(entries) }), { concurrency: 10 })),
Effect.as(undefined),
) Observations:
Creating the Request/RequestResolver is quite a lot of boiler plate on top of the existing effect-aws code, and given that it might be hard to determine which services/commands could benefit from batching, it might be easier to create a utility, perhaps as a static on the service, that allows the developer to make the Request query and associated code on demand. Something like: const PutEvent = EventBridge.makeBatching(PutEventsCommand, {
max: 10,
window: Duration.milliseconds(100),
concurrency: 10
}) I also think that if possible, the query effect ( const PutEvents = (entries: ReadonlyArray<PutEventsRequestEntry>) => Effect.forEach(entries, PutEvent, { batching: true }) Maybe that's an option to the utility function, as to whether it presents a singular or plural interface. As previously suggested, batching could be added as an option to the existing service methods, but I think I like the idea of keeping those as an 'effect'ual version of the SDK, and supporting batching on an adhoc basis. I think something would need to be done about accumulating results, or maybe its sufficient to just return an array of the regular command output. |
Thinking out loud here - I went back to see how Batching compared to the chunking alternative: I define the features as:
For completeness, here is an implementation of PutEvents that uses Chunks to meet the service limit whilst allowing concurrency: const ChunkedPutEvents = ({ batchSize, concurrency } : { batchSize: number, concurrency: Concurrency }) => (entries: ReadonlyArray<PutEventsRequestEntry>) => {
const chunks = Chunk.chunksOf(batchSize)(Chunk.fromIterable(entries))
return Effect.forEach(chunks, chunk => EventBridge.putEvents({ Entries: Chunk.toArray(chunk)}), { concurrency })
} I'm not familiar enough with Effect or the Batching API to fully understand whether its capable of providing the concurrency support. I don't see a way to configure concurrenct so I'm assuming there is none. I am also assuming that it will aggregrate separate request until a batch is full, but without the possibility of a batching window so flush downstream when the upstream is slow. I've somewhat familiar with Mailbox which might provide the necessary support. It seems possible to have an API similar to The mailbox can be converted to a |
I worked up batching for DynamoDB Get, using BatchGetCommand under the hood. This is less straightforward as requests need to be grouped by table and then the responses matched up to the request. This is far from complete:
The main value here was that there is plenty of nuance in how indivdual AWS clients handle batching and that might make adding broad support for batching in effect-aws more effort. import { KeysAndAttributes } from "@aws-sdk/client-dynamodb"
import { BatchGetCommandInput, NativeAttributeValue } from "@aws-sdk/lib-dynamodb"
import { DynamoDBDocument } from "@effect-aws/lib-dynamodb"
import { Data, Effect, Equal, Request, RequestResolver } from "effect"
type Item = Record<string, NativeAttributeValue>
type KeyAndAttributes = Omit<KeysAndAttributes, "Keys"> & {
Key: Item
}
export class GetItemError extends Data.TaggedError('GetItemError')<{
Key: Item
cause: Error
}> {
}
interface GetItemRequest extends Request.Request<Item, GetItemError>, KeyAndAttributes {
readonly _tag: 'GetItemRequest'
readonly TableArn: string
}
export const GetItemRequest = Request.tagged<GetItemRequest>('GetItemRequest')
const GetItemResolver = RequestResolver.makeBatched((requests: ReadonlyArray<GetItemRequest>) => Effect.gen(function* () {
const groupRequestsByTable = () => {
type TableArn = string
const requestsPerTableMap = new Map<TableArn, Set<GetItemRequest>>()
for (const request of requests) {
const setOfRequests = requestsPerTableMap.get(request.TableArn) ?? new Set()
setOfRequests.add(request)
requestsPerTableMap.set(request.TableArn, setOfRequests)
}
return requestsPerTableMap
}
const makeBatchGetItemCommand = (requestsPerTableMap: ReturnType<typeof groupRequestsByTable>) => {
const input: BatchGetCommandInput = { RequestItems: {} }
for (const [tableArn, tableRequests] of requestsPerTableMap) {
input.RequestItems ??= {}
input.RequestItems[tableArn] ??= { Keys: [] }
for (const request of tableRequests) {
input.RequestItems[tableArn].Keys!.push(request.Key)
}
}
return input
}
const requestsPerTableMap = groupRequestsByTable()
const input = makeBatchGetItemCommand(requestsPerTableMap)
const itemMatchesKey = (item: Item, key: Item) => {
for (const [k, v] of Object.entries(key)) {
if (!Equal.equals(item[k], v)) {
return false
}
}
return true
}
yield* DynamoDBDocument.batchGet(input).pipe(
Effect.andThen(response =>
Effect.forEach(Object.entries(response.Responses ?? {}), ([tableArn, items]) =>
Effect.forEach(items, item =>
Effect.filter(requests, request => Effect.succeed(request.TableArn === tableArn && itemMatchesKey(item, request.Key))).pipe(
Effect.andThen(Effect.forEach(request =>
Request.completeEffect(request, Effect.succeed(item)))),
),
),
),
),
Effect.catchAll(cause =>
Effect.forEach(requests, request =>
Request.completeEffect(request, new GetItemError({
cause,
Key: request.Key,
})),
),
),
)
})).pipe(
RequestResolver.batchN(100),
RequestResolver.contextFromServices(DynamoDBDocument),
)
export const GetItem = (request: GetItemRequest) => Effect.request(request, GetItemResolver) |
Many of the AWS clients support sending multiple events within a single request, such as SQS and EventBridge, but there are typically quota limits on how many. Addtionally, the client is usually configured with a maximum number of concurrent requests, typically defined by the HTTP agent (tends to be 50).
I have often found myself repeating the process of having some iterable of events and having to chunk according to the service limit per request, and then handling the requests serially or with some degree of concurrency.
It would be useful if the generated Effect clients had additional options to control this, though appreciate this might then affect response (success and response handling). I have tended to rely on errors being returned to the invoker so that it takes advantage of the inherent retries and DLQ handling where configured, rather than attempting to implement retry logic at the client level.
I'm not sure if its possible for the code gen to know what the service limits are, and which commands they would apply to. There are additional service limits such as payload size to also consider, although calculating those can, I think, be service specific and perhaps much harder to implement generically.
For example:
The text was updated successfully, but these errors were encountered: