You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using PHP ETL and the Parallel extension's functional API to ingest large CSV datasets. It has been reliable and fast, but there can be several hundred database connections left open until the ingest is complete. I poked around the source code for this ETL package, but there are no methods or documentation for removing connections.
For context, this is how the ingest script works:
Stream contents of the CSV file into a temporary $batch array
Once 5K rows have been loaded into the batch, push a task into $tasks array
At end of file, load remaining batch into a task
Each task runs ETL in a parallel process using a closure
Initialize new ETL instance
addConnection() for each target database
extract() & transform() the batch data
load() the batch into the DB
run() ETL
unset($etl)
return (also tried exit)
Await completion of all tasks
Move to next dataset...
Though I can't share any actual code, those closure bullet points are essentially what happens.
Inside the ETL closure, I have tried exit and return after $etl->run() completes, and I have tried unsetting the ETL instance in the closure. Still, the processes and DB connections remain open.
Documentation for the Parallel extension could be more robust.
Request
The Manager class would benefit from a removeConnection or destroyConnection method, where the conn would be removed from $connections. Would that terminate the PDO connection?
I'm happy to open a PR if this would work. I would also take advice on using persistent connections with ETL and Parallel.
The text was updated successfully, but these errors were encountered:
The implementation is very basic: it removes any reference to the connection, so the garbage collector could remove it on its next pass.
Official PHP Doc:
The connection remains active for the lifetime of that PDO object. To close the connection, you need to destroy the object by ensuring that all remaining references to it are deleted--you do this by assigning NULL to the variable that holds the object. If you don't do this explicitly, PHP will automatically close the connection when your script ends.
Problem
I have been using PHP ETL and the Parallel extension's functional API to ingest large CSV datasets. It has been reliable and fast, but there can be several hundred database connections left open until the ingest is complete. I poked around the source code for this ETL package, but there are no methods or documentation for removing connections.
For context, this is how the ingest script works:
$batch
array$tasks
arraynew ETL
instanceaddConnection()
for each target databaseextract()
&transform()
the batch dataload()
the batch into the DBrun()
ETLunset($etl)
return
(also triedexit
)Though I can't share any actual code, those closure bullet points are essentially what happens.
Inside the ETL closure, I have tried
exit
andreturn
after$etl->run()
completes, and I have tried unsetting the ETL instance in the closure. Still, the processes and DB connections remain open.Documentation for the Parallel extension could be more robust.
Request
The Manager class would benefit from a
removeConnection
ordestroyConnection
method, where the conn would be removed from$connections
. Would that terminate the PDO connection?I'm happy to open a PR if this would work. I would also take advice on using persistent connections with ETL and Parallel.
The text was updated successfully, but these errors were encountered: