-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rows.Close() getting all result rows #2153
Comments
When you encounter a scanning error inside the According to the PostgreSQL wire protocol, all remaining messages must be consumed before a new query can be processed (actually PostgreSQL can accept new query, but to consume new query results you still need to consume all previous messages). If you really want to stop receiving data, probably your best option is to terminate the connection to the server. |
It is not true. In C driver you have PGCancel |
As mentioned in the PostgreSQL documentation (https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-CANCELING-REQUESTS):
Once you're receiving query results, the query is already done, so there’s nothing left to cancel. |
You can still use same API in pgx: https://pkg.go.dev/github.com/jackc/pgx/v5/pgconn#PgConn.CancelRequest However, as mentioned earlier, cancellation only affects queries that are still in progress |
I understand ,what you are talking about. When I work with API Query -> rows.Next() should I close rows by rows.Close() or not? |
It's generally better to explicitly use But you can let pgx to manage it for you, see Lines 421 to 447 in 2ec9004
Here's how you use this API: myRows, err := pgx.CollectRows(rows, pgx.RowToAddrOfStructByName[Sometype]) |
Here's how do I wrap this API usage in my personal app: type Queryer interface {
Query(ctx context.Context, sql string, args ...any) (pgx.Rows, error)
Exec(ctx context.Context, sql string, args ...any) (pgconn.CommandTag, error)
}
func PgxFetchRowsPtrCtx[T any](ctx context.Context, pool Queryer, sql string, values ...any) ([]*T, error) {
rows, _ := pool.Query(ctx, sql, values...)
res, err := pgx.CollectRows(rows, pgx.RowToAddrOfStructByNameLax[T])
if err != nil {
return nil, err
}
return res, nil
}
// Then in your app you write something like that
rows, err := utils.PgxFetchRowsPtrCtx[yourRowType](ctx, r.pgxPool, sql, query bindings goes here) Such wrapper can make your life even easier |
Then I have to admit, that if you have a case like mine, API requires additional clarification. I have a query which returns about 40 000 000 rows (10 GB) which I need to process. I don't want to load it into memory (I don't have much), and my question is not about deserialize data to a structure. Let's imagine I have some kind of processing and in the middle of processing I have whatever error and decide to cancel processing and close the rows, I have to wait until rows.Close() read all the rows for nothing, just several GB of data through network just because we need to be a connection in a particular state, instead of cancel a query and wait until DB really stopped the stream. As we work all in any DB IDE, we run a query, ups we need to cancel it, we do, and then we run another query with the same connection. The solution for the case with huge queries is:
Is it right? In all of those words, I don't like the answer "You can close the connection". This is how I think it should work, in the function rows.Close(), we have to work as in C, we send a message to the server cancel a query and wait for What do you think? |
It seems there's some misunderstanding about query processing. Once you start receiving rows, the query has already finished on the PostgreSQL server. You can't cancel it because nothing is running on the PostgreSQL server, at that point - it's simply sending results to your application. P.S. Querying 40 000 000 rows at once might not be ideal. PostgreSQL may not fit such a large result set in the connection's RAM ( |
(Nit: that's only true if you're not using a [1] https://www.postgresql.org/docs/current/sql-declare.html |
What you are talking about?
Are you for real? Who returns rows then?
until we get CompleteResponse query is on the server And there is no difference between cursor and select internally in the DB. Pagination is not related problem I described. only meaningful thing in our discussion was : to resolve a bug, yes I belive it is a bug: func (rr *ResultReader) Close() (CommandTag, error) {
if rr.closed {
return rr.commandTag, rr.err
}
rr.closed = true
// solution
err := rr.pgConn.CancelRequest(context.Background())
if err != nil {
fmt.Println("can't deliver a cancelation")
}
//
for !rr.commandConcluded {
_, err := rr.receiveMessage()
if err != nil {
return CommandTag{}, rr.err
}
} As I said, request is not finished, we send a signal to cancel, wait 'rr.commandConcluded' and driver even returned right message: By the way connection stays and ready to do another query. if your query is finished successfully I believe rr.closed will work We just have to repeat What do you think? |
I see your point about the cancellation request, I did a simple test with PostgreSQL 15: postgres=# create table some_tbl (id bigint PRIMARY KEY);
CREATE TABLE
postgres=# INSERT INTO some_tbl SELECT s FROM generate_series(1, 40000000) s;
INSERT 0 40000000 Then I wrote a Go app to test cancelling a request during result set retrieval: Go Codepackage main
import (
"context"
"github.com/jackc/pgx/v5"
"log"
)
func main() {
conn, err := pgx.Connect(context.Background(), "postgresql://postgres:[email protected]:5432/postgres?sslmode=disable")
if err != nil {
panic(err)
}
log.Println("Sending query")
rows, err := conn.Query(context.Background(), "select * from some_tbl ORDER BY id")
if err != nil {
panic(err)
}
rows.Next()
log.Println("First row has been arrived")
var num int64
if err := rows.Scan(&num); err != nil {
log.Fatal("Failed to scan row", err)
}
// Now we sending cancel request
if err := conn.PgConn().CancelRequest(context.Background()); err != nil {
log.Println("Failed to cancel request", err)
}
log.Println("After cancel request has been sent")
var n int
// And we still receiving rows after cancellation request was sent
for rows.Next() {
var num int64
if err := rows.Scan(&num); err != nil {
log.Println("Failed to scan row", err)
break
}
n++
}
log.Printf("Rows loop ended, rows received=%d err=%v", n, rows.Err())
} After sending a cancel request, rows continued to be received, but eventually the cancellation took effect with this output:
So, you're right - the query is still considered in progress while streaming results. It’s fine to send a CancelRequest during this stage to stop reading all 40 million rows. |
While your approach of placing CancelRequest inside the rows.Close logic seems reasonable for your use case, it could introduce a breaking change. This could result in exceeding the PostgreSQL connection limit. So I think it's better to keep the CancelRequest in application logic rather than in the driver. |
Thank you. Maybe we consider adding it to the driver, just ignore the error, I mean, if there are no connections, and we can't send the cancelation signal, doesn't matter, wait for all rows. The cancelation is just one signal and sending connection should be closed. It is quite fast operation. I think a situation when we cancel queries in the same time less probable, than the problem which causes by waiting until you get all the rows. Maybe discussion with other Contributors? |
As mentioned by others above Unfortunately, it is somewhere between very difficult to impossible for the driver to handle query cancellation properly in every case. Here is a subset of the issues: How to ensure the proper query is cancelled?Query execution and the cancellation signal are parallel operations. It is possible for a race condition to occur.
Batch queries and pipeline mode are even more difficult. How long to wait for cancellation?We have no way of knowing if or when the cancellation signal will take affect. How long should the blocked Go code wait? pgx had this quandary for what to do with context cancellation. By default pgx chooses to unblock the Go code as soon as possible at the expense of closing the connection. pgx sends the cancellation signal to avoid wasting server resources, and also immediately closes the underlying net.Conn. In a previous version of pgx, we tried to automatically recover to avoid closing connections unnecessarily, but the logic was too tricky to handle reliably and there were certain cases where the connection was guaranteed to be broken (interrupted TLS connections) so application code still needed to handle context cancellation killing the connection. In current pgx, you can reconfigure context cancellation logic if you are really ambitious. I doubt that is the right approach for you though. All that said, unless you specifically need to use a single, stateful connection then what I would recommend is to use a pgxpool and use context cancellation. Everything will just work. If for some reason you really need to use a single connection then you would need to manually call CancelRequest yourself. |
@jackc What I have to do in this situation? Solution for my previous situation was to terminate connection, Ok, understandable, not ordinary situation, huge query, consider that if query is huge then connection belongs to query. But in this situation? It is really another good question, Why do we close Rows if we can't scan it? but it is not related. |
OK, My personal problem solved, by server side import (
"context"
"fmt"
"github.com/jackc/pgx/v5"
)
type Cursor struct {
Name string
SQL string
FetchSize int
Tx pgx.Tx
count int
open bool
closed bool
}
const (
CURSORDECLARATIONSQL = "DECLARE %s CURSOR FOR %s"
FETCHSQL = "FETCH %d FROM %s"
CLOSESQL = "CLOSE %s"
)
func NewCursor(name, sql string, fetchSize int, tx pgx.Tx) *Cursor {
return &Cursor{
Name: name,
SQL: sql,
FetchSize: fetchSize,
Tx: tx,
}
}
func (c *Cursor) Open() error {
if c.open {
return nil
}
if c.closed {
return fmt.Errorf("Cursor already closed")
}
_, err := c.Tx.Exec(context.Background(), c.DeclareSQL())
if err != nil {
return err
}
c.open = true
return nil
}
func (c *Cursor) Fetch() (pgx.Rows, error) {
if !c.open {
return nil, fmt.Errorf("cursor is not open")
}
if c.closed {
return nil, fmt.Errorf("cursor is already closed")
}
return c.Tx.Query(context.Background(), c.FetchSQL())
}
func (c *Cursor) Close() error {
if !c.open {
return fmt.Errorf("cursor is not open")
}
if c.closed {
return nil
}
c.open = false
c.closed = true
_, err := c.Tx.Exec(context.Background(), c.CloseSQL())
return err
}
func (c *Cursor) DeclareSQL() string {
return fmt.Sprintf(CURSORDECLARATIONSQL, c.Name, c.SQL)
}
func (c *Cursor) FetchSQL() string {
c.count++
return fmt.Sprintf(FETCHSQL, c.FetchSize, c.Name)
}
func (c *Cursor) CloseSQL() string {
return fmt.Sprintf(CLOSESQL, c.Name)
}
func (c *Cursor) Count() int {
return c.count
} You Question about, Why we close rows inside |
IMO it's what the developer would expect in most cases and it's convenient. A It is a convenience because it means that in many cases it is not necessary to check the error from The only case I can think of where a Finally, if you really want to scan without risk of the rows being closed by an error you can use https://pkg.go.dev/github.com/jackc/pgx/v5#ScanRow. |
In this case I completely disagree, I don't expect func ImpossibleToDoWithoutGettingAllRows1() error {
-----
rows, err := conn.Query(`SELECT...`)
if err!= nil {
return err
}
defer rows.Close()
for rows.Next() {
err := rows.Scan(.....)
/// I have to wait here, get all the rows even if it is several Gigabyte
/// instead It could be immediately done by Cancelling
/// and yes , I remember what you said before but it is my problem as a developer
/// to be concerned about race conditions
if err != nil {
slog.Error("error scanning row", "err", err)
_ = conn.PgConn().CancelRequest(context.Background())
return err
/// could be return err - which also closes rows implicitly
}
//------ Some Processing -------
}
if rows.err != nil {
return err
}
return nil
} or func ImpossibleToDoWithoutGettingAllRows2() error {
-----
rows, err := conn.Query(`SELECT...`)
if err!= nil {
return err
}
// I want to close the connection thereby terminate a query
// As Postgres documentation says
defer conn.Close()
for rows.Next() {
err := rows.Scan(.....)
/// I'm waiting and waiting here .....
if err != nil {
slog.Error("error scanning row", "err", err)
return err
}
//------ Some Processing -------
}
if rows.err != nil {
return err
}
return nil
} What I really like in At the end I would say that closing rows in the Scan function is wrong because it doesn't bring you any benefits and I mean it, any. Because, you still have in you function a code like this:
which does exactly it. And when you write an application with DB interaction, not always DB belongs to you, one day field is 'NOT NULL' and you scan it to int, other day UPS it is NULL. But you app fairly passed all ... GB or TB of data through network without reason. Of course I believe it is huge change, because some developers might rely on it by not Closing rows at the end of the function. I understand that if you write an app where a query returns 100 or even 1000 rows you don't care about it, but I work with 40_000_000 rows (it is one day by the way) and about 10 GB of compressed data, maybe that is why I brought this topic. I don't remember exactly but version 4, worked in a different way, didn't it? |
This behavior goes all the way back to v2 when the But a good thing about pgx's design is that pgx doesn't have to change for you to get the behavior you want. pgx is designed in a layered fashion where you can usually drop down a layer if you need something different than pgx provides out of the box. As I mentioned above you can use https://pkg.go.dev/github.com/jackc/pgx/v5#ScanRow to do what you want. And if that didn't work you could go down to the pgconn layer, a lower level library similar to the C libpq library. If that didn't work you could go down to the pgproto3 layer and directly read and write PostgreSQL protocol messages. And finally, if even that wasn't sufficient, you can get access to the underlying |
I have a part of code like this
but when I get an error, app tries to close rows and it is stuck in a loop where it is getting all result rows, which I don't need anymore. Particularly here:
pgconn.go 1593
and If your query returns huge amount of rows you just wait until all rows are read.
Do I do something wrong?
Shouldn't I close the rows? Or ..?
I expect that
rows.Close()
just stops the query in the database and cleans the memory.Version:
v5.7.1
The text was updated successfully, but these errors were encountered: