Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add all_ecocredit_txes function #34

Merged
merged 1 commit into from
Aug 17, 2023
Merged

Conversation

wgwz
Copy link
Contributor

@wgwz wgwz commented Aug 15, 2023

Closes: #32

@wgwz
Copy link
Contributor Author

wgwz commented Aug 15, 2023

While I was testing locally, it seemed like no matter how I wrote a test query, the index was not being used.
I learned that postgres does not always use indexes, in cases where it does not need to (postgres smart, me dumb):
https://www.pgmustard.com/blog/why-isnt-postgres-using-my-index
And also, there are some rules about the types of queries you can write involving GIN indexes on JSONB columns:
https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

So after writing the query properly and disabling seqscan (using set enable_seqscan=false), the indexes are working locally.
here's a query filtering on the 'code' in the json:

select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data ->'tx_response'->'code' @> '0'

And here's a screenshot showing the explain/analyze results for the query (which shows the index being used):
Screen Shot 2023-08-15 at 3 33 01 PM

@wgwz
Copy link
Contributor Author

wgwz commented Aug 15, 2023

TL;DR due to a limitation in the connection filter plugin, i think we need to write a custom function to expose as a graphql query.

While setting this up and testing the integration with regen-server, I think I've found that using the connection filter plugin will not work for us in this case.
The problem is that the connection filter does not generate a SQL query that properly makes use of the GIN indexes.
Here's how I have come to that conclusion.

  1. Run a graphql query utilizing the connection filter plugin
{
  allTxes(filter: {data: {contains: {tx_response: {code: 0}}}}) {
    nodes {
      hash
    }
    totalCount
  }
}
  1. Observe the raw SQL query as postgresql shows in it's log files
2023-08-15 19:33:02.128 EDT [5606] STATEMENT:  with __local_0__ as (select to_json((json_build_object('hash'::text, (__local_1__."hash")))) as "@nodes" from (select __local_1__.*                                                                                                                                                          
        from "public"."tx" as __local_1__                                                                                                                                                                                                                                                                                                   
        where (((__local_1__."data" @> $1))) and (TRUE) and (TRUE)                                                                                                            order by __local_1__."chain_num" ASC,__local_1__."block_height" ASC,__local_1__."tx_idx" ASC                                                                  
                                                                                                                                                                      
        ) __local_1__), __local_2__ as (select json_agg(to_json(__local_0__)) as data from __local_0__) select coalesce((select __local_2__.data from __local_2__), '[
]'::json) as "data", (                                                                                                                                                
          select json_build_object('totalCount'::text, count(1))                                                                                                                from "public"."tx" as __local_1__                                                                                                                           
          where (((__local_1__."data" @> $1)))                                                                                                                                ) as "aggregates"                                                                                                                                             
2023-08-15 19:33:02.128 EDT [5606] LOG:  execute DDsfnUKoGgwtKn4U3KkDyZ9+Aas=: with __local_0__ as (select to_json((json_build_object('hash'::text, (__local_1__."hash")))) as "@nodes" from (select __local_1__.*                                                                                                                          
        from "public"."tx" as __local_1__                                                                                                                             
                                                                                                                                                                      
        where (((__local_1__."data" @> $1))) and (TRUE) and (TRUE)                                                                                                    
        order by __local_1__."chain_num" ASC,__local_1__."block_height" ASC,__local_1__."tx_idx" ASC                                                                                                                                                                                                                                        
        ) __local_1__), __local_2__ as (select json_agg(to_json(__local_0__)) as data from __local_0__) select coalesce((select __local_2__.data from __local_2__), '[]'::json) as "data", (       
          select json_build_object('totalCount'::text, count(1))                                                                                                                from "public"."tx" as __local_1__                                        
          where (((__local_1__."data" @> $1)))                                                                                                                        
        ) as "aggregates"                                                                                                                                             
2023-08-15 19:33:02.128 EDT [5606] DETAIL:  parameters: $1 = '{"tx_response": {"code": 0}}'
  1. Copy and paste the generated SQL and substitute the parametrized value into a database IDE, and run the query with explain analyze and observe that the tx_data_tx_response_code_idx is not used.

Screen Shot 2023-08-15 at 7 39 28 PM

This can be reproduced with this minimal:

  • Query 1: returns correct result but does not use the index (same as generated query)
select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data @> '{"tx_response": {"code": 0}}';
  • Query 2: returns correct result and uses the GIN index
select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data ->'tx_response'->'code' @> '0'

I think the best path forward is to just a custom query, and since there's probably enough details here, open an issue in the connection filter plugin and see if it's a problem they would like to solve.

@wgwz wgwz force-pushed the kyle/32-index-code-and-type branch 2 times, most recently from e782de2 to fee80d2 Compare August 16, 2023 20:20
@wgwz wgwz changed the title feat: index tx.data tx_response code and type feat: add all_ecocredit_txes function Aug 16, 2023
@wgwz wgwz force-pushed the kyle/32-index-code-and-type branch from fee80d2 to 1567827 Compare August 16, 2023 20:25
@wgwz
Copy link
Contributor Author

wgwz commented Aug 16, 2023

Screen Shot 2023-08-16 at 4 26 13 PM

With the latest code and new all_ecocredit_txes in this branch, we can see that the added indexes are being utilized as expected in my local development environment.

Next step will be to deploy these changes to the staging indexer, and test it as a graphql query on the regen-server staging environment.

If there are performance issues with this approach, we might be able to get better performance by switching the newly added msg_event_type_idx to be a GIN index rather than a btree index (ref1, ref2 ).

@wgwz wgwz marked this pull request as ready for review August 16, 2023 21:04
@wgwz wgwz requested a review from a team August 16, 2023 21:04
Copy link
Member

@ryanchristo ryanchristo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK. This looks good to me. As discussed, we may need to account for different chains if and when we move forward with #33. I made a note in the issue. And thanks for the walkthrough on the architecture call today!

@wgwz
Copy link
Contributor Author

wgwz commented Aug 17, 2023

Just noting that I successfully tested this in the regen-server staging environment.
I'm going to go ahead, merge and deploy this to production now to start evaluating the performance there.

@wgwz wgwz merged commit ba1e2e9 into main Aug 17, 2023
@wgwz wgwz deleted the kyle/32-index-code-and-type branch August 17, 2023 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set up indexer to enable improved filtering with allTxes query
2 participants