Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing creator transaction hashes after migration #1573

Closed
manuelwedler opened this issue Aug 16, 2024 · 6 comments
Closed

Missing creator transaction hashes after migration #1573

manuelwedler opened this issue Aug 16, 2024 · 6 comments
Assignees

Comments

@manuelwedler
Copy link
Collaborator

We found that some contracts that had a creator-tx-hash.txt in the /files/any/ endpoint shown in our old kubernetes deployment didn't have it after the migration for our new setup. The transaction_hash column in contract_deployments is simply NULL for these cases.

I investigated two examples:

0x00000000219ab540356cBB839Cbe05303d7705Fa on chain 1

image

The logs show that there was a creatorTxHash found but fetching the creation bytecode failed. The latter probably happened due to an unresponsive RPC.

We should still store the creatorTxHash in these cases. This is fixed by #1572

0x4E7095a3519A33dF3D25774c2F9D7a89eB99745D on chain 11155111

image

Here the log is a bit misleading. The fetching of the creatorTxHash failed actually because no creatorTx is in the jsonPayload. Therefore, lib-sourcify also didn't match with creation tx. I fixed the logging also in the above PR.

So in this case the cause was probably the Etherscan API being unresponsive.

Database problem

What still remains is that we probably have a lot of contracts after the migration for which we are missing the creation tx hashes. I queried the DB and it shows that there are 3.7M contract_deployments without tx_hash and 1.4M with the tx_hash. This seems a lot without tx hash to me. Unfortunately, we cannot know how it was before because we have never recovered the old production DB fully.

Two options two fix this problem now:

  • We run the migration again and maybe add some checks for the transaction hashes to make sure they are fetched and stored.
  • We go through the old repo, check the creator-tx-hash.txt files and add them manually to the database if not present yet.

A side note: There are a probably more problems after the migration. We should check the database more thoroughly. An unresponsive RPC could also have caused a change in the match status after the migration.

@kuzdogan
Copy link
Member

If we go with the second option we are going to have a dirty DB, that is contracts will have a creatorTxHash but no creation bytecodes. I think we should re-verify them.

We'll need a script that needs to connect to the both filesystems, the current one on GCP and the legacy one running on Argo. This script needs two lists:

  1. The contracts that don't have a creationTxHash on the DB.
  2. The contract that have a creator-tx-hash.txt in the legacy Argo repo

Then it should take an intersection of these two lists (address+chainId as identifiers). This would give us the full list that needs modification and also the number of contracts affected. We can write this list of contracts in a new database instance, a local one etc. to mark off the ones that are verified.

Finally we send these contracts to re-verification on the migration instance (the instance container image needs to be updated) and have them re-verified. Bear in mind we need a small change in the code to trigger updating the row of that contract is it goes from a null match to a partial.

Also before all of this review #1572

@marcocastignoli
Copy link
Member

marcocastignoli commented Aug 19, 2024

I think we can optimize this process by:

  1. creating a dedicated sourcify_sync like table (sourcify_sync_tx_hash with additional transaction_hash field) on prod database and pushing all the contracts that have a creator-tx-hash.txt from the legacy Argo repo
  2. Do a query to compare contract_deployments.transaction_hash with sourcify_sync_tx_hash.transaction_hash for each contract finding all the contracts that don't have transaction_hash in contract_deployments but they have it in sourcify_sync_tx_hash
  3. For each of these contracts copy creator-tx-hash.txt from argo repo to gcp repo
  4. Update those contracts contract_deployments.transaction_hash with an update query filtering using the query above

@kuzdogan
Copy link
Member

Yes we have to do 1. and 2. regardless. Let's do that. Just not sure if using the prod database is a good idea. Maybe create a user that only has read rights to the main tables but writes just to the new table.

Regardin the rest, I'm not sure if it's a good idea to accept the creatorTxHash even if we can't fetch the Transaction from this hash. That's basically what this PR is doing.

First thing is, we are not checking if this txHash is indeed the txHash that created this contract. Yes we get this from a somewhat reliable resource (Etherscan, Blockscout etc.) but I think we should (not must) check this. How we do this now is by fetching the Transaction object and checking if the 'new contract address algorithm', that is the deployer address + nonce yields the address of the contract being verified.

Second thing is, even if we decide it's ok to write the creatorTxHash without double checking, we still need the values deployer, block_number, and tx_index of the contract. These are available only if we fetch the Transaction via this hash. I don't think it would make sense if we only write the creatorTxHash but not these values.

What we can do maybe is we can fail the whole verification if we can't fetch the Transaction from the expected API like Etherscan, instead of saving this contract as just a "runtime match". The verification will fail and it needs to be submitted again. In that case (or actually regardless) we should monitor the errors in fetching Transaction's and get notified if e.g. we are hitting Etherscan API limits.

@marcocastignoli
Copy link
Member

Regardin the rest, I'm not sure if it's a good idea to accept the creatorTxHash even if we can't fetch the Transaction from this hash. That's basically what this PR is doing.

I agree we should not accept creatorTxHash regardless. From my POV, we should implement a new service that goes through the database and tries to re-verify all the contracts that don't have creation information. But I'm wondering if it makes sense to first implement a more reliable way to fetch the creation bytecode, for example, by starting to index all creation bytecode.

@kuzdogan
Copy link
Member

kuzdogan commented Dec 19, 2024

I'm looking to backfill this data by re-verifying those contracts with missing tx hashes, and consequently without creation_matches.

I'll report progress here and leave some queries for future reference:

Query to select contracts without a `transaction_hash`
SELECT 
    vc.id as verification_id,
    vc.created_at,
    vc.created_by,
    cd.chain_id,
    cd.address,
    cd.transaction_hash,
    cd.block_number,
    cd.transaction_index,
    vc.creation_match,
    vc.runtime_match
FROM verified_contracts vc
JOIN contract_deployments cd ON vc.deployment_id = cd.id
WHERE cd.transaction_hash = '\x'::bytea
   OR cd.transaction_hash IS NULL
ORDER BY cd.chain_id, cd.block_number
limit 100;
Missing `transaction_hash`es per chain
SELECT 
    cd.chain_id,
    COUNT(*) as missing_tx_hash_count
FROM verified_contracts vc
JOIN contract_deployments cd ON vc.deployment_id = cd.id
WHERE cd.transaction_hash = '\x'::bytea 
   OR cd.transaction_hash IS NULL
GROUP BY cd.chain_id
ORDER BY missing_tx_hash_count DESC;
Result of above, missing `transaction_hashes` per chain in production
chain_id missing_tx_hash_count
10 1749473
1 496517
42161 426920
11155111 236064
80001 207305
5 152755
369 131314
4 124177
3 76503
8453 66197
84531 49830
421613 20584
137 20055
97 12200
69 11221
17000 7819
44787 6344
420 5238
421611 4212
1116 1611
2222 1585
42220 1490
77 1149
7700 866
80002 667
11297108099 547
62320 491
288 420
28 414
84532 406
7001 360
9001 299
71401 262
1287 248
250 209
9996 206
34443 203
42261 199
11297108109 198
7672 170
1313161555 167
9000 160
11155420 142
25 142
314 136
33111 136
1115 115
7000 103
1001 102
10200 99
71402 96
122 92
22776 85
23294 76
7668 73
888 68
999 66
42170 60
1291 59
32769 56
200901 56
106 51
56 50
51 50
200810 50
33101 50
37714555429 49
48900 45
41 44
8217 44
48899 39
660279 38
314159 36
62621 33
35443 33
35441 32
295 31
1030 28
420666 28
212 22
23295 21
10242 20
62831 19
2221 19
690 18
841 18
336 16
4157 15
99 15
17069 14
167005 14
100010 13
8 12
534 11
42262 11
100 11
2000 10
44 10
14 9
59141 8
7701 8
167006 7
2358 6
356256156 6
421614 6
10850 5
46 5
7777777 5
59902 5
25925 4
78430 4
1149 4
4200 4
59144 3
43113 3
2442 3
78431 3
12898 3
255 3
534351 3
723107 3
19011 2
5000 2
13381 2
4337 2
43114 2
252 2
2017 2
10243 2
2038 2
83 2
1516 2
570 2
9977 2
100009 2
111000 2
103090 1
534352 1
6321 1
2021 1
7078815900 1
49797 1
96 1
648 1
1101 1
57000 1
54211 1
666666666 1
Query to see the missing `transaction_hash`es as a percentage of total verified per chain
WITH missing_txs AS (
    SELECT 
        cd.chain_id,
        COUNT(*) as missing_tx_count
    FROM verified_contracts vc
    JOIN contract_deployments cd ON vc.deployment_id = cd.id
    WHERE cd.transaction_hash = '\x'::bytea 
       OR cd.transaction_hash IS NULL
    GROUP BY cd.chain_id
),
total_verifications AS (
    SELECT 
        cd.chain_id,
        COUNT(*) as total_count
    FROM verified_contracts vc
    JOIN contract_deployments cd ON vc.deployment_id = cd.id
    GROUP BY cd.chain_id
)
SELECT 
    t.chain_id,
    COALESCE(m.missing_tx_count, 0) as missing_tx_count,
    t.total_count as total_verifications,
    ROUND(COALESCE(m.missing_tx_count::numeric * 100 / t.total_count, 0), 2) as missing_percentage
FROM total_verifications t
LEFT JOIN missing_txs m ON t.chain_id = m.chain_id
ORDER BY missing_tx_count DESC;
Result of above
chain_id missing_tx_count total_verifications missing_percentage
10 1749473 1934321 90.44
1 496517 769986 64.48
42161 426920 512576 83.29
11155111 236064 462156 51.08
80001 207305 207305 100.00
5 152755 152755 100.00
369 131314 131326 99.99
4 124177 124177 100.00
3 76503 76503 100.00
8453 66197 896400 7.38
84531 49830 49830 100.00
421613 20584 20584 100.00
137 20055 118460 16.93
97 12200 13705 89.02
69 11221 11221 100.00
17000 7819 48963 15.97
44787 6344 7969 79.61
420 5238 5238 100.00
421611 4212 4212 100.00
1116 1611 4878 33.03
2222 1585 1585 100.00
42220 1490 4891 30.46
77 1149 1149 100.00
7700 866 866 100.00
80002 667 15722 4.24
11297108099 547 547 100.00
62320 491 491 100.00
288 420 434 96.77
28 414 414 100.00
84532 406 118510 0.34
7001 360 360 100.00
9001 299 299 100.00
71401 262 262 100.00
1287 248 344 72.09
250 209 211 99.05
9996 206 206 100.00
34443 203 287 70.73
42261 199 236 84.32
11297108109 198 198 100.00
7672 170 170 100.00
1313161555 167 167 100.00
9000 160 160 100.00
25 142 195 72.82
11155420 142 22538 0.63
314 136 140 97.14
33111 136 136 100.00
1115 115 1545 7.44
7000 103 103 100.00
1001 102 102 100.00
10200 99 278 35.61
71402 96 96 100.00
122 92 535 17.20
22776 85 85 100.00
23294 76 286 26.57
7668 73 73 100.00
888 68 68 100.00
999 66 66 100.00
42170 60 132 45.45
1291 59 60 98.33
32769 56 56 100.00
200901 56 57 98.25
106 51 121 42.15
51 50 50 100.00
56 50 12485 0.40
33101 50 50 100.00
200810 50 50 100.00
37714555429 49 49 100.00
48900 45 45 100.00
41 44 1090 4.04
8217 44 44 100.00
48899 39 39 100.00
660279 38 38 100.00
314159 36 37 97.30
62621 33 33 100.00
35443 33 33 100.00
35441 32 32 100.00
295 31 31 100.00
1030 28 28 100.00
420666 28 28 100.00
212 22 22 100.00
23295 21 519 4.05
10242 20 35 57.14
62831 19 19 100.00
2221 19 36 52.78
841 18 18 100.00
690 18 18 100.00
336 16 16 100.00
4157 15 15 100.00
99 15 15 100.00
17069 14 14 100.00
167005 14 14 100.00
100010 13 13 100.00
8 12 12 100.00
534 11 11 100.00
42262 11 85 12.94
100 11 10280 0.11
44 10 10 100.00
2000 10 38 26.32
14 9 37 24.32
7701 8 174 4.60
59141 8 808 0.99
167006 7 7 100.00
356256156 6 6 100.00
421614 6 10042 0.06
2358 6 36 16.67
7777777 5 141 3.55
46 5 5 100.00
10850 5 5 100.00
59902 5 5 100.00
1149 4 31 12.90
4200 4 4 100.00
78430 4 4 100.00
25925 4 4 100.00
2442 3 61 4.92
12898 3 3 100.00
43113 3 19963 0.02
59144 3 614 0.49
255 3 16 18.75
78431 3 3 100.00
534351 3 1470 0.20
723107 3 3 100.00
2017 2 2 100.00
43114 2 17282 0.01
19011 2 2 100.00
2038 2 22 9.09
570 2 49 4.08
13381 2 2 100.00
252 2 40 5.00
100009 2 2 100.00
111000 2 2 100.00
10243 2 33 6.06
83 2 972 0.21
9977 2 2 100.00
4337 2 219 0.91
5000 2 231 0.87
1516 2 2 100.00
96 1 1 100.00
2021 1 1 100.00
666666666 1 85 1.18
7078815900 1 1 100.00
534352 1 3517 0.03
6321 1 1 100.00
648 1 29 3.45
49797 1 27 3.70
1101 1 415 0.24
54211 1 1 100.00
57000 1 12 8.33
103090 1 1 100.00
11111 0 73 0.00
10849 0 31 0.00
999999999 0 152 0.00
1891 0 8 0.00
2522 0 19 0.00
39797 0 5 0.00
82 0 842 0.00
61 0 50 0.00
57 0 65 0.00
1339 0 2 0.00
432204 0 23 0.00
5700 0 73 0.00
641230 0 12 0.00
4000 0 12 0.00
710420 0 46 0.00
6119 0 36 0.00
764984 0 1 0.00
1127469 0 5 0.00
1285 0 101 0.00
1088 0 35 0.00
919 0 100 0.00
1313161554 0 1109 0.00
1284 0 149 0.00
2037 0 96 0.00
40 0 1521 0.00
53935 0 58 0.00
30 0 31 0.00
192837465 0 15 0.00
16180 0 7 0.00
222000222 0 433 0.00
333000333 0 62 0.00
13337 0 484 0.00
335 0 56 0.00
73799 0 128 0.00
11235 0 3 0.00
19 0 28 0.00
5003 0 51 0.00
246 0 3 0.00
223 0 4 0.00
Query to see the total number of connections on the DB
SELECT 
    client_ip,
    connection_count,
    percentage
FROM (
    SELECT 
        CASE 
            WHEN client_addr IS NULL THEN 'local/null'
            ELSE client_addr::text 
        END as client_ip,
        COUNT(*) as connection_count,
        ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM pg_stat_activity), 2) as percentage,
        2 as sort_order
    FROM pg_stat_activity 
    GROUP BY client_addr
    UNION ALL
    SELECT 
        'TOTAL' as client_ip,
        COUNT(*) as connection_count,
        100.00 as percentage,
        1 as sort_order
    FROM pg_stat_activity
) t
ORDER BY sort_order, connection_count DESC;

@kuzdogan kuzdogan moved this from Sprint - Up Next to Sprint - In Progress in Sourcify Public Dec 20, 2024
@kuzdogan
Copy link
Member

kuzdogan commented Jan 3, 2025

Done Verifying 99% of the contracts. There are some left but we don't want to dig into each of them for now. We can run another round of re-verification after #1665 and #1367 and see if they'll verify.

Results
chain_id total_count reverified_count not_reverified_count
42161 178825 178825 0
10 97454 97361 93
1 32788 32710 78
11155111 23140 22488 652
137 17312 17311 1
97 5802 5802 0
17000 3813 3805 8
8453 3449 3388 61
1116 1512 1511 1
84532 452 452 0
80002 265 265 0
1287 248 248 0
250 210 210 0
42261 199 199 0
11155420 169 167 2
10200 100 100 0
122 92 92 0
1115 87 86 1
25 85 85 0
23294 76 76 0
1291 59 59 0
42170 55 55 0
56 55 55 0
51 50 50 0
37714555429 49 49 0
41 43 43 0
660279 38 38 0
1030 28 28 0
10242 20 20 0
62831 19 19 0
841 18 18 0
690 18 18 0
23295 17 17 0
17069 14 14 0
42262 11 11 0
7701 8 8 0
2358 6 6 0
59141 6 6 0
59902 5 5 0
10850 5 5 0
4200 4 4 0
7777777 4 4 0
59144 3 3 0
12898 3 0 3
2442 3 3 0
255 3 3 0
570 2 2 0
252 2 2 0
43114 2 2 0
83 2 2 0
421614 1 1 0
534351 1 1 0
57000 1 1 0
54211 1 1 0
1101 1 1 0
648 1 1 0
666666666 1 1 0
7078815900 1 1 0
2038 1 1 0

@kuzdogan kuzdogan closed this as completed Jan 3, 2025
@github-project-automation github-project-automation bot moved this from Sprint - In Progress to Sprint - Done in Sourcify Public Jan 3, 2025
@kuzdogan kuzdogan moved this from Sprint - Done to COMPLETED in Sourcify Public Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants
@marcocastignoli @kuzdogan @manuelwedler and others