-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to connect db entries from the table "sites" to a belonging warc-file? #156
Comments
You can set the warc prefix using warcprox-meta as shown here: https://github.com/internetarchive/brozzler/blob/master/job-conf.rst#using-warcprox-meta If you don't, captures from all your jobs and sites will be mixed together in the same warcs. |
thank you for your reply. example of warc file names: on brozzler dashboard the navigation through it and to the captured content goes i understand how the tables jobs, sites and pages are connected - via job_id and site_id.
i need this connection for exporting the belonging informations (in jobs, sites, pages) about the warc-files from the database. Can you tell me how brozzler connect the warc-files to its table entries jobs, sites, pages? part of sites entry:
example of warcinfo record: WARC/1.0 software: warcprox 2.4b6 |
Hi brozzler-team,
I want to export database entries belonging to a specific warc-file, from the tables jobs, sites and pages.
I Know how connect those tables to each other, but i couldn't find a connection to the table captures or directly to the belonging warc-file.
Is it working via the "WARC_Date" in the warcinfo record of the warc-file and "last_claimed" in the table sites?
A hint Would be great. Thx.
The text was updated successfully, but these errors were encountered: