Based on the JSON documents produced by Perceval and its source code, try to answer the following questions:
-
What is the meaning of the JSON attribute
timestamp
?The Unix timestamp that used to explain the datetime when the item was fetched from the data source.
-
What is the meaning of the JSON attribute
updated_on
?The last time (Unix timestamp) the data was last updated.
-
What is the meaning of the JSON attribute
origin
?The URL of the data source, e.g., Git repository.
-
What is the meaning of the JSON attribute
category
?It includes the category of item to be fetched from the the data source.
-
How many categories do the Git and GitHub backends have?
As is shown in microtask-2:
Git
has only one category: 'commit'.Github
has three categories, 'issue', 'pull_request', 'repository'.
-
What is the meaning of the JSON attribute
uuid
?It is universally unique identifier (UUID), which will be the SHA1 of the concatenation of the values from the list.
-
What is the meaning of the JSON attribute
search_fields
?It adds the values of the fields defined in
SEARCH_FIELDS
class attribute with their corresponding keys.In case of
Github
, It adds the values ofmetadata_id
plus theowner
andrepo
. -
What is stored in the attribute
data
of each JSON document produced by Perceval?The
data
store the information which are extracted from the data source using the Perceval Backends.- In case of
Git Commit
, please refer to microtask-2.ipynb - In case of
Github Issue
, please refer to issue.json - In case of
Github Pull Request
, please refer to pull_request.json
- In case of
-
Identify the code in charge of dealing with remote APIs and explain its logic.
The function _fetch_from_remote() would be used to obtain data from the remote data source. This methord was implemented in the abstract class for HTTP clients --
HttpClient
, so that sub-classes can use it to query data sources taking care of retrying requests in case connection issues.During the initialization of the specific client , a http session would be created by _create_http_session(), which can provide cookie persistence, connection-pooling, and configuration. And then we can use
self.session.get()
orself.session.post()
to send a request to remote API and return theresponse
object. -
Which is the folder that stores the archives generated by Perceval?
We can use
--archive-path
to specify the archive path in some cases. If not, the default archive path is~/.perceval/archives/
. The functioncreate_archive()
would create a brand new archive with a random SHA1 as its name. The first byte of the hashcode will be the name of the subdirectory and the remaining bytes will be the archive name.