-
Notifications
You must be signed in to change notification settings - Fork 0
Creating networks from streamer data
Currently, you can use data collected from the streamer to make post-reply networks or user-user networks (via post-reply ties). Below is a very basic description of how you would create the networks.
The key data fields from streamer are:
- id: ID of the status in the database.
- in_reply_to_id: ID of the status being replied to.
- Note: All of the values in the
in_reply_to_id
can be matched to a corresponding post in theid
field (although there is 1 exception explained below). - The
id
for all posts with non-null value forin_reply_to_id
are "reply posts". Those with a Null value forin_reply_to_id
are original posts. - The
in_reply_to_id
contains theid
to the original post. - Use the
in_reply_to_id
field to create an edgelist by matching the correspondingid
(let's nameid_reply
) to theid
of the original post (let's nameid_orig
). Thus resulting inid_reply
toid_orig
directed edgelist, which connects to the post attributes viaid
.
- account_id: ID of the account that authored this status.
- in_reply_to_account_id: ID of the account that authored the status being replied to.
- Similar to the guide on creating a post-reply network (above), you can create a user-user network by creating account-id edgelists that link the accounts who made a replies to an original post.
- The
in_reply_to_account_id
contains theaccount_id
to the user account of the original post being replied to. - Use the
in_reply_to_account_id
field to create an edgelist by matching correspondedaccount_id
(lets nameaccount_id_reply
) to theaccount_id
of the original post (lets nameaccount_id_orig
). Thus resulting inaccount_id_reply
toaccount_id_orig
directed edgelist, which connect to the account attributes viaaccount_id
.
There are some cases were the value of in_reply_to_id
does not correspond to a original post via the id
field. All of these cases, are self-reply posts where in_reply_to_account_id == account_id
. It is still unclear why the id
for the original post is not included in the data pulled from the streamer, since there are cases where self-reply posts are connected to the original post via the id
. Nevertheless, since this exception only impacts self-loop ties, it seems ok to move ahead with data visualization/analysis. This note will be updated if a better explanation is discovered.