Skip to content

Creating networks from streamer data

Bryan Stephens edited this page Sep 11, 2023 · 1 revision

Notes for creating networks from streamer data

Currently, you can use data collected from the streamer to make post-reply networks or user-user networks (via post-reply ties). Below is a very basic description of how you would create the networks.

Post-Reply Networks

The key data fields from streamer are:

  • id: ID of the status in the database.
  • in_reply_to_id: ID of the status being replied to.
  • Note: All of the values in the in_reply_to_id can be matched to a corresponding post in the id field (although there is 1 exception explained below).
  • The id for all posts with non-null value for in_reply_to_id are "reply posts". Those with a Null value for in_reply_to_id are original posts.
  • The in_reply_to_id contains the id to the original post.
  • Use the in_reply_to_id field to create an edgelist by matching the corresponding id (let's name id_reply) to the id of the original post (let's name id_orig). Thus resulting in id_reply to id_orig directed edgelist, which connects to the post attributes via id.

User-User Networks (via post-reply links)

  • account_id: ID of the account that authored this status.
  • in_reply_to_account_id: ID of the account that authored the status being replied to.
  • Similar to the guide on creating a post-reply network (above), you can create a user-user network by creating account-id edgelists that link the accounts who made a replies to an original post.
  • The in_reply_to_account_id contains the account_id to the user account of the original post being replied to.
  • Use the in_reply_to_account_id field to create an edgelist by matching corresponded account_id (lets name account_id_reply) to the account_id of the original post (lets name account_id_orig). Thus resulting in account_id_reply to account_id_orig directed edgelist, which connect to the account attributes via account_id.

Exception note:

There are some cases were the value of in_reply_to_id does not correspond to a original post via the id field. All of these cases, are self-reply posts where in_reply_to_account_id == account_id. It is still unclear why the id for the original post is not included in the data pulled from the streamer, since there are cases where self-reply posts are connected to the original post via the id. Nevertheless, since this exception only impacts self-loop ties, it seems ok to move ahead with data visualization/analysis. This note will be updated if a better explanation is discovered.