Skip to content

Data fields from streamer

Pasan Kamburugamuwa edited this page Sep 19, 2023 · 7 revisions

Data Fields

With mastodonpy here streaming, you can collect data with the fields below (Here is the link to the full documentation here).

  • id: ID of the status in the database.
  • uri: URI of the status used for federation
  • created_at: The date when this status was created
  • account: The account that authored this status.
  • account_id: The account id that authored this status.
  • content: HTML-encoded status content.
  • visibility: Toot visibility ('public', 'unlisted', 'private', or 'direct')
  • sensitive: Is this status marked as sensitive content?
  • spoiler_text: Subject or summary line, below which status content is collapsed until expanded.
  • media_attachments: Media that is attached to this status.
  • mentions: Mentions of users within the status content.
  • tags: Hashtags used within the status content.
  • emojis: Custom emoji to be used when rendering status content.
  • url: A link to the status’s HTML representation.
  • in_reply_to_id: ID of the status being replied to.
  • in_reply_to_account_id: ID of the account that authored the status being replied to.
  • reblog: The status being reblogged.
  • poll: The poll attached to the status.
  • card: Preview card for links included within status content.
  • language: Primary language of this status.
  • edited_at: Timestamp of when the status was last edited.

These fields are currently NOT accessible via mastodonpy package (as of 09/05/2023) or they might require a token via an authorized user.

  • application OPTIONAL: The application used to post this status.
  • favourites_count: How many favourites this status has received.
  • reblogs_count: How many reblogs this status has received.
  • replies_count: How many replies this status has received.
  • favourited OPTIONAL: If the current token has an authorized user: Have you favourited this status?
  • reblogged OPTIONAL: If the current token has an authorized user: Have you boosted this status?
  • muted OPTIONAL: If the current token has an authorized user: Have you muted notifications for this status’s conversation?
  • bookmarked OPTIONAL: The number of replies to this status.
  • pinned OPTIONAL: If the current token has an authorized user: Have you pinned this status? Only appears if the status is pinnable.
  • filtered OPTIONAL: A poll dict if a poll is attached to this status

Data files and structure

Currently, the streamer saves Mastodon data in a .json file. See this link for example JSON of the response.

  • Note that some of the data fields (account, media_attachments, mentions, tags, emojis, card)(along with their subfields) have a variable-nested JSON data structure (i.e., the structure is not consistent across entries). For example, a post could have 0...N tags. Thus if you trying to process into .csv or delimitated text file (for data analysis) you need to create a function that processes the JSON files dynamically by determining the unique set of keys across all entries and using that to create the CSV header (e.g., tags_1, tags_2, ... tags_N).

How to connect to the streamer.

Used Mastodonpy package to connect to the instancehere. For the streaming, we called the StreamListener, onUpdate method to grab the A new status has appeared. status is the parsed status dict describing the status.

  • To connect the instance, install the pip3 install Mastodon.py
  • Import the package from mastodon import Mastodon, StreamListener
  • Here is the reference link to implementation. code