This is a small project written in Python 3.x to tie together a few different packages and APIs to accomplish the goal of quickly sending an email when a new Reddit post appears.
At a high level, this project offers scheduled independently-defined Reddit searches for new posts and automated email notifications of those posts.
Some of the technical offerings of the project are:
- Integration with Reddit via the Reddit API by way of the PRAW Python package.
- Used to perform searches on Reddit and return submission objects containing a post's metadata
- Integration with Gmail via Google APIs and the Oauth2 protocol
- Used to perform a first time interactive authentication given an API ID and Secret to grant access to the Google account
- Used to integrate with Gmail to send emails
- Extensible JSON configuration parser to read in and parse a set of hierarchical parameters from an ordered set of configuration files
- Provides a nested object search to quickly search multiple nested JSON levels
- Extensible logging class to provide verbose console and file based logging for the project's classes
- Configurable search schedule to check for new results
- Stateful tracking of previous results sent out to prevent repeated content in emails
- Minimal third party library dependencies
- macOS
- Install pyenv to easily manage different pre- and post-installed Python versions. Make sure it's close to the front of your environment
PATH
so callingwhich python3
directs you to a folder under its shim - Make sure
pip3
is also installed along withpython3
. Pip will help get other packages and dependencies installed - JetBrains PyCharm (if using it as an IDE) should be pointing to this
pyenv
shim directory for its interpreter, so package installations occur in the same place aspip3
via the CLI
- Install pyenv to easily manage different pre- and post-installed Python versions. Make sure it's close to the front of your environment
- CentOS 7
- Use Software Collections (SCL) to install Python 3 alongside the already-installed Python 2. SCL can be installed with
sudo yum install centos-release-scl
and then Python 3 can be installed withsudo yum install rh-python36
. - Finally, enabling the Python 3 installalation via adding it to the
PATH
can be done by launching an SCL shell withscl enable rh-python36 bash
. - I also added
scl enable rh-python36 bash
to the~/.bash_profile
configuration file on the last line, after thePATH
is set - Here is a more comprehensive guide on how to do it
- Use Software Collections (SCL) to install Python 3 alongside the already-installed Python 2. SCL can be installed with
- CentOS 8
- The default repositories installed seem to have Python. I ran
dnf install python38
after trying to run it for generic python and seeingThere are following alternatives for "python": python2, python36, python38
After setup, running which python3
should return a valid executable
This project allows you to specify a personal JSON file and pass it in to the executable. If values aren't defined in the personal JSON file, they will be picked up by the default_base_config.json
file that came with the project. That way, if the value in the default file is what you prefer, you can leave it be. Or you can edit the default JSON file to your heart's content. Take care not to share your API keys and personal information!
- Searches
- It's expected that you will add a number of searches to the JSON file. You can refer to the Reddit search API to learn fancy ways of searching.
- At a basic level, you can combine subreddits to search with a "temporary multireddit" by defining the search subreddits like
"subreddits":"redditdev+learnpython"
. - You can then search for exact matches in the subreddit with backslash-escaped quotes like
"search_params":"looking for an \"EXACT MATCH\""
- Finally, you can add one or more emails for those results to be emailed to. This setting can be added per search so different searches go to different people. To override the default email and send a search result to a different email, add
"email_recipient":"[email protected]"
to the search
- Reddit API Key
- PRAW and Reddit both require an API key in order to use the API.
- Go to https://www.reddit.com/prefs/apps/ while logged in or click on the "Apps" tab at the top of your Reddit preferences page
- Click "create another app" and fill out the information
- Give it a Name and Description so you know what the API key is used for. This isn't used for anything but the Name is required
- Choose "script" for the purpose
- A redirect URI is required. I just set it to my username page (https://www.reddit.com/user/spez), as it isn't used for our purposes.
- Finally, click "create app" and make sure the app is created
- Open your App for editing and note the alphanumeric string under the App name and "personal use script." It should look like
3Y6aB66notreal
(15 characters at the time of writing). Add this to the JSON config with thepraw_client_id
key like"praw_client_id":"3Y6aB66notreal",
- Also in your App you should see a longer string, the Secret. Add it to your Config like
"praw_client_secret":"GsR8notAhpfTmXIdqP2pqrealx8"
- Google API Key
- Use the Google API console to create a new project and set of Oauth Client ID credentials. The code will create a URL from those credentials. Navigate to it in a browser to authenticate. A verification code will be generated for you to paste into the Python executable (it watches the CLI for keyboard input). Then, a new "refresh token" will be generated for you to add to your JSON config
- Go to the Google API Console and make a new project using the dropdown in the upper left. You may have to activate/agree beforehand
- Then on the left side, go to Credentials and use the Create dropdown to make a new Oauth client ID. On the next screen, choose Other and give it a name (doesn't really matter) Click OK
- A Client ID and Client secret should now appear. Copy those into your JSON as
google_api_client_id
andgoogle_api_client_secret
, contained in an object under the "email_settings" key. At this time you should also add theemail_sender
key and value, which is the email address of the Google account. - Next, run
search_runner.py
using the instructions in the Execution section of this README to start the main script. As part of the script, it will check if the Google refresh token already exists.- If the token exists and you get an error like 400 bad request or 401 unauthorized, it's likely the email address + client ID + client secret + refresh token is mistyped somewhere or expired.
- If the token hasn't yet been filled in the JSON (existing as
"google_refresh_token":""
) you will see a URL printed out in the CLI when you try to run it. - ALTERNATIVE: You can skip the JSON entry and just pass the values directly into the
email_tools.py
script! Call it with arguments to pass in the Client ID and Secret likeemail_tools.py -i <your_client_id_here> -s <your_client_secret_here>
- Follow the generated link in a browser on any computer. Note that an "Unverified App" warning will appear since you just made this API/project/credential. Log into your account and authenticate, and get the verification code. Paste it into the CLI where it asks you to
Enter verification code:
and press Enter/Return. If all was configured properly you should now see a "Credentials Valid!" message. - A new alphanumeric string will be printed out before the "Credentials Valid" message. This is your Refresh Token. Copy and paste it into your JSON configuration and try to run the program again. If all goes well, you should not see an error when EmailTools initializes, as it does a check to validate the credentials when starting up (regardless of if an email will be sent)
- This was the inspiration and guide for this style of Gmail integration
- Other Settings
search_interval_minutes
defines the number of minutes to wait before running all the searches on Reddit again. This starts counting from the time the program is started, and is not guaranteed to run on even minutes (:10, :20, etc.). An Integer should be passed in without quotes, like"search_interval_minutes":30
logging.file_log_level
andlogging.console_log_level
define the log level to print outputs at. For most verbose logging, use "DEBUG" and for less logging, "INFO" should be used. For almost no logging, "WARN" should be used.logging.file_log_absolute_path
defines the location and name of the log file. This file is created from the working directory wheresearch_runner.py
is called fromemail_settings.email_subject_text
defines the String used in the subject of every email sentemail_settings.default_email_recipient
defines the email recipient if one isn't specified in the Reddit searches described above. This is the "fallback" email
This project uses argparse to configure and consume command line arguments. The search_runner.py
script is the heart of the project, and spins up everything it needs to run. The script is configured with the following parameters:
--config
or-c
plus a string containing the path to a JSON configuration. The path will be constructed from the current working directory, NOT from the location of the script--skipdedupe
or-s
to ignore the existing and previous search results and send the full set of search results every time the script runs. No additional argument needed Useful for testing search parameters or tweaking other parts of the JSON--onerun
or-o
which avoids scheduling the job to run repeatedly on the interval specified in the JSON. Instead, the script runs one time and then exits
All together, this could mean a script call could look like python3 search_runner.py --skipdedupe -o --config ~/path/to/some/file.json
which would run once, return all results, and use ~/path/to/some/file.json
as the primary config, falling back on the default_base_config.json
in the project directory if a value isn't defined.
However, the script is most useful when it runs on an interval, and you might not want to leave a terminal session up and running for as long as you want the script watching your searches.
To handle the long-term (or run-at-startup) use case, a launch.sh
shell script is provided. It passes through any arguments it was called with, meaning the script does not need to be modified to specify your config and other runtime options. It then runs search_runner.py
with your arguments in a Screen named "search_runner" with a separate process ID (PID) than the session the Bash script is running in. It then exits the script, leaving the Python code running in the background for you to check in on either with the file log or by reattaching the screen session.