RapidUnfurl is a Python library designed to pull and process metadata very quickly to unfurl URL contents into a JSON object that can the be used by other programs for portraying that data, similar to how link expansion works in apps like Slack.
This library was originally forked from Loftie Ellis' pyunfurl library, which is an awesome project. I just wanted to do some things to speed up the process, and drop away the html rendering, which I didn't need.
- Supports all oEmbed providers from https://oembed.com/ and https://noembed.com/ by default.
- Supports the autodiscovery part of the oEmbed spec.
- Support for Open Graph protocol.
- Support for Twitter Cards
- Falls back to Meta tags and the site favicon/title if all else fails.
Use the package manager pip to install pyunfurl.
pip install rapidunfurl
import rapidunfurl
rapidunfurl.unfurl('https://davintaddeo.com')
This will return a dict similar to the oembed spec:
{
"type": "website",
"url": "https://davintaddeo.com",
"title": "Davin Taddeo | DevOps Advocate",
"site_name": "@tdarwin",
"description": "Homepage of Davin Taddeo, DevOps Advocate, Senior Customer Architect for Chef",
"image": "https://davintaddeo.com/assets/images/round_headshot.png",
"card": "summary",
"favicon": "https://davintaddeo.com/favicon.ico"
}
Pull requests are welcome. RapidUnfurl supports some custom integrations for sites that doesnt return any meta tags, if you want to improve the integration for a specific site you can look at the hackernews example.