Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rss.from_as1: remove possible addition of invalid author element #847

Merged
merged 6 commits into from
Dec 22, 2024

Conversation

isnbh0
Copy link
Contributor

@isnbh0 isnbh0 commented Dec 18, 2024

I appreciate this project, thanks for your work on it!

Here's a change that I think will fix a few broken cases such as this one that directly publish RSS converted from JSON feeds via granary.io.

In brief, we adhere closer to the RSS spec by removing the use of the default author email value '-' when constructing an RSS item.

(relevant upstream project code: https://github.com/lkiesow/python-feedgen/blob/v1.0.0/feedgen/entry.py#L366-L372)

illustrative example

https://granary.io/url?input=jsonfeed&output=rss&url=https://jamesg.blog/hf-papers.json

The above link produces an RSS file that at the time of writing is technically invalid when checked. This invalidity appears to break some RSS readers.

url.rss excerpt
<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
  <channel>
    <title>Feed for https://jamesg.blog/hf-papers.json</title>
    <link>https://jamesg.blog/</link>
    <description>-</description>
    <atom:link href="https://granary.io/url?input=jsonfeed&amp;output=rss&amp;url=https://jamesg.blog/hf-papers.json" rel="self"/>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <generator>granary</generator>
    <language>en</language>
    <lastBuildDate>Wed, 18 Dec 2024 05:46:10 +0000</lastBuildDate>
    <item>
      <title>Are Your LLMs Capable of Stable Reasoning?</title>
      <link>https://huggingface.co/papers/2412.13147</link>
      <description><![CDATA[The rapid advancement of Large Language Models (LLMs) has demonstrated remarkable progress in complex reasoning tasks. However, a significant discrepancy persists between benchmark performances and real-world applications. We identify this gap as primarily stemming from current evaluation protocols and metrics, which inadequately capture the full spectrum of LLM capabilities, particularly in complex reasoning tasks where both accuracy and consistency are crucial. This work makes two key contributions. First, we introduce G-Pass@k, a novel evaluation metric that provides a continuous assessment of model performance across multiple sampling attempts, quantifying both the model's peak performance potential and its stability. Second, we present LiveMathBench, a dynamic benchmark comprising challenging, contemporary mathematical problems designed to minimize data leakage risks during evaluation. Through extensive experiments using G-Pass@k on state-of-the-art LLMs with LiveMathBench, we provide comprehensive insights into both their maximum capabilities and operational consistency. Our findings reveal substantial room for improvement in LLMs' "realistic" reasoning capabilities, highlighting the need for more robust evaluation methods. The benchmark and detailed results are available at: https://github.com/open-compass/GPassK.]]></description>


      <author>-</author> <!-- INVALID -->


      <guid isPermaLink="true">https://huggingface.co/papers/2412.13147</guid>
    </item>

<!-- etc... -->

@isnbh0 isnbh0 force-pushed the rss-remove-invalid-author-email branch from a841d77 to 61cac45 Compare December 18, 2024 06:00
@isnbh0
Copy link
Contributor Author

isnbh0 commented Dec 18, 2024

It seems I've gotten a bit lost in fixing up the tests -- I'll get back to this later, sorry :)

@isnbh0 isnbh0 marked this pull request as draft December 18, 2024 07:20
granary/rss.py Outdated Show resolved Hide resolved
@snarfed
Copy link
Owner

snarfed commented Dec 18, 2024

This looks great! Thank you for the contribution! I think you're almost there, don't let the tests get you down.

@isnbh0 isnbh0 marked this pull request as ready for review December 21, 2024 13:15
granary/tests/test_rss.py Outdated Show resolved Hide resolved
@snarfed
Copy link
Owner

snarfed commented Dec 21, 2024

Tests are passing! Awesome! One small comment left, and please also add an entry to the changelog in the readme:

granary/README.md

Lines 339 to 340 in 55b5610

* `rss`:
* Support image enclosures, both directions.

...then I think we're ready to merge!

@isnbh0 isnbh0 force-pushed the rss-remove-invalid-author-email branch from 3975a03 to 79a8f54 Compare December 21, 2024 23:50
@snarfed
Copy link
Owner

snarfed commented Dec 22, 2024

Love it. Merging, thank you again!

cc @capjamesg, let me know if you install granary from GitHub or if you need a new release on PyPi for this.

@snarfed snarfed merged commit c940712 into snarfed:main Dec 22, 2024
2 checks passed
snarfed added a commit to snarfed/bridgy-fed that referenced this pull request Dec 23, 2024
@capjamesg
Copy link

I have just installed Granary from source for the hosted version of the Papers with Code RSS feed. Thank you everyone for the fix!

@capjamesg
Copy link

Oh wait, this pertains to the HF Papers project. I see. @snarfed The project depends on the hosted version of granary at granary.io:

curl -I https://jamesg.blog/hf-papers.xml
HTTP/2 301
server: nginx
date: Mon, 23 Dec 2024 23:24:50 GMT
content-type: text/html
content-length: 162
location: https://granary.io/url?input=jsonfeed&output=rss&url=https://jamesg.blog/hf-papers.json

The hosted version would thus need to be updated to use the latest GitHub version.

@snarfed
Copy link
Owner

snarfed commented Dec 24, 2024

Deployed to granary.io!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants