Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new notebook: Information Extraction with Haystack and NuExtract #150

Merged
merged 3 commits into from
Jul 23, 2024

Conversation

anakin87
Copy link
Contributor

What does this PR do?

Add a new notebook: "Information Extraction with Haystack and NuExtract".

In the notebook, we create an Information Extraction pipeline with the Haystack open-source framework and the NuExtract open model.

Then we extract structured data and derive insights from startups' funding announcements.

(@TuanaCelik)

Who can review?

@merveenoyan @stevhliu

Notes on visualization

I think some rendered outputs/visualizations are quite important to make this notebook clear and engaging (e.g., the pipeline diagram, the dataframe, and the graph visualization).

As far as I know, #48 is still valid.
If there is a way to preview what the notebook would look like in the HF site, we can check it out and then add the missing visualization using images in markdown cells.
If a preview is not possible, I would be more than happy to add those images in markdown cells to ensure that the notebook looks nice and understandable.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

review-notebook-app bot commented Jul 18, 2024

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2024-07-18T10:15:03Z
----------------------------------------------------------------

maybe explain what it does with a very small sentence and give a link to that as people might not know about it


anakin87 commented on 2024-07-18T11:36:14Z
----------------------------------------------------------------

done!

Copy link

review-notebook-app bot commented Jul 18, 2024

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2024-07-18T10:15:04Z
----------------------------------------------------------------

renders nicely 😍 wonder if it will be the case for cookbook but I say we keep it


Copy link
Collaborator

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great PR tbh I couldn't find anything to comment on, would like to wait for @stevhliu to comment and we can merge

Copy link
Contributor Author

done!


View entire conversation on ReviewNB

@anakin87
Copy link
Contributor Author

@merveenoyan thx for the quick review!

About visualization: while the cell outputs are displayed correctly in ReviewNB, I suspect that some of them won't be displayed in https://huggingface.co/learn/cookbook

We have two options:

  • once the PR is ok, merge it and if the relevant outputs (pipeline diagram, dataframe, and graph) are not displayed on the website, create another PR, following the approach for images outlined here
  • do it in advance

LMK...

Copy link

review-notebook-app bot commented Jul 22, 2024

View / edit / reply to this conversation on ReviewNB

stevhliu commented on 2024-07-22T15:22:40Z
----------------------------------------------------------------

Can we try not including or changing the format of the output from print(streams) and print(docs)? It appears to be breaking the doc-builder for some reason.


Copy link

review-notebook-app bot commented Jul 22, 2024

View / edit / reply to this conversation on ReviewNB

stevhliu commented on 2024-07-22T15:22:41Z
----------------------------------------------------------------

"...you will probably see a warning saying:..."


anakin87 commented on 2024-07-23T10:30:21Z
----------------------------------------------------------------

fixed

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job and great visualizations! Once the build PR documentation check passes, we can preview whether the visualizations work and then merge 🙂

Copy link
Contributor Author

fixed


View entire conversation on ReviewNB

@anakin87
Copy link
Contributor Author

@stevhliu I made a change to not display ByteStream objects, which I think are problematic.
Can you please try rerunning the documentation build workflow?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu
Copy link
Member

The doc preview is currently down but everything else looks good and ready to merge! If the visualization doesn't render, then let's go with option 2 and open a PR and add the images as outlined here

@stevhliu stevhliu merged commit 1e1f7e3 into huggingface:main Jul 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants