Feasibility study: ditch farm and use pytorch #178

Shreyanand · 2022-07-14T15:48:30Z

Estimate the benefit, risk, and labor cost involved in removing the farm architecture and using simple PyTorch. We want to see if we do that, then does our workflow that involves quay image and Kubeflow pipelines become easy to understand, simplified, and less error-prone.

MichaelTiemannOSC · 2022-07-21T07:44:42Z

Noting that FARM project has transitioned to HAYSTACK: https://github.com/deepset-ai/haystack/ and that some OS-Climate members are also HAYSTACK sponsors/supporters.

Shreyanand · 2022-07-22T18:55:25Z

There are two alternatives that seems plausible here:

Using a solution like mentioned here

Benefits:
- retriever and reader are in the same pipeline so simpler architecture
- latest models using the HAYSTACK library
- may solve the multiprocessing issue with kpi-train
Risk and Cost:
- may introduce new errors while running in the container env.
- 1-2 week developer time to convert data and results into the right format
Using huggingface QA (transformer package)

Benefits:
- the root library used as a standard for transformers so it gives a lot of flexibility
- may solve the multiprocessing issue with kpi-train
Risk and Cost:
- may introduce new errors while running in the container env.
- 1-2 week developer time to convert data and results into the right format

@erikerlandson

MichaelTiemannOSC · 2022-07-22T19:15:11Z

Let's not forget that we have a technical solution (disable multiprocessing) and a view to a technical solution (allocating appropriate amounts of SHM) to the train kpi-extraction problem. If there are OTHER problems with farm we want to solve, let's talk about those, but we should be able to dispose of the kpi-extaction problem very simply.

Shreyanand · 2022-07-26T15:43:09Z

The disable multiprocessing solution makes the training almost impractical, in one experiment it took around 1.5 days to train 80% only to run into kernel error. This was for training with 145 files. When we get more annotated files, it would be even more difficult. Allocating appropriate amounts of SHM seems like longer term solution to get working.

One other obvious problem with FARM is, like @MichaelTiemannOSC pointed out, that it is no longer actively maintained. It has transitioned to farm-haystack and any further development will happen in that pypi package. In my above comment, I list two alternatives, one is using farm-haystack that is actively maintained or using transformers package which is the root package used by farm-haystack to build upon.

erikerlandson · 2022-07-26T17:23:25Z

We should absolutely get off of the unmaintained package. We could use farm-haystack but it's an added layer of dependency and I'm trying to sort out the cost/benefits over just using transformers package.

cc @JeremyGohBNP @ChristianMeyndt @idemir-ids @andraNew

Shreyanand added the nlp-internal Indicates that the issue exists to improve the internal NLP model and it's code label Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feasibility study: ditch farm and use pytorch #178

Feasibility study: ditch farm and use pytorch #178

Shreyanand commented Jul 14, 2022

MichaelTiemannOSC commented Jul 21, 2022

Shreyanand commented Jul 22, 2022 •

edited

Loading

MichaelTiemannOSC commented Jul 22, 2022

Shreyanand commented Jul 26, 2022

erikerlandson commented Jul 26, 2022

Feasibility study: ditch farm and use pytorch #178

Feasibility study: ditch farm and use pytorch #178

Comments

Shreyanand commented Jul 14, 2022

MichaelTiemannOSC commented Jul 21, 2022

Shreyanand commented Jul 22, 2022 • edited Loading

MichaelTiemannOSC commented Jul 22, 2022

Shreyanand commented Jul 26, 2022

erikerlandson commented Jul 26, 2022

Shreyanand commented Jul 22, 2022 •

edited

Loading