Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMinTeD SSH UC Hackathon #6

Closed
reckart opened this issue Apr 8, 2018 · 105 comments
Closed

OpenMinTeD SSH UC Hackathon #6

reckart opened this issue Apr 8, 2018 · 105 comments
Assignees
Labels
Component Participant is providing component(s) UIMA UIMA based component/application

Comments

@reckart
Copy link
Member

reckart commented Apr 8, 2018

I have deployed a component and tried to run it on the platform. The result of the operation is listed as "FAILED", but I have no idea why. How can one get access to the log output?

2018-04-08_14-37-41

Instance: test.openminted.eu

@galanisd
Copy link
Member

galanisd commented Apr 8, 2018

I have deployed a component and tried to run it on the platform.

For an application you can directly run it after you registration. If it is a component this is not possible.

@reckart
Copy link
Member Author

reckart commented Apr 8, 2018

For an application you can directly run it after you registration. If it is a component this is not possible.

I know. I have built a workflow which makes use of the component that I had deployed (cf. : #7)

  • omtdImporter
  • PdfReader
  • OpenNlpSegmenter
  • VariableMentionDisambiguator

@reckart
Copy link
Member Author

reckart commented Apr 8, 2018

FYI @azielinskiACC

@galanisd
Copy link
Member

galanisd commented Apr 8, 2018

OK
I had a look into Galaxy.
VariableMentionDisambiguator is a UIMA component
with the following coordinates

eu.openminted.uc-tdm-socialsciences
ss-variable-detection
1.0.1-SNAPSHOT

It is available on
Maven Central ?
zoidberg public snapshots?
OMTD repo? -> the executor that we have does not look there.

Also the workflow is created in OMTD Workflow Editor instance of Galaxy. Then OMTD Registry
copies it OMTD Workflow Execution instance of Galaxy. Do you know the name of the workflow
so I can check if it is there?

@reckart
Copy link
Member Author

reckart commented Apr 8, 2018

OMTD repo? -> the executor that we have does not look there.

It is in the OMTD SNAPSHOTs repo. The registry seems to be able to resolve artifacts from there. Would it be possible to ensure that the executors and the registry use the same sets of repos to look up components, best also in the same order.

The workflow URL is: https://test.openminted.eu/landingPage/application/c58d1986-690e-40b9-b408-f649443c7d33

@galanisd
Copy link
Member

galanisd commented Apr 8, 2018

It is in the OMTD SNAPSHOTs repo. The registry seems to be able to resolve artifacts from there. Would it be possible to ensure that the executors and the registry use the same sets of repos to look up components, best also in the same order.

Until now it was not required. Added it on my TO-DO list.

The workflow URL is: https://test.openminted.eu/landingPage/application/c58d1986-690e-40b9-b408-f649443c7d33

Downloaded the metadata record from Registry (attached). The workflow name is
[email protected] 13865a76-613b-475a-88bf-4af5357b9263

I downloaded it from Galaxy executor (also attached). It is empty, no steps. Probably this is why
it fails. It seems a Registry issue.

rec.zip

@reckart
Copy link
Member Author

reckart commented Apr 8, 2018

I'll try building a new one.

@galanisd
Copy link
Member

galanisd commented Apr 8, 2018

Ok. Please sent me the landing page as you did with previous one. I will download the metadata record
find the Galaxy workflow and check if it is OK. If it is not we have to inform Antonis.

@reckart
Copy link
Member Author

reckart commented Apr 8, 2018

Ok, I have created a new one. This time, it is not empty when I re-open it in the workflow editor:

https://test.openminted.eu/landingPage/application/89d5e9ea-32fb-45f7-bf00-1fe466e33c4f

2018-04-08_20-02-53

However, it still fails:

2018-04-08_20-06-45

@azielinskiACC @galanisd note that I have pasted a full multi-line XML file into the parameter variableSpecification - not sure if that could cause a problem. Aside from the XML getting a bit sqashed down when pasting it into the input field, it seemed ok in the Galaxy editor.

<?xml version="1.0" encoding="UTF-8"?>
<variables>
   <variable v_id="140" correct="YesNo">
       <v_label>INGLEHART-INDEX </v_label>
       <v_topic>Political attitudes and participation</v_topic>
       <v_question> What are your political priorities? </v_question>
       <v_subquestion> </v_subquestion>
       <v_answer a_id="1">Postmaterialist</v_answer>
       <v_answer a_id="2">Postmaterialist mixed-type</v_answer>
       <v_answer a_id="3">Materialist mixed-type</v_answer>
       <v_answer a_id="4">Materialist</v_answer>
       <v_answer a_id="5">Don't know</v_answer>
       <v_answer a_id="99">No answer</v_answer>
   </variable>
</variables>

The other thing is that the component should try to download a model from the OMTD Maven repo. That means it must have network access to that repo.

	<groupId>eu.openminted.uc-tdm-socialsciences</groupId>
	<artifactId>ss-variable-detection-model-disambiguation-en-ss</artifactId>
	<version>20180406.1</version>

Hm... that said, it might actually try to download the model from the wrong repo (i.e. the DKPro Core repo instead of the OMTD repo...). That is something I need to look into locally.

@reckart
Copy link
Member Author

reckart commented Apr 8, 2018

Opened an issue regarding model-auto-downloads here: openminted/omtd-component-executor#1

@galanisd
Copy link
Member

galanisd commented Apr 8, 2018

Yes now it not empty.
The workflow is this [email protected] 3c6c03b5-9a04-41bb-996a-a2cd536c7ace

I see a the following error in the logs workflow-service which is the module that call Galaxy.

--- [ Thread-625] e.o.w.service.WorkflowServiceImpl : Unable to locate workflow: 0931730980607790%40openminted.eu+3c6c03b5-9a04-41bb-996a-a2cd536c7ace

Maybe it has to do with the name of the workflow. It contains spaces and a "@" which are escaped at some point.
@courado @greenwoodma @antleb

@reckart
Copy link
Member Author

reckart commented Apr 9, 2018

Ok. I have:

Then I tried running the workflow again on the variable test corpus that @azielinskiACC has published on the platform.

Still, I get a failure again.

Any idea what could be the reason now?

@galanisd
Copy link
Member

galanisd commented Apr 9, 2018

I assume that again the workflow-service fails to call the workflow that was created @ Galaxy executor. As I said above probably the reason is the name of the workflow.

@greenwoodma
Copy link
Member

I've just pushed a fix for this that should URL decode the workflow name before looking for it in Galaxy. This should get built and pushed to beta automatically but won't end up on test until someone manually pulls in the latest workflow service code.

@courado
Copy link

courado commented Apr 11, 2018

I have also added the error message supplied from the workflow service under the My Operations page

@reckart
Copy link
Member Author

reckart commented Apr 11, 2018

@courado great! :)

2018-04-11_11-45-27

I just tried running the workflow again, but it fails being unable to locate the named workflow.

Could somebody please push @greenwoodma `s fix to test.openminted.eu?

@greenwoodma
Copy link
Member

@reckart is it not possible to rename the workflow to avoid the bug until the fix is pushed to test?

@reckart
Copy link
Member Author

reckart commented Apr 11, 2018

@greenwoodma how do I do that? The workflow editor only has a "save" button, not a "rename" or "save as" button as far as I remember.

@galanisd
Copy link
Member

I think that the only way to do that is

a. rename the workflow in Galaxy
b. download the metadata record of the app. delete it from the registry
c. upload an updated metadata record with the new workflow name.

@greenwoodma
Copy link
Member

@reckart hmmm I thought the name of the workflow came from the name you gave the app in the registry UI, but maybe not, or maybe you can't change it there either. Certainly the workflow editor just gets passed the name from the platform it doesn't generate it.

@reckart
Copy link
Member Author

reckart commented Apr 11, 2018

Well, the name I have given to the workflow in the registry UI is "Simple Variable Disambiguation Example (English)". [email protected] 3c6c03b5-9a04-41bb-996a-a2cd536c7ace looks like an auto-generated ID over which I probably do not have control. My guess would be that it is a representation of the user-id concatenated with some other ID...

@greenwoodma
Copy link
Member

What's weird is that if all workflow IDs are generated the same way then how have we ever run a workflow as we'd have hit this issue every time? I'm seriously confused by this one.

@reckart
Copy link
Member Author

reckart commented Apr 11, 2018

Apparently one can edit the workflow name in Galaxy by clicking on the pre-generated name, entering a new value and pressing ENTER. I did that (see screenshot).

2018-04-11_12-08-31

However, when I press "save" now, nothing happens. Odd...

Ok, when I go back to "My applications" and re-open the workflow in the editor, I can see that the name I put is still there, so I guess the "save" must have worked.

I wonder what happens if I created a second workflow by the same name...

Anyway, running the now re-named workflow still gives me the same message:

Failed 
Unable to locate named workflow

@courado the "My operations" view has a date, but not a time stamp. It would be great if we could also see the submission and possibly completion times of the execution there.

@galanisd
Copy link
Member

@greenwoodma

Workflow names @ Galaxy are not generated with the same way.

  • Some test Galaxy workflows were named by me (manually). I am calling them programmatically in our tests (via workflow-service).
  • The Galaxy workflows that are created automatically from OMTD Registry and correspond to ready to use OMTD applications (e.g. Chebi app) seem to have valid names.
  • The applications that are created in Galaxy editor and then ingested in OMTD Registry seem to have this problem.
    @courado Please have a look on it.

Also workflow ID is a different thing that workflow name. For each workflow name there is an internal unique workflow ID; the one you retrieve in workflow-service from Galaxy so that you initiate a workflow execution.

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

Btw. I have also registered the Keyword Assignment component now and try to run it on a single document corpus. This comment is mainly for documenting when I started it since this info is not shown in the operations screen. The pipeline is even more minimal than the disambiguation pipeline (no segmenter needed).

2018-04-19_02-20-20

@azielinskiACC

@greenwoodma
Copy link
Member

So I've had a look at this issue of workflows running for ever and I think I've found the problem. I've just pushed a couple of fixes to the workflow service which should appear on beta shortly (not quite sure when they'll get pushed to test).

If you want the details read on.......

Essentially when a workflow runs we watch to see when the final step reaches the ok state (both the step and the underlying job). Unfortunately if an error occurs when running the workflow while this is captured and stored within the workflow service there wasn't an exception associated with the error (no exception was thrown as the error comes from checking the state not an exception). So while the internal object used for tracking progress within the workflow service recorded the failed state there was a problem when it came to communicating this to the registry. The JMS message doesn't contain a flag signifying the state of the workflow what it contains is an error field which should be filled with a message when an error occurs. The code in the workflow service filled this in using the message from the exception which had put the workflow into the failed state. Unfortunately in the case of a workflow failing because galaxy reported a state being in error there was no exception and so no message was returned. As such, while the workflow service knew that the workflow had failed the registry assumed it was still running and just sat there waiting for the next message from the workflow service which would never arrive. The fix involves never putting the internal object into the failed state without an associated exception, which means there is now always an error message (hopefully a useful one) which will be passed back to the registry.

@reckart I'm guessing your workflows are stuck in this situation. If you could send me the unique ID of the workflow (this is the long alphanumeric sequence next to the words "Workflow Canvas" at the top of the editor screen) then I can double check just to be certain. It won't help with working out why they failed, for that I'd need to look at the logs for the workflow service I think. @galanisd can you remind me the IP of the machine running the test instance of the workflow service?

@galanisd
Copy link
Member

@reckart

I created the same workflow with you; the Variable Dis. component is available in the Workflow editor.
I retested. Steps 1,2,3 were Ok...output as expected (checked Galaxy).

The Variable Dis. component fails while trying to download

de.tudarmstadt.ukp.dkpro.core#de.tudarmstadt.ukp.dkpro.core.variable-detection-model-disambiguation-en-default

part of the log attached.
log.zip

Locally in my laptop I do not have the same issue. I am trying to understand why...

@greenwoodma
Copy link
Member

Would appear that the artifact isn't in any of the repos we look in.

@galanisd
Copy link
Member

The 3 last steps of the workflow are DKPro UIMA components.

Hmmm...

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

The model that the VarDis is using should be in the same repo as VarDis itself - however, in according to the logs, it tries to download the "default" variant, not the "ss" variant. I'm trying to check the workflow config again.

@galanisd
Copy link
Member

I am checking the configuration for the repos in my laptop. I deleted the model but the when I run the script it is downloaded...

@galanisd
Copy link
Member

I was using modelLocation not modelVariant.
Corrected. I am retesting right now.

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

@reckart I'm guessing your workflows are stuck in this situation. If you could send me the unique ID of the workflow (this is the long alphanumeric sequence next to the words "Workflow Canvas" at the top of the editor screen) then I can double check just to be certain.

There are at least two ones stuck with jobs:

  • 0931730980607790-9b9b1d64-fe3f-4de7-88ca-bf1f788e60f5
  • 0931730980607790-bcdd2736-498c-48e9-b61d-352a043e7175

Btw. I can still edit the workflow name in the workflow editor.

@galanisd
Copy link
Member

Got results.... :-)
variabledis

Attached...

bc4e4776-cc9c-47d1-bf28-0d9b5ab78c46.zip

I hope that is not an illusion...

result

  • Still I have to check what happens with the repo configuration even though it seems to work right now.

  • I do not understand why your workflow never completed.

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

@galanisd great news!!!

For curiosity: does it open in the Annotation Viewer?

@galanisd
Copy link
Member

Nope ..... I think because the results are written in an "output" and not in an "annotations" folder.
This happens because currently the metadata of the component are not passed to the workflow-service.

I might be wrong.
@greenwoodma @antleb @courado ?

@greenwoodma
Copy link
Member

Yes, there is a redmine issue https://redmine.openminted.eu/issues/767 which I've just bumped.

@azielinskiACC
Copy link

So, finally. That's great.
Was it possible to use a configuration file?
For NER I use the following https://test.openminted.eu/landingPage/application/2d3fc2aa-6f9b-4a5b-bd75-763a39b8b18b
Correct?

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

@azielinskiACC I cannot access the link above. Probably it is a private workflow in your account?

@galanisd
Copy link
Member

I can ...

ner
however this metadata record seems to be for an image that I have created 10 months ago...
(Identifiers OMTD: DemoWF3SSHNER)

Back then there were no docker specs and we have create 5 apps (one of them was NER) in order

  1. to do some demos
  2. to experiment with Mesos/Galaxy and see what is required.

The respective image us not OMTD compliant; i.e. it does not follow the docker spec and it will not be executed in the current environment.

See also ...
#1

Who is working on this image/app?

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

I'd have to look into the NER thing.

@galanisd
Copy link
Member

If required please open a new issue (NER Hackathon)

@azielinskiACC
Copy link

For testing, it would be great to have the proper landing ID for all SS-A applications, since search on the OpenMinted Platform does not give any results.
Unfortunately, there are some 'empty' corpora I created and cannot be deleted and which might cause confusion (A known issue?) So please also let me know which data input files (=landing ID) I should use.

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

@azielinskiACC @galanisd since the "test.openminted.eu" platform is only for testing and may be reset again... does it make sense at all to use fixed IDs for corpora? Maybe better to have people upload own data or build a corpus using the search functionality.

The names of the SSH components on the other hand are rather stable. I'll run a release and then could publish them to the main platform (non-test).

@pennyl67
Copy link
Collaborator

@reckart Please note that @antleb is currently updating the main platform (services), so I wouldn't recommend adding anything there until we get notified. The idea is to use the services for all the testing etc., so it must be updated with all the fixes that the test platform has now.

@pennyl67
Copy link
Collaborator

Sorry, by "testing" I meant the evaluation of the tenders/hackathon

@greenwoodma
Copy link
Member

@pennyl67 is @antleb updating services to the same as test is currently or to the latest version of the code? The plan was to update test daily since the WP7 call last week, but the workflow-service hasn't been updated in the last week so it's still not got all the bug fixes we've made this week (which is quite a few). The problem is that while I think those fixes all work as expected, I'd assumed they were being tested on test as that was being updated. Now I find it hasn't been, so it may be that we get an up to date services which is buggier than test.

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

@pennyl67 @azielinskiACC @galanisd @antleb WP9 also has to wrap up the "tutorial" material. My comment was related to what we can expect to later be able to find on the main platform and what not. Things we can find on the main platform can be in the tutorial, others maybe not. So IMHO it would make more sense to include the building of the corpus/uploading of documents into the tutorial, also the building of a workflow, but not the uploading of the components - we should be able to expect that the components we test now will exist on the main platform. Makes sense?

@pennyl67
Copy link
Collaborator

@greenwoodma I'm not sure - I didn't know this detail; trying to find out and i'll let you know
@reckart I've asked @antleb to not delete from the main platform anything before I check; the problem is that we need to clean up various test resources and empty corpora. And I will send out an email to all to check which resources they want us to keep.
And yes, I understand what you're saying for the tutorials (everything happening at the same time!) - but how can you build a workflow without the components in the same platform? I can understand for the built/updated corpora.

@reckart
Copy link
Member Author

reckart commented Apr 19, 2018

We publish the SSH UC components to the test platform now for the preparation of the tutorial material. The same components will later be published to the main platform - once the main platform is ready.

@reckart
Copy link
Member Author

reckart commented Apr 20, 2018

My understanding is that for the VarDis and Keyword components, we are good now.

@azielinskiACC - do you agree? If yes, I would suggest to close this issue.

@azielinskiACC
Copy link

azielinskiACC commented Apr 20, 2018 via email

@reckart reckart closed this as completed Apr 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component Participant is providing component(s) UIMA UIMA based component/application
Projects
None yet
Development

No branches or pull requests

8 participants