-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repressive TF prediction: "AnanseScanpy_outs/maelstrom/final.out.txt" not found #15
Comments
Dear @bueredlemon11; Xinqi, I went through your log file and it seems that maelstrom did not run. I saw this:
Instead of what it should be:
You should have maelstrom as a job. Then a folder will be generated in your output dir. If this is an issue with the HPC instead of your local pc I do not know. I would recommend running both in parallel to rule this possibility out. Please rerun anansnake and check if mealstrom appears in the job list in the snakemake log file before proceeding. Hope to have helped! Best, Julian |
Hello @Arts-of-coding Julian, Thanks for your suggestion! I tried to run on my local PC once and also on HPC several times, but mealstrom job still didn't appear in the job list. As you can find in this log from my PC: 2024-11-20T111420.924211.snakemake.log Similar for other runs on HPC: 2024-11-14T235745.843525.snakemake.log Are there other possible reasons for mealstrom job not running? Or is there any way to specifically run mealstrom somewhere else? In the manuscript we're currently preparing, we plan to use these repressive TFs predicted by scANANSE for further modeling but we got stuck in here... Best, |
Dear @bueredlemon11 Xinqi, Option 1It could be that there is a disconnect in installed package versions. First of all, please check if there are any differences with comparing the software environment to the one we used below. If there are, try to install the versions specified below.
Option 2Second, it could be that the folder structure is not correctly picked up by anansnake. For replication, you could put your data in the "example" directory of anansnake. This should be the anansnake folder: This should be inside the "example" directory: Something similar to this should be in the outdir: This should be in the deseq2 folder, before starting the run (keep in mind the timestamps): Lastly, you should have a similar structure to this in the config.yaml: Option 3Third, you can try to run it multiple times in a row.
I hope that any of the options will make mealstrom appear in the job list. If not, please let me know. Kind regards, Julian |
Hi @Arts-of-coding Julian, Thanks for your options! I tried option1 and 3, now the maelstrom job runs however ananse network gives me error messages. First I managed to update the package versions as you indicated in the reply, then in the log I can see the maelstrom job running: As I opened log_average.txt I tried to run other datasets but they all gave same error messages from ananse network. Have you perhaps seen this before? PS: the only difference in our package version is "setuptools", in my environment it's 59.8.0, whereas in your case it's 65.5.1. I tried to upgrade the version but had the following incompatible error. Could this be a cause? Best regards, |
Dear @bueredlemon11 Xinqi, Good to see that maelstrom now runs for you. From a practical point of view I would suggest for you to have two different anansnake environments, one where you run Maelstrom and save the results external (latest environment) and another environment, where network runs correctly and save those results somewhere else. These network related error messages did not pop up in the previous environment right? Alternatively, it might be that the binding file is somehow corrupted uring the run (e.g. average.h5) and you can remove this file just before you begin re-running it. However, if you also experienced these errors in the previous runs as well (with the previous anansnake environment), it might be your underlying data. You really need to check this statement below as suggested in the logging:
To rule out that "average.h5" is the issue, you can of course only run direct contrasts (e.g.: "day2/3_day3/4") and see if that runs correctly (see option 2 config.yaml from my previous message). If the average.h5 network is indeed the issue, you can also think about implementing a biologically relevant comparison (instead of the average network). This might be a specific type of (naïve) stem cells (e.g. iPSCs) or using the first day of the time series as your baseline to compare everything else to. If you choose to do this you need to define the comparison for all your conditions in AnanseScanpy, in a similar fashion to this:
It does also worry me that you see this happening in the other log file:
If you have two different anansnake environments, you can directly compare if this happens in both of them. I hope that this helps you advance further. Kind regards, Julian |
Hi @Arts-of-coding Julian, Thanks for your suggestion. I tried two environments: 1) anansnake, from the scANANSE tutorial; 2) anansnake_ja, from the .yml you shared in previous reply. In the first environment "anansnake", the job ran successfully without Maelstrom job, as you can see in the log below: The binding and network jobs were OK: In the second environment "anansnake_ja", However, under the folder "AnanseScanpy_outs" I didn't find "Maelstrom" folder I only found under "gimme" folder, But in "final.out.txt", When attempting to use "hg38-maelstrom" folder for the python code: It seems like Maelstrom job indeed ran but was not complete. Would it appear in later jobs if the run were successful? Have you seen this before as well? Thanks in advance for your help. Best regards, |
Dear @bueredlemon11 Xinqi, The final.out.txt indeed seems corrupted for your data, I have not seen this before. First you can try to remove the final out file and then use option 3 as earlier specified (see below), by just rerunning everything (mealstrom should rerun). Alternatively, you can see if replicating your pipeline with the sample data provided, shows similar output. If this is the case, then it is the underlying maelstrom job in your environment. If the sample data works (and shows a good final.out.txt) and yours does not, then it must be something in the data that you supplied. You then need to check the sample data (dataframes and .tsv files) and compare this to your data and troubleshoot from that until it is fixed. Within Python you can check datatypes of pseudobulk tables in e.g. Pandas and convert them if needed with ".astype(...)" Kind regards, Julian
|
Hello,
I have tried to reproduce the results of scANANSE manuscript using exemplary data. However, I got stuck on the final step that should predict the repressive TFs. I ran the pipeline in python following the code indicated in “AnanseScanpy_equivalent.pdf”.
On this line:
asc.import_scanpy_maelstrom(anndata=adata, cluster_id="predicted.id", maelstrom_dir="AnanseScanpy_outs/maelstrom/")
I received an error message
FileNotFoundError: [Errno 2] No such file or directory: 'AnanseScanpy_outs/maelstrom/final.out.txt
When I checked the generated files, the maelstrom folder and pfmscorefile.tsv file weren’t generated in previous steps of AnanseScanpy and anansnake. I understand that this line imports the motif enrichment results into the Scanpy object, but in my case this result is stored in another folder /AnanseScanpy_outs/gimme/pfmscorefile.tsv, which looks like
I therefore manually created a maelstrom folder and put the file pfmscorefile.tsv in there. I then changed the function import_scanpy_maelstrom() on line 187 in “anansescanpy_import.py” script as follows:
But I received another error from another function per_cluster_df()
ValueError: Length mismatch: Expected axis has 0 elements, new values have 131368 elements
Thus, I wanted to ask: what is this “AnanseScanpy_outs/maelstrom/final.out.txt” file? Should I exclude first 4 rows of pfmscorefile.tsv and save it as final.out.txt? The anansnake step didn't give me error messages.
Please find attached log file:
2024-09-23T143049.814049.snakemake.log
I ran scANANSE on a high-performance computing cluster provided within university, instead of my local PC. Could that be a reason why?
Your help would be greatly appreciated!
Best,
Xinqi
The text was updated successfully, but these errors were encountered: