Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Problems #10

Open
ian-adams opened this issue Oct 15, 2018 · 10 comments
Open

Memory Problems #10

ian-adams opened this issue Oct 15, 2018 · 10 comments

Comments

@ian-adams
Copy link

Hello again -

When I am running eda, I get the following error:

J(): 3900 unable to allocate real [29,536870912]
tuples_get_indicators(): - function returned error
tuples(): - function returned error
: - function returned error
r(3900);

Has this come up before? I am using a relatively modern (< 1 year old) Windows laptop with 16 GB RAM, or is this not a memory error?

@ian-adams
Copy link
Author

Little more info:

I've managed to narrow down the issue, it seems to be throwing the error every time (different, but related datasets) after getting done with the "x^2 Probability Plots for [var]" at the end of the variable list. I don't know what comes next, but every time this is the plot that is left showing, and always for the last variable in Stata's variable list.

@wbuchanan
Copy link
Owner

How many categorical and continuous variables are you using? It is likely a problem with the number of permutations of the variables. The bubble plots will consume the most memory, but with a sufficient number of continuous variables there could be a massive number of possible pairs that can be created.

@ian-adams
Copy link
Author

ian-adams commented Oct 15, 2018

Running with the nobubble and noladder options hang at the same error.

The actual dataset is approximately 15 continuous variables and 30 categorical. But if I cut down to a smaller training set of 10 categorical and 3 categorical, and have nobubble, it gets to the correlation heat map, and then gives the following:

(0 observations deleted)
(file C:/Users/adams/Google Drive/STATA Data/Officer Wellness/SLC_eda/1/graphs/edaheatmap.pdf written in PDF format)
matsize must be between 10 and 800
r(198);

I 'query memory' just in case, but matsize is already set to 400. I've noticed it's not creating a batch file at all, not sure if that's related.

@ian-adams
Copy link
Author

Trying it with the sysuse auto.dta - It manages to get through to the heat map, then fails with the same:

(0 observations deleted)
(file C:/Users/adams/Google Drive/STATA Data/Officer Wellness/eda testing/1/graphs/edaheatmap.pdf
written in PDF format)
matsize must be between 10 and 800
r(198);
end of do-file

As the package begins to run, it gives this note:

note: file C:/Users/adams/Google Drive/STATA Data/Officer Wellness/eda testing/1/autos.tex not found

The "autos.tex" file is created and is there in the correct root folder.

So it seems there's still something going on in the create PDF at the end of the script? This is on a fresh install of all .ado files.

@wbuchanan
Copy link
Owner

The first error message seems like something related to the Stata configuration for matrix sizes. I can’t remember off the top of my head, but you should be able to do something like

set matsize 11000

To address the first issue above. Are you able to share the data you are working with and/or simulate a dataset with roughly the same properties?

@ian-adams
Copy link
Author

ian-adams commented Oct 16, 2018

To the first issue, my Stata (IC I think) has a matsize upper limit of 800, but I've moved that to the max.

I can share the data

Even when I'm running the auto sysuse data, I'm getting that *.tex not found issue.

@wbuchanan
Copy link
Owner

The note isn’t an actual problem. It is basically just saying that the replace option under the hood isn’t being used. If you were to run things a second time it shouldn’t display the same message since it would be replacing that file.

@ian-adams
Copy link
Author

Ah, I see.

So any idea why it's not compiling at the end? I have pdflatex on the system path, and I can get smaller training sets to progress through the package to the edaheat portion - but then it stalls out. The matsize issue is Stata specific, and I can see how with a larger set of variables it could be a problem. But it shouldn't be impacting the compiling of the report, right?

@wbuchanan
Copy link
Owner

Not sure to be honest. I’ll see if I can replicate from a Windows machine later today or over the weekend. It is especially weird if that is happening only when you use some datasets and not others.

@wbuchanan
Copy link
Owner

@iadams78
As an FYI, I recently ran into the memory issue that you hit when doing some work for a colleague. On the compilation issue, there are a couple different things that I started to notice.

  1. On Windows machines the command is adding a . character between makeLaTeX and .bat so the file name ends up looking like makeLaTeX..bat.
  2. Stata seems to have some issues resolving networked harddrive locations mapped as a local drive (e.g., \Some-Network-Location\Directory\SubDirectory mapped to R:)
  3. On OSX the bash script is now running in the user's home directory instead of running in the scope of the directory.

I'm going to create some additional issues to look into this a bit more, but figured this might be helpful/useful for explaining why the compilation thing might look like it isn't working (I thought the same and then realized that the files were all being created in my home directory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants