Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare_graph_inputs translation error #10

Open
sanjaynagi opened this issue Dec 13, 2021 · 6 comments
Open

prepare_graph_inputs translation error #10

sanjaynagi opened this issue Dec 13, 2021 · 6 comments

Comments

@sanjaynagi
Copy link

sanjaynagi commented Dec 13, 2021

Dear FEEMS authors,

Thank you for your software. I am having trouble running the notebook on my own data - encountering the following error when using the prepare_graph_inputs function. I should add the test data works smoothly.

# path to discrete global grid
grid_path = "GhanaDDG.shp

# graph input files
outer, edges, grid, _ = prepare_graph_inputs(coord=coords, 
                                             ggrid=grid_path,
                                             translated=True, 
                                             buffer=0,
                                             outer=outer)

AssertionError: grid is empty changing translation

If I turn translation=False, the code runs with no errors, but I later get a CHOLMOD error when fitting the model, in that one my matrices is not positive definite (I don't know if that is related). I'm not clear on what translation=True is actually doing.

links to my own data files -

grid shape file - https://drive.google.com/file/d/1OHcAEPxTPLGRwuuro2vv_AyXJf3PoNB9/view?usp=sharing
outer boundary - https://drive.google.com/file/d/16HMvTlwzT4Ia4AHVhZtNjG0jb431WCey/view?usp=sharing
sample coords - https://drive.google.com/file/d/1XW0baGLR9X-nEBJkWY9VXZozNTZ4MxJB/view?usp=sharing

Your help is much appreciated.
Cheers
Sanjay

@lauterbur
Copy link

Hi @sanjaynagi, did you ever find a solution to this? I'm having the same issue, the grid is empty if translated=True. If I change to translated=False, prepare_graph_inputs runs but I then get an error that the Cholesky factorization is not positive definite:

CholmodNotPositiveDefiniteError: ../Cholesky/t_cholmod_rowfac.c:430: not positive definite (code 1)

I've tried messing around with a few things in the code, for me the initial problem seems to be coming in at create_tile_dict, specifically line 38 of utils.py where it checks for intersections between the outer polygon and the grid. When translate=True, there are no intersections so the grid is empty. When translate=False the grid is not empty, and I'm not familiar enough with the methods of spatial graph fitting to understand what might be causing the non-positive definite matrix.

@sanjaynagi
Copy link
Author

I didn't unfortunately. I was using my own grid (it needed to be denser, as I'm looking at a small region), and I don't know if that why? were you using your own grid or the one that comes with the software?

If i use the original grid, I can get FEEMS to work, but I have to include more samples from a wider region, which is not what I want to do.

@lauterbur
Copy link

No, I've tried both of the original grids. I wonder if it's a grid size issue since it works for you with the original grid. Would you mind sharing your denser grid, unless it's region-specific? I'm having trouble installing the dgreathgrid package used to make new grids as described in #9

@sanjaynagi
Copy link
Author

theres a link in the first post! :)

and the code to generate it is below, you just need the boundary of your area in a geopandas dataframe (gpd_df_outer_boundary). Although I cant fully promise its correct.

def random_points_within(poly, num_points):
    min_x, min_y, max_x, max_y = poly.bounds
    x = (np.linspace(min_x,max_x,num_points))
    y=  (np.linspace(min_y,max_y,num_points))
    xx,yy = np.meshgrid(x,y,sparse=True)
    xx = xx.reshape((np.prod(xx.shape),))
    yy = yy.reshape((np.prod(yy.shape),))
    points = []

    for x in xx:
       for y in yy:
           random_point = Point([x, y])
           if (random_point.within(poly)):
              points.append(list(random_point.coords))
    print(min_x,max_x,max_y,min_y)
    return points,xx,yy

pts, xs, ys = random_points_within(gpd_df_outer_boundary['geometry'].iloc[0], 10)
pts = np.array(pts)
xs = pts[:,0][:,0]
ys = pts[:,0][:,1]

GridGeo = gpd.GeoDataFrame(geometry=gpd.points_from_xy(xs, ys))
GridGeo['geometry'] = GridGeo['geometry'].buffer(0.2)
GridGeo.to_file("GhanaDDG.shp")

@lauterbur
Copy link

Thanks! I've requested access.

I'm making some progress leaving the outer boundary out altogether:
outer, edges, grid, _ = prepare_graph_inputs(coord=coord, ggrid=grid_path, translated=False, buffer=10)
Now I'm getting a convergence error rather than the Cholesky factorization error. This is making me suspect that I may just have too little data to be informative. If you have few samples and/or few SNPs I wonder if your issue was because of the same thing.

@sanjaynagi
Copy link
Author

ooops didnt realise it needed access approval.

Yeah, I agree, I have both a lot of samples and SNPs, but I think the reason I get it is because my small region is too genetically uniform for it to find any signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants