Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create direct bindings to DGGRID #121

Open
danlooo opened this issue Oct 24, 2024 · 6 comments
Open

Create direct bindings to DGGRID #121

danlooo opened this issue Oct 24, 2024 · 6 comments
Assignees

Comments

@danlooo
Copy link
Owner

danlooo commented Oct 24, 2024

Currently, communication between DGGS.jl and DGGRID is done via files and the shell interface of DGGRID7_jll binary. This is ineffective and the bottleneck of this package. DGGS.jl would profit drastically from direct memory access of C++ DGGRID functions. This is known so far:

@danlooo danlooo self-assigned this Oct 24, 2024
@danlooo
Copy link
Owner Author

danlooo commented Oct 28, 2024

This is how to convert coordinates from geographic to Q2DI space using PROJ:

  1. Convert (lon, lat) to ISEA (x,y) using proj with +proj=isea +orient=isea +mode=plane +R=6371007.18091875 +x_0=0 +y_0=0
  2. Apply linear transformations i.e. this R code:
# proj (x,y) to q2di (n,i,j):
# TODO: allow quads other than 1
# TODO: Generalize to DGGRID res_spec resolution other than 3
proj_to_q2di <- function(x,y) {
  # rotate: diamond to parallelogram
  theta <- 45
  i <- x * cos(theta) - y * sin(theta)
  j <- y * cos(theta) - x * sin(theta)
  
  # shift: parallelogram to square
  i <- i - (2 * x)
  
  # move and reflect: align origin and basis
  # values after rotation and shift
  min_i <- 15563031 
  min_j <- 12159298
  max_i <- 25465716
  max_j <- 18269308

  i <- (max_i - (i - min_i) - min_i) 
  j <- (max_j - (j - min_j) - min_j)
  
  # scale to row and col number
  # already starts with 0, devide by step (x_1 - x_o), multiply by number of rows/cols
  # add one to step size so that last step still its into grid after floor
  i <- i / (9902685+1) * 8 
  j <- j / (6110010+1) * 8

  # floor: discretization
  # round would result in sheared grid
  # TODO: Is this equivalent to hexagonal nearest neighbor i.e. works on non-center points within a cell?
  i <- as.integer(i)
  j <- as.integer(j)

  # transpose
  i_old <- i
  i <- j
  j <- i_old

  return(c(i,j))
}

@d-consoli
Copy link

I tried to dig a bit on the library to understand what happens when we use it like you need:

  1. From the main in dggrid.cpp meta file is parsed from SubOpMain.cpp
  2. When we send TRANSFORM_POINTS, during the initialization the operation SubOpTransform will be selected from OpBasic.cpp
  3. Then the main continue to the operation execution, you can see at line 100 that the operation iterates over each location to compute the results
  4. The getNextLoc works ether on the input text file of from GDAL (I don’t really know what does it mean)
  5. We fall in the case “pointInputFileType == "TEXT"” so loc is initialized with inStrToPointLoc

I tried to dig further but I did not get how the core information of this DgLocationData is stored, but you can instantiate it with a constructor that takes a null pointer and then fill it with a DgDataList, and I did not find where the real numbers are stored.

For now what I tired to do to have a bit of control is to fork the repo and mody the SubOpTransofrm.cpp in order to write the output as I would always get the input coordinates “15.1,23” 4 times (and levels still depends on the meta file). At least I succeeded! I mights sound silly, but this means that if I do something similar for the result writing, and we set up CxxWrap.jl, we could:

  1. Get a matrix of floats for coordinates;
  2. Convert row to a string variable while iterating;
  3. Get the result as string variable;
  4. Convert it back to a vector of integers;
  5. Return the matrix of integers.

This would save at least the I/O of writing and reading the two files, leaving only the string/number conversion. But of course I will push to directly use the numbers, maybe adding some ad-hoc constructors to DgLocationData. Note also that I could easily make this threaded per point with OpenMP, it will be super fast!

@danlooo
Copy link
Owner Author

danlooo commented Nov 4, 2024

Thanks a lot @d-consoli ! Seems that we can't bypass parsing string to float in DGGRID. Usually, parsing is much slower than reading files (e.g. high user parsing time vs low system file reading time in CSV files). I expect that the speed up would not be much.
The PROJ approach doesn't have that bottleneck. The linear transformations from ISEA (x,y) space to Q2DI (n,i,j) space is fast. However, my R code above is only tested on cell center points. round/floor operation works for rectangular grids, but I don't know whether it is applicable on hexagonal grids, i.e. points nearby the corner of a hex cell will accurately be assigned to its cell center coordinate.

@d-consoli
Copy link

Update:

  1. From src/apps/dggrid/SubOpTransform.cpp it calls inStrToPointLoc in src/apps/dggrid/SubOpBasicMulti.cpp.
  2. The meaty part is remainder = loc->fromString(buff, op.inOp.inputDelimiter);where thisfromStringis implemented insrc/lib/dglib/lib/DgRF.hpp`.
  3. Here the information is passed to str2add, that is calling the implementation in the header src/lib/dglib/include/dglib/DgContCartRF.h that simply calls fromString of src/lib/dglib/lib/DgDVec2D.cpp, that does the actual work.

I easily implemented a method fromFloat in DgDVec2D.cpp, that is what we need, but I need to deal with some inheritance pipeline to make it work, but I see the light!

@d-consoli
Copy link

Good news, I manged add some methods to bypass all the strings conversion, both in input and in output! At least for the classes involved in our case. You can check this out. Now it's only missing to interface it with CxxWrap.jl.

@danlooo
Copy link
Owner Author

danlooo commented Nov 12, 2024

Great news :) This is a big leap towards reducing the overhead. Is https://github.com/d-consoli/DGGRID backwards compatible with the original https://github.com/sahrk/DGGRID by just extending the API with a function accepting integers? I'd be great to open a PR such that all other bindings e.g. https://github.com/am2222/pydggrid will profit from your work?

@danlooo danlooo closed this as completed Nov 12, 2024
@danlooo danlooo reopened this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants