Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing first order nodes? #1

Open
xindi-dumbledore opened this issue Aug 28, 2018 · 7 comments
Open

Missing first order nodes? #1

xindi-dumbledore opened this issue Aug 28, 2018 · 7 comments

Comments

@xindi-dumbledore
Copy link

xindi-dumbledore commented Aug 28, 2018

I'm trying to use the python code to generate some higher order network and have some prediction of the trajectory. However, it seems the generated network is missing first order nodes. For example, if the trajectory I put in is
`0 244 1619 2824 2454 1296 3626 4949 9195 9195 15708 950 1481

1 23 207 349 153 20 341 93 418 2092 2092 103 4316 4098 4316 1586 4098 4098 4098 4098 4098 4098 4098 3057 4098 4098 3818 4122 4098 1586 4098 4098 3309 4122 5455 4122 4098 3309`

and if I set some minimum support larger than 1, the network is basically empty.

Of course, the real data has more than just two trajectories. But from my understanding of the algorithm, it should have the first order network no matter what, and rewire edges based on the higher order rules. However, this is not the case based on my observation. I'm wondering whether I'm understanding the algorithm incorrectly? If this should be the case, how to predict the trajectory with some nodes missing?

Thanks a lot :)

@xyjprc
Copy link
Owner

xyjprc commented Aug 28, 2018

Thank you @xindi-dumbledore .

The MinSupport means that anything we observe less than that many of times will be discarded. In your example, other than self-loops (which are ignored by BuildHON), we observe all patterns only once (244->1619, 1619->2824, etc). In other words, if you build a first-order network, all edge weights will be 1. Setting MinSupport = 2 means throwing away all edge weights <= 2. This is expected.

The order of dependency is not controlled by MinSupport. Whenever the algorithm deems a higher-order is necessary, it will try to use higher-orders, unless hitting the MaxOrder cap.

Personally I set MinSupport = 1 unless you have a very large data set and want to speed up the computation. If you want to set MinSupport > 1, what you could try is introducing some repeating patterns, like
0 3 5 3 5 3 4 3 4 3 5 3 5
1 2 3 5 4 3 6 2 3 5 4 3 6

Feel free to let me know of further questions.

@xindi-dumbledore
Copy link
Author

Thanks a lot for the quick response @xyjprc !!!

I have one more question to ask. I saw your guys paper mentioned the trajectory prediction using random walk. Could you elaborate on how it works? Since now there are higher order nodes, given a trajectory, there maybe multiple nodes at each step.... I'm a bit confused on how it works.

Thanks again!

@xyjprc
Copy link
Owner

xyjprc commented Aug 29, 2018

@xindi-dumbledore : The purpose of random walking simulation on HON is not intended to do trajectory prediction (there are other more sophisticated methods on series prediction); rather, the random walking accuracy scores serve as a way to measure information retained in the network structure, and measure how well the random walkers' trajectories can show higher-order patterns by walking on HON rather than FON.

There can be multiple higher-order nodes representing the same physical location (state): if you see A|B.C.D, it still represents node A, but only paths from D->C->B will lead to this higher-order node.

When your random walker's trajectory looks like A -> B|A -> C|B -> D|C.B -> B|D, you are essentially seeing A -> B -> C -> D -> B, just ignore the higher-order part after the "given" symbol (vertical bar).

@xindi-dumbledore
Copy link
Author

@xyjprc : sorry I may have too many questions but I do want to understand how you guys obtain the random walking accuracy scores correctly, the process I have in my mind is:

  1. Split the data into train data and test data
  2. Obtain the HON on train data
  3. This step I am confused: have a random walker walking on the HON and get his trajectories and match with the test data? But the trajectories matched need to have the exact same previous pattern with the given trajectory?

One more thing, I kind of get your guys purpose is not doing trajectory prediction but do you think it kind of make sense to use this as a trajectory prediction method? Or you think it's a bad idea hahahah :D

@xyjprc
Copy link
Owner

xyjprc commented Aug 29, 2018

@xindi-dumbledore Questions are always welcome. You are understanding the process correctly. That is what Figure 2 in the paper illustrates (http://www.jianxu.net/en/files/HON_SA.pdf).

To do a training/testing split, you can change this line to hold out the last X steps for testing (https://github.com/xyjprc/hon/blob/master/pyHON/main.py#L42)

HON is not intended to be a trajectory prediction engine; it is a good way to distill significant higher-order movement patterns, and help roughly simulate movement patterns with the least added complexity to the network representation (that's why it's variable order).

@xyjprc xyjprc closed this as completed Dec 2, 2018
@xyjprc xyjprc reopened this Dec 2, 2018
@ICarvalho
Copy link

@xindi-dumbledore Questions are always welcome. You are understanding the process correctly. That is what Figure 2 in the paper illustrates (http://www.jianxu.net/en/files/HON_SA.pdf).

To do a training/testing split, you can change this line to hold out the last X steps for testing (https://github.com/xyjprc/hon/blob/master/pyHON/main.py#L42)

HON is not intended to be a trajectory prediction engine; it is a good way to distill significant higher-order movement patterns, and help roughly simulate movement patterns with the least added complexity to the network representation (that's why it's variable order).

Hi, Jian Xu

I have done what you mentioned about chaning line 42 of main.py to have the dataset split in trainining+ test, but when I do that, nothing is written in the output file. In addition to this, I cannot get the probabilities using the non-FREQ mode, only the counts of FREQ one. Could you please helpe giving me some clue about that?

Regards,

Igor.

@xyjprc
Copy link
Owner

xyjprc commented Oct 3, 2020

@ICarvalho Hi Igor,

When you increase LastStepsHoldOutForTesting, each series' length will reduce by that number when training. I wonder if you set a high value for that parameter while your each of input series is very short? For example, if each trajectory's length is 5, but you set LastStepsHoldOutForTesting=10, the algorithm essentially trains on empty inputs.

I suggest trying with some trivial inputs, such as 3 trajectories of lengths of 10, setting MaxOrder = 1 then slowly increase, and see if it works. Let me know!

Best,
Jian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants