You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the example above, 1-4 are just shifts around the (here assumed real) junction 100-400, 1-5 is lumped into the group only because of no. 4 but would not be otherwise, and no. 6 should not be considered part of the group.
In graph form, this looks like this:
1 ----- 2
| \ / |
| / \ |
3-----4 ------- 5
6
Ideally we want to select 1, 5 and 6, and discard 2-4 as minor variations on 1.
The way I would do it with a Mikado-like algorithm:
1- lump together junctions by their distance. So create a super-group of 1-6. These are our nodes.
2- Define the edges. Two junctions are connected if their acceptors are within 10 bps of each other, and their donors are within 10 bps of each other.
3- Define the subgraphs, using an algorithm to find connected components in a graph. This is an implementation in Python: http://networkx.readthedocs.io/en/networkx-1.11/_modules/networkx/algorithms/components/connected.html#connected_components
4- In this case, we have two subgraphs: (1,2,3,4,5) and (6). Consider them separately (this effectively solves the problem of potentially excluding 6).
5- Find the cliques in each subgraph: there are various algorithms but I am banal and I like the very popular Bron-Kerbosh solution. A possible implementation is here: http://networkx.readthedocs.io/en/networkx-1.11/_modules/networkx/algorithms/clique.html#find_cliques
In this case, the algorithm yields two cliques: (1,2,3,4) and (4,5)
6- Define the scoring. Let us assume it is as follows: 1 > 2==3 > 5 > 4.
7- The first node to consider is 1, as it is the one with the greatest score.
8- Assign all the nodes which are in a clique shared by (1) to be secondary to the (1) group. Remove them from further consideration.
9- This leaves you with disconnected components inside the subgraph. Repeat steps 5-8 until all nodes are exhausted.
10- In our case, step 5-9 yield two "primary" nodes: 1 and 5.
The text was updated successfully, but these errors were encountered:
The problem arises by having multiple junctions which are separated by small distances of acceptors and donors. A possible example would be:
1 Chr1 100 400
2 Chr1 95 403
3 Chr1 100 402
4 Chr1 109 410
5 Chr1 115 420
6 Chr1 100 560
In the example above, 1-4 are just shifts around the (here assumed real) junction 100-400, 1-5 is lumped into the group only because of no. 4 but would not be otherwise, and no. 6 should not be considered part of the group.
In graph form, this looks like this:
1 ----- 2
| \ / |
| / \ |
3-----4 ------- 5
6
Ideally we want to select 1, 5 and 6, and discard 2-4 as minor variations on 1.
The way I would do it with a Mikado-like algorithm:
1- lump together junctions by their distance. So create a super-group of 1-6. These are our nodes.
2- Define the edges. Two junctions are connected if their acceptors are within 10 bps of each other, and their donors are within 10 bps of each other.
3- Define the subgraphs, using an algorithm to find connected components in a graph. This is an implementation in Python:
http://networkx.readthedocs.io/en/networkx-1.11/_modules/networkx/algorithms/components/connected.html#connected_components
4- In this case, we have two subgraphs: (1,2,3,4,5) and (6). Consider them separately (this effectively solves the problem of potentially excluding 6).
5- Find the cliques in each subgraph: there are various algorithms but I am banal and I like the very popular Bron-Kerbosh solution. A possible implementation is here:
http://networkx.readthedocs.io/en/networkx-1.11/_modules/networkx/algorithms/clique.html#find_cliques
In this case, the algorithm yields two cliques: (1,2,3,4) and (4,5)
6- Define the scoring. Let us assume it is as follows: 1 > 2==3 > 5 > 4.
7- The first node to consider is 1, as it is the one with the greatest score.
8- Assign all the nodes which are in a clique shared by (1) to be secondary to the (1) group. Remove them from further consideration.
9- This leaves you with disconnected components inside the subgraph. Repeat steps 5-8 until all nodes are exhausted.
10- In our case, step 5-9 yield two "primary" nodes: 1 and 5.
The text was updated successfully, but these errors were encountered: