Skip to content

pyNBS.network_propagation.network_propagation

Tongqiu Jia edited this page Jun 14, 2018 · 8 revisions

This function performs the random-walk based network propagation of a binary somatic mutation profiles over a given molecular network. We implement the closed form formulation of this propagation model as described by Leiserson et al 2015:

eqn

Ft is the final resulting network propagated somatic mutation profiles. F0 is the binary mutation matrix. Anorm is the normalized adjacency matrix of the molecular network (see normalize_network to see how it is calculated). α is the network propagation constant, which describes how much mutation information should be spread over the network. The default value for α is 0.7 as suggested by Hofree et al, but results may vary if α is changed dramatically, but Hofree et al also suggest that α values between 0.5-0.8 produce relatively robust results. Additional analysis on α can be found in this supplementary notebook.

This function is can be used in two ways:

1. Computing a "kernel" for speeding up multiple propagation operations over the same network:

Due to the multiple subsampling and propagation steps used in pyNBS, we have found that the algorithm can be significantly sped up for large numbers of subsampling and propagation iterations if a gene-by-gene matrix describing the influence of each gene on every other gene in the network by the random-walk propagation operation is pre-computed. We refer to this matrix as the "network propagation kernel". In order to compute this kernel, we essentially propagate the all genes in the molecular network independently of one another. The "kernel" formulation substitutes the F0 with an identity matrix with the same dimensions as the adjacency matrix. The resulting "kernel" matrix is used in the network_kernel_propagation function.

2. Performing a single instance of random-walk network propagation:

This is the usage for the function if performing network propagation as a standalone function. If no network kernel is pre-computed or the user is only interested in performing network propagation for patient profiles only a handful of times, the user simply needs to pass a binary patients-by-genes matrix into the function alongside the network.

This function requires the usage of two helper functions:


Function Call:

network_propagation(network, binary_matrix, alpha=0.7, symmetric_norm=False, verbose=True, **save_args)

Parameters:

  • network (required, Networkx.Graph): Networkx object loaded from network file.
  • binary_matrix (required, pandas.DataFrame): Binary somatic matrix loaded from file. May also be an identity matrix if attempting to construct network propagation kernel.
  • alpha (optional, float, default=0.7): Propagation constant to use in the propagation of mutations over molecular network. Range is 0.0-1.0 exclusive. See above for more details.
  • symmetric_norm (optional, bool, default=False): Parameter for determining whether or not to perform a symmetric degree normalization on the adjacency matrix (see normalize_network for additional details)
  • verbose (optional, bool, default=False): Verbosity flag for reporting on function progress.
  • **save_args (optional, dict, default=None): Dictionary of strings for saving results.
    • save_args['outdir']: A string containing the directory path of which to save the resulting propagated profiles (or network kernel). If this parameter is given within **save_args, the function will automatically write the propagated profiles (or network kernel) as a .csv to this location.
    • save_args['job_name']: A string containing a file prefix for the propagated profiles saved in save_args['outdir']. Otherwise the base file name will default to prop.csv
    • save_args['iteration_label']: A string containing a file indicator for the propagated profiles saved in save_args['outdir'] to keep track of which pyNBS iteration this propagation profile corresponds to. Otherwise all propagated profile matrices may be saved with the same name. Can pass the name 'kernel' as well to this function to indicate that the function is saving a network kernel instead of propagated profiles.

Returns:

  • prop_data_df (pandas.DataFrame): The network-smoothed somatic mutation profiles. The rows will be patients/samples and the columns will be the genes in the network.

Additional notes about this function:
  • We first separate the molecular network into each connected component and then perform network propagation for each connected component and concatenate the resulting propagated matrices along the diagonal for each subgraph.
Clone this wiki locally