-
Notifications
You must be signed in to change notification settings - Fork 22
pyNBS.pyNBS_single.NBS_single
This is the primary function that wraps a single iteration of the core steps of the NBS algorithm. For each call of this function, pyNBS will perform the following steps:
- Subsample binary somatic mutation data (see the
subsample_sm_mat
function for more details). - Propagate binary somatic mutation data over the given molecular network (see the
network_propagation
ornetwork_kernel_propagation
function for more details). - Quantile normalize the network-smoothed data (see the
qnorm
function for more details). - Perform network-regularized non-negative matrix factorization (netNMF) (see the
netNMF
function for more details).
However, this function is written to be intentionally flexible such that there are ways for the user to avoid this function from executing any of the specific steps above.
NBS_single(
sm_mat, regNet_glap, propNet=None, propNet_kernel=None, k=3, verbose=False, **kwargs
)
-
sm_mat (required, pandas.DataFrame): Binary somatic matrix loaded from file. This is a matrix of the binary somatic mutation profiles of the cohort to perform pyNBS on. Rows are patients/samples and the columns are genes that are mutated in those tumors.
-
regNet_glap (required, pandas.DataFrame): Graph laplacian (gene-by-gene) of KNN influence network constructed by the
network_inf_KNN_glap
function. This is the regularization matrix for thenetNMF
step. -
propNet (optional, Networkx.Graph, default=None): NetworkX object loaded from network file. If no network is given, then the subsampled somatic mutation data will not be restricted to any network gene space, nor will the
sm_mat
data be propagated over any network. -
propNet_kernel (optional, pandas.DataFrame, default=None): This is the output (gene-by-gene) of the
network_propagation
function when using it to construct the "network propagation kernel" described in the documentation. If nopropNet_kernel
is given, this function will propagate thesm_mat
data using the nativenetwork_propagation
function. However, ifpropNet_kernel
is pre-computed and passed to this function, a significant speed up in the network propagation step can be had, especially if this function is called multiple times. GivingpropNet_kernel
will call thenetwork_kernel_propagation
function. -
k (optional, int, default=3): Number of components to decompose patient mutation data into during the
netNMF
step. This is also the same as the number of clusters of patients to separate data into. -
**kwargs (optional, dict, default=None): Dictionary of parameters to control functions being called by this function. Many of these parameters are the same as those described in the parameters file page and in the documentation of the associated function the kwarg parameter is used in. All keys and values in the dictionary are expected to be strings, but may be cast to other data types, see descriptions for specific kwarg details.
-
kwargs['gene_subsample_p']
: Proportion of columns (mutated genes) insm_mat
to subsample when performingsm_mat_subsample
. Range is (0.0-1.0] and the value must be able to be converted to a Python float. The default value of this parameter is0.8
. Setting this value to1
will simply shuffle thesm_mat
data columns, but not cause any subsampling of the columns. Seesm_mat_subsample
for more details. -
kwargs['iteration_label']
: A string containing a file indicator for the files saved inkwargs['outdir']
to keep track of which pyNBS iteration this propagation profile corresponds to. Otherwise all files will be saved with the same base name (seekwargs['job_name']
). -
kwargs['job_name']
: A string containing a file prefix for the H matrix and propagated profiles (if specified bykwargs['save_prop']
) saved inkwargs['outdir']
. Otherwise the base file name will default toH.csv
(orprop.csv
for propagated profiles) -
kwargs['min_muts']
: Minimum number of mutations required in each subsampledsm_mat
row fromsm_mat_subsample
to be included in final subsampled data matrix. The value cannot be less than 0 and can be as large as the user wants. There is no maximum number of mutations to be set currently, but if the number is too large,sm_mat_subsample
may return an empty result. The value value must be able to be converted to a Python int and the default value of this parameter is10
. Seesm_mat_subsample
for more details. -
kwargs['netNMF_err_delta_tol']
: This is the minimum error tolerance for the L2 norm of difference in matrix reconstructions between iterations of netNMF for convergence. If the reconstruction error of the decomposition is not improving significantly, the function will return the H matrix from the decomposition at that time. The value value must be able to be converted to a Python float and the default value of this parameter is1e-8
. See netNMF for more details. -
kwargs['netNMF_err_tol']
: This is the minimum error tolerance for matrix reconstruction of original data for the netNMF to reach convergence. If the decomposition has reached a sufficiently close estimation of the original data passed into the netNMF function, the function will return the H matrix from that decomposition at that time. The value value must be able to be converted to a Python float and the default value of this parameter is1e-4
. See netNMF for more details. -
kwargs['netNMF_eps']
: Epsilon error value to adjust 0 (or very small) values during multiplicative matrix updates in netNMF. Essentially this is a parameter to define the machine precision for the netNMF step. The value value must be able to be converted to a Python float and the default value of this parameter is1e-15
. Changing this parameter may lead to unexpected behavior, but is accessible for algorithm testing purposes. -
kwargs['netNMF_gamma']
: This is the regularization constant to scale network regularizer (knnGlap
) term in netNMF. The value value must be able to be converted to a Python int and the default value of this parameter is200
. We have found that larger positive integers for this value produce better, and more robust results. We suggest using a value between 100-1000 for this parameter. Setting this value to0
will perform netNMF with no network regularization penalty (similar to a non-network-regularized NMF). See netNMF for more details. -
kwargs['netNMF_maxiter']
: Maximum number of multiplicative updates to perform within the netNMF if the result does not reach convergence by a different method. See netNMF for more details. -
kwargs['netNMF_verbose']
: Verbosity flag for determining whether or not to have the netNMF function report intermediate progress at each iteration. The default value for this is'False'
. Passing'True'
for this parameter will turn on stdout reporting of the netNMF function. -
kwargs['outdir']
: A string containing the directory path of which to save the resulting H matrix of this function. If this parameter is given within **kwargs, the function will automatically write the output H matrix of this function as a .csv to this location. Also defines the save location of propagated profiles to save if specified bykwargs['save_prop']
. -
kwargs['pats_subsample_p']
: Proportion of rows (patients/samples) insm_mat
to subsample when performingsm_mat_subsample
. Range is (0.0-1.0] and the value must be able to be converted to a Python float. The default value of this parameter is0.8
. Setting this value to1
will simply shuffle thesm_mat
data rows, but not cause any subsampling of the rows. Seesm_mat_subsample
for more details. -
kwargs['prop_alpha']
: This parameter is only used if apropNet
is given and nopropNet_kernel
is given to the function. The propagation constant to use in the propagation of mutations over molecular network. Range is 0.0-1.0 exclusive and the value must be able to be converted to a Python float. The default value of this parameter is0.7
. Seenetwork_propagation
for more details. -
kwargs['prop_symmetric_norm']
: This parameter is only used if apropNet
is given and nopropNet_kernel
is given to the function. Parameter for determining whether or not to perform a symmetric degree normalization on the adjacency matrix during propagation. The default value for this is'False'
. Passing'True'
for this parameter will perform symmetric normalization on the molecular network for propagation. This value can also be a a boolean denoting True or False. Seenormalize_network
for additional details. -
kwargs['qnorm_data']
: Parameter for determining whether or not to perform quantile normalization on the network-smoothed data. The default value for this is the string'True'
. Any other value will prevent quantile normalization (the only other exception is the boolean True). See theqnorm
function for more details. -
kwargs['save_prop']
: This parameter is only used if apropNet
is given (so this function will now call a network propagation function. Parameter for determining whether or not to save the propagation result constructed by this function. Passes**kwargs
to the the appropriate network propagation function called and those functions will receive**kwargs
as their**save_args
parameters.**save_args
parameters here are the same askwargs['outdir']
,kwargs['job_name']
, andkwargs['iteration_label']
described here. This value can either be a string or a boolean denoting True or False.
-
- prop_data_df (pandas.DataFrame): The network-smoothed somatic mutation profiles.