-
Notifications
You must be signed in to change notification settings - Fork 22
pyNBS Parameters File
Tongqiu (Iris) Jia edited this page Jan 29, 2018
·
8 revisions
The parameter configuration file is a 2-column comma-separated text file where the first column is the parameter name, and the second column is the parameter value. The delimiter for this file must be a comma.
- The usage of this parameter file will be for the command line script execution of pyNBS.
- This file will be read in by the
load_params
function. - If no parameter file path is given, default parameters will be set instead (see documentation for details and default values).
- Blank lines and lines starting with "#" will be ignored.
- The parameter file may include as many or as few of the parameters from the pyNBS overall parameter space (see all possible parameters below). For examples of two parameter files see:
./OV_run_pyNBS_Hofree_params.csv
VS./run_pyNBS_default_params.csv
An excerpt of the the full default parameters file is given below:
################################
# Overall pyNBS Parameters #
################################
verbose,True
outdir,./Results/
###############################
# Data Loading Parameters #
###############################
net_filedelim," "
mut_filetype,matrix
mut_filedelim,","
degree_preserved_shuffle,False
node_label_shuffle,False
All parameters that can be edited by this file are described below. For additional details of each parameter, please see the linked function.
- verbose (bool, default=True): Verbosity flag for reporting on function progress.
- job_name (str, default = ‘pyNBS’): Filename prefix used to tag a particular run of pyNBS.
- outdir (str,default = ‘./Results/’): Path to output directory. pyNBS will attempt to create the directory at the file path if it does not already exist. Default output folder will be current working directory unless otherwise defined by params_file.
-
net_filedelim (str, default='\t'): Delimiter used in network file between columns. This parameter is the
delimiter
parameter inload_network_file
. -
mut_filetype (str, default= 'matrix'): File structure of binary mutation data. There are two options: matrix or list. This parameter is the
filetype
parameter inload_binary_mutation_data
. -
mut_filedelim (str, default= ','): Delimiter used in binary mutation file. This parameter is the
delimiter
parameter inload_binary_mutation_data
. -
degree_preserved_shuffle (bool,default=False): Determination of whether or not to shuffle the network edges (while preserving node degree) when loading network. This parameter is the
degree_shuffle
parameter inload_network_file
. -
node_label_shuffle (bool, default=False): Determination of whether or not to shuffle the network node labels (while preserving network topology) when loading network. This parameter is the
label_shuffle
parameter inload_network_file
.
-
reg_net_gamma (float, default=0.01): Constant value to add to the diagonal of molecular network graph laplacian to calculate influence matrix for regularization network construction. This parameter is the
gamma
parameter innetwork_inf_KNN_glap
. -
k_nearest_neighbors (int, default=11): Number of nearest neighbors to add to the regularization network during construction. This parameter is the
kn
parameter innetwork_inf_KNN_glap
. -
save_knn_glap (bool,default=False): Parameter to determine whether or not to save regularization network graph laplacian. This parameter is used in command line script
run_pyNBS
.
-
pats_subsample_p (float, default=0.8): Proportion of rows (patients/samples) in
sm_mat
to subsample when performing subsampling. Range is (0.0-1.0] and the value must be able to be converted to a Python float. Setting this value to1
will simply shuffle thesm_mat
data rows, but not cause any subsampling of the rows. This parameter is used in functionsubsample_sm_mat
. -
gene_subsample_p (float, default=0.8): Proportion of columns (mutated genes) in
sm_mat
to subsample when performing subsampling. Range is (0.0-1.0] and the value must be able to be converted to a Python float. Setting this value to1
will simply shuffle thesm_mat
data columns, but not cause any subsampling of the columns. This parameter is used in functionsubsample_sm_mat
. -
min_muts (int, default=10): Minimum number of mutation counts for filtering. This parameter is used in function
subsample_sm_mat
.
-
prop_alpha (float, default=0.7): Propagation constant to use in the propagation of mutations over molecular network. Range is 0.0-1.0 exclusive. This parameter is the parameter
alpha
in functionnetwork_propagation
andnetwork_kernel_propagation
. -
prop_symmetric_norm (bool, default=False): Parameter for determining whether or not to perform a symmetric degree normalization on the adjacency matrix (see
normalize_network
for additional details). This parameter is the parametersymmetric_norm
in functionnetwork_propagation
andnormalize_network
. -
save_kernel (bool, default=False): Parameter for determining whether or not to save network propagation kernel. This parameter is used in command line script
run_pyNBS
. -
save_prop (bool, default=False): Parameter for determining whether or not to save propagated, sub-sampled data at each intermediate step. This parameter is used in command line script
run_pyNBS
. -
qnorm_data (bool, default=True): Parameter for determining whether or not to perform quantile normalization on the network-smoothed data. The default value for this is
'True'
. Any other value will prevent quantile normalization. See theqnorm
function for more details. This parameter is used in the**kwargs
dictionary ofNBS_single
function.
-
netNMF_k (int, default=3): Number of components to decompose patient mutation data into during the netNMF. This is also the same as the number of clusters of patients to separate data into. This parameter is used as parameter
k
inmixed_netNMF
andNBS_single
function. -
netNMF_gamma (int, default=200): This is the regularization constant to scale network regularizer term in
netNMF
. The value value must be able to be converted to a Python int and the default value of this parameter is200
. We have found that larger positive integers for this value produce better, and more robust results. We suggest using a value between 100-1000 for this parameter. Setting this value to0
will performnetNMF
with no network regularization penalty (similar to a non-network-regularized NMF). This parameter is the parameterl
inmixed_netNMF
function. -
netNMF_maxiter (int, default=250): Maximum number of update steps to perform during this function if the result does not reach convergence by a different method. This parameter is the parameter
maxiter
inmixed_netNMF
function. -
netNMF_eps (float, default=1e-15): Epsilon error value to adjust 0 (or very small) values during multiplicative matrix updates in netNMF. Essentially this is a parameter to define the machine precision for the netNMF step. This parameter is the parameter
eps
inmixed_netNMF
function. -
netNMF_err_tol (float, default=1e-4): This is the minimum error tolerance for matrix reconstruction of original data for this function to reach convergence. If the decomposition has reached a sufficiently close estimation of data, the function will return the H factor matrix from that decomposition at that time. This parameter is the parameter
err_tol
inmixed_netNMF
function. -
netNMF_err_delta_tol (float, default=1e-8): This is the minimum error tolerance for the L2 norm of difference in matrix reconstructions between iterations of netNMF for convergence. If the reconstruction error of the decomposition is not improving significantly, the function will return the H factor matrix from the decomposition at that time. This parameter is the parameter
err_delta_tol
inmixed_netNMF
function. -
save_H (float, default=False): Parameter for determining whether or not to save individual H matrices to file. This parameter is used in command line script
run_pyNBS
.
-
niter (int, default=100): Number of iterations to perform sub-sampling, propagation and network-regularized NMF before consensus clustering. This parameter is used in command line script
run_pyNBS
. -
hclust_linkage_method (str, default='average'): The hiearchical clustering linkage method to use. Other methods are described in the
scipy.cluster.hierarchy.linkage
documentation. This parameter is used inconsensus_hclust_hard
function. -
hclust_linkage_metric (str, default='euclidean'): The distance metric to use when constructing the linkage map of patients to be clustered in each H matrix. Other distance measures are described in the
scipy.spatial.distance.pdist
documentation. This parameter is used inconsensus_hclust_hard
function. -
save_cc_results (bool, default=True): Parameter for determining whether or not to save consensus clustering results. This parameter is used in command line script
run_pyNBS
. -
save_cc_map (bool, default=True): Parameter for determining whether or not to save patient co-clustering map. This parameter is used in command line script
run_pyNBS
.
-
plot_survival (bool, default=False): Parameter for determining whether or not to perform survival analysis. This parameter should only be set
True
when patient survival data is provided. This parameter is used in command line scriptrun_pyNBS
. -
surv_file_delim( str, default='\t'): Delimiter used in the patient survival data file between columns. This is the parameter
delimiter
incluster_KMplot
function. -
surv_lr_test (book, default=True): Parameter for determining whether or not to perform a multi=-variate log-rank test on the full set (over the full length) of survival curves in the resulting KM plot. If True, this function will return the p-value of the log-rank test and add it to the title of the plot, otherwise, only the plot will be generated. This parameter is the parameter
lr_test
incluster_KMplot
function. -
surv_tmax (int, default=0): The number of days to cut off the KM plot display. The default (-1) shows the full length of all survival data, otherwise,
surv_tmax
should be a positive integer. Making a shortersurv_tmax
will not affect the log-rank test p-values. This parameter is the parametertmax
incluster_KMplot
function.