R scripts used in

All-cause and cause-specific mortality in respiratory symptom clusters: a population-based multicohort study

Daniil Lisik, Helena Backman, Hannu Kankaanranta, Rani Basna, Linnea Hedman, Linda Ekerljung, Fredrik Nyberg, Anne Lindberg, Göran Wennergren, Eva Rönmark, Bright I. Nwaru, Lowie Vanfleteren

Under review

Description of R scripts

VARIABLES.r: define variables across cohorts and their characteristics (e.g., variables for cluster analysis)
CONSTANTS.r: define constants used in multiple R scripts (e.g., folder paths)
draw-dag.r: draw a directed acyclic graph (DAG)
clean-and-combine-datasets.r: do cleaning/preprocessing and merge data from the cohorts with standardized variable names
assess-pre-imputation-data.r: tabulate data pre-imputation by cohort
assess-missingness.r: quantify and visualize missingness in the data
select-random-dataset.r select random dataset for testing stability by random removal of subjects
impute.r: impute missing data (and assess validity of imputation), and set composite variables
pool-imputations.r: pool imputed datasets
add-comorbidity-data.r: add comorbidity register data and tabulate the results
assess-pre-post-imputation-data.r: tabulate data pre- and post-imputation (as well as tabulation of relevant register [comorbidity] data)
do-descriptive-statistics.r: generate descriptive statistics, primarily regarding respiratory symptoms frequency
assess-correlation.r: quantify and visualize correlation between potential cluster-defining variables
prepare-data-for-cluster-analysis.r: based on above findings, prepare data to be used in cluster analysis
lsh.r: perform cluster analysis
pool-cluster-labels.r: pool cluster labels
assess-clusters.r: tabulate data by clusters
add-mortality-data.r: prepare data to be used in mortality analysis
mortality-analysis.r: perform mortality analysis and extract results therefrom
pool-hazard-ratios-and-confidence-intervals.r: pool hazard ratios (HR) and corresponding 95% confidence intervals (95%CI)
catboost.r: assess feature importance by gradient boosting on decision trees
plot-cost.r: across all the cluster analyses across all datasets, pool and plot the clustering cost (total within-cluster distance)

Data availability

Due to regulations and contractual agreement with study participants, the underlying data are not available.

Contact

For any inquiries, please contact Daniil Lisik (daniil.lisik@gmail.com).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

R scripts used in

Daniil Lisik, Helena Backman, Hannu Kankaanranta, Rani Basna, Linnea Hedman, Linda Ekerljung, Fredrik Nyberg, Anne Lindberg, Göran Wennergren, Eva Rönmark, Bright I. Nwaru, Lowie Vanfleteren

Under review

Description of R scripts

Data availability

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

R scripts used in

Daniil Lisik, Helena Backman, Hannu Kankaanranta, Rani Basna, Linnea Hedman, Linda Ekerljung, Fredrik Nyberg, Anne Lindberg, Göran Wennergren, Eva Rönmark, Bright I. Nwaru, Lowie Vanfleteren

Under review

Description of R scripts

Data availability

Contact