diff --git a/.gitignore b/.gitignore index e8864b72..47cd9fa3 100644 --- a/.gitignore +++ b/.gitignore @@ -1,56 +1,71 @@ -#ietoolkit specific files -test/ +######################################################################## +# +# Based on DIME .gitignore template. Follow the instructions in the URL +# below to set up this template in your own repository +# https://github.com/worldbank/dime-github-trainings/tree/master/GitHub-resources/DIME-GitHub-Templates +# +# Note that if you are using GitKraken, you need to use version 5.x or more +# recent for this template to work properly +# +######################################################################## + +####################### +# Start by ignoring everything, and below we are explicitly saying +# what to not ignore +* + +####################### +# List of files with GitHub functionality anywhere in the repo +# that we do not want to ignore + +# These files include GitHub settings +!.gitignore +!.gitattributes + +# Keep markdown files used for documentation on GitHub +!README.md +!CONTRIBUTING.md +!LICENSE* + +####################### +# For performance reasons, if a folder is already ignored, then +# GitHub does not check the content for that folder for matches +# with additional rules. The line below includes folder in the +# top folder (but not their content), so that anything matching +# the rules below will still not be ignored. +!*/ + +####################### +# The following file types are code that should always be +# included no matter where in the repository folder they are +# located unless you explicitly ignore that folder + +# Stata +!/**/*.do +!/**/*.ado -# Windows image file caches -Thumbs.db -ehthumbs.db - -# Folder config file -Desktop.ini - -# Recycle Bin used on file shares -$RECYCLE.BIN/ - -# Windows Installer files -*.cab -*.msi -*.msm -*.msp - -# Windows shortcuts -*.lnk - -# ========================= -# Operating System Files -# ========================= - -# OSX -# ========================= - -.DS_Store -.AppleDouble -.LSOverride - -# Thumbnails -._* - -# Files that might appear in the root of a volume -.DocumentRevisions-V100 -.fseventsd -.Spotlight-V100 -.TemporaryItems -.Trashes -.VolumeIcon.icns - -# Directories potentially created on remote AFP share -.AppleDB -.AppleDesktop -Network Trash Folder -Temporary Items -.apdisk - -# Folders and files that are only relevant for testing -run/ -admin/ -*.xls -*.xlsx +# R +!/**/*.R +!/**/*.Rmd + +# LaTeX +!/**/*.tex +!/**/*.bib + +# Python +!/**/*.py +!/**/*.ipynb +# Still ignore .ipynb files in checkpoint folders +.ipynb_checkpoints + +# Matlab +!/**/*.m + +# Markdown +!/**/*.md + +# Julia +!/**/*.jl + +#Ignore test folder +test/ diff --git a/admin/checklist-submitting-SSC.md b/admin/checklist-submitting-SSC.md index 444a2b82..6cf3245b 100644 --- a/admin/checklist-submitting-SSC.md +++ b/admin/checklist-submitting-SSC.md @@ -15,6 +15,7 @@ - [ ] 4.2 - If any of the meta info (title, description, keywords, version or authour/contact) has changed then include those updates in your email. - [ ] 5. **Draft release note** - Go to the [release notes](https://github.com/worldbank/iefieldkit/releases) and draft a new release note for the new version. Follow the format from previous releases with links to [issues](https://github.com/worldbank/iefieldkit/issues) solved. - [ ] 6. **Wait for publication confirmation** - Do not proceed pass this step until Prof. Baum has confirmed that the new version is uploaded to the servers. + - [ ] 6.1 - Update the package from SSC and make sure it's the latest version. - [ ] 7. **Merge version branch to *master*** - If step 2 and 3 was done correctly, then there should not be any merge conflicts in this step. - [ ] 8. **Rebase *develop* to *master*** - This step brings edits done in 3 and 3.1, as well as version updates done in 3.2 and 3.3 into the *develop* branch. The same result can be accomplished - although by creating a slightly messier history - by merging *master* into *develop*. Regardless if the branches are merged or rebased, if any branches created of *develop* was not included in this version, make sure to rebase them to *develop* afterwards, otherwise there is a big risk for very messy conflicts in the future. - [ ] 9. **Publish release note** - Once the new version is up on SSC, publish the release note. diff --git a/run/Master.do b/run/Master.do index 3dd4a999..55444674 100644 --- a/run/Master.do +++ b/run/Master.do @@ -14,13 +14,14 @@ qui { Part I: Set up *******************************************************************************/ - * Set folder paths + * Set root paths global GitHub "" global AnalyticsDB "" - * Calculate globals + * Set up folder globals global iefieldkit "${GitHub}/iefieldkit" global form "${AnalyticsDB}/Data Coordinator/iefieldkit/ietestform" + global testouput "${iefieldkit}/run/output" * Select commands to test local ieduplicates 1 @@ -41,6 +42,10 @@ qui { noi di as error "No commands to test" exit } + + ** Test if output folder exists, if not create it + mata : st_numscalar("r(dirExist)", direxists("${testouput}")) + if `r(dirExist)' == 0 mkdir "${testouput}" } /******************************************************************************* diff --git a/run/ieduplicates.do b/run/ieduplicates.do index 9755ff16..60437f91 100644 --- a/run/ieduplicates.do +++ b/run/ieduplicates.do @@ -6,6 +6,11 @@ * Add the path to your local clone of the iefieldkit repo do "${iefieldkit}/src/ado_files/ieduplicates.ado" + local iedup_output "${testouput}/ieduplicates_output" + + ** Test if output folder exists, if not create it + mata : st_numscalar("r(dirExist)", direxists("`iedup_output'")) + if `r(dirExist)' == 0 mkdir "`iedup_output'" sysuse auto, clear encode make, gen(uuid) @@ -20,54 +25,56 @@ *******************************************************************************/ * No duplicates - ieduplicates make using "${iefieldkit}\foo", uniquevars(make) + ieduplicates make using "`iedup_output'\foo", uniquevars(make) * Test file format use `duplicates', clear - ieduplicates uuid using "${iefieldkit}\foo.xlsx", uniquevars(make) force + ieduplicates uuid using "`iedup_output'\foo.xlsx", uniquevars(make) force use `duplicates', clear - ieduplicates uuid using "${iefieldkit}\foo.xls", uniquevars(make) force + ieduplicates uuid using "`iedup_output'\foo.xls", uniquevars(make) force * Test folder and suffix syntax use `duplicates', clear - ieduplicates uuid, uniquevars(make) folder("${iefieldkit}") force + ieduplicates uuid, uniquevars(make) folder("`iedup_output'") force use `duplicates', clear - ieduplicates uuid, uniquevars(make) folder("${iefieldkit}") suffix(bar) force - + ieduplicates uuid, uniquevars(make) folder("`iedup_output'") suffix(bar) force /******************************************************************************* Yes error *******************************************************************************/ * Observations were removed - cap ieduplicates uuid using "${iefieldkit}", uniquevars(make) + cap ieduplicates uuid using "`iedup_output'\iedupreport.xlsx", uniquevars(make) + di _rc assert _rc == 9 - - * Without 'clear' option + + * Without 'force' option use `duplicates', clear - cap ieduplicates uuid using "${iefieldkit}\foo", uniquevars(make) + cap ieduplicates uuid using "`iedup_output'\foo", uniquevars(make) + di _rc assert _rc == 198 * Invalid format use `duplicates', clear - cap ieduplicates uuid using "${iefieldkit}\foo.csv", uniquevars(make) + cap ieduplicates uuid using "`iedup_output'\foo.csv", uniquevars(make) assert _rc == 198 use `duplicates', clear - cap ieduplicates uuid using "${iefieldkit}\foo.", uniquevars(make) force + cap ieduplicates uuid using "`iedup_output'\foo.", uniquevars(make) force + di _rc assert _rc == 198 * Invalid name use `duplicates', clear - cap ieduplicates uuid using "${iefieldkit}\.xlsx", uniquevars(make) + cap ieduplicates uuid using "`iedup_output'\.xlsx", uniquevars(make) + di _rc assert _rc == 198 *Check that cd is not working - cd "${iefieldkit}" + cd "`iedup_output'" use `duplicates', clear cap ieduplicates uuid using "foo", uniquevars(make) force + di _rc assert _rc == 198 - - diff --git a/src/ado_files/iecodebook.ado b/src/ado_files/iecodebook.ado index beb6aea1..1e642d32 100644 --- a/src/ado_files/iecodebook.ado +++ b/src/ado_files/iecodebook.ado @@ -1,4 +1,4 @@ -*! version 1.4 8AUG2019 DIME Analytics dimeanalytics@worldbank.org +*! version 1.5 28APR2020 DIME Analytics dimeanalytics@worldbank.org // Main syntax --------------------------------------------------------------------------------- @@ -26,43 +26,49 @@ cap program drop iecodebook } // Select subcommand + noi di " " gettoken subcommand anything : anything // Check folder exists - // Start by finding the position of the last forward slash. If no forward - * slash exist, it is zero, then replace to to string len so it is never - * the min() below. - local r_f_slash = strpos(strreverse(`"`using'"'),"\") - if `r_f_slash' == 0 local r_f_slash = strlen(`"`using'"') + // Start by standardize all slashes to forward slashes, and get the position of the last slash + local using = subinstr("`using'","\","/",.) + local r_lastslash = strlen(`"`using'"') - strpos(strreverse(`"`using'"'),"/") + if strpos(strreverse(`"`using'"'),"/") == 0 local r_lastslash -1 // Set to -1 if there is no slash - // Start by finding the position of the last backward slash. If no backward - * slash exist, it is zero, then replace to to string len so it is never - * the min() below. - local r_b_slash = strpos(strreverse(`"`using'"'),"/") - if `r_b_slash' == 0 local r_b_slash = strlen(`"`using'"') + // Get the full folder path and the file name + local r_folder = substr(`"`using'"',1,`r_lastslash') + local r_file = substr(`"`using'"',`r_lastslash'+2,.) - // Get the last slash in the report file path regardless of back or forward - local r_lastslash = strlen(`"`using'"')-min(`r_f_slash',`r_b_slash') + // Test that the folder for the report file exists + mata : st_numscalar("r(dirExist)", direxists("`r_folder'")) + if `r(dirExist)' == 0 { + noi di as error `"{phang}The folder [`r_folder'/] does not exist.{p_end}"' + error 601 + } - // Get the folder - local r_folder = substr(`"`using'"',1,`r_lastslash') + // Find the position of the last dot in the file name and get the file format extension + local r_lastsdot = strlen(`"`r_file'"') - strpos(strreverse(`"`r_file'"'),".") + local r_fileextension = substr(`"`r_file'"',`r_lastsdot'+1,.) - // Test that the folder for the report file exists - mata : st_numscalar("r(dirExist)", direxists("`r_folder'")) - if `r(dirExist)' == 0 { - noi di as error `"{phang}The folder [`r_folder'/] does not exist.{p_end}"' - error 601 - } + // If no fileextension was used, then add .xslx to "`using'" + if "`r_fileextension'" == "" { + local using "`using'.xlsx" + } + // Throw an error if user input uses any extension other than the allowed + else if !inlist("`r_fileextension'",".xlsx",".xls") { + di as error "The codebook may only have the file extension [.xslx] or [.xls]. The format [`r_fileextension'] is not allowed." + error 601 + } // Throw error on [template] if codebook already exists - if ("`subcommand'" == "template") { + if ("`subcommand'" == "template") & !strpos(`"`options'"',"replace") { cap confirm file "`using'" if _rc == 0 { di as err "That template already exists. {bf:iecodebook} does not allow you to overwrite an existing template," di as err " since you may already have set it up. If you are {bf:sure} that you want to delete this template," - di as err `" you need to manually remove it from `using'. {bf:iecodebook} will now exit."' + di as err `" you need to manually delete the file `using'. {bf:iecodebook} will now exit."' error 602 } @@ -73,7 +79,7 @@ cap program drop iecodebook } } - if ("`subcommand'" == "export") { + if ("`subcommand'" == "export") & !strpos(`"`options'"',"replace") { cap confirm file "`using'" if (_rc == 0) & (!strpos(`"`options'"',"replace")) { @@ -98,7 +104,7 @@ cap program drop iecodebook iecodebook_labclean // Do this first in case export or template syntax iecodebook_`subcommand' `anything' using "`using'" , `options' iecodebook_labclean // Do this again to clean up after apply or append - qui compress + qui cap compress end @@ -146,7 +152,7 @@ cap program drop iecodebook_export qui { // Return a warning if there are lots of variables - if `c(k)' >= 1000 di "This dataset has `c(k)' variables. This may take a long time! Consider subsetting your variables first." + if `c(k)' >= 1000 noi di "This dataset has `c(k)' variables. This may take a long time! Consider subsetting your variables first." // Template Setup // Load dataset if argument @@ -210,6 +216,11 @@ qui { save `a' , replace } + // Clean up common characters + foreach character in , . < > / ? [ ] | & ! ^ + - : * = ( ) "{" "}" "`" "'" { + replace v1 = subinstr(v1,"`character'"," ",.) + } + // Reshape one word per line split v1 drop v1 @@ -230,8 +241,8 @@ qui { qui count forvalues i = 1/`r(N)' { local next = `v'[`i'] - cap unab vars : `next' - if _rc == 0 local allVars "`allVars' `vars'" + cap novarabbrev unab vars : `next' + if (_rc == 0 & strpos("`vars'","__")!=1 ) local allVars "`allVars' `vars'" } // Keep only those variables @@ -245,7 +256,8 @@ qui { } keep `theKeepList' // Keep only variables mentioned in the dofiles compress - local savedta = subinstr(`"`using'"',".xlsx",".dta",.) + local savedta = subinstr(`"`using'"',".xls",".dta",.) + local savedta = subinstr(`"`using'"',".dtax",".dta",.) save "`savedta'" , replace } // End [trim] option @@ -317,10 +329,25 @@ qui { qui lookfor name clonevar name`template' = `: word 2 of `r(varlist)'' + local theNames = "`r(varlist)'" label var name`template' "name`template_colon'" - merge 1:1 name`template' using `newdata' , nogen + // Allow matching for more rounds + local nNames : list sizeof theNames + if `nNames' > 2 { + forvalues i = 2/`nNames' { + replace name`template' = `: word `i' of `theNames'' if name`template' == "" + } + } + + tempvar order + gen `order' = _n + + merge m:1 name`template' using `newdata' , nogen replace name`template' = "" if type`template' == "" + + sort `order' + drop `order' } // Export variable information to "survey" sheet @@ -333,8 +360,8 @@ qui { local rc = _rc } } - if `rc' != 0 di as err "A codebook didn't write properly. This can be caused by Dropbox syncing the file or having the file open." - if `rc' != 0 di as err "Consider turning Dropbox syncing off or using a non-Dropbox location. You may need to delete the file and try again." + if `rc' != 0 di as err "A codebook didn't write properly. This can be caused by file syncing the file or having the file open." + if `rc' != 0 di as err "If the file is not currently open, consider turning file syncing off or using a non-synced location. You may need to delete the file and try again." if `rc' != 0 error 603 restore @@ -465,12 +492,17 @@ qui { forvalues i = 2/`r(N)' { local theName = name`survey'[`i'] local theRename = name[`i'] + local theRename = trim("`theRename'") if strtoname("`theRename'") != "`theRename'" & "`theRename'" != "." { di as err "Error: [`theRename'] on line `i' is not a valid Stata variable name." local QUITFLAG = 1 } local theLabel = label[`i'] local theChoices = choices[`i'] + if strtoname("`theChoices'") != "`theChoices'" & "`theChoices'" != "." { + di as err "Error: [`theChoices'] on line `i' is not a valid Stata choice list name." + local QUITFLAG = 1 + } local theRecode = recode`survey'[`i'] if "`theName'" != "" { @@ -498,6 +530,20 @@ qui { // Prepare list of values for each value label. import excel "`using'" , first clear sheet(choices) allstring + // Catch any labels called on choices that are not defined in choice sheet + levelsof list_name , local(theListedLabels) + local leftovers : list theValueLabels - theListedLabels + if `"`leftovers'"' != "" { + di as err "You have specified a value label in [choices] which is not defined in the {it:choices} sheet." + di as err "{bf:iecodebook} will exit. Define the following value labels and re-run the command to continue:" + di as err " " + foreach element in `leftovers' { + di as err " `element'" + } + di as err " " + error 100 + } + // Check for broken things, namely quotation marks foreach var of varlist * { cap confirm string variable `var' @@ -516,13 +562,13 @@ qui { local theNextValue = value[`i'] local theNextLabel = label[`i'] local theValueLabel = list_name[`i'] - local theLabelList_`theValueLabel' `" `theLabelList_`theValueLabel'' `theNextValue' "`theNextLabel'" "' + local L`theValueLabel' `" `L`theValueLabel'' `theNextValue' "`theNextLabel'" "' } // Add missing values if requested if `"`missingvalues'"' != "" { foreach theValueLabel in `theValueLabels' { - local theLabelList_`theValueLabel' `" `theLabelList_`theValueLabel'' `missingvalues' "' + if "`theValueLabel'" != "." local L`theValueLabel' `" `L`theValueLabel'' `missingvalues' "' } } @@ -531,11 +577,16 @@ qui { // Define value labels foreach theValueLabel in `theValueLabels' { - if "`theValueLabel'" != "." label def `theValueLabel' `theLabelList_`theValueLabel'', replace + if "`theValueLabel'" != "." label def `theValueLabel' `L`theValueLabel'', replace } // Drop leftovers if requested cap drop `allDrops' + qui des + if `r(k)' == 0 { + noi di as err "You are dropping all the variables in a dataset. This is not allowed. {bf:iecodebook} will exit." + error 102 + } // Apply all recodes, choices, and labels foreach type in Recodes Choices Labels { @@ -569,7 +620,7 @@ cap program drop iecodebook_append syntax [anything] [using/] , /// surveys(string asis) [GENerate(string asis)] /// - [clear] [match] [KEEPall] /// User options + [clear] [match] [KEEPall] [report] /// User options [template] [replace] /// System options [*] @@ -591,8 +642,8 @@ qui { local drop "drop" } else { - di "You have specified [keepall], which means you are forcing all variables to be appended even if you did not manually harmonize them." - di "Make sure to check the resulting dataset carefully. Forcibly appending data, especially of different types, may result in loss of information." + noi di "You have specified [keepall], which means you are forcing all variables to be appended even if you did not manually harmonize them." + noi di "Make sure to check the resulting dataset carefully. Forcibly appending data, especially of different types, may result in loss of information." local drop "" } @@ -630,14 +681,12 @@ qui { } // On success copy to final location - copy "`codebook'" `"`using'"' + copy "`codebook'" `"`using'"' , `replace' use `raw_data' , clear exit } - - // Loop over datasets and apply codebook local x = 0 foreach dataset in `anything' { @@ -661,12 +710,15 @@ qui { } // Success message - di `"Applied codebook using `using' to `anything' – check your data carefully!"' + noi di `"Applied codebook {browse `using'} to `anything' – check your data carefully!"' // Final codebook - local using = subinstr("`using'",".xlsx","_appended.xlsx",.) - iecodebook export using "`using'" + local using = subinstr("`using'",".xls","_report.xls",.) use `final_data' , clear + if "`report'" != "" { + iecodebook export using "`using'" , `replace' + noi di `"Wrote report to {browse `using'}!"' + } } // end qui end diff --git a/src/ado_files/iecompdup.ado b/src/ado_files/iecompdup.ado index 3dd5a321..86f13971 100644 --- a/src/ado_files/iecompdup.ado +++ b/src/ado_files/iecompdup.ado @@ -1,4 +1,4 @@ -*! version 1.4 8AUG2019 DIME Analytics dimeanalytics@worldbank.org +*! version 1.5 28APR2020 DIME Analytics dimeanalytics@worldbank.org capture program drop iecompdup program iecompdup , rclass diff --git a/src/ado_files/ieduplicates.ado b/src/ado_files/ieduplicates.ado index 7adf20ea..db3ca904 100644 --- a/src/ado_files/ieduplicates.ado +++ b/src/ado_files/ieduplicates.ado @@ -1,4 +1,4 @@ -*! version 1.4 8AUG2019 DIME Analytics dimeanalytics@worldbank.org +*! version 1.5 28APR2020 DIME Analytics dimeanalytics@worldbank.org capture program drop ieduplicates program ieduplicates , rclass @@ -301,25 +301,28 @@ *Locals indicating in which ways input is incorrect (if any) local local_multiInp 0 local local_multiCorr 0 - local local_inputNotYes 0 + local local_inputNotYes 0 local local_notDrop 0 /****************** Section 3.3.1 - Make sure input is yes or y for the correct and drop columns + Make sure input is "correct" or "drop" for the correct and drop columns + "Yes" and "y" are allowed for backward compatibility ******************/ - * 1. Trim the string of leading and trailing spaces, 2. make it lower case and 3. change "y" to "yes" - replace `correct' = trim(`correct') - replace `drop' = trim(`drop') - replace `correct' = lower(`correct') - replace `drop' = lower(`drop') - replace `correct' = "yes" if `correct' == "y" - replace `drop' = "yes" if `drop' == "y" + * 1. Trim the string of leading and trailing spaces, 2. make it lower case + * and 3. (change "y" to "yes" : backward compatibility) + replace `correct' = lower(trim(`correct')) + replace `drop' = lower(trim(`drop')) + replace `correct' = "yes" if `correct' == "y" //Backward compatibility + replace `drop' = "yes" if `drop' == "y" //Backward compatibility - *Check that variables are either empty or "yes" - gen `inputNotYes' = !((`correct' == "yes" | `correct' == "") & (`drop' == "yes" | `drop' == "")) + *Check that variables are either empty or correct/drop. (Or "yes" : backward compatibility) + gen `inputNotYes' = !( /// + (`correct' == "correct" | `correct' == "" | `correct' == "yes") & /// + (`drop' == "drop" | `drop' == "" | `drop' == "yes") /// + ) *Set local to 1 if error should be outputted cap assert `inputNotYes' == 0 @@ -345,8 +348,8 @@ Test that maximum one duplicate per duplicate group is indicated as correct ******************/ - *Generate dummy if correct column is set to yes - gen `yesCorrect' = (`correct' == "yes") + *Generate dummy if correct column is set to "correct" (or "yes" : backward compatibility) + gen `yesCorrect' = (`correct' == "correct" | `correct' == "yes") //("yes" : backward compatibility) *Count number of duplicates within duplicates where that dummy is 1 bys `idvar' : egen `groupNumCorrect' = total(`yesCorrect') @@ -381,7 +384,7 @@ ** If option droprest is not used, then all observations in a duplicate * group where at least one observation has a correction must have a - * correction or have drop set to yes. + * correction or have drop set to "drop" (or "yes" : backward compatibility) cap assert `notDrop' == 0 *Error will be outputted below @@ -390,11 +393,11 @@ } else { - ** Option -droprest- specified. Drop will be changed to yes + ** Option -droprest- specified. Drop will be changed to drop * for any observations without drop or any other correction * explicitly specified if the observation is in a duplicate * group with at least one observation has a correction - replace `drop' = "yes" if `notDrop' == 1 + replace `drop' = "drop" if `notDrop' == 1 } @@ -427,7 +430,7 @@ *Error in incorrect string if `local_inputNotYes' == 1 { - display as error "{phang}The following observations have an answer in either correct or drop that is neither yes nor y{p_end}" + display as error `"{phang}The following observations have an answer in either correct or drop that is not valid. The only valid value in correct is "correct", and in drop is "drop". ("yes" is also allowed for backward compatibility.){p_end}"' list `idvar' `duplistid' `correct' `drop' `uniquevars' if `inputNotYes' == 1 di "" } @@ -576,9 +579,8 @@ * Only checking variables in the original data set and not variables in Excel report. local diffvars: list diffvars - excelVars - *Truncate list when longer than 256 to fit in old Stata string formats. - *255-29 (characters for " ||| List truncated, use iecompdup for full list")= 226 - if strlen("`diffvars'") > 256 local difflist_`nospaceid' = substr("`r(diffvars)'" ,1 ,207) + " ||| List truncated, use iecompdup for full list" + *Truncate list when longer than 250 to look better in report + if strlen("`diffvars'") > 250 local difflist_`nospaceid' = substr("`r(diffvars)'" ,1 ,200) + " ||| List truncated, use iecompdup for full list" else local difflist_`nospaceid' "`diffvars'" //List of diff is short enough to show in its entirety } @@ -682,7 +684,7 @@ Drop duplicates listed for drop ******************/ - drop if `drop' == "yes" + drop if (`drop' == "drop" | `drop' == "yes") /****************** Section 6.2 diff --git a/src/ado_files/iefieldkit.ado b/src/ado_files/iefieldkit.ado index c0b12c02..cc3c1808 100644 --- a/src/ado_files/iefieldkit.ado +++ b/src/ado_files/iefieldkit.ado @@ -1,11 +1,11 @@ -*! version 1.4 8AUG2019 DIME Analytics dimeanalytics@worldbank.org +*! version 1.5 28APR2020 DIME Analytics dimeanalytics@worldbank.org capture program drop iefieldkit program iefieldkit, rclass * UPDATE THESE LOCALS FOR EACH NEW VERSION PUBLISHED - local version "1.4" - local versionDate "8AUG2019" + local version "1.5" + local versionDate "28APR2020" syntax [anything] @@ -33,5 +33,9 @@ program iefieldkit, rclass noi di "" noi di _col(4) "This version of iefieldkit installed is version " _col(54)"`version'" noi di _col(4) "This version of iefieldkit was released on " _col(54)"`versionDate'" - + noi di "" + noi di _col(4) "This package includes the following commands:" + noi di _col(8) "- {help iecodebook}" + noi di _col(8) "- {help ieduplicates}/{help iecompdup}" + noi di _col(8) "- {help ietestform}" end diff --git a/src/ado_files/ietestform.ado b/src/ado_files/ietestform.ado index d6f3c2f1..88660652 100644 --- a/src/ado_files/ietestform.ado +++ b/src/ado_files/ietestform.ado @@ -1,4 +1,4 @@ -*! version 1.4 8AUG2019 DIME Analytics dimeanalytics@worldbank.org +*! version 1.5 28APR2020 DIME Analytics dimeanalytics@worldbank.org capture program drop ietestform program ietestform , rclass diff --git a/src/help_files/iecodebook.sthlp b/src/help_files/iecodebook.sthlp index 402cf1ce..461ebaa8 100644 --- a/src/help_files/iecodebook.sthlp +++ b/src/help_files/iecodebook.sthlp @@ -1,16 +1,9 @@ {smcl} -{* 8 Aug 2019}{...} +{* 28 Apr 2020}{...} {hline} help for {hi:iecodebook} {hline} - _ __ __ __ - (_)__ _________ ____/ /__ / /_ ____ ____ / /__ - / / _ \/ ___/ __ \/ __ / _ \/ __ \/ __ \/ __ \/ //_/ - / / __/ /__/ /_/ / /_/ / __/ /_/ / /_/ / /_/ / ,< -/_/\___/\___/\____/\__,_/\___/_.___/\____/\____/_/|_| - - {title:Title} {p}{cmdab:iecodebook} {hline 2} performs common data cleaning tasks using dataset definitions (codebooks) written in Excel files. {p_end} @@ -77,7 +70,8 @@ and optionally produces an export version of the dataset with only variables use {it:"/path/to/survey1.dta" "/path/to/survey2.dta" [...]} {break} {help using} {it:"/path/to/codebook.xlsx"} {break} {p_end} {p 2 3}, {bf:clear} {bf:surveys(}{it:Survey1Name} {it:Survey2Name} [...]{bf:)} {break} -[{opth gen:erate(varname)} {opt miss:ingvalues(# "label" [# "label" ...])} {bf:keepall}]{p_end} +[{opth gen:erate(varname)} {opt miss:ingvalues(# "label" [# "label" ...])}]{break} +[{bf:report replace keepall}] {p_end} {dlgtab 0:Export: Creating a full codebook of the current data} @@ -124,6 +118,11 @@ This must be specified during both the template and append steps.{p_end} {synopt:{opt miss:ingvalues()}}This option specifies standardized "extended missing values" to add to every value label definition. For example, specifying {bf:missingvalues(}{it:.d "Don't Know" .r "Refused" .n "Not Applicable"}{bf:)} will add those codes to every value-labeled answer.{p_end} {break} +{synopt:{opt report}}This option writes a codebook in the standard {bf:export} format describing the appended dataset. +It will be placed in the same folder as the append codeboook, with the same name, with "_report" added to the filename.{p_end} +{break} +{synopt:{opt replace}}This option is required to overwrite a previous report.{p_end} +{break} {synopt:{opt keep:all}}By default, {cmdab:iecodebook append} will only retain those variables with a new {it:name} explicitly written in the codebook to signify manual review for harmonization. {bf:Specifying this option will keep all variables from all datasets. Use carefully!} Forcibly appending data, especially of different types, can result in loss of information. diff --git a/src/help_files/iecompdup.sthlp b/src/help_files/iecompdup.sthlp index e28e65b9..38369375 100644 --- a/src/help_files/iecompdup.sthlp +++ b/src/help_files/iecompdup.sthlp @@ -1,5 +1,5 @@ {smcl} -{* 8 Aug 2019}{...} +{* 28 Apr 2020}{...} {hline} help for {hi:iecompdup} {hline} diff --git a/src/help_files/ieduplicates.sthlp b/src/help_files/ieduplicates.sthlp index 68b6f731..3130d8ad 100644 --- a/src/help_files/ieduplicates.sthlp +++ b/src/help_files/ieduplicates.sthlp @@ -1,5 +1,5 @@ {smcl} -{* 8 Aug 2019}{...} +{* 28 Apr 2020}{...} {hline} help for {hi:ieduplicates} {hline} @@ -79,7 +79,7 @@ then returns the data set without these duplicates. {pstd}The Excel report includes three columns called {it:correct}, {it:drop} and {it:newid}. Each of them represents one way to correct the duplicates. If {it:correct} is indicated with -a "Yes" then that observation is kept unchanged, if {it:drop} is indicated with a "Yes" then +a "correct" then that observation is kept unchanged, if {it:drop} is indicated with a "drop" then that observation is deleted and if {it:newid} is indicated then that observation is assigned a new ID using the value in column {it:newid}. After corrections are entered, the report should be saved in the same location and without any changes to its name. @@ -187,23 +187,21 @@ is sorted at the time {cmd:ieduplicates} is executed. time for that duplicate. {phang}{it:listofdiffs} stores a list with the names of the variables that are -different in two different observations. This list is truncated at 256 characters -and is only stores when there are exactly two duplicates. For other cases, {help:iecompdup} -must be used to get this information. +different in two different observations. This list is truncated at 250 characters +and is only stores when there are exactly two duplicates. For full list or cases +where there are more then two duplicates, {help:iecompdup} should be used. {dlgtab:Columns in Excel Report to be filled in manually by a user:} -{phang}{it:correct} is used to indicate that the duplicate should be kept. Valid values are -restricted to "yes" and "y" to reduce the risk of unintended entries. The values -are not sensitive to case. All valid values are changed to "yes" lower case when -imported. If {it:correct} is indicated then both {it:drop} and {it:newid} must be -left empty. +{phang}{it:correct} is used to indicate that the duplicate should be kept. The only +valid value is "correct" to reduce the risk of unintended entries ("yes" is also +allowed for backward compatibility). The values are not sensitive to case. If {it:correct} +is indicated then both {it:drop} and {it:newid} must be left empty. -{phang}{it:drop} is used to indicate that the duplicate should be deleted. Valid values are -restricted to "yes" and "y" to reduce the risk of unintended entries. The values -are not sensitive to case. All valid values are changed to "yes" lower case when -imported. If {it:drop} is indicated then both {it:correct} and {it:newid} must be -left empty. +{phang}{it:drop} is used to indicate that the duplicate should be deleted. The only +valid value is "drop" to reduce the risk of unintended entries ("yes" is also +allowed for backward compatibility). The values are not sensitive to case. If {it:drop} +is indicated then both {it:correct} and {it:newid} must be left empty. {phang}{it:newid} is used to assign a new ID values to a duplicate. If {it:ID_varname} is a string then all values are valid for {it:newid}. If {it:ID_varname} is numeric then @@ -281,12 +279,12 @@ unresolved duplicates were found {col 3}{c TLC}{hline 130}{c TRC} {col 3}{c |}{col 4}HHID{col 10}duplistid{col 21}datelisted{col 33}datefixed{col 44}correct{col 53}drop{col 59}newid{col 65}initials{col 75}notes{col 94}KEY{col 107}enumerator{col 121}listofdiffs{col 134}{c |} {col 3}{c LT}{hline 130}{c RT} -{col 3}{c |}{col 4}4321{col 10}1{col 21}27Dec2015{col 33}02Jan2016{col 44}yes{col 53} {col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} -{col 3}{c |}{col 4}4321{col 10}2{col 21}27Dec2015{col 33}02Jan2016{col 44} {col 53}yes{col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} +{col 3}{c |}{col 4}4321{col 10}1{col 21}27Dec2015{col 33}02Jan2016{col 44}correct{col 53} {col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} +{col 3}{c |}{col 4}4321{col 10}2{col 21}27Dec2015{col 33}02Jan2016{col 44} {col 53}drop{col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} {col 3}{c |}{col 4}7365{col 10}3{col 21}03Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} {col 3}{c |}{col 4}7365{col 10}4{col 21}03Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} {col 3}{c |}{col 4}1145{col 10}5{col 21}03Jan2016{col 33}11Jan2016{col 44} {col 53} {col 59}1245{col 65}IB{col 75}incorrect id {col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} -{col 3}{c |}{col 4}1145{col 10}6{col 21}03Jan2016{col 33}11Jan2016{col 44}yes{col 53} {col 59} {col 65}IB{col 75}correct id {col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} +{col 3}{c |}{col 4}1145{col 10}6{col 21}03Jan2016{col 33}11Jan2016{col 44}correct{col 53} {col 59} {col 65}IB{col 75}correct id {col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} {col 3}{c |}{col 4}9834{col 10}7{col 21}11Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} {col 3}{c |}{col 4}9834{col 10}8{col 21}11Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:keepvarvalue}{col 121}{it:varlist}{col 134}{c |} {col 3}{c BLC}{hline 130}{c BRC} @@ -315,12 +313,12 @@ observation. One is kept and one is dropped, usually it does not matter which yo {col 3}{c TLC}{hline 116}{c TRC} {col 3}{c |}{col 4}HHID{col 10}duplistid{col 21}datelisted{col 33}datefixed{col 44}correct{col 53}out{col 59}newid{col 65}initials{col 75}notes_enumerators{col 94}KEY{col 107}listofdiffs{col 120}{c |} {col 3}{c LT}{hline 116}{c RT} -{col 3}{c |}{col 4}4321{col 10}1{col 21}27Dec2015{col 33}02Jan2016{col 44}yes{col 53} {col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} -{col 3}{c |}{col 4}4321{col 10}2{col 21}27Dec2015{col 33}02Jan2016{col 44} {col 53}yes{col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} +{col 3}{c |}{col 4}4321{col 10}1{col 21}27Dec2015{col 33}02Jan2016{col 44}correct{col 53} {col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} +{col 3}{c |}{col 4}4321{col 10}2{col 21}27Dec2015{col 33}02Jan2016{col 44} {col 53}drop{col 59} {col 65}KB{col 75}double submission{col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} {col 3}{c |}{col 4}7365{col 10}3{col 21}03Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} {col 3}{c |}{col 4}7365{col 10}4{col 21}03Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} {col 3}{c |}{col 4}1145{col 10}5{col 21}03Jan2016{col 33}11Jan2016{col 44} {col 53} {col 59}1245{col 65}IB{col 75}incorrect id {col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} -{col 3}{c |}{col 4}1145{col 10}6{col 21}03Jan2016{col 33}11Jan2016{col 44}yes{col 53} {col 59} {col 65}IB{col 75}correct id {col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} +{col 3}{c |}{col 4}1145{col 10}6{col 21}03Jan2016{col 33}11Jan2016{col 44}correct{col 53} {col 59} {col 65}IB{col 75}correct id {col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} {col 3}{c |}{col 4}9834{col 10}7{col 21}11Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} {col 3}{c |}{col 4}9834{col 10}8{col 21}11Jan2016{col 33} {col 44} {col 53} {col 59} {col 65} {col 75} {col 94}{it:uniquevalue}{col 107}{it:varlist}{col 120}{c |} {col 3}{c BLC}{hline 116}{c BRC} diff --git a/src/help_files/iefieldkit.sthlp b/src/help_files/iefieldkit.sthlp index 86485207..08f7ac4f 100644 --- a/src/help_files/iefieldkit.sthlp +++ b/src/help_files/iefieldkit.sthlp @@ -1,5 +1,5 @@ {smcl} -{* 8 Aug 2019}{...} +{* 28 Apr 2020}{...} {hline} help for {hi:iefieldkit} {hline} @@ -21,11 +21,16 @@ command please see the {browse "https://dimewiki.worldbank.org/wiki/iefieldkit": {marker desc} {title:Description} -{pstd}{cmdab:iegraph} This command returns the version of iefieldkit installed. It +{pstd}This command returns the version of iefieldkit installed. It can be used in the beginning of a Master Do-file that is intended to be used by multiple users to programmatically test if iefieldkit is not installed for the user and therefore need to be installed, or if the version the user has - installed is too old and needs to be upgraded. + installed is too old and needs to be upgraded.{p_end} + +{pstd}This package includes the following commands:{p_end} +{pmore}- {help iecodebook}{p_end} +{pmore}- {help ieduplicates} and {help iecompdup}{p_end} +{pmore}- {help ietestform}{p_end} {marker optslong} {title:Options} @@ -41,7 +46,7 @@ command please see the {browse "https://dimewiki.worldbank.org/wiki/iefieldkit": replaces the iefieldkit file with the latest version. In your code you can skip the second part if you are not sure which version is required. But you should always have the first part testing that {inp:r(version)} has a value before using - it in less than or greater than expressions. + it in less than or greater than expressions.{p_end} {inp} cap iefieldkit {inp} if "`r(version)'" == "" { @@ -53,15 +58,9 @@ command please see the {browse "https://dimewiki.worldbank.org/wiki/iefieldkit": {inp} ssc install iefieldkit , replace {inp} }{text} -{title:Acknowledgements} - -{phang}We would like to acknowledge the help in testing and proofreading we received - in relation to this command and help file from (in alphabetic order):{p_end} -{pmore}Luiza Cardoso De Andrade{break}Seungmin Lee{break} - {title:Authors} -{phang}Kristoffer Bjarkefur, The World Bank, DECIE +{phang}DIME Analytics, The World Bank, DECIE {phang}Please send bug-reports, suggestions and requests for clarifications writing "iefieldkit iefieldkit" in the subject line to the email address diff --git a/src/help_files/ietestform.sthlp b/src/help_files/ietestform.sthlp index 7b3e9f49..88c8e5cd 100644 --- a/src/help_files/ietestform.sthlp +++ b/src/help_files/ietestform.sthlp @@ -1,5 +1,5 @@ {smcl} -{* 8 Aug 2019}{...} +{* 28 Apr 2020}{...} {hline} help for {hi:ietestform} {hline} diff --git a/src/iefieldkit.pkg b/src/iefieldkit.pkg index 963e154e..cc4a3917 100644 --- a/src/iefieldkit.pkg +++ b/src/iefieldkit.pkg @@ -1,4 +1,4 @@ -v 1.4 +v 1.5 d iefieldkit. DIME Analytics iefieldkit d DIME Analytics, World Bank Group, Development Economics Research f /ado_files/iefieldkit.ado diff --git a/src/stata.toc b/src/stata.toc index b5036477..1b591e2a 100644 --- a/src/stata.toc +++ b/src/stata.toc @@ -1,3 +1,3 @@ -v 1.4 +v 1.5 d DIME Analytics, World Bank Group, Development Economics Research p iefieldkit DIME Analytics iefieldkit