Mix of python (first stage - to fix problems and parse the xml and calculate initial statistics) and R scripts (later statistics & plotting).
Comments and help welcome. This is all in the very early stages, fair amount of articles are missing/failing parsing. Stuff can always be more elaborate.