Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRE REVIEW]: Unsupervised learning approach towards anomaly detection in compat logs with ADE #2972

Closed
whedon opened this issue Jan 19, 2021 · 69 comments

Comments

@whedon
Copy link

whedon commented Jan 19, 2021

Submitting author: @ayush-1506 (Ayush Shridhar)
Repository: https://github.com/openmainframeproject/ade.git
Version: v1.0.5
Editor: @gkthiruvathukal
Reviewers: @arcuri82, @mdpiper
Managing EiC: Kristen Thyng

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @ayush-1506. Currently, there isn't an JOSS editor assigned to your paper.

The author's suggestion for the handling editor is @bmcfee.

@ayush-1506 if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
@whedon
Copy link
Author

whedon commented Jan 19, 2021

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Jan 19, 2021

PDF failed to compile for issue #2972 with the following error:

Can't find any papers to compile :-(

@whedon
Copy link
Author

whedon commented Jan 19, 2021

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=5.22 s (90.8 files/s, 14113.6 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            437          10599          22656          36800
Bourne Shell                      9            215            293            719
XSLT                              3            154             54            541
XSD                               4             66             59            406
XML                               5             18             21            310
Maven                             3              6             16            276
CSS                               1             16              0             85
Bourne Again Shell                8             44            161             81
Markdown                          1             17              0             47
HTML                              2              5             14             21
JSON                              1              0              0             17
--------------------------------------------------------------------------------
SUM:                            474          11140          23274          39303
--------------------------------------------------------------------------------


Statistical information for the repository 'aef4f59eaae7aa27d77f8f93' was
gathered on 2021/01/19.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Chris Brooker                    1         68690              0           94.84
Faisal Hameed                   13           190            260            0.62
Jim Caffrey                     26          1961            674            3.64
Neale Ferguson                   3           271             41            0.43
ayman abdelghany                 3           157            135            0.40
davidoh                          2            25             22            0.06

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Faisal Hameed               161           84.7         14.7                0.00
James Caffrey              1562          100.0         10.7               23.62
Jim Caffrey                  11            0.6         14.8                0.00
Neale Ferguson              131           48.3         14.9                0.00
ayman abdelghany            130           82.8         12.8                0.00
cbrooker27                68035          100.0          0.0               36.32
davidoh                      25          100.0          0.1                0.00

@kthyng
Copy link

kthyng commented Jan 19, 2021

Hi @ayush-1506 — is there a paper associated with your submission?

@ayush-1506
Copy link

@kthyng Yes, the code and paper live inside a different branch of the repository.
Link : https://github.com/openmainframeproject/ade/tree/logs

Can we get whedon to use this branch instead of master? Else I'll discuss with my collaborators to merge this into master as soon as possible.

@kthyng
Copy link

kthyng commented Jan 20, 2021

@whedon generate pdf from branch logs

@whedon
Copy link
Author

whedon commented Jan 20, 2021

Attempting PDF compilation from custom branch logs. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Jan 20, 2021

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@kthyng
Copy link

kthyng commented Jan 20, 2021

@ayush-1506 Yes it is fine to have the paper in another branch. Please look through the paper requirements to be sure you've covered them all. For one thing, we require a section entitled "Statement of Need".

@kthyng
Copy link

kthyng commented Jan 20, 2021

@ayush-1506 This looks like interesting work, but can you make a compelling argument for why it is research software in particular? You can read more about that requirement here. I'm going to label this with a scope query to get the editorial board's input on this, which should take 1-2 weeks.

@kthyng
Copy link

kthyng commented Jan 20, 2021

@whedon scope query

@whedon
Copy link
Author

whedon commented Jan 20, 2021

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@whedon commands

@kthyng
Copy link

kthyng commented Jan 20, 2021

@whedon query scope

@whedon
Copy link
Author

whedon commented Jan 20, 2021

Submission flagged for editorial review.

@whedon whedon added the query-scope Submissions of uncertain scope for JOSS label Jan 20, 2021
@ayush-1506
Copy link

@kthyng Thanks for the input. I'll add a Statement of Need section (which will support the fact that this software and approach solves a problem). Do I need to add the argument behind this being a research software in the paper or a comment here will suffice?

@kthyng
Copy link

kthyng commented Jan 20, 2021

Here is the specific seciton on what your paper should contain: https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain

Your statement of need should describe the research purpose of the software, but summarizing that or expanding on it here would also be helpful as the editors look through your submission to learn about it.

@ayush-1506
Copy link

@kthyng Just realised that the Motivation section should probably be renamed to Statement of Need.

@ayush-1506
Copy link

summarizing that or expanding on it here would also be helpful as the editors look through your submission to learn about it.

Sure, I'll make required edits to the paper and expand the same here.

@ayush-1506
Copy link

ayush-1506 commented Jan 21, 2021

Made changes to the paper, summarizing the same here:

Objective:

The aim of the project is to solve the problem of efficiently detecting anomalous logs slices from large set of logs (This can include sparse logs such as Linux Syslogs RFC3164/RFC5424 format or very dense logs such as those generated from Spark jobs). This is a common occurrence in large system or a development cluster where system crash or unexpected behavior can have adverse effects. We introduce a novel approach towards solving this problem with a data science/statistical approach. Expanding on the approach later in this comment.

Need:

Why do we need to find anomalous log slices?

Debugging system failures is a cumbersome task. Upheaval behavior in the system can be identified by studying the logs generated while the system was running. If the system fails or reacts with unexpected behavior, this data is logged somewhere. However, going through hours of dense logs is a challenge: sysadmins typically need to race against time to study large amounts of log messages to decipher the root cause of the issue. Such system failures are very common and at times unavoidable. Over the years these have led to huge loss of time and resources.

Relevant work and our approach:

While there has been work towards this direction of anomaly detection in large logs, such as TadGAN (https://arxiv.org/abs/2009.07769) and semi-supervised adversarial learning with GANs (https://doi.org/10.1109/ciss.2019.8693024), most of these approaches have focused on using large deep learning models and some treat this as a supervised problem. These models are large to train and also comparatively slower.

On the other hand, we treat the problem as a statistical one and use unsupervised learning techniques for fast and robust detection of anomalous slices. Being an unsupervised approach, we don't need labelled features. Avoiding computationally heavy deep learning makes our system fast and it's written in the Java language which makes it ideal for enterprise IT use cases (which can be adapted to others too). To this end, we divide the problem into 3 main sub-categories:

  • Unsupervised learning algorithms: At the heart of ADE are unsupervised learning algorithms that are trained to understand the actual expected behavior of the system and compare it with the observed behavior during inference.

  • Model groups: We divide the training into several categories, called model groups. Through model groups, multiple systems contribute to the generation of a single model for the group; the more systems in the group, the more data our system can use to build the model.

  • Statistical scores: To come up with an anomaly score, we calculate a number of statistical score that contribute to the final anomaly score. These scores include Bernoulli Score, Poisson Score, LogNormal Score, Best-of-two score, rarity score, severity score, Clustering score, Percentile score, FullBernoulliClusterAware score to name a few.

Along with this, for each message, we try to classify it into four categories based on the frequency of the particular family of messages. These classes include:

  • New : Defines a completely new message (previously unseen)
  • IN_SYNC : Implies that ADE expects the message to be issued in a periodic pattern and the message was issued as expected
  • NOT_IN_SYNC : Implies that ADE expects the message to be issued in a periodic pattern but the message was not expected
  • NOT_PERIODIC : Indicates that ADE does not expect the message to be periodic

Using all this calculated information, we allocate an anomaly score to every internal slice. The higher the anomaly score, the greater are the chances of that particular slice being the source of anomalous logs.

Output format:

Output format: Finally, we write out the analysis output in XML format. An example of the analysis output for a day can be seen here. We also provide specialized output for each interval, which can be accessed by clicking on the XML links associated with each slice. Examples of analysis for a period can be viewed here. Our approach has shown comparatively accurate results when tested on real data, along with fast inference and training. We also provide sample data and instructions to build the binary and run it on the data.

Looking at the What we mean by research software section, I believe this falls under the category: software that: solves complex modeling problems in a scientific context (physics, mathematics, biology, medicine, social science, neuroscience, engineering) and extracts knowledge from large data sets.

Kindly let me know if there are any questions or if I missed something.

@kthyng
Copy link

kthyng commented Jan 22, 2021

@ayush-1506 Thank you! someone from the editorial board will get back to you after a week or two.

@danielskatz
Copy link

@ayush-1506 - can you explain what code you are submitting to JOSS in this branch vs the overall repo? The paper seems to describe ADE, which is what the repo contains, but you also have suggested that https://github.com/openmainframeproject/ade/tree/logs is the contribution being submitted here, and I can't tell if the paper describes that specific contribution.

@ayush-1506
Copy link

@danielskatz There were some issues (here : CLA wasn't registering the contributors) with pushing new commits to master branch in the ADE repository, hence all development was being done and reviewed in the logs branch temporarily. However, the issues with CLA have been resolved now and all changes have been merged to master. We can take the master branch as the main branch with paper and code from now on.

@danielskatz
Copy link

So is all of the content in the main branch the JOSS submission?

@danielskatz
Copy link

@whedon check repository

@whedon
Copy link
Author

whedon commented Jan 26, 2021

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=2.31 s (210.5 files/s, 32943.2 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            444          10823          23275          37558
Bourne Shell                     10            271            359            895
XSD                               5             88             76            542
XSLT                              3            154             54            541
XML                               6             23             45            468
Maven                             3              6             16            276
Markdown                          2             56              0            191
CSS                               1             16              0             85
Bourne Again Shell                8             44            161             81
TeX                               1              2              0             27
HTML                              2              5             14             21
JSON                              1              0              0             17
YAML                              1              1              0              8
--------------------------------------------------------------------------------
SUM:                            487          11489          24000          40710
--------------------------------------------------------------------------------


Statistical information for the repository '23ad3deb8570f20df8428079' was
gathered on 2021/01/26.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Chris Brooker                    1         68690              0           92.61
Faisal Hameed                   13           190            260            0.61
Jim Caffrey                     26          1961            674            3.55
Neale Ferguson                   3           271             41            0.42
ayman abdelghany                 3           157            135            0.39
ayush-1506                      13          1673             72            2.35
davidoh                          2            25             22            0.06

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Faisal Hameed               161           84.7         57.9                0.00
James Caffrey              1554          100.0         53.9               23.68
Jim Caffrey                  11            0.6         58.0                0.00
Neale Ferguson              131           48.3         58.1                0.00
ayman abdelghany            130           82.8         56.1                0.00
ayush-1506                 1619           96.8          6.5               42.62
cbrooker27                68025          100.0          0.0               36.31
davidoh                      25          100.0         43.3                0.00

@gkthiruvathukal
Copy link

@ jonathanschilling are you willing to contribute a review for this JOSS submission?

@ayush-1506
Copy link

@gkthiruvathukal I think jonathanschilling wasn't notified since there's a space between @ and jonathanschilling in your comment above(if this was not intentional)

@gkthiruvathukal
Copy link

@ayush-1506 So sorry!

@jonathanschilling are you willing to contribute a review for this JOSS submission?

@ayush-1506
Copy link

@gkthiruvathukal Hi, I'm not sure if jonathanschilling is seeing this, should we proceed into the reviewing stage (I believe arcuri82 has agreed to be the reviewer here) while we wait for him? (Or request another reviewer?) Thanks.

@gkthiruvathukal
Copy link

@ayush-1506 Yes, and please do suggest 2-3 names if you can. People are very busy. We need 2 reviewers to proceed to review. So having your input from our list of reviewers will be extremely helpful.

@jonathanschilling
Copy link

@ayush-1506 @gkthiruvathukal Sorry, I am indeed quite busy at the moment. Maybe someone else more involved in this topic can perform the review in this case?

@ayush-1506
Copy link

@gkthiruvathukal Adding some names to {kuangmeng, marcoapintoo} (mentioned above): {mdpiper, markbasham, johnsamuelwrites}. Kindle let me know if you'd like more suggestions.

@gkthiruvathukal
Copy link

gkthiruvathukal commented Feb 14, 2021

@ayush-1506 I will get moving on this shortly. Thanks for your suggestions and patience!

@gkthiruvathukal
Copy link

@kuangmeng Are you willing to contribute a review for this JOSS submission?

@arcuri82
Copy link

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Feb 18, 2021

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@arcuri82
Copy link

Hi,

@gkthiruvathukal : I made a first review at openmainframeproject/ade#85 However, as I do not use Linux, I did not run the software (just read documentation, compiled software, run its test cases). We would need the second reviewer to make sure to run the software. In worst case, I can try to install a virtual machine to run Linux on it (but that would be quite a bit of work, so might take me a while...). However, the authors have quite an extensive documentation, with few examples of outputs

@gkthiruvathukal
Copy link

@arcuri82 Are you on Mac or Windows? (I'm assuming one of those two since you're not on Linux.) Still waiting on a second reviewer. I will be sending out another invite shortly if I don't hear back from @kuangmeng.

@gkthiruvathukal
Copy link

And thank you for your early input, @arcuri82!

@gkthiruvathukal
Copy link

@mdpiper, are you willing to contribute a review for this JOSS submission. I need a second reviewer in addition to @arcuri82, who has graciously accepted!

@mdpiper
Copy link

mdpiper commented Feb 21, 2021

@gkthiruvathukal Yes, I can review this submission.

@gkthiruvathukal
Copy link

@mdpiper Thanks for your response! We are always grateful to our reviewers during these challenging times! I will add you and get the review started.

@gkthiruvathukal
Copy link

@whedon add @mdpiper as reviewer

@whedon whedon assigned arcuri82 and unassigned gkthiruvathukal and arcuri82 Feb 22, 2021
@whedon
Copy link
Author

whedon commented Feb 22, 2021

OK, @mdpiper is now a reviewer

@gkthiruvathukal
Copy link

@whedon start review

@whedon
Copy link
Author

whedon commented Feb 22, 2021

OK, I've started the review over in #3052.

@whedon whedon closed this as completed Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants