Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excess detail consuming disk resources #97

Open
datadavev opened this issue Dec 20, 2023 · 2 comments
Open

Excess detail consuming disk resources #97

datadavev opened this issue Dec 20, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@datadavev
Copy link
Member

Disk usage on logproc-stage-ucsb-1.test.dataone.org is running close to 95%. By default, Elastic Search puts itself into read only mode when disk capacity reaches 95% full to avoid errors and complications when disks are full.

The method for recovery is to reduce disk usage and issue the command:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

The vast majority of disk use is with the apacheperf-1 index currently at around 720gb, followed by eventlog-1 at around 144gb.

@datadavev datadavev added the bug Something isn't working label Dec 20, 2023
@mbjones
Copy link
Member

mbjones commented Dec 20, 2023

Thanks for this, @datadavev -- nick is out, so its hard to expand the filesystem at the moment. But we can later. Over the short term, is there anything we can clean up to gain some headroom? I see about 10GB of log data in /var/log in 3 subdirs:

1999	apache2
4026	elasticsearch
4129	journal

Maybe those can be trimmed some? Other ideas?

@datadavev
Copy link
Member Author

Some space has been freed up and I've slowed the firehose of events in the apacheperf-1 index by excluding events where the CN is calling itself through the API. That reduces the traffic considerably to give some time for a more considered solution. The temporary fix was on cn-ucsb-1 adjust the apache config like:

    #Performance logging
    # don't log self
    SetEnvIf Remote_Addr "128\.111\.85\.180" dontlog
    LogFormat "%{%Y-%m-%d}tT%{%T}t.%{msec_frac}t%{%z}t|%m|%>s|%{ms}T|%a|%U|\"%q\"|%{cache-status}e|\"%{User-agent}i\"|%u" performance_log
    CustomLog "/var/log/apache2/cn_perf.log" performance_log env=!dontlog

This is just a temporary fix to slow the deluge of events.
Thing is, I'm not sure this information is needed for the current metrics processing - reviewing code and logstash configuration...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants