Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_prometheus_exporter: respond with 200 even with no metrics #8352

Closed
wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Jan 5, 2024

Prometheus scrapers consider non-200 responses errors. It's not necessarily an error that no filter has yet caused a metric to be exported (consider eg. a filter that exposes the number of error log messages as a prometheus metric, where there does not happen to have been error log messages in the inputs so far)


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@edsiper
Copy link
Member

edsiper commented Jan 10, 2024

Hi, thanks for this PR. To understand better, does it mean that if we return a 404 error, it can mess up the scrapper, and it won't retry ? I am looking to understand why 404 would cause an issue.

@ghost
Copy link
Author

ghost commented Jan 11, 2024

Hi, thanks for this PR. To understand better, does it mean that if we return a 404 error, it can mess up the scrapper, and it won't retry ? I am looking to understand why 404 would cause an issue.

it's not so much "mess up"; the scraping does continue to happen, but the target being scraped is considered down by prometheus (ie. the up metric will have value 0) if the response is not 200. the scrape target also shows as "DOWN" in prometheus web UI, which also displays the error the scrape received.

further: we are using Victoria Metrics, which exposes a metric vm_promscrape_scrapes_failed_total, that counts how many scrapes are failing, and we have an alert on that; fluent-bit returning 404 on an empty scrape makes that alert fire even if everything is ok (just no metrics generated from fluent-bit yet).

@edsiper
Copy link
Member

edsiper commented Jan 11, 2024

I understand, thanks for the explanation. Please let me know when this PR is ready for review.

@patrick-stephens
Copy link
Contributor

FYI, we do provide a Vagrant definition if you want to develop in a VM: https://github.com/fluent/fluent-bit/blob/master/Vagrantfile

Builds can also be all done via containers for any target using the ./packaging/build.sh script - this gives you an ephemeral build environment using your local source mounted into the container and compiled with the resulting binaries then exported from the container.
https://github.com/fluent/fluent-bit/tree/master/packaging shows how to use it hopefully, e.g.
./packaging/build.sh -d 'centos/7' will compile for CentOS 7 (AMD64) and create an RPM.

@ghost
Copy link
Author

ghost commented Jan 16, 2024

@patrick-stephens thanks, it's mostly just a question of finding time...

Copy link
Contributor

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Apr 17, 2024
@ghost
Copy link
Author

ghost commented Apr 17, 2024

example config file

[SERVICE]
    log_level debug
[INPUT]
    name tail
    path /tmp/foo
    tag foo
[FILTER]
    name log_to_metrics
    match foo
    tag foo_metric
    metric_name foos
    metric_mode counter
    metric_description foo count
[OUTPUT]
    name prometheus_exporter
    match foo_metric
    port 2021

debug log output (with valgrind)

$ valgrind --leak-check=full ./bin/fluent-bit -c flb.conf
==16481== Memcheck, a memory error detector
==16481== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16481== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==16481== Command: ./bin/fluent-bit -c flb.conf
==16481==
Fluent Bit v3.0.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  <
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/

[2024/04/17 02:38:36] [ info] Configuration:
[2024/04/17 02:38:36] [ info]  flush time     | 1.000000 seconds
[2024/04/17 02:38:36] [ info]  grace          | 5 seconds
[2024/04/17 02:38:36] [ info]  daemon         | 0
[2024/04/17 02:38:36] [ info] ___________
[2024/04/17 02:38:36] [ info]  inputs:
[2024/04/17 02:38:36] [ info]      tail
[2024/04/17 02:38:36] [ info] ___________
[2024/04/17 02:38:36] [ info]  filters:
[2024/04/17 02:38:36] [ info]      log_to_metrics.0
[2024/04/17 02:38:36] [ info] ___________
[2024/04/17 02:38:36] [ info]  outputs:
[2024/04/17 02:38:36] [ info]      prometheus_exporter.0
[2024/04/17 02:38:36] [ info] ___________
[2024/04/17 02:38:36] [ info]  collectors:
[2024/04/17 02:38:36] [ info] [fluent bit] version=3.0.3, commit=5363cf8c83, pid=16481
[2024/04/17 02:38:36] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2024/04/17 02:38:36] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/04/17 02:38:36] [ info] [cmetrics] version=0.7.3
[2024/04/17 02:38:36] [ info] [ctraces ] version=0.4.0
[2024/04/17 02:38:36] [ info] [input:tail:tail.0] initializing
[2024/04/17 02:38:36] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/04/17 02:38:36] [debug] [tail:tail.0] created event channels: read=21 write=22
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] inotify watch fd=27
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] scanning path /tmp/foo
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] inode=199246339 with offset=32 appended as /tmp/foo
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] scan_glob add(): /tmp/foo, inode 199246339
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] 1 new files found on path '/tmp/foo'
[2024/04/17 02:38:37] [ info] [input:emitter:emitter.1] initializing
[2024/04/17 02:38:37] [ info] [input:emitter:emitter.1] storage_strategy='memory' (memory only)
[2024/04/17 02:38:37] [debug] [emitter:emitter.1] created event channels: read=29 write=30
[2024/04/17 02:38:37] [debug] [prometheus_exporter:prometheus_exporter.0] created event channels: read=31 write=32
[2024/04/17 02:38:37] [ info] [output:prometheus_exporter:prometheus_exporter.0] listening iface=0.0.0.0 tcp_port=2021
[2024/04/17 02:38:37] [ info] [sp] stream processor started
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] inode=199246339 file=/tmp/foo promote to TAIL_EVENT
[2024/04/17 02:38:37] [ info] [input:tail:tail.0] inotify_fs_add(): inode=199246339 watch_fd=1 name=/tmp/foo
[2024/04/17 02:38:37] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2024/04/17 02:38:45] [debug] [input:tail:tail.0] inode=199246339, /tmp/foo, events: IN_MODIFY
[2024/04/17 02:38:46] [debug] [task] created task=0x53b95c0 id=0 without routes, dropping.
[2024/04/17 02:38:46] [debug] [task] destroy task=0x53b95c0 (task_id=0)
[2024/04/17 02:38:46] [debug] [task] created task=0x53b9770 id=0 OK
[2024/04/17 02:38:46] [debug] [out flush] cb_destroy coro_id=0
[2024/04/17 02:38:46] [debug] [task] destroy task=0x53b9770 (task_id=0)
^C[2024/04/17 02:38:49] [engine] caught signal (SIGINT)
[2024/04/17 02:38:49] [ warn] [engine] service will shutdown in max 5 seconds
[2024/04/17 02:38:49] [ info] [input] pausing tail.0
[2024/04/17 02:38:49] [ info] [input] pausing emitter.1
[2024/04/17 02:38:50] [ info] [engine] service has stopped (0 pending tasks)
[2024/04/17 02:38:50] [ info] [input] pausing tail.0
[2024/04/17 02:38:50] [ info] [input] pausing emitter.1
[2024/04/17 02:38:50] [debug] [input:tail:tail.0] inode=199246339 removing file name /tmp/foo
[2024/04/17 02:38:50] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=199246339 watch_fd=1
==16481==
==16481== HEAP SUMMARY:
==16481==     in use at exit: 0 bytes in 0 blocks
==16481==   total heap usage: 2,454 allocs, 2,454 frees, 2,265,918 bytes allocated
==16481==
==16481== All heap blocks were freed -- no leaks are possible
==16481==
==16481== For lists of detected and suppressed errors, rerun with: -s
==16481== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

before appending to /tmp/foo, :2021/metrics returns an empty 200 OK. after doing echo foo >>/tmp/foo, it returns

< HTTP/1.1 200 OK
< Server: Monkey/1.7.2
< Date: Wed, 17 Apr 2024 02:35:58 GMT
< Transfer-Encoding: chunked
< Content-Type: text/plain; version=0.0.4
<
# HELP log_metric_counter_foos foo count
# TYPE log_metric_counter_foos counter
log_metric_counter_foos 1
* Connection #0 to host localhost left intact

@ghost
Copy link
Author

ghost commented Apr 17, 2024

I understand, thanks for the explanation. Please let me know when this PR is ready for review.

@edsiper I think it's ready now.

@github-actions github-actions bot removed the Stale label Apr 19, 2024
@vchirikov
Copy link

404 on empty metrics isn't ok :(
please check/merge the PR
cc: @edsiper

Copy link
Contributor

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jul 26, 2024
@vchirikov
Copy link

not stale

@github-actions github-actions bot removed the Stale label Jul 28, 2024
@ghost ghost closed this by deleting the head repository Nov 29, 2024
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants