- The Envoy source code is formatted using clang-format. Thus all white spaces, etc.
issues are taken care of automatically. The Azure Pipelines will automatically check
the code format and fail. There are make targets that can both check the format
(check_format) as well as fix the code format for you (fix_format). Errors in
.clang-tidy are enforced while other warnings are suggestions. Note that code and
comment blocks designated
clang-format off
must be closed withclang-format on
. To run these checks locally, see Support Tools. - Beyond code formatting, for the most part Envoy uses the Google C++ style guidelines. The following section covers the major areas where we deviate from the Google guidelines.
- Please see REPO_LAYOUT.md.
- If you are modifying the data plane structually, please keep the Life of a Request documentation up-to-date.
- Exceptions are allowed and encouraged where appropriate. When using exceptions, do not add additional error handling that cannot possibly happen in the case an exception is thrown.
- Do use exceptions for:
- Configuration ingestion error handling. Invalid configurations (dynamic and
static) should throw meaningful
EnvoyException
s, the configuration ingestion code will catch these. - Constructor failure.
- Error handling in deep call stacks, where exceptions provide material improvements to code complexity and readability.
- Configuration ingestion error handling. Invalid configurations (dynamic and
static) should throw meaningful
- Apply caution when using exceptions on the data path for general purpose error handling. Exceptions are not caught on the data path and they should not be used for simple error handling, e.g. with shallow call stacks, where explicit error handling provides a more readable and easier to reason about implementation.
- References are always preferred over pointers when the reference cannot be null. This includes both const and non-const references.
- Function names should all use camel case starting with a lower case letter (e.g.,
doFoo()
). - Struct/Class member variables have a
_
postfix (e.g.,int foo_;
). - Enum values using PascalCase (e.g.,
RoundRobin
). - 100 columns is the line limit.
- Use your GitHub name in TODO comments, e.g.
TODO(foobar): blah
. - Smart pointers are type aliased:
using FooPtr = std::unique_ptr<Foo>;
using BarSharedPtr = std::shared_ptr<Bar>;
using BlahConstSharedPtr = std::shared_ptr<const Blah>;
- Regular pointers (e.g.
int* foo
) should not be type aliased.
- `absl::optional<std::reference_wrapper> is type aliased:
using FooOptRef = absl::optional<std::reference_wrapper<T>>;
using FooOptConstRef = absl::optional<std::reference_wrapper<const T>>;
- If move semantics are intended, prefer specifying function arguments with
&&
. E.g.,void onHeaders(Http::HeaderMapPtr&& headers, ...)
. The rationale for this is that it forces the caller to specifystd::move(...)
or pass a temporary and makes the intention at the callsite clear. Otherwise, it's difficult to tell if a const reference is actually being passed to the called function. This is true even forstd::unique_ptr
. - Prefer
unique_ptr
overshared_ptr
wherever possible.unique_ptr
makes ownership in production code easier to reason about. Note that this creates some test oddities where production code requires aunique_ptr
but the test must still have access to the memory the production code is using (mock or otherwise). In these cases it is acceptable to allocate raw memory in a test and return it to the production code with the expectation that the production code will hold it in aunique_ptr
and free it. Envoy uses the factory pattern quite a bit for these cases. (Search the code for "factory"). - The Google C++ style guide points out that non-PoD static and global variables are forbidden.
This includes types such as
std::string
. We encourage the use of the advice in the C++ FAQ on the static initialization fiasco for how to best handle this. - The Google C++ style guide points out that constant vars should be named
kConstantVar
. In the Envoy codebase we useConstantVar
orCONSTANT_VAR
. If you pickCONSTANT_VAR
, please be certain the name is globally significant to avoid potential conflicts with #defines, which are not namespace-scoped, and may appear in externally controlled header files. - API-level comments should follow normal Doxygen conventions. Use
@param
to describe parameters and@return <return-type>
for return values. Internal comments for methods and member variables may be regular C++//
comments or Doxygen at developer discretion. Where possible, methods should have meaningful documentation on expected input and state preconditions. - Header guards should use
#pragma once
. - All code should be inside a top-level Envoy namespace. There are some
exceptions such as
main()
functions. When code cannot be placed inside the Envoy namespace there should be a comment of the form// NOLINT(namespace-envoy)
at the top of the file. - If a method that must be defined outside the
test
directory is intended to be called only from test code then it should have a name that ends inForTest()
such asaMethodForTest()
. In most cases tests can and should be structured so this is not necessary. - Tests default to StrictMock so will fail if hitting unexpected warnings. Feel free to use NiceMock for mocks whose behavior is not the focus of a test.
- Thread
annotations,
such as
ABSL_GUARDED_BY
, should be used for shared state guarded by locks/mutexes. - Functions intended to be local to a cc file should be declared in an anonymous namespace, rather than using the 'static' keyword. Note that the Google C++ style guide allows either, but in Envoy we prefer anonymous namespaces.
- Braces are required for all control statements include single line if, while, etc. statements.
- Don't use mangled Protobuf enum names.
A few general notes on our error handling philosophy:
- All error code returns should be checked.
- At a very high level, our philosophy is that errors should be handled gracefully when caused by:
- Untrusted network traffic (from downstream, upstream, or extensions like filters)
- Raised by the Envoy process environment and are likely to happen
- Third party dependency return codes
- Examples of likely environnmental errors include any type of network error, disk IO error, bad data returned by an API call, bad data read from runtime files, etc. This includes loading configuration at runtime.
- Third party dependency return codes should be checked and gracefully handled. Examples include HTTP/2 or JSON parsers. Some return codes may be handled by continuing, for example, in case of an out of process RPC failure.
- Testing should cover any serious cases that may result in infinite loops, crashes, or serious errors. Non-trivial invariants are also encouraged to have testing. Internal, localized invariants may not need testing.
- Errors in the Envoy environment that are unlikely to happen after process initialization, should
lead to process death, under the assumption that the additional burden of defensive coding and
testing is not an effective use of time for an error that should not happen given proper system
setup. Examples of these types of errors include not being able to open the shared memory region,
system calls that should not fail assuming correct parameters (which should be validated via
tests), etc. Examples of system calls that should not fail when passed valid parameters include
the kernel returning a valid
sockaddr
after a successful call toaccept()
,pthread_create()
,pthread_join()
, etc. However, system calls that require permissions may cause likely errors in some deployments and need graceful error handling. - OOM events (both memory and FDs) or ENOMEM errors are considered fatal crashing errors. An OOM
error should never silently be ignored and should crash the process either via the C++ allocation
error exception, an explicit
RELEASE_ASSERT
following a third party library call, or an obvious crash on a subsequent line via null pointer dereference. This rule is again based on the philosophy that the engineering costs of properly handling these cases are not worth it. Time is better spent designing proper system controls that shed load if resource usage becomes too high, etc. - The "less is more" error handling philosophy described in the previous points is primarily based on the fact that restarts are designed to be fast, reliable and cheap.
- Although we strongly recommend that any type of startup error leads to a fatal error, since this
is almost always a result of faulty configuration which should be caught during a canary process,
there may be cases in which we want some classes of startup errors to be non-fatal. For example,
if a misconfigured option is not necessary for server operation. Although this is discouraged, we
will discuss these on a case by case basis during code review (an example of this
is the
--admin-address-path
option). If degraded mode error handling is implemented, we require that there is complete test coverage for the degraded case. Additionally, the user should be aware of the degraded state minimally via an error log of level warn or greater and via the increment of a stat. - If you do need to log a non-fatal warning or error, you can unit-test it with EXPECT_LOG_CONTAINS or EXPECT_NO_LOGS from logging.h. It's generally bad practice to test by depending on log messages unless the actual behavior being validated is logging. It's preferable to export statistics to enable consumption by external monitoring for any behavior that should be externally consumed or to introduce appropriate internal interfaces such as mocks for internal behavior.
- The error handling philosophy described herein is based on the assumption that Envoy is deployed using industry best practices (primarily canary). Major and obvious errors should always be caught in canary. If a low rate error leads to periodic crash cycling when deployed to production, the error rate should allow for rollback without large customer impact.
- Tip: If the thought of adding the extra test coverage, logging, and stats to handle an error and continue seems ridiculous because "this should never happen", it's a very good indication that the appropriate behavior is to terminate the process and not handle the error. When in doubt, please discuss.
-
The following macros are available:
RELEASE_ASSERT
: fatal check.ASSERT
: fatal check in debug-only builds. These should be used to document (and check in debug-only builds) program invariants.ENVOY_BUG
: logs and increments a stat in release mode, fatal check in debug builds. These should be used where it may be useful to detect if an efficient condition is violated in production (and fatal check in debug-only builds).
-
Sub-macros alias the macros above and can be used to annotate specific situations:
ENVOY_BUG_ALPHA
(aliasENVOY_BUG
): Used for alpha or rapidly changing protocols that need detectability on probable conditions or invariants.
-
Per above it's acceptable to turn failures into crash semantics via
RELEASE_ASSERT(condition)
orPANIC(message)
if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. -
Do not
ASSERT
on conditions imposed by the external environment. Either add error handling (potentially with anENVOY_BUG
for detectability) orRELEASE_ASSERT
if the condition indicates that the process is unrecoverable. -
Use
ASSERT
andENVOY_BUG
liberally, but do not use them for things that will crash in an obvious way in a subsequent line. E.g., do not doASSERT(foo != nullptr); foo->doSomething();
. -
Use
ASSERT
s for true invariants and well-defined conditions that are useful for tests, debug-only checks and documentation. They may beENVOY_BUG
s if performance allows, see point below. -
ENVOY_BUG
s provide detectability and more confidence than anASSERT
. They are useful for non-trivial conditions, those with complex control flow, and rapidly changing protocols. Testing should be added to ensure that Envoy can continue to operate even if anENVOY_BUG
condition is violated. -
Annotate conditions with comments on belief or reasoning, for example
Condition is guaranteed by caller foo
orCondition is likely to hold after processing through external library foo
. -
Macro usage should be understandable to a reader. Add comments if not. They should be robust to future changes.
-
Note that there is a gray line between external environment failures and program invariant violations. For example, memory corruption due to a security issue (a bug, deliberate buffer overflow etc.) might manifest as a violation of program invariants or as a detectable condition in the external environment (e.g. some library returning a highly unexpected error code or buffer contents). Unfortunately no rule can cleanly cover when to use
RELEASE_ASSERT
vs.ASSERT
. In general we viewASSERT
as the common case andRELEASE_ASSERT
as the uncommon case, but experience and judgment may dictate a particular approach depending on the situation. The risk of process death fromRELEASE_ASSERT
should be justified with the severity and possibility of the condition to avoid unintentional crashes. You may use the following guide:- If a violation is high risk (will cause a crash in subsequent data processing or indicates a
failure state beyond recovery), use
RELEASE_ASSERT
. - If a violation is medium or low risk (Envoy can continue safely) and is not expensive,
consider
ENVOY_BUG
. - Otherwise (if a condition is expensive or test-only), use
ASSERT
.
- If a violation is high risk (will cause a crash in subsequent data processing or indicates a
failure state beyond recovery), use
Below is a guideline for macro usage. The left side of the table has invariants and the right side
has error conditions that can be triggered and should be gracefully handled. ENVOY_BUG
represents
a middle ground that can be used for uncertain conditions that need detectability. ENVOY_BUG
s can
also be added for errors if they warrant detection.
ASSERT /RELEASE_ASSERT |
ENVOY_BUG |
Error handling and Testing |
---|---|---|
Low level invariants in data structures | ||
Simple, provable internal class invariants | Complex, uncertain internal class invariants (e.g. need detectability if violated) | |
Provable (pre/post)-conditions | Complicated but likely (pre-/post-) conditions that are low-risk (Envoy can continue safely) | Triggerable or uncertain conditions, may be based on untrusted data plane traffic or an extensions’ contract. |
Conditions in alpha or changing extensions that need detectability. (ENVOY_BUG_ALPHA ) |
||
Unlikely environment errors after process initialization that would otherwise crash | Likely environment errors, e.g. return codes from untrusted extensions, dependencies or system calls, network error, bad data read, permission based errors, etc. | |
Fatal crashing events. e.g. OOMs, deadlocks, no process recovery possible |
Tests should be hermetic, i.e. have all dependencies explicitly captured and not depend on the local environment. In general, there should be no non-local network access. In addition:
-
Port numbers should not be hardcoded. Tests should bind to port zero and then discover the bound port when needed. This avoids flakes due to conflicting ports and allows tests to be executed concurrently by Bazel. See
test/integration/integration_test.h
andtest/common/network/listener_impl_test.cc
for examples of tests that do this. -
Paths should be constructed using:
- The methods in
TestEnvironment
for C++ tests. - With
${TEST_TMPDIR}
(for writable temporary space) or${TEST_SRCDIR}
for read-only access to test inputs in shell tests. - With
{{ test_tmpdir }}
,{{ test_rundir }}
and{{ test_udsdir }}
respectively for JSON templates.{{ test_udsdir }}
is provided for pathname based Unix Domain Sockets, which must fit within a 108 character limit on Linux, a property that might not hold for{{ test_tmpdir }}
.
- The methods in
Tests should be deterministic. They should not rely on randomness or details
such as the current time. Instead, mocks such as
MockRandomGenerator
and
Mock*TimeSource
should be used.