Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-3335] Add interface ConfigShade to support encryption of sensitive configuration items and provide a base64 encoding implementation #3396

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Jzjsnow
Copy link
Contributor

@Jzjsnow Jzjsnow commented Jan 6, 2025

Why are the changes needed?

When we start Amoro Management Service , we need to set configuration items in plaintext in the config file, including sensitive configurations such as admin-password and the passwords for connecting to databases (e.g., mysql, pg, etc.), which may be a security risk. To avoid the use of plaintext passwords, we provide an interface (ConfigShade) by implementing which developers can customize the decryption method themselves.

We also provide an implementation for base64 encoding first, not only as an example implementation of the interface, but also to solve the current problem of plaintext passwords.

Close #3335.

Brief change log

  • Add interface ConfigShade to the amoro-common module to provide the ability to customize decryption of sensitive options in the config file
  • Provide an implementation org.apache.amoro.config.shade.impl.Base64ConfigShade for base64 encoding

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

How to use

Using the base64 implementation as an example, the following shows how to use a configuration file with sensitive items encrypted:

  1. Add two new options shade.identifier and shade.sensitive-keywords to the ams part in config.yaml to specify the encryption algorithm and the encrypted sensitive keywords.
  2. Replace the plaintext of the sensitive items specified in shade.sensitive-keywords with the encrypted ciphertext.
  3. Restart the ams service.

Example config file (partial):

ams:
  admin-username: admin
  admin-password: YWRtaW4=
  server-bind-host: "0.0.0.0"
  server-expose-host: "127.0.0.1"

  shade:
    identifier: base64
    sensitive-keywords: admin-password;database.password

  database:
    type: mysql
    jdbc-driver-class: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/amoro?useUnicode=true&characterEncoding=UTF8&autoReconnect=true&useAffectedRows=true&allowPublicKeyRetrieval=true&useSSL=false
    username: root
    password: cGFzc3dvcmQ=

   ...

How to customize the encryption algorithm

To use a user-defined encryption algorithm, we expect the developer to provide a dependency package that implements the ConfigShade interface.

/**
 * The interface that provides the ability to decrypt {@link
 * org.apache.amoro.config.Configurations}.
 */
public interface ConfigShade {
  /**
   * Initializes the custom instance using the service configurations.
   *
   * This method can be useful when decryption requires an external file (e.g. a key file)
   * defined in the service configs.
   */
  default void initialize(Configurations serviceConfig) throws Exception {}

  /**
   * The unique identifier of the current interface, used it to select the correct {@link
   * ConfigShade}.
   */
  String getIdentifier();

  /**
   * Decrypt the content.
   *
   * @param content The content to decrypt
   */
  String decrypt(String content);
}

In it, the method getIdentifier() can be called to get the unique identifier of the algorithm, which is used to configure the shade.identifier, and the method decrypt(String content) can be used to decrypt the input cipher text.

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jzjsnow thanks for the contribution, I left some comments, please have a look when you're free.

BiFunction<String, Object, String> processFunction =
(key, value) -> configShade.decrypt(value.toString());

Preconditions.checkArgument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can skip checking for an empty map, as it will be returned ahead of time when the configshade is recognized as the default value.


@VisibleForTesting
public static String decryptOption(String identifier, String content) {
ConfigShade configShade = CONFIG_SHADES.getOrDefault(identifier, DEFAULT_SHADE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add some logs here if there is no ConfigShade with the given identifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is necessary to print the log. The method decryptConfig is called for normal use, and if the given identifier cannot be loaded, it will log ERROR and throw an IllegalStateException exception. Here, the method decryptOption is only used for unit testing.

public void testDecryptOptions() {
String encryptUsername = "YWRtaW4=";
String encryptPassword = "cGFzc3dvcmQ=";
String decryptUsername = ConfigShadeUtils.decryptOption("base64", encryptUsername);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to change the identifier to Base64ConfigShade.getIdentifier() here,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the identifier string to a static final string with the value of Base64ConfigShade.getIdentifier().


@Test
public void testDecryptOptions() {
String encryptUsername = "YWRtaW4=";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add some comments here to show that these are the message have been transformed by base64

Copy link
Contributor Author

@Jzjsnow Jzjsnow Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new commit, TestConfigShade#getBase64EncodedText is provided to show the base64 encoding process and can be used to encode plaintext for testing.

ConfigOptions.key("shade.sensitive-keywords")
.stringType()
.asList()
.defaultValues("admin-password", "database.password")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the the key here is only located in ams part in the configuration, not sure if this needs to be apply the whole config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the sensitive keywords are all in the ams part, so first I put the shade-related configurations under ams part and only decrypt the ams configurations . If in the future the containers also contain sensitive information, I think it may be necessary to put the shade-related configurations in a separate part, alongside the ams and containers.

Iterator<ConfigShade> it = serviceLoader.iterator();
it.forEachRemaining(
configShade -> {
CONFIG_SHADES.put(configShade.getIdentifier(), configShade);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to log the relationship for the identifier and the configShade here? I assume the shade loaded here would not a big number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the log to indicate the identifier of the ConfigShade implementation and the full name of the class.

@klion26 klion26 self-requested a review January 9, 2025 03:57
@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 0% with 59 lines in your changes missing coverage. Please review.

Project coverage is 21.59%. Comparing base (243d289) to head (67c9d3e).
Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
...che/amoro/config/shade/utils/ConfigShadeUtils.java 0.00% 54 Missing ⚠️
...che/amoro/config/shade/impl/Base64ConfigShade.java 0.00% 4 Missing ⚠️
...ava/org/apache/amoro/config/shade/ConfigShade.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3396      +/-   ##
============================================
- Coverage     21.59%   21.59%   -0.01%     
- Complexity     2309     2313       +4     
============================================
  Files           426      429       +3     
  Lines         39719    39791      +72     
  Branches       5624     5632       +8     
============================================
+ Hits           8577     8591      +14     
- Misses        30414    30473      +59     
+ Partials        728      727       -1     
Flag Coverage Δ
trino 21.59% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@czy006 czy006 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. So, how should we handle this logic in the Helm Chart section

@Jzjsnow Jzjsnow force-pushed the add-support-for-config-shade branch from 4210df3 to b6b9437 Compare January 17, 2025 10:23
Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jzjsnow Thanks for the contribution, LGTM. could you please file an issue to add the documentation, so that the users can know how to use this feature, thanks.

@czy006 could you please help to take another look at the k8s part.

@klion26 klion26 requested a review from czy006 January 22, 2025 09:31
@Jzjsnow
Copy link
Contributor Author

Jzjsnow commented Jan 22, 2025

@czy006 Thanks for the heads up, I've changed the script in helm charts in the latest commit and verified it locally:
Similar to the usage in config.yaml, users needs to fill in the options amoroConf.shade.identifier and amoroConf.shade.sensitiveKeywords in charts/amoro/values.yaml, and replace the plaintext of the sensitive items specified in sensitive-keywords with the encrypted ciphertext. Then deploy it via the helm install command to make it work, as described in the official documentation.
If users want to use their customized encryption algorithm, they need to modify the amoro image prior to the above steps, as they need to upload their decryption dependency package to the image first.

jzjsnow added 4 commits January 22, 2025 18:01
…tive configuration items and provide a base64 encoding implementation
…f sensitive configuration items and provide a base64 encoding implementation
…f sensitive configuration items and provide a base64 encoding implementation
…f sensitive configuration items and provide a base64 encoding implementation
@Jzjsnow Jzjsnow force-pushed the add-support-for-config-shade branch from 0947524 to 5b1d475 Compare January 22, 2025 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Add support for using encrypted passwords in configurations
4 participants