Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support config files #169

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions nbstripout/_nbstripout.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@
*.ipynb diff=ipynb
"""

from argparse import ArgumentParser, RawDescriptionHelpFormatter
from argparse import ArgumentParser, Namespace, RawDescriptionHelpFormatter
import collections
import io
import json
Expand All @@ -120,7 +120,7 @@
import sys
import warnings

from nbstripout._utils import strip_output, strip_zeppelin_output
from nbstripout._utils import strip_output, strip_zeppelin_output, merge_configuration_file
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from nbstripout._utils import strip_output, strip_zeppelin_output, merge_configuration_file
from nbstripout._utils import strip_output, strip_zeppelin_output, merge_configuration_files

try:
# Jupyter >= 4
from nbformat import read, write, NO_CONVERT
Expand Down Expand Up @@ -351,7 +351,7 @@ def status(git_config, install_location=INSTALL_LOCATION_LOCAL, verbose=False):
return 1


def main():
def setup_commandline() -> Namespace:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer returning a Namespace but a parser, right?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be called get_parser or sth like that.

parser = ArgumentParser(epilog=__doc__, formatter_class=RawDescriptionHelpFormatter)
task = parser.add_mutually_exclusive_group()
task.add_argument('--dry-run', action='store_true',
Expand All @@ -377,7 +377,7 @@ def main():
help='Space separated list of extra keys to strip '
'from metadata, e.g. metadata.foo cell.metadata.bar')
parser.add_argument('--drop-empty-cells', action='store_true',
help='Remove cells where `source` is empty or contains only whitepace')
help='Remove cells where `source` is empty or contains only whitespace')
parser.add_argument('--drop-tagged-cells', default='',
help='Space separated list of cell-tags that remove an entire cell')
parser.add_argument('--strip-init-cells', action='store_true',
Expand All @@ -402,7 +402,13 @@ def main():
help='Prints stripped files to STDOUT')

parser.add_argument('files', nargs='*', help='Files to strip output from')
args = parser.parse_args()

return parser


def main():
parser = setup_commandline()
args = merge_configuration_file(parser)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args = merge_configuration_file(parser)
args = merge_configuration_files(parser)


git_config = ['git', 'config']

Expand Down
109 changes: 108 additions & 1 deletion nbstripout/_utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
from collections import defaultdict
from argparse import ArgumentParser, Namespace, _StoreTrueAction, _StoreFalseAction
import os
import sys
from collections import defaultdict
from functools import partial
from typing import Any, Dict, Optional

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep imports sorted

__all__ = ["pop_recursive", "strip_output", "strip_zeppelin_output", "MetadataError"]

Expand Down Expand Up @@ -153,3 +157,106 @@ def strip_output(nb, keep_output, keep_count, extra_keys=[], drop_empty_cells=Fa
for field in keys['cell']:
pop_recursive(cell, field)
return nb


def process_pyproject_toml(toml_file_path: str, parser: ArgumentParser) -> Optional[Dict[str, Any]]:
"""Extract config mapping from pyproject.toml file."""
try:
import tomllib # python 3.11+
except ModuleNotFoundError:
import tomli as tomllib

with open(toml_file_path, 'rb') as f:
dict_config = tomllib.load(f).get('tool', {}).get('nbstripout', None)

if not dict_config:
# could be {} if 'tool' not found, or None if 'nbstripout' not found
return dict_config

# special processing of boolean options, make sure we don't have invalid types
for a in parser._actions:
if a.dest in dict_config and isinstance(a, (_StoreTrueAction, _StoreFalseAction)):
Comment on lines +178 to +179
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of reaching into the internals of ArgumentParser but also don't have a great idea what to do instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought that this would still be more robust than keeping an explicit list of Boolean arguments, although that wouldn't be impossible. I looked at a range of options, including shifting to an alternative arguments library, like Click, but that all got too complicated.
I think that this is a reasonable compromise if we keep on top of adding new Python versions to the test suite whenever they become available.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agree

if not isinstance(dict_config[a.dest], bool):
raise ValueError(f'Argument {a.dest} in pyproject.toml must be a boolean, not {dict_config[a.dest]}')

return dict_config


def process_setup_cfg(cfg_file_path, parser: ArgumentParser) -> Optional[Dict[str, Any]]:
"""Extract config mapping from setup.cfg file."""
import configparser

reader = configparser.ConfigParser()
reader.read(cfg_file_path)
if not reader.has_section('nbstripout'):
return None

raw_config = reader['nbstripout']
dict_config = dict(raw_config)

# special processing of boolean options, to convert various configparser bool types to true/false
for a in parser._actions:
if a.dest in raw_config and isinstance(a, (_StoreTrueAction, _StoreFalseAction)):
Comment on lines +199 to +200
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

dict_config[a.dest] = raw_config.getboolean(a.dest)

return dict_config


def merge_configuration_file(parser: ArgumentParser, args_str=None) -> Namespace:
janosh marked this conversation as resolved.
Show resolved Hide resolved
"""Merge flags from config files into args."""
CONFIG_FILES = {
'pyproject.toml': partial(process_pyproject_toml, parser=parser),
'setup.cfg': partial(process_setup_cfg, parser=parser),
}

# parse args as-is to look for configuration files
args = parser.parse_args(args_str)

# Traverse the file tree common to all files given as argument looking for
# a configuration file
# TODO: make this more like Black:
# By default Black looks for pyproject.toml starting from the common base directory of all files and
# directories passed on the command line. If it’s not there, it looks in parent directories. It stops looking
# when it finds the file, or a .git directory, or a .hg directory, or the root of the file system, whichever
# comes first.
# if no files are given, start from cwd
config_path = os.path.commonpath([os.path.abspath(file) for file in args.files]) if args.files else os.path.abspath(os.getcwd())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line gets a bit long with a ternary

config: Optional[Dict[str, Any]] = None
while True:
for config_file, processor in CONFIG_FILES.items():
config_file_path = os.path.join(config_path, config_file)
if os.path.isfile(config_file_path):
config = processor(config_file_path)
if config is not None:
break
if config is not None:
break
config_path, tail = os.path.split(config_path)
if not tail:
break

# black starts with default arguments (from click), updates that with the config file,
# then applies command line arguments. this all happens in the click framework, before main() is called
# we can use parser.set_defaults
if config:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't picked up on this in my first round review: The way you implemented this, config file settings take precedence over command line arguments, however it should be the other way around.

I realise this will be a bit harder to implement. You could turn the Namespace returned by argparse into a dict and merge that into the dict you get from reading the config files, where the argparse dict overwrites.

What makes this even more annoying is that you don't know if a value has been set by the caller or is simply the default :( I may need to look around for an idiomatic way to implement this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay. Have to prioritize something else atm. Hope to come back to this at some point but if anyone sees fit, feel free to take over.

# check all arguments are valid
valid_args = vars(args).keys()
for name in config.keys():
if name not in valid_args:
raise ValueError(f'{name} in the config file is not a valid option')

# separate into default-overrides and special treatment
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a brief comment why extra_keys needs special treatment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure that it does. I left this with the same functionality that @janosh did.
This special treatment lets you specify a base set of extra keys in the config files, and then add to that in an ad-hoc manner, but does not let you remove extra keys....
Alternative treatment would be that whenever you specify extra keys you would have to list the complete set.
The more I think about it, the more I think that perhaps special treatment is NOT justified.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's a fair point and treating it specially may lead to surprising results.

@janosh can you explain your rationale? Do you agree that special treatment of this flag may be surprising?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember for sure at this point, but I think it might be left over from the original autoflake8 config file PR. I agree minimal user surprise is usually the way to go.

extra_keys: Optional[str] = None
if 'extra_keys' in config:
extra_keys = config['extra_keys']
del config['extra_keys']

# merge the configuration options as new defaults, and re-parse the arguments
parser.set_defaults(**config)
args = parser.parse_args(args_str)

# merge extra_keys using set union
if extra_keys:
args.extra_keys = ' '.join(sorted(set(extra_keys.split()) | set(args.extra_keys.split())))

return args
1 change: 1 addition & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pytest_plugins = "pytester"
Loading