All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning.
_stash_dataframe_as_csv
incivis/ml/_model.py
now uses aStringIO
object which has thegetvalue
method (required bypandas
v0.23.1 if a file-like object is passed intodf.to_csv
). (#259)
- Added instructions in the README for adding an API key to a Windows 10 environment
- Configured Windows CI using AppVeyor. (#258)
- Coalesced
README.rst
andindex.rst
. (#254)
- Added more robust parsing for tablename parsing in io. You may now pass in tables like schema."tablename.with.periods".
- Adding in missing documentation for civis_file_to_table
- Include JSON files with pip distributions (#244)
- Added flush to
civis_to_file
when passed a user-created buffer, ensuring the buffer contains the entire file when subsequently read. - Fix several tests in the
test_io
module (#248) - Travis tests for Python 3.4 are now restricted to pandas<=0.20, the last version which supported Python 3.4 (#249)
- Added a utility function which can robustly split a Redshift schema name and table name which are presented as a single string joined by a "." (#225)
- Added docstrings for
civis.find
andcivis.find_one
. (#224) - Executors in
futures
(and the joblib backend, which uses them) will now add "CIVIS_PARENT_JOB_ID" and "CIVIS_PARENT_RUN_ID" environment variables to the child jobs they create (#236) - Update default CivisML version to v2.2. This includes a new function
ModelPipeline.register_pretrained_model
which allows users to train a model outside of Civis Platform and use CivisML to score it at scale (#242, #247). - Added a new parameter
dvs_to_predict
tocivis.ml.ModelPipeline.predict
. This allows users to select a subset of a model's outputs for scoring (#241). - Added
civis.io.export_to_civis_file
to store results of a SQL query to a Civis file - Surfaced
civis.find
andcivis.find_one
in the Sphinx docs. (#250)
- Moved "Optional Dependencies" doc section to top of ML docs, and added clarifications for pre-defined models with non-sklearn estimators (#238)
- Switched to pip install-ing dependencies for building the documentation (#230)
- Added a merge rule for the changelog to .gitattributes (#229)
- Default to "all" API resources rather than "base".
- Updated documentation on algorithm hyperparameters to reflect changes with CivisML v2.2 release (#240)
- Added a script for integration tests (smoke tests).
- Added missing string formatting to a log emit in file multipart upload and correct ordering of parameters in another log emit (#217)
- Updated documentation with new information about predefined stacking estimators (#221)
- Updated CivisML 2.0 notebook (#214)
- Reworded output of
civis notebooks new
CLI command (#215)
- Documentation updated to reflect CivisML 2.1 features (#209)
civis.io.dataframe_to_civis
,civis.io.csv_to_civis
, andcivis.io.civis_file_to_table
functions now support thediststyle
parameter.- New notebook-related CLI commands: "new", "up", "down", and "open".
- Additional documentation for using the Civis joblib backend (#199)
- Documented additional soft dependencies for CivisML (#203)
- Changed
ModelPipeline.train
default forn_jobs
from 4 toNone
, so thatn_jobs
will be dynamically calculated by default (#203) - Use "feather"-formatted files to send data from users to CivisML, if possible.
Require this when using
pd.Categorical
types, since CSVs require us to re-infer column types, and this can fail. Using feather should also give a speed improvement; it reads and writes faster than CSVs and produces smaller files (#200). ModelFuture
objects will emit any warnings which occurred during their corresponding CivisML job (#204)- Removed line setting "n_jobs" from an example of CivisML prediction. Recommended use is to let CivisML determine the number of jobs itself (#211).
- Update maximum CivisML version to v2.1; adjust fallback logic such that users get the most recent available release (#212).
- Restored the pre-v1.7.0 default behavior of the
joblib
backend by setting theremote_backend
parameter default to 'sequential' as opposed to 'civis'. The default of 'civis' would launch additional containers in nested calls tojoblib.Parallel
. (#205) - If validation metadata are missing,
ModelFuture
objects will returnNone
for metrics or validation metadata, rather than issuing an exception (#208) - Allowed callers to pass
index
andencoding
arguments to theto_csv
method throughdataframe_to_civis
.
civis.io.file_to_civis
now uses additional file handles for multipart upload instead of writing to disk to reduce disk usagecivis.io.dataframe_to_civis
writes dataframes to disk instead of using an in memory buffer
- Relaxed requirement on
cloudpickle
version number (#187) - Restore previous behavior of
civis.io.civis_to_csv
when using "compression='gzip'" (#195)
- Specify escape character in
civis.io.read_civis_sql
when performing parallel unload - Issue uploading files in
civis.io.file_to_civis
- Revert performance enhancement that will change format of file produced by
civis.io.civis_to_csv
- Updated CivisML template ids to v2.0 (#139)
- Optional arguments to API endpoints now display in function signatures. Function signatures show a default value of "DEFAULT"; arguments will still only be transmitted to the Civis Platform API when explicitly provided. (#140)
APIClient.feature_flags
has been deprecated to avoid a name collision with the feature_flags endpoint. In v2.0.0,APIClient.featureflags
will be renamed toAPIClient.feature_flags
.- The following APIClient attributes have been deprecated in favor of the
attribute that includes underscores:
APIClient.bocceclusters
->APIClient.bocce_clusters
APIClient.matchtargets
->APIClient.match_targets
APIClient.remotehosts
->APIClient.remote_hosts
civis.io.csv_to_civis
andcivis.io.dataframe_to_civis
functions now usecivis.io.file_to_civis
andcivis.io.civis_file_to_table
functions instead of separate logiccivis.io.file_to_civis
,civis.io.csv_to_civis
andcivis.io.dataframe_to_civis
now support files over 5GB- Refactor internals of
CivisFuture
andPollableResult
to centralize handling of threads andpubnub
subscription. - Updated API specification and base resources to include all general availability endpoints.
- Changed
civis.io.file_to_civis
andcivis.io.civis_to_file
to allow strings for paths to local files in addition to just file/buffer objects.
- Fixed parsing of multiword endpoints. Parsing no longer removes underscores in endpoint names.
- In
civis.futures.ContainerFuture
, returnFalse
when users attempt to cancel an already-completed job. Previously, the object would sometimes give aCivisAPIError
with a 404 status code. This fix affects the executors and joblib backend, which use theContainerFuture
. - Tell
flake8
to ignore a broad except in aCivisFuture
callback. - Close open sockets (in both the
APIClient
andCivisFuture
) when they're no longer needed, so as to not use more system file handles than necessary (#173). - Correct treatment of
FileNotFoundError
in Python 2 (#176). - Fixed parsing of endpoints containing hyphens. Hyphens are replaced with underscores.
- Use
civis.compat.TemporaryDirectory
incivis.io.file_to_civis
to be compatible with Python 2.7 - Catch notifications sent up to 30 seconds before the
CivisFuture
connects. Fixes a bug where we would sometimes miss an immediate error on SQL scripts (#174).
- Documentation updated to include new CivisML features (#137).
civis.resources.cache_api_spec
function to make it easier to record the current API spec locally (#141).- Autospecced mock of the
APIClient
for use in testing third-party code which uses this library (#141). - Added
etl
,n_jobs
, andvalidation_data
arguments to ModelPipeline.train (#139). - Added
cpu
,memory
, anddisk
arguments to ModelPipeline.predict (#139). - Added
remote_backend
keyword to thecivis.parallel.make_backend_factory
andcivis.parallel.infer_backend_factory
in order to set the joblib backend in the container for nested calls tojoblib.Parallel
. - Added the PyPI trove classifiers for Python 3.4 and 3.6 (#152).
civis.io.civis_file_to_table
function to import an existing Civis file to a tablecivis.io.file_to_civis
function will now automatically retry uploads to the Civis Platform up to 5 times if is there is an HTTPError, ConnectionError or ConnectionTimeout- Additional documentation about the use case for the Civis joblib backend.
- Added a note about serializing
ModelPipeline
APIClient
objects to the docstring. - Added
civis notebooks download
command-line interface command to facilitate downloading notebooks.
civis.io.file_to_civis
now takes advantage of multipart uploads to chunk files and perform I/O in parallelcivis.io.civis_to_csv
andcivis.io.read_civis_sql
will always request data with gzip compression to reduce I/O. Also, they will attempt to fetch headers in a separate query so that data can be unloaded in parallelcivis.io.civis_to_csv
withcompression='gzip'
currently returns a file with no compression. In a future release,compression='gzip'
will return a gzip compressed file.
- Added explanatory text to CivisML_parallel_training.ipynb (#126).
- Added
ResourceWarning
for Python 2.7 (#128). - Added
TypeError
for multi-indexed dataframes when used as input to CivisML (#131). ModelPipeline.from_existing
will warn if users attempt to recreate a model trained with a newer version of CivisML, and fall back on the most recent prediction template it knows of (#134).- Make the
PaginatedResponse
returned by LIST endpoints a full iterator. This also makes theiterator=True
parameter work in Python 2. - When using
civis.io.civis_to_csv
, emit a warning on SQL queries which return no results instead of allowing a crypticIndexError
to surface (#135). - Fixed the example code snippet for
civis.io.civis_to_multifile_csv
. Also provided more details on its return dict in the docstring. - Pinned down
sphinx_rtd_theme
andnumpydoc
indev-requirements.txt
for building the documentation.
- Jupyter notebook with demonstrations of use patterns and abstractions in the Python API client (#127).
- Catch unnecessary warning while importing xgboost in CivisML_parallel_training.ipynb (#121)
- Fixed bug where instantiating a new model via
ModelPipeline.from_existing
from an existing model with empty "PARAMS" and "CV_PARAMS" boxes fails (#122). - Users can now access the
ml
andparallel
namespaces from the basecivis
namespace (#123). - Parameters in the Civis API documentation now display in the proper order (#124).
- Edited example for safer null value handling
- Make
pubnub
andjoblib
hard dependencies instead of optional dependencies (#110). - Retry network errors and wait for API rate limit refresh when using the CLI (#117).
- The CLI now provides a User-Agent header which starts with "civis-cli" (#117)
- Include
pandas
andsklearn
-dependent code in Travis CI tests.
- Version 1.1 of CivisML, with custom dependency installation from remote git hosting services (i.e., Github, Bitbucket).
- Added email notifications option to
ModelPipeline
. - Added custom
joblib
backend for multiprocessing in the Civis Platform. Public-facing functions aremake_backend_factory
,make_backend_template_factory
, andinfer_backend_factory
. Includes a new hard dependency oncloudpickle
to facilitate code transport.
- Fixed a bug where the version of a dependency for Python 2.7 usage was incorrectly specified.
- Non-seekable file-like objects can now be provided to
civis.io.file_to_civis
. Only seekable file-like objects will be streamed. - The
civis.ml.ModelFuture
no longer raises an exception if its model job is cancelled. - The CLI's API spec cache now expires after 24 hours instead of 10 seconds.
- Fixed a bug where
ModelFuture.validation_metadata
would not source training job metadata for aModelFuture
corresponding to prediction job (#90). - Added more locks to improve thread safety in the
PollableResult
andCivisFuture
. - Fix issue with Python 2/3 dependency management (#89).
- Fixed a bug which caused an exception to be set on all
ModelFuture
objects, regardless of job status (#86). - Fixed a bug which made the
ModelPipeline
unable to generate prediction jobs for models trained with v0.5 templates (#84). - Handle the case when inputs to
ModelFuture
arenumpy.int64
(or other non-integer
ints) (#85).
- Convert
README.md
(Markdown) toREADME.rst
(reStructuredText).
- Retries to http request in
get_swagger_spec
to make calls toAPIClient
robust to network failure - Parameter
local_api_spec
toAPIClient
to allow creation of client from local cache - Clarify
civis.io.dataframe_to_civis
docstring with a note about treatment of the index. - Added functions
civis.io.file_id_from_run_output
,civis.io.file_to_dataframe
, andcivis.io.file_to_json
. - Added
civis.ml
namespace withModelPipeline
interface to Civis Platform modeling capabilities. - Added
examples
directory with sampleModelPipeline
code fromcivis.ml
. - Python 2.7 compatibility
- Corrected the defaults listed in the docstring for
civis.io.civis_to_multifile_csv
. - Do not allow uploading of files greater than 5GB to S3 (#58).
- Revised example code of docstring of civis_to_file to use bytes when downloading file
- Modified retry behavior so that 413, 429, or 503 errors accompanied by a "Retry-After" header will be retried regardless of the HTTP verb used.
- Add CSV settings arguments to
civis.io.civis_to_csv
function. - Refactored use of "swagger" language.
get_swagger_spec
is nowget_api_spec
andparse_swagger
is nowparse_api_spec
. - Modified
CivisFuture
so if PubNub is disconnected, it will fall back to polling on a shorter interval.
- Deprecate
api_key
input to higher-level functions and classes in favor of anAPIClient
input. Theapi_key
will be removed in v2.0.0. (#46)
- Improved threading implementation in
PollableResult
so that it no longer blocks interpreter shutdown. - Allow the base url of the API to be configured through the
CIVIS_API_ENDPOINT
environment variable. (#43)
- Decorator function for deprecating parameters (#46)
civis.futures.CivisFuture
for tracking future results
civis.io.file_to_civis
will perform a streaming upload to Platform if the optionalrequests-toolbelt
package is installed.- Replace all
PollableResult
return values withCivisFuture
to reduce the number of API calls and increase speed
- support for multifile csv exports
- support for subscription based polling
- civis.io functions use the "hidden" API option to keep jobs out of the UI. Deprecate the "archive" parameter in favor of "hidden".
- civis.io.query_civis now has a "hidden" parameter which defaults to True
- expose
poller
andpoller_args
as public attributes inPollableResults
- update to
default_credential
to handle pagination incredentials.list
endpoint.
- miscellaneous documentation fixes
- unexpected keyword arguments passed to
APIClient
methods now raise appropriate TypeError
- Decrease time required to create client objects from ~0.6 seconds to ~150 us for all objects after the first in a session
- civis.io reads/writes to/from memory instead of disk where appropriate
- Minor documentation corrections
- 204/205 responses now return valid Response object
- Initial release