Skip to content

Commit

Permalink
Merge pull request #79 from impresso/adapt-to-runai
Browse files Browse the repository at this point in the history
Adapt to runai
  • Loading branch information
piconti authored Nov 30, 2023
2 parents 829f681 + b2004d0 commit f0067e0
Show file tree
Hide file tree
Showing 107 changed files with 3,393 additions and 15,131 deletions.
87 changes: 69 additions & 18 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,28 +1,79 @@
FROM daskdev/dask:2.3.0
# Set base image
FROM daskdev/dask:2023.11.0-py3.11

# Install some necessary tools.
RUN apt-get update && apt-get install -y \
cmake \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Set environment variables for user
ENV GROUP_NAME=DHLAB-unit
ENV GROUP_ID=11703

# Add local impresso_pycommons
ADD . .
ARG USER_NAME
ARG USER_ID

# Install build tools and libraries
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
pkg-config \
cmake \
software-properties-common

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \
apt-utils \
git \
curl \
vim \
unzip \
wget \
tmux \
screen \
wget \
sudo \
openssh-client

RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip
# Create a group and user
RUN groupadd -g $GROUP_ID $GROUP_NAME
RUN useradd -ms /bin/bash -u $USER_ID -g $GROUP_ID $USER_NAME

# Add new user to sudoers
RUN echo "${USER_NAME} ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

# install desired libraries.
# TODO remove boto once it's removed from all functions.
RUN pip install --upgrade pip setuptools
RUN pip install numpy scipy pillow beautifulsoup4 pandas PyYAML jsonlines pytest
RUN pip install \
git+https://github.com/dkpro/dkpro-pycas.git \
git+https://github.com/impresso/dask_k8.git \
boto \
boto3 \
bs4 \
docopt \
"kubernetes>=9.0.0,<10" \
kubernetes \
"urllib3>1.21.1<1.25" \
"opencv-python>=3.4,<=4" \
opencv-python \
smart_open \
jsonlines \
s3fs \
"s3fs>=2023.3.0" \
jupyter

RUN python setup.py install
EXPOSE 8080
EXPOSE 8786
EXPOSE 8787

# Set the working directory
WORKDIR /home/$USER_NAME/impresso_pycommons

# Add local impresso_pycommons
COPY . .

# Change ownership of the copied files to the new user and group
RUN chown -R ${USER_NAME}:${GROUP_NAME} /home/${USER_NAME}/impresso_pycommons

# Switch to the new user
USER $USER_NAME

RUN pip install -e .

# Make sure the script launching the rebuilt is executable
RUN chmod -x /home/${USER_NAME}/impresso_pycommons/scripts/start_rebuilt_runai.sh

CMD ["sleep", "infinity"]
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ pip install impresso-commons

## License

The second project 'impresso - Media Monitoring of the Past II. Beyond Borders: Connecting Historical Newspapers and Radio' is funded by the Swiss National Science Foundation (SNSF) under grant number [CRSII5_213585](https://data.snf.ch/grants/grant/213585) and the Luxembourg National Research Fund under grant No. 17498891.
The second project 'impresso - Media Monitoring of the Past II. Beyond Borders: Connecting Historical Newspapers and Radio' is funded by the Swiss National Science Foundation (SNSF) under grant number [CRSII5_213585](https://data.snf.ch/grants/grant/213585) and the Luxembourg National Research Fund under grant No. 17498891.

Aiming to develop and consolidate tools to process and explore large-scale collections of historical newspapers and radio archives, and to study the impact of this tooling on historical research practices, _Impresso II_ builds upon the first project – 'impresso - Media Monitoring of the Past' (grant number [CRSII5_173719](http://p3.snf.ch/project-173719), Sinergia program). More information at https://impresso-project.ch.

Expand Down
Empty file removed docs/.nojekyll
Empty file.
5 changes: 3 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = .
Expand All @@ -16,4 +17,4 @@ help:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/images.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/io.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/rebuild.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/utils.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: d5cdc2e340f0361617247c2bfe93c213
config: 52a307b7b889569b6d220f068464a8b3
tags: 645f666f9bcd5a90fca523b33c5a78b7
15 changes: 9 additions & 6 deletions docs/_build/html/_sources/images.rst.txt
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
Image handling
==============
================================

Image Utils
-----------
------------------------------------------

.. automodule:: impresso_commons.images.img_utils
:members:

:members:
:undoc-members:
:show-inheritance:

Olive Boxes
-----------
--------------------------------------------

.. automodule:: impresso_commons.images.olive_boxes
:members:
:members:
:undoc-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion docs/_build/html/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.. Impresso PyCommons documentation master file, created by
sphinx-quickstart on Tue Sep 18 17:18:04 2018.
sphinx-quickstart on Fri Nov 24 12:10:00 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Expand Down
23 changes: 15 additions & 8 deletions docs/_build/html/_sources/io.rst.txt
Original file line number Diff line number Diff line change
@@ -1,21 +1,28 @@
Input/Output
============
==============================

General
-------
---------------

.. automodule:: impresso_commons.path
:members:
:members:
:undoc-members:
:show-inheritance:


I/O from file system
--------------------
--------------------------------------

.. automodule:: impresso_commons.path.path_fs
:members:

:members:
:undoc-members:
:show-inheritance:

I/O from S3
-----------
--------------------------------------

.. automodule:: impresso_commons.path.path_s3
:members:
:members:
:undoc-members:
:show-inheritance:

24 changes: 15 additions & 9 deletions docs/_build/html/_sources/rebuild.rst.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Text Rebuild
============
==============================

Command Line interface
----------------------
Expand All @@ -21,17 +21,23 @@ Config file example


Rebuild functions
--------------------

A set of functions to transform JSON files in **impresso's canonical format**
into a number of JSON-based formats for different purposes.

.. autofunction:: impresso_commons.text.rebuilder.rebuild_for_solr
.. autofunction:: impresso_commons.text.rebuilder.rebuild_text
-----------------

.. automodule:: impresso_commons.text.rebuilder
:members:
:undoc-members:
:show-inheritance:

Helpers
-------

.. automodule:: impresso_commons.text.helpers
:members:
:members:
:undoc-members:
:show-inheritance:

Running Rebuild on Runai
------------------------

.. include:: ../scripts/rebuild_on_runai.md
:parser: myst_parser.sphinx_
63 changes: 33 additions & 30 deletions docs/_build/html/_sources/utils.rst.txt
Original file line number Diff line number Diff line change
@@ -1,42 +1,45 @@
Utilities
#########
===============================

Basic Utils Functions
------------------------------------

Command Line interface
----------------------
.. automodule:: impresso_commons.utils.utils
:members:
:undoc-members:
:show-inheritance:

to come...


Basic utils
-----------
S3 Utils Functions
---------------------------------

.. autofunctions:: impresso_commons.utils.init_logger
.. autofunctions:: impresso_commons.utils.user_confirmation
.. autofunctions:: impresso_commons.utils.timestamp
.. autofunctions:: impresso_commons.utils.Timer
.. autofunctions:: impresso_commons.utils.chunk
.. automodule:: impresso_commons.utils.s3
:members:
:undoc-members:
:show-inheritance:

Dask Utils Functions
----------------------------------------

S3 Utils Functions
------------------
.. automodule:: impresso_commons.utils.daskutils
:members:
:undoc-members:
:show-inheritance:

Apache UIMA XMI Utils Functions
-----------------------------------

.. autofunctions:: impresso_commons.utils.s3.get_s3_client
.. autofunctions:: impresso_commons.utils.s3.get_s3_resource
.. autofunctions:: impresso_commons.utils.s3.get_s3_connection
.. autofunctions:: impresso_commons.utils.s3.get_bucket
.. autofunctions:: impresso_commons.utils.s3.get_bucket_boto3
.. autofunctions:: impresso_commons.utils.s3.s3_get_articles
.. autofunctions:: impresso_commons.utils.s3.s3_get_pages
.. autofunctions:: impresso_commons.utils.s3.get_s3_versions
.. autofunctions:: impresso_commons.utils.s3.get_s3_versions_client
.. autofunctions:: impresso_commons.utils.s3.read_jsonlines
.. autofunctions:: impresso_commons.utils.s3.readtext_jsonlines
.. autofunctions:: impresso_commons.utils.s3.upload
.. automodule:: impresso_commons.utils.uima
:members:
:undoc-members:
:show-inheritance:

Config File Loader
------------------

Dask Utils Functions
--------------------
.. autofunctions:: impresso_commons.utils.daskutils.create_even_partitions
.. automodule:: impresso_commons.utils.config_loader
:members:
:undoc-members:
:show-inheritance:

.. include:: ../impresso_commons/config/config.loader.example.md
:parser: myst_parser.sphinx_
Loading

0 comments on commit f0067e0

Please sign in to comment.