Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update base Docker images, improve Dockerfile section, fix 404s #631

Merged
merged 2 commits into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,34 +9,48 @@ sidebar_position: 4

---

## Base Docker images

Apify provides several Docker images that can be used as a base for user actors.

All images come in two versions: the **latest** tag corresponds to the stable version and **beta** to images where we test new features. Use the beta version at your own risk.

Note that all Apify Docker images are pre-cached on Apify servers in order to speed up the actor builds and runs. The source code used to generate the images is available in the [apify-actor-docker](https://github.com/apify/apify-actor-docker) GitHub repository.

## Images with Apify SDK and Crawlee preinstalled {#apify-sdk-actor-images}

The [Apify SDK for JavaScript](/sdk/js) and [Crawlee](https://crawlee.dev/) are preinstalled on these images. You can read more about them in the [Apify SDK Docker image guide](/sdk/js/docs/guides/docker-images).
### Node.js base images

- **Node.js 16 on Alpine Linux** ([`apify/actor-node`](https://hub.docker.com/r/apify/actor-node/)) - slim and efficient image, contains only the most elementary tools. Note that headless browsers (Puppeteer, Playwright) are not available in this image.
Apify provides several Docker images with Node.js, the [Apify SDK for JavaScript](/sdk/js) and [Crawlee](https://crawlee.dev/) preinstalled.
These images come with either Node.js 16, 18 or 20, you can choose which one you want using one of the `16`, `18` or `20` tags. The `latest` tag corresponds to the latest LTS version of Node.js.

- **Node.js 16 + Puppeteer + Chrome on Debian** ([`apify/actor-node-puppeteer-chrome`](https://hub.docker.com/r/apify/actor-node-puppeteer-chrome/)) - larger image with the Chromium and Google Chrome browsers and the [`puppeteer`](https://github.com/puppeteer/puppeteer) library bundled. With this image, you can use the [`launchPuppeteer()`](https://crawlee.dev/api/puppeteer-crawler/function/launchPuppeteer) function and [`PuppeteerCrawler`](https://crawlee.dev/api/puppeteer-crawler/class/PuppeteerCrawler). Note that Chrome requires quite a lot of resources, therefore the actor should run with at least 2048 MB of memory.
| Image | Description |
| ----- | ----------- |
| Node.js on Alpine Linux ([`actor-node`](https://hub.docker.com/r/apify/actor-node/)) | Slim and efficient image, contains only the most elementary tools. Note that headless browsers (Puppeteer, Playwright) are not available in this image. |
| Node.js + Puppeteer + Chrome on Debian ([`actor-node-puppeteer-chrome`](https://hub.docker.com/r/apify/actor-node-puppeteer-chrome/)) | Larger image with the Chromium and Google Chrome browsers and the [`puppeteer`](https://github.com/puppeteer/puppeteer) library bundled. |
| Node.js + Playwright + Chrome on Debian ([`actor-node-playwright-chrome`](https://hub.docker.com/r/apify/actor-node-playwright-chrome/)) | Larger image with the Chromium and Google Chrome browsers and the [`playwright`](https://github.com/microsoft/playwright) library bundled. |
| Node.js + Playwright + Firefox on Debian ([`actor-node-playwright-firefox`](https://hub.docker.com/r/apify/actor-node-playwright-firefox/)) | Larger image with the Firefox browser and the [`playwright`](https://github.com/microsoft/playwright) library bundled. |
| Node.js + Playwright + WebKit on Ubuntu ([`actor-node-playwright-webkit`](https://hub.docker.com/r/apify/actor-node-playwright-webkit/)) | Larger image with the Webkit browser engine and the [`playwright`](https://github.com/microsoft/playwright) library bundled. |
| Node.js + Playwright + all browsers on Ubuntu ([`actor-node-playwright`](https://hub.docker.com/r/apify/actor-node-playwright/)) | A very large and slow image with the [`playwright`](https://github.com/microsoft/playwright) library and all Playwright browsers (Chromium, Chrome, Firefox, WebKit) bundled. |

- **Node.js 16 + Playwright + Chrome on Debian** ([`apify/actor-node-playwright-chrome`](https://hub.docker.com/r/apify/actor-node-playwright-chrome/)) - similar to the `apify/actor-node-puppeteer-chrome` image, but it comes preinstalled the [`playwright`](https://github.com/microsoft/playwright) automation library instead of Puppeteer. With this image, you can use the [`launchPlaywright()`](https://crawlee.dev/api/playwright-crawler/function/launchPlaywright) function and [`PlaywrightCrawler`](https://crawlee.dev/api/playwright-crawler/class/PlaywrightCrawler). This image also comes with a `firefox` and `webkit` version.
You can read more about each of the images in the [Apify SDK Docker image guide](/sdk/js/docs/guides/docker-images).

For a full list of available images, [see the Apify SDK Docker image guide](/sdk/js/docs/guides/docker-images).
### Python base images

## Images with Apify Client for Python preinstalled {#python-actor-images}
Apify provides several Docker images with Python 3 and the [Apify SDK for Python](/sdk/python) preinstalled.
These images come with either Python 3.8, 3.9, 3.10 or 3.11, you can choose which one you want using one of the `3.8`, `3.9`, `3.10` or `3.11` tags. The `latest` tag corresponds to the latest version of Python 3 supported by the Apify SDK.

The [Apify API client for Python](/api/client/python) is preinstalled on these images.
These images are all based on Debian Bullseye.

- **Python 3 on Alpine Linux** ([`apify/actor-python`](https://hub.docker.com/r/apify/actor-python/)) - a slim image with Python 3 and the [Apify API client for Python](/api/client/python) preinstalled. Comes in multiple versions containing Python 3.7, 3.8, 3.9 or 3.10.
| Image | Description |
| ----- | ----------- |
| Python ([`actor-python`](https://hub.docker.com/r/apify/actor-python)) | Slim and efficient image, containing just the Apify SDK for Python. Headless browsers (Playwright, Selenium) are not available in this image. |
| Python + Playwright ([`actor-python-playwright`](https://hub.docker.com/r/apify/actor-python-playwright)) | Larger image with the [`playwright`](https://github.com/microsoft/playwright) library and all its browsers bundled. |
| Python + Selenium + Chrome ([`actor-python-selenium`](https://hub.docker.com/r/apify/actor-python-selenium)) | Larger image with the [`selenium`](https://github.com/seleniumhq/selenium) library, Google Chrome and [ChromeDriver](https://chromedriver.chromium.org/) bundled. |

## Custom Dockerfile

## [](#custom-dockerfile)Custom Dockerfile
Internally, Apify uses Docker to build and run Actors. If you create an Actor from a template, the Actor already contains an optimized Dockerfile for the given use-case.

Internally, Apify uses Docker to build and run Actors. To control the build of the Actor, you can create a custom **Dockerfile** and either reference from the `dockerfile` field in the Actor's config in **.actor/actor.json**, or store it in **.actor/Dockerfile** or **Dockerfile** in its root directory. These three sites are searched for in this order of preference. If the **Dockerfile** is missing, the system uses the following default:
To control the build of the Actor, you can create a custom **Dockerfile** and either reference from the `dockerfile` field in the Actor's config in **.actor/actor.json**, or store it in **.actor/Dockerfile** or **Dockerfile** in its root directory. These three sites are searched for in this order of preference. If the **Dockerfile** is missing, the system uses the following default:

```dockerfile
FROM apify/actor-node:16
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This option is used by default when your actor's source code is hosted on Apify

The only required file is **Dockerfile**, and all other files depend on your Dockerfile settings. By default, Apify's custom NodeJS Dockerfile is used, which requires a **main.js** file containing your source code and a **package.json** file containing package configurations for [NPM](https://www.npmjs.com/).

See [Custom Dockerfile](./source_types.md) and [base Docker images](../actor_definition/dockerfile.md) for more information about creating your own Dockerfile and using Apify's prepared base images.
See [Dockerfile](../actor_definition/dockerfile.md#custom-dockerfile) and [base Docker images](../actor_definition/dockerfile.md#base-docker-images) for more information about creating your own Dockerfile and using Apify's prepared base images.

## [](#git-repository)Git repository

Expand All @@ -32,7 +32,7 @@ To specify a Git branch or tag to check out, add a URL fragment to the URL. For

Optionally, the second part of the fragment in the Git URL (separated by a colon) specifies the directory from which the Actor will be built (and where the `.actor`) folder is located. For example, `https://github.com/jancurn/some-actor.git#develop:some/dir` will check out the **develop** branch and set **some/dir** as the root directory of the Actor.

Note that you can easily set up an integration where the Actor is automatically rebuilt on every commit to the Git repository. For more details, see [GitHub integration](./source_types.md).
Note that you can easily set up an integration where the Actor is automatically rebuilt on every commit to the Git repository. For more details, see [GitHub integration](../../../integrations/github.md).

### [](#private-repositories)Private repositories

Expand All @@ -53,7 +53,7 @@ An example Actor monorepo is shown in the [`apify/actor-monorepo-example`](https

## [](#zip-file)Zip file

The source code for the Actor can also be located in a Zip archive hosted on an external URL. This option enables integration with arbitrary source code or continuous integration systems. Similarly, as with the [Git repository](#git-repository), the source code can consist of multiple files and directories, can contain a custom **Dockerfile**, and the actor description is taken from <strong>README.md</strong>. If you don't use a [custom Dockerfile](#custom-dockerfile), the root file of your application must be named `main.js`.
The source code for the Actor can also be located in a Zip archive hosted on an external URL. This option enables integration with arbitrary source code or continuous integration systems. Similarly, as with the [Git repository](#git-repository), the source code can consist of multiple files and directories, can contain a custom **Dockerfile**, and the actor description is taken from <strong>README.md</strong>. If you don't use a [custom Dockerfile](../actor_definition/dockerfile.md#custom-dockerfile), the root file of your application must be named `main.js`.

## [](#github-gist)GitHub Gist

Expand All @@ -68,5 +68,5 @@ Then set the **Source Type** to **GitHub Gist** and paste the Gist URL as follow

Note that the example Actor is available in the Apify Store as [apify/example-github-gist](https://apify.com/apify/example-github-gist).

Similarly, as with the [Git repository](./source_types.md), the source code can consist of multiple files and directories, it can contain a custom **Dockerfile** and the actor description is taken from <strong>README.md</strong>. If you don't use a [custom Dockerfile](#custom-dockerfile), the root file of your application must be named `main.js`.
Similarly, as with the [Git repository](#git-repository), the source code can consist of multiple files and directories, it can contain a custom **Dockerfile** and the actor description is taken from <strong>README.md</strong>. If you don't use a [custom Dockerfile](../actor_definition/dockerfile.md#custom-dockerfile), the root file of your application must be named `main.js`.

3 changes: 1 addition & 2 deletions sources/platform/actors/development/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,5 @@ We first copy the `package.json`, `package-lock.json` , and install the dependen

### Speedup the Actor startup times by using standardised images

If you use one of [Apify's standardized images](https://github.com/apify/apify-actor-docker), the startup time will be faster. This is because the images are cached at each worker machine, and so only the layers you added in your Actor's [Dockefile](./actor_definition/dockerfile.md) need to be pulled.
If you use one of [Apify's standardized images](https://github.com/apify/apify-actor-docker), the startup time will be faster. This is because the images are cached at each worker machine, and so only the layers you added in your Actor's [Dockerfile](./actor_definition/dockerfile.md) need to be pulled.


Loading