OpenSearch Docker Ingest Attachment Plugin to index Plone content#

How to get a Docker image with the OpenSearch ingest-attachment plugin.

OpenSearch, the 2021 fork of the Elasticsearch and Kibana by Amazon, is now established as a community project.

We use it to index Plone content with the collective.elastic.plone and collective.elastic.ingest packages. First one is a Plone add-on with an proxy index to the index server, second one is a Celery based Python package based to asynchronous index content and also create and manage the index schemas and ingest pipelines.

However, in order to get OpenSearch up and running in a Docker container, we had to create our own Docker image. This is because the official OpenSearch Docker image does not contain the ingest-attachment plugin, which we need to extract data from a variety of binary formats, like PDF, Word, … and index them.

The process is documented at the working with plugins section of the OpenSearch documentation.

First create Dockerfile

FROM opensearchproject/opensearch:latest
RUN /usr/share/opensearch/bin/opensearch-plugin install --batch ingest-attachment

Next prepare and execute the build:

docker buildx use default
docker buildx build --tag opensearch-ingest-attachment:latest Dockerfile

The image is ready to be used on your local machine (you may want to push it to your trusted Docker registry).

I used the example docker compose file provided in the documentation. The lines with the image name image: opensearchproject/opensearch:latest are to be changed to the new name image: opensearch-ingest-attachment:latest.

With docker compose up the container is started and the ingest-attachment plugin is ready for use.

In my opinion it is a bit cumbersome, but it works. Better would be to have a generic Docker image with the plugin already installed, or ready to be activated using a environment variable.