Announcing Data Prepper 2.10.0

Introduction

Data Prepper 2.10 is now available!
Two major features include a source that sends data to Data Prepper using an API mimicking the OpenSearch _bulk API and the ability to read from Amazon Kinesis Data Streams.

OpenSearch API source

Many existing OpenSearch clients that perform ingestion directly to OpenSearch can now send that data to Data Prepper first.
This means that you can use Data Prepper’s buffering and rich processor set before sending data to OpenSearch without having to change clients that are using the OpenSearch _bulk API.
A new Data Prepper source named opensearch_api has been added that accepts OpenSearch Document API bulk operation requests from clients using REST and ingests data into OpenSearch.
The behavior of this source is also quite similar to the existing http source.
It supports industry-standard encryption in the form of TLS/HTTPS and HTTP basic authentication.
It also parses incoming requests and creates Data Prepper events and associated event metadata, making it compatible with the opensearch sink.
The request body is compatible with the OpenSearch Document API bulk operation and supports all actions: index, create, delete, and update.

The following two HTTP methods are now supported:

POST _bulk
POST <index>/_bulk

The second API specifies the index in the path, so you don’t need to include it in the request body.

Additionally, the following OpenSearch Document API bulk operation query parameters are supported:

pipeline
routing

The following example demonstrates how to use the source:

version: "2"
opensearch-api-pipeline:
  source:
    opensearch_api:
  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        insecure: true
        username: "admin"
        password: "admin"
        index: "${getMetadata("opensearch_index")}"
        action: "${getMetadata("opensearch_action")}"
        document_id: "${getMetadata("opensearch_id")}"
        routing: "${getMetadata("opensearch_routing")}"
        pipeline: "${getMetadata("opensearch_pipeline")}"

Consider the following example request:

POST _bulk
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }

This request will be ingested into OpenSearch, and a new document will be created under the index movies with the document ID tt1979320 and the document source { "title": "Rush", "year": 2013 }.

The Data Prepper maintainers are interested in further expanding this source to support other indexing APIs, allowing it to stand in for an OpenSearch cluster in ingestion workloads.
To learn more or provide feedback, see Provide an OpenSearch API source #4180.

Kinesis source

Amazon Kinesis Data Streams is a high-speed streaming data service.
Data Prepper has also introduced a new source named kinesis that can be used to ingest stream record data from multiple Kinesis data streams into OpenSearch clusters.
You can configure it to read stream records from either the oldest untrimmed record or from the most recent record.
Moreover, if you enable end-to-end acknowledgements, Kinesis data streams will be checkpointed to prevent duplicate processing of records.

The following is an example pipeline:

version: "2"
kinesis-pipeline:
  source:
    kinesis:
      codec:
        newline:
      streams:
        - stream_name: "MyStream1"
          initial_position: LATEST
          checkpoint_interval: "PT5M"
        - stream_name: "MyStream2"
          # Enable this if ingestion should start from the start of the stream.
          initial_position: EARLIEST
          consumer_strategy: "polling"
          polling:
            max_polling_records: 100
            idle_time_between_reads: "250ms"

Other features and improvements

Data Prepper 2.10 has introduced a number of other improvements:

The kafka source now supports authentication with an Apache Kafka cluster using SASL/SCRAM in addition to the SASL/PLAIN authentication provided in previous versions.
Data Prepper can now parse OpenTelemetry logs from sources such as Amazon Simple Storage Service (Amazon S3). The new otel_logs codec parses data from OpenTelemetry Protocol (OTLP) JSON-formatted files. Now you can write OpenTelemetry logs from AWS S3 Exporter for OpenTelemetry Collector and read these using Data Prepper.
Additionally, the maintainers have worked to improve performance through the addition of an internal cache for event keys. Data Prepper administrators can configure this cache as necessary.

Next steps

To download Data Prepper, visit the OpenSearch downloads page.
For instructions on how to get started with Data Prepper, see Getting started with Data Prepper.
To learn more about the work in progress for Data Prepper 2.11 and other releases, see the Data Prepper Project Roadmap.

Thanks to our contributors!

The following community members contributed to this release. Thank you!

chenqi0805 – Qi Chen
danhli – Daniel Li
dependabot[bot]
dinujoh – Dinu John
dlvenable – David Venable
franky-m
graytaylor0 – Taylor Gray
jayeshjeh – Jayesh Parmar
KarstenSchnitter – Karsten Schnitter
kkondaka – Krishna Kondaka
LeeroyHannigan – Lee
linghengqian – Ling Hengqian
oeyh – Hai Yan
quanghungb – qhung
san81 – Santhosh Gandhe
sb2k16 – Souvik Bose
shenkw1 – Katherine Shen
srikanthjg – Srikanth Govindarajan