ASF Project Spotlight: Apache DolphinScheduler  

Can you tell us a bit about the project?  
Apache DolphinScheduler is a distributed and extensible open-source workflow orchestration platform with powerful DAG visual interfaces, which is agile in creating a high-performance workflow with low code. DolphinScheduler was donated to the Apache Software Foundation (ASF) in 2019, becoming an ASF Incubator project. DolphinScheduler became an ASF Top-Level Project in 2020.

When was the project started and why? 
DolphinScheduler was launched in 2017 to create an easy-to-use data scheduling tool that could solve the “complex task dependencies” problem, provide real-time monitoring of ETL running status, and support multi-tenancy, multiple task types, HA, and scalability. Prior to projects like DolphinScheduler, big data platforms were often hindered by the complex dependencies of ETL, ease of use, maintainability, and re-development issues. 

Who is your audience, and what key features of the technology do you believe will excite people? 
DolphinScheduler is built for data scientists, data engineers, data analysts, and data practitioners who need to solve big data workflow scheduling problems with a simple and easy-to-use approach, accelerating the efficiency of the data ETL development process.

DolphinScheduler features and highlights include:

  • Fast workflow creations: The drag-and-drop workflow management method optimizes efficiency while supporting coding workflow by Python, Yaml, and Open API, allowing users to create a workflow within a minute. 
  • Steady data execution: DolphinScheduler allows users to execute big data tasks with high concurrency, high throughput and low latency. 
  • Powerful data backfill and workflow version control.
  • Easily manage complex tasks by enabling cross-project and cross-workflow task dependencies.

What technology problem is Apache DolphinScheduler solving?
The emergence of DolphinScheduler solves the issue of “complex task dependencies,” providing real-time monitoring of ETL running status and supporting multi-tenancy, multiple task types, HA, and scalability. 

Why is this work important?
DolphinScheduler has established a powerful big data workflow management platform that features high reliability, user-friendly, and high expansibility. It also supports multi-tenant environments and more than 30 task types, including Apache Spark, Apache Flink, Apache Hive, Shell, Python, sub_process. It has openly and freely provided a comprehensive solution to address the problem of complex task dependencies in the field of big data, which greatly promotes the efficient utilization of big data globally. This has helped enterprises, organizations, and individuals to enhance the value of their data.

The ASF’s mission is to provide software for the public good. In what ways does your project embody the ASF mission and “community over code” ethos?
Learning and practicing the Apache Way has permeated the entire development process of DolphinScheduler. We are also committed to practicing the essence of the ASF’s “Community Over Code” culture. The community is greater than the code itself; it requires thinking beyond the code on how to collaborate, how to communicate, and how to keep the community healthy moving forward. It is this constant reflection that has enabled DolphinScheduler to widely attract people from all over the world who like DolphinScheduler and support open source, to build the project and community together. Through various channels, the concept of “community over code” is promoted and through the power of the community, the project is constantly improved upon, allowing more people to enjoy the revolution brought by open-source big data task flow scheduling technology.

Are there any use cases you would like to tell us about? 
DolphinScheduler has been deployed in 3000+ instances spanning enterprises, organizations, and personal use. One notable enterprise user is Changan Auto. DolphinScheduler helps Changan Auto’s intelligent internet-connected vehicle cloud platform to handle tens of millions of data inputs. Changan Auto uses DolphinScheduler to manage prediction models; build timed task workflows to extract signal data required for model prediction from the cluster; and centrally manage the SQL analysis and Py code. With DolphinScheduler, Changan Auto can build a unified data platform in conjunction with data integration tools such as SeaTunnel and Sqoop. This allows multiple sources of data to be integrated smoothly and efficiently while data is transmitted in both directions under a hybrid cloud architecture and under private and public cloud cross-cluster conditions.

For more details, please refer to this case study: Changan Auto Intelligent Vehicle Cloud Platform introduces the core workflow orchestration system Apache DolphinScheduler

What has been your experience growing the community? 
Growing the community has been a tough but interesting challenge. The community started with just a small group of people and now has grown to 550+ contributors from around the globe. People from China, the United States, South Korea, India, Europe, the Philippines, Singapore, Australia, among others, are attracted to the goal of building a big data workflow scheduling platform more usable to everyone.

What’s the best way to learn about the project and try it out? 
To learn more about DolphinScheduler, you can refer to this quick start tutorial to try it out: https://dolphinscheduler.apache.org/en-us/docs/3.2.1/quick-start_menu

How can others contribute to this project – code contributions being only one of the ways? 
DolphinScheduler welcomes both code and non-code contributions. Anyone interested in the project and open-source software can find a suitable way to join the ranks of contributors.


The ASF is home to nearly 9,000 committers contributing to more than 320 active projects including Apache Airflow, Apache Camel, Apache Flink, Apache HTTP Server, Apache Kafka, and Apache Superset. With the support of volunteers, developers, stewards, and more than 75 sponsors, ASF projects create open source software that is used ubiquitously around the world. This work helps us realize our mission of providing software for the public good.

In the midst of hosting community events, engaging in collaboration, producing code and so much more, we often forget to take a moment to recognize and adequately showcase the important work being done across the ASF ecosystem. This blog series aims to do just that: shine a spotlight on the projects that help make the ASF community vibrant, diverse and long lasting. We want to share stories, use cases and resources among the ASF community and beyond so that the hard work of ASF communities and their contributors is not overlooked. 

If you are part of an ASF project and would like to be showcased, please reach out to [email protected]

Connect with ASF

The post ASF Project Spotlight: Apache DolphinScheduler   appeared first on The Apache Software Foundation Blog.