ASF Project Spotlight: Apache SeaTunnel  

Can you tell us a bit about the project?  Apache SeaTunnel is a high-performance, distributed, massive data integration tool. The project was originally developed in 2017 and entered the ASF Incubator in December 2021 and became an ASF Top-Level Project in June 2023. When was the project started and why? Originally named…

Continue ReadingASF Project Spotlight: Apache SeaTunnel  

PyTorch vs. TensorFlow for building streaming data apps

Machine learning (ML) has transformed problem-solving in software development. At its core, ML involves training algorithms to perform specific tasks by learning from data rather than being explicitly programmed to do so. Various frameworks offer prebuilt methods, functions, and structures that simplify the complex tasks of designing, training, and deploying…

Continue ReadingPyTorch vs. TensorFlow for building streaming data apps

The Significance of Databricks’ Acquisition of Tabular: A Triumph for Open Frameworks in Data

In a strategic move that has sent ripples through the data analytics industry, Databricks announced its acquisition of Tabular, a data platform by the original creators of Apache Iceberg. This acquisition underscores the growing importance of open frameworks in the data landscape, heralding a new era of innovation, collaboration, and…

Continue ReadingThe Significance of Databricks’ Acquisition of Tabular: A Triumph for Open Frameworks in Data

Building efficient workflows: Asynchronous Request-Reply pattern

Many modern applications and services often depend on remote APIs to provide business logic and compose functionality. These API calls commonly occur over the HTTP protocol and follow request–response semantics. However, not all APIs can respond quickly enough to send a synchronous reply over the same connection, especially when the…

Continue ReadingBuilding efficient workflows: Asynchronous Request-Reply pattern

The Architects Guide to Machine Learning Operations (MLOps)

MLOps, short for Machine Learning Operations, is a set of practices and tools aimed at addressing the specific needs of engineers building models and moving them into production. Some organizations start off with a few homegrown tools that version datasets after each experiment and checkpoint models after every epoch of…

Continue ReadingThe Architects Guide to Machine Learning Operations (MLOps)

Migrate to AI-Ready infrastructure: Hitachi Content Platform to MinIO

Transitioning from Hitachi Content Platform (HCP) to MinIO has never been easier, thanks to our HCP-to-MinIO tool. Developed to support our customers' evolving storage needs, this tool is freely available on GitHub and greatly simplifies the migration process. Many organizations are transitioning to leverage MinIO's modern, scalable, and high-performance object…

Continue ReadingMigrate to AI-Ready infrastructure: Hitachi Content Platform to MinIO

The Real Reasons Why AI is Built on Object Storage

tl;dr:In this post, we will explore four technical reasons why AI workloads rely on high performance object store. 1. No Limits on Unstructured DataA typical (single node) AI model training setup (PyTorch feeding GPUs data from object store)In the current paradigm of machine learning, performance and ability scales with compute, which…

Continue ReadingThe Real Reasons Why AI is Built on Object Storage

The Architect’s Guide to the GenAI Tech Stack – Ten Tools

This post first appeared on The New Stack on June 3rd, 2024.I previously wrote about the modern data lake reference architecture, addressing the challenges in every enterprise — more data, aging Hadoop tooling (specifically HDFS) and greater demands for RESTful APIs (S3) and performance — but I want to fill…

Continue ReadingThe Architect’s Guide to the GenAI Tech Stack – Ten Tools