IT Architecture – Page 5

ASF Project Spotlight: Apache SeaTunnel

Post author:
Post published:12/07/2024
Post category:IT Architecture

Can you tell us a bit about the project? Apache SeaTunnel is a high-performance, distributed, massive data integration tool. The project was originally developed in 2017 and entered the ASF Incubator in December 2021 and became an ASF Top-Level Project in June 2023. When was the project started and why? Originally named…

Data-Centric AI with Snorkel and MinIO

Post author:
Post published:11/07/2024
Post category:IT Architecture

With all the talk in the industry today regarding large language models with their encoders, decoders, multi-headed attention layers, and billions (soon trillions) of parameters, it is tempting to believe that good AI is the result of model design only. Unfortunately, this is not the case. Good AI requires more…

PyTorch vs. TensorFlow for building streaming data apps

Post author:
Post published:09/07/2024
Post category:IT Architecture

Machine learning (ML) has transformed problem-solving in software development. At its core, ML involves training algorithms to perform specific tasks by learning from data rather than being explicitly programmed to do so. Various frameworks offer prebuilt methods, functions, and structures that simplify the complex tasks of designing, training, and deploying…

The Significance of Databricks’ Acquisition of Tabular: A Triumph for Open Frameworks in Data

Post author:
Post published:05/07/2024
Post category:IT Architecture

In a strategic move that has sent ripples through the data analytics industry, Databricks announced its acquisition of Tabular, a data platform by the original creators of Apache Iceberg. This acquisition underscores the growing importance of open frameworks in the data landscape, heralding a new era of innovation, collaboration, and…

Building efficient workflows: Asynchronous Request-Reply pattern

Post author:
Post published:02/07/2024
Post category:IT Architecture

Many modern applications and services often depend on remote APIs to provide business logic and compose functionality. These API calls commonly occur over the HTTP protocol and follow request–response semantics. However, not all APIs can respond quickly enough to send a synchronous reply over the same connection, especially when the…

The Architects Guide to Machine Learning Operations (MLOps)

Post author:
Post published:29/06/2024
Post category:IT Architecture

MLOps, short for Machine Learning Operations, is a set of practices and tools aimed at addressing the specific needs of engineers building models and moving them into production. Some organizations start off with a few homegrown tools that version datasets after each experiment and checkpoint models after every epoch of…

Migrate to AI-Ready infrastructure: Hitachi Content Platform to MinIO

Post author:
Post published:28/06/2024
Post category:IT Architecture

Transitioning from Hitachi Content Platform (HCP) to MinIO has never been easier, thanks to our HCP-to-MinIO tool. Developed to support our customers' evolving storage needs, this tool is freely available on GitHub and greatly simplifies the migration process. Many organizations are transitioning to leverage MinIO's modern, scalable, and high-performance object…

Earn your RAG-ing rights with MinIO

Post author:
Post published:28/06/2024
Post category:IT Architecture

It’s often been said that in the age of AI - data is your moat. To that end, building a production-grade RAG application demands a suitable data infrastructure to store, version, process, evaluate, and query chunks of data that comprise your proprietary corpus. Since MinIO takes a data-first approach to…

The Real Reasons Why AI is Built on Object Storage

Post author:
Post published:25/06/2024
Post category:IT Architecture

tl;dr:In this post, we will explore four technical reasons why AI workloads rely on high performance object store. 1. No Limits on Unstructured DataA typical (single node) AI model training setup (PyTorch feeding GPUs data from object store)In the current paradigm of machine learning, performance and ability scales with compute, which…

The Architect’s Guide to the GenAI Tech Stack – Ten Tools

Post author:
Post published:25/06/2024
Post category:IT Architecture

This post first appeared on The New Stack on June 3rd, 2024.I previously wrote about the modern data lake reference architecture, addressing the challenges in every enterprise — more data, aging Hadoop tooling (specifically HDFS) and greater demands for RESTful APIs (S3) and performance — but I want to fill…