Mario CartiaHow to migrate data from Postgres (or any RDBMS) to MongoDB (or any NoSQL) in denormalized form…Migrating data from a relational database (RDBMS) to a NoSQL one is a very common task. One of the most common use cases in which you want…8 min read·Oct 12, 2022----
Mario CartiaThe fastest way to get a Jupyter-based local development environment for Apache Spark 3 in ScalaAs one of my main activities is training on Big Data topics, I often find myself having to set up local development environments for using…3 min read·Jun 15, 2021----
Mario CartiainAgile Lab EngineeringSpark 3.0: First hands-on approach with Adaptive Query Execution (Part 3)In the previous articles (1)(2), we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0. In…4 min read·Dec 7, 2020----
Mario CartiainAgile Lab EngineeringSpark 3.0: First hands-on approach with Adaptive Query Execution (Part 2)In the previous article, we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0. In particular…4 min read·Nov 5, 2020--2--2
Mario CartiainAgile Lab EngineeringSpark 3.0: First hands-on approach with Adaptive Query Execution (Part 1)Apache Spark is a distributed data processing framework that is suitable for any Big Data context thanks to its features. Despite being a…7 min read·Oct 14, 2020--1--1
Mario CartiainAgile Lab EngineeringHow to create an Apache Spark 3.0 development cluster on a single machine using DockerApache Spark is the most widely used in-memory parallel distributed processing framework in the field of Big Data advanced analytics. The…3 min read·Sep 23, 2020--2--2
Mario CartiaUtilizzo del formato PMML per esporre tramite REST API dei modelli di Machine Learning attraverso…Nel post precedente ho illustrato in modo semplice come creare una REST API a partire da un modello di Machine Learning realizzato in…3 min read·Oct 10, 2019----
Mario CartiaCome creare una REST API per il serving di un modello di Machine Learning Python con Google…La combinazione Python + Jupyter è oggi quasi uno standard-de-facto per quanto riguarda lo sviluppo di modelli di Machine (o Deep)…3 min read·Oct 7, 2019--1--1
Mario CartiaGestire files di piccole dimensioni su HDFS: analisi del problema e best practicesHadoop è ad oggi la piattaforma Big Data standard-de-facto nel mondo enterprise. In particolare HDFS, il modulo Hadoop che implementa la…4 min read·May 24, 2019----