Mario CartiaHow to migrate data from Postgres (or any RDBMS) to MongoDB (or any NoSQL) in denormalized form…Migrating data from a relational database (RDBMS) to a NoSQL one is a very common task. One of the most common use cases in which you want…Oct 12, 2022Oct 12, 2022
Mario CartiaThe fastest way to get a Jupyter-based local development environment for Apache Spark 3 in ScalaAs one of my main activities is training on Big Data topics, I often find myself having to set up local development environments for using…Jun 15, 20211Jun 15, 20211
Mario CartiainAgile Lab EngineeringSpark 3.0: First hands-on approach with Adaptive Query Execution (Part 3)In the previous articles (1)(2), we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0. In…Dec 7, 2020Dec 7, 2020
Mario CartiainAgile Lab EngineeringSpark 3.0: First hands-on approach with Adaptive Query Execution (Part 2)In the previous article, we started analyzing the individual features of Adaptive Query Execution introduced on Spark 3.0. In particular…Nov 5, 20202Nov 5, 20202
Mario CartiainAgile Lab EngineeringSpark 3.0: First hands-on approach with Adaptive Query Execution (Part 1)Apache Spark is a distributed data processing framework that is suitable for any Big Data context thanks to its features. Despite being a…Oct 14, 20201Oct 14, 20201
Mario CartiainAgile Lab EngineeringHow to create an Apache Spark 3.0 development cluster on a single machine using DockerApache Spark is the most widely used in-memory parallel distributed processing framework in the field of Big Data advanced analytics. The…Sep 23, 20203Sep 23, 20203
Mario CartiaUtilizzo del formato PMML per esporre tramite REST API dei modelli di Machine Learning attraverso…Nel post precedente ho illustrato in modo semplice come creare una REST API a partire da un modello di Machine Learning realizzato in…Oct 10, 2019Oct 10, 2019
Mario CartiaCome creare una REST API per il serving di un modello di Machine Learning Python con Google…La combinazione Python + Jupyter è oggi quasi uno standard-de-facto per quanto riguarda lo sviluppo di modelli di Machine (o Deep)…Oct 7, 20191Oct 7, 20191
Mario CartiaGestire files di piccole dimensioni su HDFS: analisi del problema e best practicesHadoop è ad oggi la piattaforma Big Data standard-de-facto nel mondo enterprise. In particolare HDFS, il modulo Hadoop che implementa la…May 24, 2019May 24, 2019