Apache NiFi - A Complete Guide
This ebook is dedicated to all engineers, architects or administrators. We start with describing the architecture of NiFi, going through all its specific details like the nature of NiFis FlowFiles or NiFi High Availability, deploying and maintaining it in different environments, developing and managing data pipelines, security and about CICD.
EBOOK
About
The possibility to cloud native services is one of the most popular topics related to Big Data projects today. Many companies are interested in migrating their data platforms to the cloud as well as using data processing and analysis tools more intensively if being already there. One of the popular options is Google Cloud Platform - a platform and a set of tools where you can implement solutions using infrastructure managed by Google.
  • Masterless architecture
  • Boxes don't touch - independent workers
  • Distributed but local, dual nature of flowfiles
  • Anatomy of the flow file
  • Overcoming the NiFi limitations
NiFi architecture - Universe made out of flow files
Contents of a book
I
Overcoming the NiFi limitations
II
Apache NiFi and Apache NiFi Registry - Management & Operations
III
  • NiFi Registry - the repository for the NiFi flows
  • Environment separation intro
  • NiFi Toolkit to the rescue
    • Additional hurdles in details
CI/CD of NiFi flow
IV
  • We need to keep it simple
  • Simple features make the system complex
  • From Proof-of-Concept to production
One year history of certain NiFi flow
V
  • Harness the complexity
  • Choose the right tool
  • Keep your finger on the pulse
  • CI and CD takes time
Recommendations for using Apache NiFi
VI
  • What should you do when you reach the limits?
  • Custom Groovy scripts
  • We need to go deeper - custom processors
  • Offloading business logic from NiFi
  • Cluster vs. single NiFi instance
  • Bare Metal Servers or Virtual Machines
  • Kubernetes
  • NiFi and Apache Ranger - how should you combine them?
  • NiFi and Kerberos
Here you can download your book!
We work hard every day to make life
of our clients better and happier!
About Authors
Mateusz Pytel - Throughout his career Mateusz worked with multiple open-source Big Data technologies in areas like marketing, banking and ecommerce. With time his focus switched from on premise solutions to cloud adoption, working mostly with Google Cloud.

Arkadiusz Gąsior - Data Engineer with over 15yrs professional experience. Used various data tools and frameworks Google Cloud Platform, Hadoop, Kafka, Elasticsearch and more. His recent achievements includes working on GCP training, automation, designing petabyte scale cloud based data warehouse and working on tools for data ingestion, transformation and analytics.

Tomasz Żukowski - Tomasz gained experience during his work in logistics, medical services and IT. As a data analyst Tomasz was building repots by means of Microsoft Excel and VBA. Tomasz has been working with SQL and Python. Currently he is taking care of analyzing Big Data by using Hadoop ecosystem, Python and R.

Piotr Pilis - has been working in Big Data for over three years, mainly in the Google Cloud Platform environment. In GetIndata, he implements Data Lake and implements analytical use-cases based on the Google Cloud



We help data-oriented organizations to succeed using open-source and cloud technologies such as Flink, Kafka, Spark, Hadoop, Google Cloud Platfrom by providing outsourcing, consulting and training services.

We've been already working for tens of companies ranging from fast-growing European startups to global corporations in pharmacy, FMCG, banking and media sectors. We trully focus to help our customers achieve true ROI from their data processing.


We help data-oriented organizations to succeed using open-source and cloud technologies such as Flink, Kafka, Spark, Hadoop, Google Cloud Platfrom by providing outsourcing, consulting and training services.

We've been already working for tens of companies ranging from fast-growing European startups to global corporations in pharmacy, FMCG, banking and media sectors. We trully focus to help our customers achieve true ROI from their data processing.


© 2020 Getindata sp. z o.o. sp. k
Made on
Tilda