Matěj Černý
  • Bluesky
  • LinkedIn
  • Nostr
  • X
  • Čeština
  • Bluesky
  • LinkedIn
  • Nostr
  • X
  • Čeština

Category: Apache Spark

The volume of data grows exponentially every year, which traditional relational databases are not prepared to handle. Processing intervals are shortening, and unstructured data from multiple sources is being analyzed. This issue is addressed by technologies within the Hadoop ecosystem, such as the HDFS distributed file system, the Apache Spark computing framework, columnar data storage formats like Apache Parquet, and others.

Wick: A zero cost type safe Apache Spark API
2026-05-05

Apache Spark / Wick: Type-Safe Spark API

Whether you prefer writing queries in Spark SQL or you are more used to calling functions via the Dataframe API, it’s only a matter of time before you run into a classic problem.

Matěj Černý
Matěj Černý
2025-12-26

Apache Spark / Procedural SQL

Apache Spark 4.0 introduces a significant enhancement to its SQL capabilities: experimental support for procedural language.

Matěj Černý
Matěj Černý
Apache Spark Catalyst Optimizer steps
2022-05-10

Apache Spark / Dataframe API vs. SQL

Working with data using Apache Spark is possible in several ways. If you come more from a software development background, you will likely lean towards using the Dataframe API

Matěj Černý
Matěj Černý
Apache Spark
2019-12-03

Apache Spark / CSV file

Apache Spark, as one of the main representatives of distributed computing systems, supports several formats for reading and writing. Probably the simplest of these is the delimited text format

Matěj Černý
Matěj Černý

Recent comments

  • TechSavvySam
    TechSavvySam
    You've explained this better than anyone else I've…
Add a comment...

Apache Spark

  • CSV file
  • Dataframe API vs. SQL
  • Procedural SQL
  • Wick: Type-Safe Spark API

PL/SQL by Examples

  • Anonymous Block and Branching
  • Scalar Data Types
  • Composite Data Types
  • FOR and WHILE loop
  • Procedures and Functions
  • Package
  • Exceptions
  • Savepoints
  • Object
  • Inheritance and Polymorphism

Oracle 12c

  • Querying from an associative array
  • User-defined function in the WITH clause
  • Top N rows, pagination
  • Cheat sheet / Oracle DDL operations

Links

  • Petr Stříbný
  • Josef Strzibny
© Matěj Černý