Integration with Spark | ClickHouse Native JDBC

# Integration with Spark

# Requirements

Java 8, Scala 2.11/2.12, Spark 2.4
Or Java 8/11, Scala 2.12, Spark 3.0/3.1

For Spark 3.2, Spark ClickHouse Connector (opens new window) is recommended.

Notes: Spark 2.3.x(EOL) should also work fine. Actually we do test on both Java 8 and Java 11, but Spark official support on Java 11 since 3.0.0.

# Import

Gradle

// available since 2.4.0
compile "com.github.housepower:clickhouse-integration-spark_2.11:${clickhouse_native_jdbc_version}"

Maven

<!-- available since 2.4.0 -->
<dependency>
    <groupId>com.github.housepower</groupId>
    <artifactId>clickhouse-integration-spark_2.11</artifactId>
    <version>${clickhouse-native-jdbc.version}</version>
</dependency>

# Examples

Make sure register ClickHouseDialect before using it

    JdbcDialects.registerDialect(ClickHouseDialect)

Read from ClickHouse to DataFrame

val df = spark.read
    .format("jdbc")
    .option("driver", "com.github.housepower.jdbc.ClickHouseDriver")
    .option("url", "jdbc:clickhouse://127.0.0.1:9000")
    .option("user", "default")
    .option("password", "")
    .option("dbtable", "db.test_source")
    .load

Write DataFrame to ClickHouse (support truncate table)

df.write
    .format("jdbc")
    .mode("overwrite")
    .option("driver", "com.github.housepower.jdbc.ClickHouseDriver")
    .option("url", "jdbc:clickhouse://127.0.0.1:9000")
    .option("user", "default")
    .option("password", "")
    .option("dbtable", "db.test_target")
    .option("truncate", "true")
    .option("batchsize", 10000)
    .option("isolationLevel", "NONE")
    .save

← Connection Pool Troubleshooting →