# Integration with Spark
# Requirements
- Java 8, Scala 2.11/2.12, Spark 2.4
- Or Java 8/11, Scala 2.12, Spark 3.0/3.1
For Spark 3.2, Spark ClickHouse Connector (opens new window) is recommended.
Notes: Spark 2.3.x(EOL) should also work fine. Actually we do test on both Java 8 and Java 11, but Spark official support on Java 11 since 3.0.0.
# Import
- Gradle
// available since 2.4.0
compile "com.github.housepower:clickhouse-integration-spark_2.11:${clickhouse_native_jdbc_version}"
- Maven
<!-- available since 2.4.0 -->
<dependency>
<groupId>com.github.housepower</groupId>
<artifactId>clickhouse-integration-spark_2.11</artifactId>
<version>${clickhouse-native-jdbc.version}</version>
</dependency>
# Examples
Make sure register ClickHouseDialect
before using it
JdbcDialects.registerDialect(ClickHouseDialect)
Read from ClickHouse to DataFrame
val df = spark.read
.format("jdbc")
.option("driver", "com.github.housepower.jdbc.ClickHouseDriver")
.option("url", "jdbc:clickhouse://127.0.0.1:9000")
.option("user", "default")
.option("password", "")
.option("dbtable", "db.test_source")
.load
Write DataFrame to ClickHouse (support truncate table
)
df.write
.format("jdbc")
.mode("overwrite")
.option("driver", "com.github.housepower.jdbc.ClickHouseDriver")
.option("url", "jdbc:clickhouse://127.0.0.1:9000")
.option("user", "default")
.option("password", "")
.option("dbtable", "db.test_target")
.option("truncate", "true")
.option("batchsize", 10000)
.option("isolationLevel", "NONE")
.save