Copying Big Oracle tables Using Apache Spark
Background Sometimes we need to copy table data from one database to another. Logically the best way to do this is to do database specific export (expdp in oracle lingo) and import in the destination database (impdp in oracle). But sometimes there are deficiencies in this method such as unable to do parallel process for single table, and requirements of DBA access in the database. This post shows how to do table copy using Apache Spark and Apache Zeppelin. Preparations In order to allow Apache Spark access to oracle jdbc connections, we need to add dependency to ojdbc6.jar. To do this, write this paragraph in Zeppelin : %dep z.reset() z.load("/path/to/ojdbc6.jar") Basic Approach The most basic approach to copy table data is to retrieve data in one query and save the resulting records in the target database. val tableNameSrc = "TABLENAME" val tableNameTrg = "TABLENAME" import java.util.Properties Class.forName("ora...