1 min read

Cassandra, Spark, and Microsoft SQL Server part 2: Talend OpenStudio

Featured Image

There are many different parts involved in building a custom application that integrates with many different 3rd party and legacy systems. A big aspect is building out a centralized database, and figuring out how to move that data between all the applications. Senior Systems Engineer Judge Hiciano documents some of his experiences going through that process. 

OpenStudio made it very easy for us to create Cassandra column families based off of our MSSS tables/views. Depending on the goal and impact on our MSSS DB, we used MSSS views to get the data as close as possible to its final form. Wet hen used Spark for the heavy lifting and any additional transformations needed.

 

1) Connects to DSE Cluster

 

 

2) Connects to Microsoft SQL Server

 

 

3) Selects the table/view, input SQL select statement, and able to pull schema also.

 

 

4) Able to create/drop/truncate colum family, create column family based on schema from step 2, imports the data.

 

5) Closes connections to both MS SQL and DSE Cassandra.

This process made moving the data over much quicker as with a 1:1 naming convention between MSSS and Cassandra, spark was able to impor tthe data with out much customizations.

This is part two of a three part series, if you missed the first post, you can find it here.