Warehouses and Lakes
Marketing Analytics

How to load data from Mailchimp to Snowflake destination

Learn how to use Airbyte to synchronize your Mailchimp data into Snowflake destination within minutes.

TL;DR

This can be done by building a data pipeline manually, usually a Python script (you can leverage a tool as Apache Airflow for this). This process can take more than a full week of development. Or it can be done in minutes on Airbyte in three easy steps:

  1. set up Mailchimp as a source connector (using Auth, or usually an API key)
  2. set up Snowflake destination as a destination connector
  3. define which data you want to transfer and how frequently

You can choose to self-host the pipeline using Airbyte Open Source or have it managed for you with Airbyte Cloud.

This tutorial’s purpose is to show you how.

What is Mailchimp

Mailchimp is a global marketing automation platform aimed at small to medium-sized businesses. Mailchimp provides essential marketing tools for growing a successful business, enabling businesses to automate messages and send marketing emails, create targeted business campaigns, expedite analytics and reporting, and effectively and efficiently sell online.

What is Snowflake destination

A cloud data platform, Snowflake Data Cloud provides a warehouse-as-a-service built specifically for the cloud. The Snowflake platform is designed to empower many types of data workloads, and offers secure, immediate, governed access to a comprehensive network of data. Snowflake’s innovative technology goes above the capabilities of the ordinary database, supplying users all the functionality of database storage, query processing, and cloud services in one package.

Integrate Mailchimp with Snowflake destination in minutes

Try for free now

Prerequisites

  1. A Mailchimp account to transfer your customer data automatically from.
  2. A Snowflake destination account.
  3. An active Airbyte Cloud account, or you can also choose to use Airbyte Open Source locally. You can follow the instructions to set up Airbyte on your system using docker-compose.

Airbyte is an open-source data integration platform that consolidates and streamlines the process of extracting and loading data from multiple data sources to data warehouses. It offers pre-built connectors, including Mailchimp and Snowflake destination, for seamless data migration.

When using Airbyte to move data from Mailchimp to Snowflake destination, it extracts data from Mailchimp using the source connector, converts it into a format Snowflake destination can ingest using the provided schema, and then loads it into Snowflake destination via the destination connector. This allows businesses to leverage their Mailchimp data for advanced analytics and insights within Snowflake destination, simplifying the ETL process and saving significant time and resources.

Step 1: Set up Mailchimp as a source connector

1. Open the Airbyte platform and navigate to the "Sources" tab on the left-hand side of the screen.

2. Click on the "Add Source" button and select "Mailchimp" from the list of available connectors.

3. Enter a name for your Mailchimp source connector and click on the "Next" button.

4. Enter your Mailchimp API key in the provided field. You can find your API key by logging into your Mailchimp account and navigating to the "Account" section. From there, click on "Extras" and then "API keys".

5. Select the Mailchimp lists you want to sync with Airbyte by checking the boxes next to their names.

6. Choose the sync mode you want to use for your Mailchimp data. You can choose between "Full Refresh" and "Incremental".

7. Click on the "Test" button to ensure that your Mailchimp source connector is working properly.

8. If the test is successful, click on the "Create" button to save your Mailchimp source connector.

9. You can now use your Mailchimp source connector to sync your Mailchimp data with other destinations on the Airbyte platform.

Step 2: Set up Snowflake destination as a destination connector

1. First, navigate to the Airbyte website and log in to your account.

2. Once you are logged in, click on the "Destinations" tab on the left-hand side of the screen.

3. Scroll down until you find the Snowflake Data Cloud destination connector and click on it.

4. You will be prompted to enter your Snowflake account information, including your account name, username, and password.

5. After entering your account information, click on the "Test" button to ensure that the connection is successful.

6. If the test is successful, click on the "Save" button to save your Snowflake Data Cloud destination connector settings.

7. You can now use the Snowflake Data Cloud destination connector to transfer data from your Airbyte sources to your Snowflake account.

8. To set up a data transfer, navigate to the "Sources" tab on the left-hand side of the screen and select the source you want to transfer data from.

9. Click on the "Create New Connection" button and select the Snowflake Data Cloud destination connector as your destination.

10. Follow the prompts to set up your data transfer, including selecting the tables or data sources you want to transfer and setting up any necessary transformations or mappings.

11. Once you have set up your data transfer, click on the "Run" button to start the transfer process.

Step 3: Set up a connection to sync your Mailchimp data to Snowflake destination

Once you've successfully connected Mailchimp as a data source and Snowflake destination as a destination in Airbyte, you can set up a data pipeline between them with the following steps:

  1. Create a new connection: On the Airbyte dashboard, navigate to the 'Connections' tab and click the '+ New Connection' button.
  2. Choose your source: Select Mailchimp from the dropdown list of your configured sources.
  3. Select your destination: Choose Snowflake destination from the dropdown list of your configured destinations.
  4. Configure your sync: Define the frequency of your data syncs based on your business needs. Airbyte allows both manual and automatic scheduling for your data refreshes.
  5. Select the data to sync: Choose the specific Mailchimp objects you want to import data from towards Snowflake destination. You can sync all data or select specific tables and fields.
  6. Select the sync mode for your streams: Choose between full refreshes or incremental syncs (with deduplication if you want), and this for all streams or at the stream level. Incremental is only available for streams that have a primary cursor.
  7. Test your connection: Click the 'Test Connection' button to make sure that your setup works. If the connection test is successful, save your configuration.
  8. Start the sync: If the test passes, click 'Set Up Connection'. Airbyte will start moving data from Mailchimp to Snowflake destination according to your settings.

Remember, Airbyte keeps your data in sync at the frequency you determine, ensuring your Snowflake destination data warehouse is always up-to-date with your Mailchimp data.

Use Cases to transfer your Mailchimp data to Snowflake destination

Integrating data from Mailchimp to Snowflake destination provides several benefits. Here are a few use cases:

  1. Advanced Analytics: Snowflake destination’s powerful data processing capabilities enable you to perform complex queries and data analysis on your Mailchimp data, extracting insights that wouldn't be possible within Mailchimp alone.
  2. Data Consolidation: If you're using multiple other sources along with Mailchimp, syncing to Snowflake destination allows you to centralize your data for a holistic view of your operations, and to set up a change data capture process so you never have any discrepancies in your data again.
  3. Historical Data Analysis: Mailchimp has limits on historical data. Syncing data to Snowflake destination allows for long-term data retention and analysis of historical trends over time.
  4. Data Security and Compliance: Snowflake destination provides robust data security features. Syncing Mailchimp data to Snowflake destination ensures your data is secured and allows for advanced data governance and compliance management.
  5. Scalability: Snowflake destination can handle large volumes of data without affecting performance, providing an ideal solution for growing businesses with expanding Mailchimp data.
  6. Data Science and Machine Learning: By having Mailchimp data in Snowflake destination, you can apply machine learning models to your data for predictive analytics, customer segmentation, and more.
  7. Reporting and Visualization: While Mailchimp provides reporting tools, data visualization tools like Tableau, PowerBI, Looker (Google Data Studio) can connect to Snowflake destination, providing more advanced business intelligence options. If you have a Mailchimp table that needs to be converted to a Snowflake destination table, Airbyte can do that automatically.

Wrapping Up

To summarize, this tutorial has shown you how to:

  1. Configure a Mailchimp account as an Airbyte data source connector.
  2. Configure Snowflake destination as a data destination connector.
  3. Create an Airbyte data pipeline that will automatically be moving data directly from Mailchimp to Snowflake destination after you set a schedule

With Airbyte, creating data pipelines take minutes, and the data integration possibilities are endless. Airbyte supports the largest catalog of API tools, databases, and files, among other sources. Airbyte's connectors are open-source, so you can add any custom objects to the connector, or even build a new connector from scratch without any local dev environment or any data engineer within 10 minutes with the no-code connector builder.

We look forward to seeing you make use of it! We invite you to join the conversation on our community Slack Channel, or sign up for our newsletter. You should also check out other Airbyte tutorials, and Airbyte’s content hub!

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

As an analytics engineer, I am always on the search for new tools to improve my data stack. The world of data is constantly evolving and if you’re not adapting along with it, you’re falling behind. This is why it's important for an analytics engineer to stay informed on all the tools available in the market.

While I’ve only ever used Fivetran and Stitch to build data ingestion pipelines, I’ve recently come across a tool called Airbyte. Airbyte is an open-source data ingestion tool that helps you consolidate your data in your data warehouses, lakes and databases. Unlike Fivetran and Stitch, Airbyte is an open-source tool, making it easy to seek support on Slack and update existing connectors.

If you’ve read my articles on dbt and Prefect, you know one of my favorite features they offer are their Slack communities. You can get all of your questions answered within hours and share best practices with other analytics and data engineers. No need to create a support ticket and wait days for a response! This is key for companies looking to move fast.

Airbyte also offers over 100 data connectors, making it easy to create data ingestion pipelines to load data from a source to a destination. I’ve found that it is always best to find one tool that can do it all for each step of the ETL/ELT process. It will save you money, time, and important brainpower.

In this article I’m going to set up my first Airbyte connector, ingesting raw data from Mailchimp to Snowflake. I’m also going to do some basic data cleaning to make the raw data ready to analyze. Throughout, I’ll give my insight on my favorite features of Airbyte and some of the more challenging pieces as well.

Running Airbyte locally

Before you set up your first connector, you’ll need to follow the instructions here to set up Airbyte on your localhost using Docker Compose. I’ve found this to be a pain with other tools such as Airflow, but getting Airbyte started locally was a breeze.

Setting up a Mailchimp source connector

Airbyte has a Mailchimp source connector. Like other data ingestion tools, Airbyte gives you the option to name your source on the UI. I recommend sticking with the name of the source unless you have multiple connectors for that same source that you need to set up. Since I have multiple Mailchimp accounts, I’m calling mine madison_mailchimp for my personal newsletter. Then select the data source you wish to connect to.

Image by author

You can find the API information for Mailchimp by going to account and then selecting “API keys” from the “Extras” dropdown menu. Create a key and copy it into the “apikey” field on Airbyte. Add the username or email you use to sign in to Mailchimp and click “Set up source”.

Setting up a Snowflake destination connector

Now you need to set up the destination that you want the Mailchimp data to be loaded into. I’ll be using Airbyte's Snowflake destination for this tutorial but feel free to use whatever destination works best for you. Similar to the connector name, I recommend choosing a name that matches the destination type. I only have one Snowflake environment, so I am simply calling it Snowflake.

Image by author

Now, you need to head to Snowflake to find all of the information necessary to set up your destination.

Figuring out the permissions for tools that connect to Snowflake can be tricky. You need to make sure you are giving the AIRBYTE_USER the correct role and all the permissions on that role. I’m not going to lie, I had some failed attempts with setting up the user and role before deciding to copy the exact script given in the Airbyte documentation.

Use their script and simply replace the variables at the top with whatever you wish to name your role, username, database, schema, and warehouse.

Lastly, make sure you select “Internal staging” for the loading method in the last step. This is the best way to load data into Snowflake.

Ingesting data from Mailchimp to Snowflake

Next, you’ll be brought to a screen that contains more information to create a data ingestion pipeline from Mailchimp to Snowflake.

Image by author

Choose to “mirror source structure” for your namespace configuration. You also have an option to add a table prefix. I recommend adding one with the name of your source connector. Tables across various connectors can have the same names (like campaigns or orders), so this is a good practice to put into place. Notice how I use madison_mailchimp_ as my table prefix.

If you look in the table, you can also see a column called “Sync mode” which specifies how you want the data to be ingested each time the connector syncs. There are four different options:

  • Full Refresh Overwrite: resyncs and replaces all data.
  • Full Refresh Append: resyncs and creates duplicates of all rows.
  • Incremental Append: syncs new rows and appends to already synced rows.
  • Incremental Deduped History: syncs new rows and appends to already synced rows, also provides a de-duplicated view mirroring the state of the ingestion.

I personally prefer incremental append because I don’t have to worry about duplicated rows. It also doesn’t require as much storage in the data warehouse. Full refresh overwrite is only recommended when you have a small amount of data, otherwise this can take a long time to sync. It may also run through precious Snowflake credits if your warehouse is running for a while.

Then, select the tables you wish to upload to Snowflake and choose “basic normalization”. Airbyte automatically creates a schema that fits your data when you choose this option, making our job as analytics engineers much easier! Otherwise, we would then have to parse a JSON object in Snowflake and that’s no fun.

Image by author

Airbyte also has an option to schedule your data syncs. You can choose every few hours, every 24 hours, or even manually. This all depends on your use case.

Image by author

After clicking “sync now”, it can be helpful to read through the logs as the data is uploading to ensure everything is going as expected. This is also a great feature in case something goes wrong. Logs make it easy to pinpoint the exact error.

Now let’s head on over to Snowflake. Go to the database and schema that you choose to upload your connector's data to. You should see tables beginning with that table prefix that you specified.

Image by author

Awesome! I can see all the Mailchimp tables that I ingested using the Airbyte data connector. Now it’s time to transform this data so it is ready to be analyzed.

Cleaning the ingested data

Let’s start by looking at the “AIRBYTE_DATABASE”.”AIRBYTE_SCHEMA”.”MADISON_MAILCHIMP_CAMPAIGNS” table. Here’s a small subset of what it looks like:

Image by author

Casting string columns

The first thing that jumps out to me is the TYPE and STATUS columns. These are both of data type variant. This data type stores a string, but is not a string itself. I want these to be strings, but more importantly, I want to remove the double quotes.

These double quotes would make analysis quite messy, especially in Snowflake. Snowflake actually uses double quotes to select from the database, schema, and table names.


select * from "AIRBYTE_DATABASE"."AIRBYTE_SCHEMA"."MADISON_MAILCHIMP_CAMPAIGNS"

Using this variant data type with double quotes would cause a lot of messy queries where you’d have to work around that. Next, let’s cast these columns so that they’re strings.


select
 id,
 CAST(type AS string),
 CAST(status AS string),
 send_time,
 resendable,
 created_time
from "AIRBYTE_DATABASE"."AIRBYTE_SCHEMA"."MADISON_MAILCHIMP_CAMPAIGNS"

Well, look at that! When you cast these columns from variant to string the double quotes are eliminated. That solves that problem.

Casting date columns

Now let’s look at the two columns that we want to be timestamps in our final dataset- send_time and created_time. Right now they are both varchar data types. Casting them to timestamps will allow us to perform date functions like date_part(), dateadd(), and year() on them in future analyses.


select
 id,
 CAST(type AS string),
 CAST(status AS string),
 send_time::timestamp_ntz,
 resendable,
 created_time::timestamp_ntz
from "AIRBYTE_DATABASE"."AIRBYTE_SCHEMA"."MADISON_MAILCHIMP_CAMPAIGNS"

Make sure you’re consistently casting the data in your base models to the same type of timestamps and dates. When I first started building my models, I would leave any timestamp columns that already had timestamp data types alone. No casting, nothing. They seemed ready to go!

Then, I started experiencing some weird issues when comparing timestamps. After a lot of digging, I realized my timestamp columns were all different types of timestamps. Now I make sure I cast them all to timestamp_ntz.

Renaming columns

Lastly, I recommend renaming the columns in tables like this one to something more descriptive. You’ll be using many of these Mailchimp tables together, so having simple names like id and status will get confusing fast.

I recommend renaming them according to the table name. Here I am looking at the CAMPAIGNS table, so I would rename id to campaign_id. This is a clear and descriptive name across all Mailchimp tables.


select
 id AS campaign_id,
 CAST(type AS string) AS campaign_type,
 CAST(status AS string) AS campaign_status,
 send_time::timestamp_ntz AS campaign_sent_at,
 resendable AS is_campaign_resendable,
 created_time::timestamp_ntz AS campaign_created_at
from "AIRBYTE_DATABASE"."AIRBYTE_SCHEMA"."MADISON_MAILCHIMP_CAMPAIGNS"

Here I renamed:

  • type to campaign_type
  • status to campaign_status
  • send_time to campaign_sent_at
  • resendable to is_campaign_resendable
  • created_time to campaign_created_at

Now the data is ready to be automated and analyzed by creating a data model! dbt is a great tool to store these SQL files used to create your data model. It then automates them so that you always have clean data available to you in your data warehouse.

Conclusion

Overall, I’ve really enjoyed using Airbyte to ingest my raw data. It was a seamless process besides setting up the correct permissions in Snowflake. The setup is clearly documented, the UI is user-friendly, and my raw data landed at the destination in a format that was easy to transform into a usable model.

However, I’ve struggled more with other data connectors like Facebook and Google. Because of the way they distribute API keys, it can be confusing to obtain the correct permissions. If you’re familiar with software development using these outside companies, this step should be a breeze for you. Otherwise, there may be a bit of a learning curve.

Because Airbyte is run locally, I would be interested to see the difficulty level of deploying it using a cloud solution like AWS. This is where I struggled the most with Prefect. Luckily, I was able to receive lots of support from their Slack channel. I imagine this would be the same with Airbyte.

What should you do next?

Hope you enjoyed the reading. Here are the 3 ways we can help you in your data journey:

flag icon
Easily address your data movement needs with Airbyte Cloud
Take the first step towards extensible data movement infrastructure that will give a ton of time back to your data team. 
Get started with Airbyte for free
high five icon
Talk to a data infrastructure expert
Get a free consultation with an Airbyte expert to significantly improve your data movement infrastructure. 
Talk to sales
stars sparkling
Improve your data infrastructure knowledge
Subscribe to our monthly newsletter and get the community’s new enlightening content along with Airbyte’s progress in their mission to solve data integration once and for all.
Subscribe to newsletter

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Frequently Asked Questions

What data can you extract from Mailchimp?

Mailchimp's API provides access to a wide range of data related to email marketing campaigns. The following are the categories of data that can be accessed through Mailchimp's API:  

1. Lists: Information about the email lists, including the number of subscribers, the date of creation, and the list name.  

2. Campaigns: Data related to email campaigns, including the campaign name, the number of recipients, the open rate, click-through rate, and bounce rate.  

3. Subscribers: Information about the subscribers, including their email address, name, location, and subscription status.  

4. Reports: Detailed reports on the performance of email campaigns, including open rates, click-through rates, and bounce rates.  

5. Templates: Access to email templates that can be used to create new campaigns.  

6. Automation: Data related to automated email campaigns, including the number of subscribers, the date of creation, and the automation name.  

7. Tags: Information about tags that can be used to categorize subscribers and campaigns.  

Overall, Mailchimp's API provides a comprehensive set of data that can be used to analyze and optimize email marketing campaigns.

What data can you transfer to Snowflake destination?

You can transfer a wide variety of data to Snowflake destination. This usually includes structured, semi-structured, and unstructured data like transaction records, log files, JSON data, CSV files, and more, allowing robust, scalable data integration and analysis.

What are top ETL tools to transfer data from Mailchimp to Snowflake destination?

The most prominent ETL tools to transfer data from Mailchimp to Snowflake destination include:

  • Airbyte
  • Fivetran
  • Stitch
  • Matillion
  • Talend Data Integration

These tools help in extracting data from Mailchimp and various sources (APIs, databases, and more), transforming it efficiently, and loading it into Snowflake destination and other databases, data warehouses and data lakes, enhancing data management capabilities.