Extract, Transform, and Load (ETL) is a fundamental data processing technique to prepare data for analysis. It involves gathering data from various sources (extract), cleaning and converting it into a consistent format (transform), and then loading it into a central repository, where it's ready for tasks like business intelligence and machine learning.
These processes are the backbone of data integration, facilitating the movement of information between systems with efficiency and accuracy.
This guide simplifies data integration between AWS S3 and Salesforce using Boomi. Learn to extract, transform, and load data, gaining valuable insights for decision-making. No matter your Boomi experience, this tutorial equips you to build a seamless data pipeline.
TL;DR
To build a simple ETL process that collects files from your AWS S3 bucket, checks if the files were last modified today, and stores them in Salesforce with Boomi, follow these steps:
- Create a new Boomi integration process
- Collecting the data from the AWS S3 bucket
- Use the Business rule to figure out files for the last modified date of today
- Collect the files that were modified today from the S3 bucket
- Change file type from CSV to XML
- Adding a Salesforce connector to store the data
- Mapping the CSV values to the Salesforce fields
- Testing the Boomi process
Prerequisites
To get the most out of this article, you must have the following:
- A Boomi platform account: if you do not have one, create a 30-day free trial account
- A Salesforce account: if you do not have one, create a free account
- A Boomi Atom is installed locally and connected to a MySQL account. To learn how to do this, check out how to connect your Boomi Atom runtime and MySQL on Docker.
- An AWS account; if you do not have one, create one for free.
Creating files in your AWS S3 bucket
Before uploading files to your S3 bucket, you must create the CSV files you want to store. To accomplish this, create a new directory on your laptop using the following command:
mkdir <name of your directory>
Next, open this directory in your code editor and create as many CSV files as possible. This tutorial creates three CSV files: booking-1.csv
, booking-2.csv
, and booking-3.csv
.
Populate these files with the CSV data you want to store. For instance, booking-1.csv
can have this data, with each field separated by a comma:
booking_id,customer_name,customer_email,customer_company
123,Alice Johnson,alice.johnson@hospitality.com,Grand View Hotel
456,Brian Jones,brian.jones@travelagency.net,Adventure Tours
Populate the rest of the booking files with the data you want to store.
Uploading these files to your AWS S3 bucket
To upload your CSV files to an S3 bucket, search for S3 in the search bar and then click on the S3 service in the modal.
Clicking on the S3 service redirects you to the Amazon S3 page. To create a new S3 bucket, click this page's Create bucket button.
The Create bucket button leads you to the General configuration page of the S3 bucket you're about to create. Enter your desired Bucket Name, then click the Create bucket button to complete the creation process.
When you are done, you should see your newly created bucket on your Amazon S3 page.
Copy your AWS access and secret key
The AWS access and secret key allow applications or users to access the data in your S3 bucket.
To retrieve these keys, you'll need to create a user and specify which artifacts the user can access.
To create a user, search for and click on the IAM service. Then, open the Users page by clicking the Users tab in the sidebar.
On this page, click the Create User button. This action will open the Specify user details page. Input the user name you want to create, then click Next to proceed to the Set Permissions page.
Choose the Attach Policies Direct option on this page, then search for the AmazonS3FullAccess permission policy and click on it.
Click the Next and Create User buttons to create your new user. You will then see your newly created user on your User page.
To find your AWS access and secret access key, open your newly created user page by selecting the user from the sidebar, then click the Security Credentials tab.
Scroll down to the Access keys section, and click the Create access key button.
This action leads you to the Access key best practices and alternatives page. Choose the Third-party Service option on this page since you'll manage the data in your S3 bucket with Boomi.
After that, check the box, "I understand the above recommendation and want to proceed to create an access key," and click the Next button.
You can input a description tag for the access key if you'd like, but it's not required. Then, click the Create access key button.
After this step, you will be directed to the Retrieve access keys page. Make sure to copy both the Access and Secret access keys. These keys are necessary for connecting your Boomi process to the AWS S3 bucket.
Click on the Done button to return to your user page.
How to create the ETL process
This tutorial creates an ETL process that enables you to collect data stored on the AWS S3 bucket. Next, you'll filter this data to return only the records modified today, transform them into XML format, and then store them in Salesforce. This process empowers the sales team and improves management's visibility and decision-making.
To create this ETL process in Boomi, follow these steps:
1. Create a new Boomi integration process
To create this process, go to the Integration page, and click the Create New button, and choose Process from the dropdown menu.
This action will open the Start Shape sidebar. Choose the No Data radio button, indicating that the process will not receive external data, and then confirm by clicking the OK button.
2. Collecting the data from the AWS S3 bucket
After initiating your new Boomi process, add your AWS S3 bucket, which contains the files you want to extract, transform, and load into the Salesforce application. However, it is important to incorporate error handling to catch any errors during connection.
Add the Try/Catch shape to your Boomi process to handle errors.
To add a shape to a Boomi process, click on the small box at the end of the dotted lines, leaving the previous shape (in this case, the Start shape). Then, search for the shape you want to add to the process (the Try/Catch shape).
Add the Amazon S3 REST connector to the Try path. This connector links your S3 bucket to the Boomi shape. Next, click on the Amazon S3 REST connector to configure the shape. This action opens up the Connector Shape sidebar.
In this sidebar, select QUERY as the Action from the dropdown. Then, click the + in the Connection input field to create a new AWS S3 connection.
This action opens a connection page that requires you to input credentials for your S3 bucket. These credentials include the Amazon AWS Access Key and your Amazon AWS Secret Key. They grant your Boomi process the necessary permissions to access the data in your buckets.
Next, click the Save and Close button to return to your Connector Shape sidebar.
After creating the connection, click the + button in the Operation input field to define how to interact with the S3 bucket.
This action opens up the Operation page. On this page, click the Import Operation button.
This action opens up an Amazon S3 REST Operation Import modal. In this modal, fill in the required information:
- Atom: The atom you are running your Boomi process in
- Connection: The Amazon S3 REST connection you just created
Click the Next button to open the Amazon S3 REST Operation Import modal. In this modal, choose the AWS S3 bucket from which you want to retrieve the files, then click Next.
Next, click the Finish button to save your newly loaded operation.
You should see your S3 bucket in the operation page's Objects section. Click on this bucket.
Next, you'll define a filter criteria for the Prefix field corresponding to the S3 bucket file path. To do this, click the downward arrow next to Filter. This will open up an Add Logical Sub-Group menu; select this option.
Next, click the downward-facing arrow and then on the Add Expression menu. This action opens up a form. In this form, input a filter name of Prefix, click the field search bar, and select the prefix menu.
Finally, click the Save and Close button to return to the Amazon S3 sidebar.
Next, go to the Parameters tab to specify that you want to query all the files in the AWS S3 bucket with the static value 'bookings'. To do that, click the + sign in this tab to open the Parameter value modal. Fill out this modal with the following information:
- Input - Prefix
- Type -Static
- Static Value - bookings
Click the OK button to view your newly created parameter in your AWS S3 connector sidebar.
Click the OK button in the Connector sidebar to return to the canvas.
After connecting your Boomi process to your S3 bucket, add a Stop shape at the end of the catch branch to stop the process if any errors are encountered.
3. Use the Business rule to figure out files for the last modified date of today
After connecting your process to your S3 bucket, add a Business Rules shape to filter the files in your bucket based on whether they were modified today.
Click on the Business Rules shape to open the Business Rules Shape Options modal. In this modal, select a Profile Type of XML. In the Profile input field, you will see the profile you created for the Amazon connector. Simply select that profile.
Next, click the OK button to open the Business Rules Shape modal. In this modal, click the Create a New Rule button to create a new business rule.
On this New Business Rule page, click the Add dropdown and select the Function tab. Selecting the Function tab opens up an Add a Function modal.
In this modal, select a Category of Date, select the Get Current Date function to get the current date, and then click the OK -> OK buttons to save this selection.
Next, add the second input for the comparison; this input will be the file's last modified date.
To add this second input, click the Add dropdown, then select Field from the menu. This action will open an Add a Field modal. Click on the Field input area to open a New Input modal within this modal. Choose the LastModified field and click the OK -> OK button to save your selection.
Next, add a condition to compare these values and check if they are the same. This means checking if the last modified date equals the present date.
To create this condition, click the Add a Condition link.
In the dropdowns, select the following fields.
- In the first dropdown, select the LastModified(Contents/LastModified) field
- In the second dropdown, select the = field
- In the third dropdown, select the Get Current Date field
Next, select the Save button and click OK to save the business rules.
When checking for the condition in the Business Rules shape, it can take one of two paths, Accepted or Rejected. If the files are rejected, you'll want to end their journey. To do that, add the End and Continue shape to the Rejected path from the Business Rules shape.
You'll want to retrieve and store the remaining files that pass this condition in your Salesforce application.
4. Collect the files that were modified today from the S3 bucket
To retrieve the files that meet these conditions, you need to add another Amazon S3 REST connector to the accepted path from the Business Rules shape.
Next, click on the Amazon S3 REST shape to configure it. In the Connection input field, select the Amazon S3 REST Connection you created earlier. Then, choose the GET action and click the + button in the operation field to create a new operation.
This action opens up the Operation page. Click on the Import Operation, and fill out the modal with your credentials:
- Atom: Select the Atom you want to run your Boomi process in
- Connection: The AWS S3 REST connection you just created.
Click the Next button to open the Amazon S3 REST Operation Import modal. In this modal, select the S3 bucket with the files you want to get, then click on the Next -> Finish buttons.
Click the Save and Close button to save your Amazon S3 operation and return to the sidebar.
Next, you pass in the file key as a parameter to the next AWS call, allowing you to read the content of the file key. To pass the file key as a parameter, go to your sidebar, navigate to the Parameters section, and click the + button to create a new parameter.
This action opens up a Parameter Value modal. In this modal, fill in the following information:
- Input - Select the ID field in the New Input modal
- Type - Profile Element
- Profile Type - XML
- Profile - Your newly created query Amazon S3 profile
- Element - Key
Click on the OK button to save your parameter. Your newly created parameter should appear in the parameter section.
Then click on the OK button to save your Amazon S3 REST configuration.
5. Change file type from CSV to XML
Next, you will transform the CSV files you get from the AWS S3 bucket to an XML format for more compatibility with your Salesforce application.
To change this CSV format, add a Map Shape to the Boomi process. Click on this shape to configure it, which opens up a Map sidebar.
Click the + sign in the Map input field in this sidebar to open the Map page.
Go to the Map source section on this page and select the Choose link.
This action opens the Choose a Source Profile modal; in this modal, select the Profile Type of Flat File, and in the Profile field, select the CSV profile you created previously.
Click the OK button to see your data fields populated in the Elements menu.
Next, Click the Save and Close button, as you will need to create a Salesforce profile before you can map the data from the CSV file to the Salesforce field.
6. Adding a Salesforce connector to store the data
Next, add a Salesforce connector to your Boomi process.
Click the Salesforce connector to open the sidebar and configure it. In the sidebar, fill out the input fields with the following information:
- Connection: Salesforce connection allows you to connect your Boomi process to the Salesforce
- Action: The action will be Send as you are upserting data into your Salesforce application instead of retrieving
- Operation: This defines how to interact with your Salesforce account.
To learn how to create a Salesforce connection and Operation, check out How to sync your MySQL database with Salesforce in Docker using Boomi.
Click the OK button to save the configuration for the Salesforce connector Shape.
7. Mapping the CSV values to the Salesforce fields
To map your data to the fields in Salesforce, open up your CSV to XML map shape, and click the pencil icon to edit the Map shape.
On the right side of the page, the Map destination, find the Choose button and click it. This action will open the Choose a Profile modal.
Select XML from the Profile Type dropdown menu within the modal. Then, choose the Salesforce Profile you just created. Confirm your selection by clicking OK.
The next step involves linking the corresponding fields between your CSV file and Salesforce. Simply drag your cursor from a field on the left side and connect it to its matching field on the right. For example, in the diagram below, the following fields are mapped together:
- booking_id -> Booking_ID_c
- customer_name -> Name
- customer_email ->Email_c
- customer_company -> Company_c
Once you've mapped all the fields, your page should resemble the example provided.
Next, click the Save and Close button to save this Map and return to the sidebar.
Then, click the OK button on the sidebar to save your Map Shape configuration and return to your Boomi process canvas.
Finally, add an End and Continue shape to signify the end of the process.
6. Testing the Boomi process
To verify if the process is working, click the Arrange button, then save and test the process.
Next, click on the Salesforce application and navigate to the Shape Source Data tab below.
You will see the data packets; click on one of these packets to see the data in your database.
After completing the process, your database data will be visible in your Salesforce application.
That's a Wrap
This guide has shown you how to build a basic ETL process using Boomi. You learned how to extract data from your AWS S3 bucket, check if it was modified today, transform it into an XML file format, and finally load it into your Salesforce application for easy access.