Data Integration on Oracle Cloud Infrastructure
This blog introduces you to Oracle Cloud Infrastructure (OCI) Data Integration service and reviews how to setup Data Integration workspace in OCI. I love Data Integration in OCI and hope to get you excited about this great product as well. I will deep dive into the details and answer many questions about data integration. As you may be aware, there are several data integration tools like ODI11g, ODI12c, ODI on Marketplace, however I would like to dive into what Oracle Cloud Infrastructure Data Integration is and how it can benefit you.
Oracle Cloud Infrastructure Data Integration is a next generation, fully managed, multi-tenant, serverless, native cloud ETL service. It will help you with common extract, load, and transform (ETL) tasks such as ingesting data from different sources, cleansing, transforming, and reshaping that data and then efficiently loading it to target data sources on Oracle Cloud Infrastructure.
The Key Features include:
- Low code approach- It allows users to design graphically, visually the programs that run in the OCI.
- Data immersive user experience to boost productivity
- Hybrid execution powered by Spark and SQL push-down capabilities
- Rules-based data integration pattern to support schema evolution
- Serverless execution, pay-as you go pricing model
Use Cases for OCI DI:
I will review a few use cases below and how they can benefit you and your organization.
Use Case 1: Data integration for big data, data lakes, and data science
- Efficiently load and transform data at scale into Data Lakes for data science and analytics.
- Load the data into object storage and create high-quality models more quickly using OCI data science
Use Case 2: Data integration for data marts and data warehousing
- Load and transform transactional data at scale into Data warehouse (Eg: Autonomous Data warehouse) for analytics purposes.
The table below compares the differences in versions of ODI, ODI marketplace, OCI Data Integration.
Key Factors | ODI | ODI Marketplace | OCI Data Integration |
---|---|---|---|
Deployment | On-Premise -customer managed | OCI Compute Instance, Oracle Managed | OCI PaaS |
Operating system | Linux or Windows | Only Linux | NA |
DB Repository DB | ADW not supported | Oracle,ADW,MySQL etc. | NA |
Installation | Manual | Launch the compute instance(agent and studio are preinstalled) | Oracle Managed |
Supported source and targets | Oracle,Non-Oracle DB ,Hive, HDFS ,object storage etc. | Similar like ODI on-prem | Oracle,Non-Oracle DB ,Oracle Fusion Applications, HDFS, ,object storage many more. |
Pricing | License needed | OCPU hours(BYOL or free for now) | OCPU hours |
When to use what (ODI On-Premise vs. ODI Marketplace vs. OCI DI) depends on many factors. I have included some scenarios below:
- If your source and target DBs are both On-prem, then ODI on-prem is the good choice.
- ODI Marketplace is preferred in cases of:
- You don’t want to maintain local infrastructure
- Avoid ODI installation and configuration
- Fully Control Scalability Dynamically
- OCI DI is preferred for
- Data Lakehouse based implementations
- Leverage data for AI/ML Data Science/ Big Data use cases
- At least one of Source/Target DBs is on the cloud
- Especially when Non-Oracle DB type of data assets are involved
OCI Data Integration Vision:
Important steps to perform before setting up OCI DI Workspace:
In order to ensure OCI DI Workspace is set up correctly and providing the most benefits, I recommend provisioning Data Integration service as outlined below. The following should be taken care prior to beginning the implementation steps to set up Data Integration below.
Your environment should consists of:
- Windows or Linux based system
- A web browser installed on your systems, preferably Mozilla firefox or Google Chrome etc.
- Record all the Oracle Cloud Infrastructure (OCI) is assigned to you. Record the following:
- Tenancy name or cloud account name
- Username
- Password
- Compartment to be used
Implementation steps to setup Data Integration:
With your environment set up you should be ready for the actual implementation to create the Data Integration workspace. Below are the two perquisites to setup the data integration workspace in OCI.
- Create an OCI Data Integration Polices in OCI Data Integration
- Create a Virtual Cloud Network in Oracle Cloud Infrastructure
Let’s get started:
1. Create an OCI Data Integration Polices in OCI Data Integration:
Policies required for OCI Data Integration will be in addition to the regular policies used in Oracle Cloud Infrastructure for accessing other necessary resources.
In the below policy creation statements,
Group-name: The group that your OCI user belongs.
Compartment-name: The OCI Compartment you are using.
- allow group <group-name> to manage dis-workspaces in compartment <compartment-name>
- allow group <group-name> to manage dis-work-requests in compartment <compartment-name>
- allow group <group-name> to use virtual-network-family in compartment <compartment-name>
- allow group <group-name> to manage tag-namespaces in compartment <compartment-name>
Policy Editor:
2. Create a Virtual Cloud Network in Oracle Cloud Infrastructure:
You will need a Virtual Cloud Network (VCN) to use OCI Data Integration .Oracle virtual cloud networks provide customizable and private cloud networks in Oracle Cloud Infrastructure.
To create a VCN, use the following steps:
-
- 1. In the OCI console, open the navigation menu. Go to Networking and click Virtual Cloud Networks.
- 2. Select a Compartment that has been allocated to you.
- 3. On the VCN page, click Start VCN Wizard to create a new VCN.
- 4. Select VCN with internet Connectivity and click Start VCN Wizard.
- 5. In the Create VCN with Internet Connectivity dialog, enter the VCN Name as you wish (In this case it is vcn_dataintegration) and the Compartment. Accept the default values in Configure VCN and Subnets and click next.
- 6. Verify the resource details and click “Create” to create the VCN.
Creating a Data Integration Workspace:
Before you can get started with Data Integration, you must create a workspace.
A workspace is an organizational construct to keep multiple data integration projects and their resources (data assets, data flows etc.).
Use the following steps to create DI workspace:
In the console, go to the navigation menu. Under Analytics & AI, go to Data Integration and click workspaces.
Next enter the workspace name, description and choose the VCN that you have created earlier and select the subnet type as private in your compartment.
The workspace is ready now. Click on the workspace to see the homepage of DI.
Additional information regarding OCI Data Integration can be found at the URL below:
https://docs.oracle.com/en-us/iaas/data-integration/home.htm
Conclusion
In this blog, you have seen that you can successfully setup an OCI Data Integration workspace in Oracle Cloud Infrastructure. Utilizing the steps that I outlined, you can now create data assets, data flows to transform and move the data to desired targets.
This capability helps data engineers and ETL developers with common extract, transform, and load (ETL) tasks such as ingesting data from a variety of data assets, cleansing, transforming, and reshaping that data and efficiently loading it to target data assets.
Please try out this process and give us your feedback.
Apps Associates’ Data and Analytics Practice is a team of 125 highly credential professionals dedicated to helping companies leverage their data assets to become data-driven organizations.
Stay tuned for more blogs in this series about OCI Data Integration. We will continue the series with the next topic “creating the data Assets in OCI Data Integration Workspace” (Object Storage, Autonomous Data Warehouse, Oracle Fusion Applications).