Data Preparation with Oracle OAC Machine Learning
Financial institutions want to predict customers that are likely to open Term deposits after a marketing campaign which in turn will help the marketing team with contacting the desired customers.
A Term deposit is a deposit that a bank or a financial institution offers with a fixed rate (often better than just opening deposit account) in which your money will be returned back at a specific maturity time. This is the process by which companies create value for customers and build strong customer relationships in order to capture value from customers in return.
This is the first installment of three-part blog series which is divided into these three segments:
- Part 1: OAC Machine Learning – Let’s Get Started & Data Preparation
- Part 2: OAC Machine Learning – Gain Deeper Insights Through Visual Exploration
- Part 3: OAC Machine Learning – Knowing Your Model Performance, Simplified
In this blog we identify the data, load it into OAC and prepare the data using one of the OAC built-in features “Recommendations” that provides recommendations in data preparation stage and enriches the data in a consistent format which saves a lot of time and effort.
We will cover “Part 1: OAC Machine Learning – Let’s Get Started & Data Preparation” in this section
Preparing the data for Machine Learning in OAC
When data is growing at an unprecedented pace, one can bring Machine Learning to a business user or an ML enthusiast or someone who wants quick results using OAC built-in features that will help with Data Preparation.
Oracle Analytics Cloud falls in the category of augmented analytics, which automates insights using Machine Learning, a platform to quickly leverage Machine Learning algorithms and powerful visualizations. Let us solve a business problem using the Machine Learning features of Oracle Analytics Cloud Service (OAC).
Before jumping into the details on data preparation, here is the full data science pipeline to arrive at a solution for our business problem, mentioned above.
- Data Loading and Preparation
You can use the data that is existing in OAC, or you can either upload the data feed from the local machine or connect to other sources using 50 built-in connectors.
OAC offers connectivity to various data sources ranging from Flat Files (Excel & CSV), Relational Databases (Oracle, MS SQL, MySQL, PostgreSQL, IBM DB2, etc.), MongoDB & Apache Hive and Non-Oracle Cloud Applications (Google Drive, Google Analytics, Dropbox etc.).
Let us look into two major types of data sources in this blog.
- Create Data Sets from Spreadsheets
After we gathered the data in the form of a csv file from the marketing campaigns, you can just drag and drop the file which is on the local machine.
A dataset is a structured collection of data generally associated with a unique body of work. Using dataset as a source, we are going to predict our business problem. Let’s get into the details on how to create data sets from spreadsheets.
- You can create a data set from an Excel spreadsheet (XLSX or XLS), CSV file, or TXT file located on your computer.
- On the Home page, click Create, and then click Data Set.
- Click File and browse to select an XLSX or XLS (with unpivoted data), CSV, or TXT file.
- Click Open to upload and open the selected spreadsheet.
- Click Add to create the data set. The View Data Source page is displayed.
- In the View Data Source page, you have the option to view the column properties and specify their formatting. The column type determines the available formatting options.
- Now you can find this dataset in Data -> Data Sets tab.
- Create Data Sets from Databases
If your data source is not a file, then you can also connect to databases.
- If you haven’t already selected a connection, click Create Connection, select connection type and specify connection details for your data source.
- In the Data Set editor, browse or search for and double-click a schema, and then choose the table that you want to use in the data set. When you double-click to select a table, a list of its columns is displayed.
- In the column list, browse or search for the columns you want to include in the data set.
- Alternatively, you can select the Enter SQL option to view or modify the data source’s SQL statement or to write a SQL statement.
- Click Add. The View Data Source page is displayed.
- In the View Data Source page, you can optionally view the column properties and specify their formatting.
- As soon as you add the data either through spreadsheets or databases, we could see the built-in Recommendations feature on right side.
- It says, would you like to extract a part from the column? or would you want me to enrich the column as it is inconsistent. Just click on it, will convert it for you, which saves a lot of time and effort.
- If the results are as you expected, click on apply script and proceed.
In this blog we have walked through Identifying, loading and preparing the data. This saves us a lot of time and effort in data cleaning by using Recommendations smart feature. In fact, data preparation accounts for about 80% of the work of data scientists.
In my subsequent blog we shall explore more exciting features like ‘Explain’ smart feature through Exploratory Data analysis(EDA) which show us hidden relationships and attributes present in our data even before we throw it at a ML model. Stay tuned to discover quick visual insights in our data.