Get started using Python
  • 20 Jul 2020
  • 10 Minutes To Read
  • Print
  • Share
  • Dark
    Light

Get started using Python

  • Print
  • Share
  • Dark
    Light

Before you begin


Getting an Aito instance

If you want to follow along with this get started guide, you'll have to go and get your own Aito instance from the Aito Console.

  1. Sign in or create an account, if you haven't got one already.
  2. In the Aito Console go to the instances page and click the "Create an instance" button.
  3. Select the instance type you want to create and fill in needed fields, Sandbox is the free instance for testing and small projects. Visit our pricing page, to learn more about the Aito instance types.
  4. Click "Create instance" and wait for a moment while your instance is created, you will receive an email once your instance is ready.

Getting the Aito Python SDK

The Aito Python SDK works using Python, so you will have to have Python3.6 or higher installed.

Install Aito Python SDK

In this guide the Aito Python SDK will be used to help with integration with the Aito instance.

To install the Aito Python SDK you can use pip:

pip install aitoai

Configuring the SDK

To access your Aito instance, you will need to get the Aito instance API URL and read/write API key from the Aito console. To find the information follow these steps:

  1. Log in to the Aito console.
  2. Go to the Instances page and click on the instance you want to use.
  3. Select the overview tab and there you should be able to access the instance API URL and read/write API key. The API key is shown by pressing the eye icon.

api_info

Store the URL and the read/write API key in your Python code.

AITO_INSTANCE_URL="your-aito-instance-url"
AITO_API_KEY="your-read/write-api-key"

TL;DR


All of the code described in this guide is as follows.

  1. Download CSV (train.csv):
    https://www.kaggle.com/c/titanic/data
  2. Setup your instance information:
AITO_INSTANCE_URL="your-aito-instance-url"
AITO_API_KEY="your-read/write-api-key"
  1. Import needed libaries:
from aito.client import AitoClient
from aito.schema import (
    AitoTableSchema,
    AitoStringType,
    AitoTokenNgramAnalyzerSchema,
    AitoAliasAnalyzerSchema
)
import numpy as np
import pandas as pd
  1. Infer schema:
# Read the data as pandas dataframe
titanic_df = pd.read_csv('train.csv')

# Infer the schema
titanic_schema = AitoTableSchema.infer_from_pandas_data_frame(titanic_df)

# Feel free to change the schema as you see fit. For example:

# Change the data type  of the `PassengerId` column  to `String` instead of `Int`
titanic_schema['PassengerId'].data_type = AitoStringType()

# Change the analyzer of the `Name` column
titanic_schema['Name'].analyzer = AitoTokenNgramAnalyzerSchema(
  source=AitoAliasAnalyzerSchema('en'),
  min_gram=1,
  max_gram=3
)
  1. Create table:
table_name = "Titanic"

# Configure Aito Client
aito_client = AitoClient(
    instance_url=AITO_INSTANCE_URL, 
    api_key=AITO_API_KEY
)

# Create the table into Aito
aito_client.create_table(
    table_name=table_name, 
    table_schema=titanic_schema
)

# Check your table schema in Aito
aito_client.get_table_schema(table_name=table_name)
  1. Convert data:
# Transform the PassengerId from int to string
titanic_df['PassengerId'] = titanic_df['PassengerId'].apply(str)

# Transform NaN values to None
titanic_df = titanic_df.where(pd.notnull(titanic_df), None)

entries = titanic_df.to_dict(orient="records")
  1. Upload data:
aito_client.upload_entries(
    table_name=table_name, 
    entries=entries
)
  1. Make query :
query_result = aito_client.request(
    method='POST',
    endpoint='/api/v1/_predict',
    query={
        "from": "Titanic",
        "where": {
            "Pclass": 1,
            "Sex": "female"
        },
        "predict": "Survived"
    }
)
  1. Evaluate results:
evaluation_result = aito_client.job_request(
    job_endpoint='/api/v1/jobs/_evaluate',
    query={
        "test": {
            "$index": {
                "$mod": [4, 0]
            }
        },
        "evaluate":  {
            "from": "Titanic",
            "where": {
                "Pclass": {"$get": "Pclass"},
                "Sex": {"$get": "Sex"}
            },
            "predict": "Survived"
        },
        "select": ["trainSamples", "testSamples", "baseAccuracy", "accuracyGain", "accuracy", "error", "baseError"]
    }
)
  1. Clean up:
aito_client.delete_database()

Intro


This get started guide uses the famous Titanic dataset as an example of how to make predictions using Aito. Titanic was a passenger liner (the biggest of the time) that collided with an iceberg on her maiden voyage on April 15, 1912 which lead to the ship sinking into the abyss.

The dataset and problem framing are quite simple but they demonstrate the steps of how to work with Aito, so you can go ahead and start making predictions with your own data and answer the questions you are curious about.

The problem


When starting to use Aito, you will want to have the problem you're solving framed as a question to help with creating the queries. In this guide, we want to answer the question of "What kind of people were more likely to survive the accident?" by using the Titanic passenger data (i.e. name, age, gender, socio-economic class, etc.)

Data


You can download the dataset from here. The name of the data file is train.csv.

Aito needs data from the past in order to make predictions for the future. The Titanic dataset includes passenger details such as the class of the passenger, sex, age and so on. The details can be used to define the person we want to predict survival for. The value we want to predict also has to be encoded in the data as a column (or a feature in data science terms), in this case, it is survived.

Snapshot of the data

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.925 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.05 S

Import needed libraries

import the following libraries in your Python code.

from aito.client import AitoClient
from aito.schema import (
    AitoTableSchema,
    AitoStringType,
    AitoTokenNgramAnalyzerSchema,
    AitoAliasAnalyzerSchema
)
import numpy as np
import pandas as pd

Table schema definition


Data lives in Aito as tables. The train.csv of the Titanic dataset will be put into Aito as one table which will be called Titanic. It is possible to use linked tables in Aito but in this example having just one table is enough.

In order to get data uploaded into Aito you will have to define the data schema for the Titanic table. By defining the schema you will tell Aito how to handle different columns in the data, for example, whether a column's values should be handled as integer or boolean values. Aito accepts numeric (integers, decimals), boolean, string and text data types. The variable nullable defines whether the column includes empty values, nullable: truemeans the column can have empty values. Analyzers are used for text columns so columns whose values have longer sentences.

For the Titanic dataset, the table schema can for example be defined as follows.

{
    "columns": {
        "Age": {
            "nullable": true,
            "type": "Decimal"
        },
        "Cabin": {
            "analyzer": "en",
            "nullable": true,
            "type": "Text"
        },
        "Embarked": {
            "nullable": true,
            "type": "String"
        },
        "Fare": {
            "nullable": false,
            "type": "Decimal"
        },
        "Name": {
            "analyzer": "en",
            "nullable": false,
            "type": "Text"
        },
        "Parch": {
            "nullable": false,
            "type": "Int"
        },
        "PassengerId": {
            "nullable": false,
            "type": "String"
        },
        "Pclass": {
            "nullable": false,
            "type": "Int"
        },
        "Sex": {
            "nullable": false,
            "type": "String"
        },
        "SibSp": {
            "nullable": false,
            "type": "Int"
        },
        "Survived": {
            "nullable": false,
            "type": "Int"
        },
        "Ticket": {
            "nullable": false,
            "type": "String"
        }
    },
    "type": "table"
}

You use the infer_from_pandas_data_frame function to get the schema from the data file, you can then use the schema to create the Titanic table into Aito.

Aito SDK Infer schema

To quickly infer the table schema from the data use the following Aito SDK function.

# Read the data as pandas dataframe
titanic_df = pd.read_csv('train.csv')

# Infer the schema
titanic_schema = AitoTableSchema.infer_from_pandas_data_frame(titanic_df)

# Feel free to change the schema as you see fit. For example:

# Change the data type  of the `PassengerId` column  to `String` instead of `Int`
titanic_schema['PassengerId'].data_type = AitoStringType()

# Change the analyzer of the `Name` column
titanic_schema['Name'].analyzer = AitoTokenNgramAnalyzerSchema(
    source=AitoAliasAnalyzerSchema('en'),
    min_gram=1,
    max_gram=3
)

Always check the inferred schema that the types look correct. After uploading data into Aito the schema will be immutable and you can only change it by deleting all the data in Aito and reuploading the schema.

Aito SDK Create table

To create the Titanic table into Aito using the SDK you can use the following functions.

# Configure Aito Client
aito_client = AitoClient(
    instance_url=AITO_INSTANCE_URL,
    api_key=AITO_API_KEY
)

# Create the table into Aito
aito_client.create_table(
    table_name=table_name,
    table_schema=titanic_schema
)

# Check your table schema in Aito
aito_client.get_table_schema(table_name=table_name)

Upload data


The data has to be in the JSON format in order to be uploaded into Aito. The data can be uploaded by reading the file by entries or by uploading the whole file. In this guide, we use the upload entries functionality.

Convert into JSON

You can run the following function to format the CSV into the JSON format.

# Transform the PassengerId from int to string
titanic_df['PassengerId'] = titanic_df['PassengerId'].apply(str)

# Transform NaN values to None
titanic_df = titanic_df.where(pd.notnull(titanic_df), None)

entries = titanic_df.to_dict(orient="records")

You can also create your own function that transforms the data into the correct JSON format, generators are also accepted as input by the upload_entries function.

Upload entries

To upload the data to the created Titanic table you can use the following function.

aito_client.upload_entries(
    table_name=table_name,
    entries=entries
)

Run a query


Aito query's generic syntax

The aito query follows a syntax that is based on this rule:

From a given context (a specific table and what is known from that table), use an operation to find the known or the unknown.

{
  "from"            : define the initial context (table name),
  "where"           : more details of the context,
  "operation_name"  : operation to be perform,
  "orderBy"         : sort the result by some metric,
  "select"          : select specific attributes or parts of the result,
  "offset"          : define the number of rows in the result to be skipped,
  "limit"           : limit the number of rows to be shown in the result
}

Making the query

When the data is in Aito, you can start making queries to the data. For example, if you want to answer the question "How likely would a first-class woman survive the Titanic accident?" you can create the following query to the _predict endpoint.

query_result = aito_client.request(
    method='POST',
    endpoint='/api/v1/_predict',
    query={
        "from": "Titanic",
        "where": {
            "Pclass": 1,
            "Sex": "female"
        },
        "predict": "Survived"
    }
)

The request function can be used for making queries to any endpoint Aito offers. Aito's query language resembles SQL in that it has the from and where clauses. In the query, we state that we want to use the data in the Titanic table, from: "Titanic", and the attributes of the passengers, we want to get survival for, are defined by where. With the predict clause we want to state which attribute we are predicting.

Results


For the query, Aito will return the following result.

{
  "offset" : 0,
  "total" : 2,
  "hits" : [ {
    "$p" : 0.8853714713085219,
    "field" : "Survived",
    "feature" : 1
  }, {
    "$p" : 0.11462852869147827,
    "field" : "Survived",
    "feature" : 0
  } ]

From the result $p is the probability of the fieldhaving a feature . So for example with 89% probability, a first-class female passenger survived the Titanic accident. Aito also returns the probabilities of the field having other features. In the case of survival the result is binary so the passenger either survived (=1) or didn't (=0) so Aito returns results for two features.

You can try out how different kinds of people survived the accident by changing the attributes defined in the query's where clause.

Evaluation of results


Result evaluation is an important step when calculating probabilities. It gives you the information on how accurate your predictions are and gives you a metric which to use for improving the prediction. Evaluation is an in-built functionality of Aito.

Evaluation can be run for the previous query as follows.

evaluation_result = aito_client.job_request(
    job_endpoint='/api/v1/jobs/_evaluate',
    query={
        "test": {
            "$index": {
                "$mod": [4, 0]
            }
        },
        "evaluate":  {
            "from": "Titanic",
            "where": {
                "Pclass": {"$get": "Pclass"},
                "Sex": {"$get": "Sex"}
            },
            "predict": "Survived"
        },
        "select": ["trainSamples", "testSamples", "baseAccuracy", "accuracyGain", "accuracy", "error", "baseError"]
    }
)

The test variable defines what data is used as the test data. Aito runs the evaluation by splitting the data in the database into a test and training dataset. The test dataset is the unknown data to Aito for which we will make predictions using Aito and the training dataset is the set for which we already know the correct values. The test and training set prediction values are then compared to get the accuracy of the prediction query. In this example, the test set is defined to be every fourth row of the data in the DB starting from index 0, as defined by $mod. In the evaluate variable we define the query we want to evaluate which is the same as was used in the query step. The $get operator takes the values per row for the column. With the select operator you can restrict the values which are returned by the evaluate endpoint.

The request starts an evaluation job as it can take some time while Aito calculates the accuracy using multiple datapoints, so it runs possibly hundreds or thousands of predictions depending on the size of your dataset.

Evaluation result

{
  "trainSamples": 668.0,
  "testSamples": 223,
  "baseAccuracy": 0.6367713004484304,
  "accuracyGain": 0.15246636771300448,
  "accuracy": 0.7892376681614349,
  "error": 0.21076233183856508,
  "baseError": 0.36322869955156956
}

Variable accuracy shows the accuracy of Aito for the given query. The baseAccuracyis the accuracy that would be achieved just by using a Naive Bayesian algorithm for the prediction. For more about the response values, check our API documentation.

Deleting the data

If you then want to start your project on a clean slate, you can delete the schema and all data from the Aito instance with the following command.

aito_client.delete_database()
Was This Article Helpful?