- 22 Oct 2020
- 10 Minutes to read
-
Print
-
DarkLight
-
PDF
Get started using Aito Command Line Interface
- Updated on 22 Oct 2020
- 10 Minutes to read
-
Print
-
DarkLight
-
PDF
Before you begin
Getting an Aito instance
If you want to follow along with this get started guide, you'll have to go and get your own Aito instance from the Aito Console.
- Sign in or create an account, if you haven't got one already.
- In the Aito Console go to the instances page and click the "Create an instance" button.
- Select the instance type you want to create and fill in needed fields, Sandbox is the free instance for testing and small projects. Visit our pricing page, to learn more about the Aito instance types.
- Click "Create instance" and wait for a moment while your instance is created, you will receive an email once your instance is ready.
Getting the Aito Command Line Interface tool
The Aito Command Line Interface (CLI) tool works using Python, so you will have to have Python3.6 or higher installed.
Install Aito CLI
To install the Aito CLI you can use pip:
pip install aitoai==0.4.0
Running the help command
You can use aito --help
to get detailed information on the aito
Command Line tool.
aito --help
To get more information on a specific operation, you can include the operation name. For example, if you want to know what database
does you can use the following call.
aito quick-add-table --help
Configuring the CLI
To access your Aito instance, you will need to get the Aito instance API URL and read/write API key from the Aito console. To find the information follow these steps:
- Log in to the Aito console.
- Go to the Instances page and click on the instance you want to use.
- Select the overview tab and there you should be able to access the instance API URL and read/write API key. The API key is shown by pressing the eye icon.
Then you can use the information to define the configuration for the CLI using the following command.
aito configure
The CLI will ask you to for the instance URL and API key. Be sure to use the read/write API key in order to go through all of the steps in this guide. After you have given the URL and API ke, the CLI creates a credentials file to $HOME/.config/aito/credentials (%UserProfile% in Windows) which it then uses when accessing the Aito instance.
Store the URL and API key as environment variables as well, in order to easily copy paste the curl examples in this guide. Use the following environment variables.
Environment variable | Value |
---|---|
AITO_INSTANCE_URL | your-aito-instance-url |
AITO_API_KEY | your-api-key |
In Unix based systems you can define the environment variables in the command line as follows (for one session use).
export AITO_INSTANCE_URL=your-aito-instance-url
export AITO_API_KEY=your-api-key
TL;DR
If you're in a hurry all of the mentioned steps can be done with just one command. The downside is that you will have no control over the schema created. If you don't mind this and want to skip the data handling steps (2-5) to get straight into predicting, you can use the following command.
aito quick-add-table --file-format csv --table-name Titanic train.csv
All of the commands described in this guide are as follows.
- Download CSV (train.csv):
https://www.kaggle.com/c/titanic/data - Infer schema:
aito infer-table-schema csv train.csv > titanic_schema.json
- Create table:
aito create-table Titanic titanic_schema.json
- Convert data:
aito convert csv -s titanic_schema.json --json train.csv > titanic_data.json
- Upload data:
aito upload-entries Titanic < titanic_data.json
- Make query :
aito predict '
{
"from": "Titanic",
"where": {
"Pclass": 1,
"Sex": "female"
},
"predict": "Survived"
}'
- Evaluate the results:
aito evaluate --use-job '
{
"test": {
"$index": {
"$mod": [4, 0]
}
},
"evaluate": {
"from": "Titanic",
"where": {
"Pclass": {"$get": "Pclass"},
"Sex": {"$get": "Sex"}
},
"predict": "Survived"
},
"select": ["trainSamples", "testSamples", "baseAccuracy", "accuracyGain", "accuracy", "error", "baseError"]
}'
- Clean up:
aito delete-database
Intro
This get started guide uses the famous Titanic dataset as an example of how to make predictions using Aito. Titanic was a passenger liner (the biggest of the time) that collided with an iceberg on her maiden voyage on April 15, 1912 which lead to the ship sinking into the abyss.
The dataset and problem framing are quite simple but they demonstrate the steps of how to work with Aito, so you can go ahead and start making predictions with your own data and answer the questions you are curious about.
The problem
When starting to use Aito, you will want to have the problem you're solving framed as a question to help with creating the queries. In this guide, we want to answer the question of "What kind of people were more likely to survive the accident?" by using the Titanic passenger data (i.e. name, age, gender, socio-economic class, etc.)
Data
You can download the dataset from here. The name of the data file is train.csv
.
Aito needs data from the past in order to make predictions for the future. The Titanic dataset includes passenger details such as the class of the passenger, sex, age and so on. The details can be used to define the person we want to predict survival for. The value we want to predict also has to be encoded in the data as a column (or a feature in data science terms), in this case, it is survived
.
Snapshot of the data
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.925 | S | |
4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35 | 1 | 0 | 113803 | 53.1 | C123 | S |
5 | 0 | 3 | Allen, Mr. William Henry | male | 35 | 0 | 0 | 373450 | 8.05 | S |
Table schema definition
Data lives in Aito as tables. The train.csv
of the Titanic dataset will be put into Aito as one table which will be called Titanic. It is possible to use linked tables in Aito but in this example having just one table is enough.
In order to get data uploaded into Aito you will have to define the data schema for the Titanic table. By defining the schema you will tell Aito how to handle different columns in the data, for example, whether a column's values should be handled as integer or boolean values. Aito accepts numeric (integers, decimals), boolean, string and text data types. The variable nullable defines whether the column includes empty values, nullable: true
means the column can have empty values. Analyzers are used for text columns so columns whose values have longer sentences.
For the Titanic dataset, the table schema can for example be defined as follows.
{
"columns": {
"Age": {
"nullable": true,
"type": "Decimal"
},
"Cabin": {
"analyzer": "en",
"nullable": true,
"type": "Text"
},
"Embarked": {
"nullable": true,
"type": "String"
},
"Fare": {
"nullable": false,
"type": "Decimal"
},
"Name": {
"analyzer": "en",
"nullable": false,
"type": "Text"
},
"Parch": {
"nullable": false,
"type": "Int"
},
"PassengerId": {
"nullable": false,
"type": "Int"
},
"Pclass": {
"nullable": false,
"type": "Int"
},
"Sex": {
"nullable": false,
"type": "String"
},
"SibSp": {
"nullable": false,
"type": "Int"
},
"Survived": {
"nullable": false,
"type": "Int"
},
"Ticket": {
"analyzer": "pt-br",
"nullable": false,
"type": "Text"
}
},
"type": "table"
}
You can either copy-paste the above JSON into a file named titanic_schema.json or run the aito infer-table-schema
command to create the table schema file you can then use to create the Titanic table into Aito.
Aito CLI Infer schema
To quickly infer the table schema from the data use the following Aito CLI command.
aito infer-table-schema csv train.csv > titanic_schema.json
Always check the created file that the types look correct. After uploading data into Aito the schema will be immutable and you can only change it by deleting all the data in Aito and reuploading the schema.
Aito CLI Create table
To create the Titanic table into Aito using the CLI you can use the following command.
aito create-table Titanic titanic_schema.json
Upload data
The data has to be in the JSON format in order to be uploaded into Aito. The data can be uploaded by reading the file by entries or by uploading the whole file. In this guide, we use the upload entries functionality.
Convert into JSON
You can run the following command to format the CSV into the JSON format.
aito convert csv -s titanic_schema.json --json train.csv > titanic_data.json
Upload entries
To upload the data to the created Titanic table you can use the following command.
aito upload-entries Titanic < titanic_data.json
Run a query
Aito query's generic syntax
The aito query follows a syntax that is based on this rule:
From a given context (a specific table and what is known from that table), use an operation to find the known or the unknown.
{
"from" : define the initial context (table name),
"where" : more details of the context,
"operation_name" : operation to be perform,
"orderBy" : sort the result by some metric,
"select" : select specific attributes or parts of the result,
"offset" : define the number of rows in the result to be skipped,
"limit" : limit the number of rows to be shown in the result
}
Making the query
When the data is in Aito, you can start making queries to the data. For example, if you want to answer the question "How likely would a first-class woman survive the Titanic accident?" you can create the following query to the _predict
endpoint.
aito predict '
{
"from": "Titanic",
"where": {
"Pclass": 1,
"Sex": "female"
},
"predict": "Survived"
}'
Aito's query language resembles SQL in that it has the from
and where
clauses. In the query, we state that we want to use the data in the Titanic table, from: "Titanic"
, and the attributes of the passengers, we want to get survival for, are defined by where
. With the predict
clause we want to state which attribute we are predicting.
Copy the curl command to the command and press enter. Aito will then immediately return the results.
Results
For the query, Aito will return the following result.
{
"offset" : 0,
"total" : 2,
"hits" : [ {
"$p" : 0.8853714713085219,
"field" : "Survived",
"feature" : 1
}, {
"$p" : 0.11462852869147827,
"field" : "Survived",
"feature" : 0
} ]
From the result $p
is the probability of the field
having a feature
. So for example with 89% probability, a first-class female passenger survived the Titanic accident. Aito also returns the probabilities of the field having other features. In the case of survival the result is binary so the passenger either survived (=1) or didn't (=0) so Aito returns results for two features.
You can try out how different kinds of people survived the accident by changing the attributes defined in the query's where
clause.
Evaluation of results
Result evaluation is an important step when calculating probabilities. It gives you the information on how accurate your predictions are and gives you a metric which to use for improving the prediction. Evaluation is an in-built functionality of Aito.
Evaluation can be run for the previous query as follows.
aito evaluate --use-job '
{
"test": {
"$index": {
"$mod": [4, 0]
}
},
"evaluate": {
"from": "Titanic",
"where": {
"Pclass": {"$get": "Pclass"},
"Sex": {"$get": "Sex"}
},
"predict": "Survived"
},
"select": ["trainSamples", "testSamples", "baseAccuracy", "accuracyGain", "accuracy", "error", "baseError"]
}'
The test
variable defines what data is used as the test data. Aito runs the evaluation by splitting the data in the database into a test and training dataset. The test dataset is the unknown data to Aito for which we will make predictions using Aito and the training dataset is the set for which we already know the correct values. The test and training set prediction values are then compared to get the accuracy of the prediction query. In this example, the test set is defined to be every fourth row of the data in the DB starting from index 0, as defined by $mod
. In the evaluate
variable we define the query we want to evaluate which is the same as was used in the query step. The $get
operator takes the values per row for the column. With the select
operator you can restrict the values which are returned by the evaluate endpoint.
The request starts an evaluation job as it can take some time while Aito calculates the accuracy using multiple datapoints, so it runs possibly hundreds or thousands of predictions depending on the size of your dataset.
Evaluation result
{
"trainSamples": 668.0,
"testSamples": 223,
"baseAccuracy": 0.6367713004484304,
"accuracyGain": 0.15246636771300448,
"accuracy": 0.7892376681614349,
"error": 0.21076233183856508,
"baseError": 0.36322869955156956
}
Variable accuracy
shows the accuracy of Aito for the given query. The baseAccuracy
is the accuracy that would be achieved just by using a Naive Bayesian algorithm for the prediction. For more about the response values, check our API documentation.
Deleting the data
If you then want to start your project on a clean slate, you can delete the schema and all data from the Aito instance with the following command.
aito delete-database