How to upload data
  • Updated on 17 Jul 2020
  • 7 minutes to read
  • Print
  • Share
  • Dark
    Light

How to upload data

  • Print
  • Share
  • Dark
    Light

This article discusses how to upload data into Aito.

Data upload workflow

These steps are needed in order to get data uploaded into Aito (some of the Aito tools can do all the steps):

  1. Create and upload a schema into Aito
  2. Transform data into
    A. JSON format
    B. NDJSON format
  3. Upload data
    A. Batch upload
    B. File upload

Create and upload schema

Before you can upload data into Aito the DB schema has to be defined.

If you would have a table that would look like the following.

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.925 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.05 S

The following schema could be created for it and uploaded to Aito. For more information on schema creation, you can read this article.

{
  "schema": {
       "Titanic": {
          "columns": {
            "Age": {
              "nullable": true,
              "type": "Decimal"
            },
            "Cabin": {
              "analyzer": "english",
              "nullable": true,
              "type": "Text"
            },
            "Embarked": {
              "nullable": true,
              "type": "String"
            },
            "Fare": {
              "nullable": false,
              "type": "Decimal"
            },
            "Name": {
              "analyzer": "english",
              "nullable": false,
              "type": "Text"
            },
            "Parch": {
              "nullable": false,
              "type": "Int"
            },
            "PassengerId": {
              "nullable": false,
              "type": "Int"
            },
            "Pclass": {
              "nullable": false,
              "type": "Int"
            },
            "Sex": {
              "nullable": false,
              "type": "String"
            },
            "SibSp": {
              "nullable": false,
              "type": "Int"
            },
            "Survived": {
              "nullable": false,
              "type": "Int"
            },
            "Ticket": {
              "analyzer": "brazilian",
              "nullable": false,
              "type": "Text"
            }
          },
          "type": "table"
        } 
    }
}

Transform data

Data needs to be in the JSON or NDJSON format in order to be uploaded into Aito. Some of the tooling we provide are able to make this transformation from different file formats, e.g. CSV or XLXS. But if you'd like to use the REST API, then you will have to take care of the transformation by yourself.

If you would have a table that would look like the following.

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.925 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.05 S

It would be transformed into the JSON format as follows.

[{
      "PassengerId": 3,
      "Survived": 1,
      "Pclass": 3,
      "Name": "Heikkinen, Miss. Laina",
      "Sex": "female",
      "Age": 26,
      "SibSp": 0,
       "Parch": 0,
       "Ticket": "STON/O2. 3101282"
      "Fare": 7.25,
      "Cabin": "",
      "Embarked": "S"
    },{
      "PassengerId": 4,
      "Survived": 1,
      "Pclass": 1,
      "Name": "Futrelle, Mrs. Jacques Heath (Lily May Peel)",
      "Sex": "female",
      "Age": 35,
      "SibSp": 1,
       "Parch": 0,
       "Ticket": "113803"
      "Fare": 53.1,
      "Cabin": "C123",
      "Embarked": "S"
    },{
      "PassengerId": 5,
      "Survived": 0,
      "Pclass": 3,
      "Name": "Allen, Mr. William Henry",
      "Sex": "male",
      "Age": 35,
      "SibSp": 0,
       "Parch": 0,
       "Ticket": "373450"
      "Fare": 8.05,
      "Cabin": "",
      "Embarked": "S"
    }
]

The Aito Python SDK offers a functionality to transform your data file into the correct format. You can either use it to write your own Python script or you can use the CLI's aito convert feature.

Uploading data

We offer multiple tools which can be used for data upload. The most flexible way of uploading data is the Aito API, then you will care of creating and uploading the schema, transforming the data and uploading the data by yourself with the help of the API. The Aito Python SDK offers helper functions in Python to get through the upload steps. The Aito CLI has helper commands and also a combined command which does all the necessary steps for you. The Aito Console will do all the needed steps automatically for you.

You can decide to either upload the data as a file (gzipped NDJSON) or in batches (JSON).

Aito Console

File upload

You also have the option to use the Aito Console for uploading data.

  1. Log in to your account and then on the Instances tab click on the instance you want to upload data into.
  2. On the instance page go to the Overview tab where you can use the Upload CSV file feature to upload data. The feature will infer a schema based on the data in the file, upload the schema, transform the data and finally upload the data into your Aito instance.

console_upload

Aito Command Line Interface (CLI)

For people keen on using the command line, we offer the Aito CLI which is a part of the Aito Python SDK. You will need Python3.6-3.8 in order for the CLI to work. Note that before uploading the data, the data has to be transformed either in the JSON or NDJSON format.

File upload

To upload the data in the the gzipped NDJSON format you can use the following command

aito upload-file tableName tableEntries.ndjson.gz

If you want to skip all the preprocessing steps (schema creation/upload and data transformation), you can use the following command. The quick-add-table accepts CSV, JSON, excel (XLS, XLSX) and NDJSON file formats.

aito quick-add-table --file-format csv --table-name tableName tableEntries.csv

Batch upload

When the data is in the JSON format, the following command can be used to upload the data.

aito upload-entries tableName < tableEntries.json

Aito Python SDK

If you're familiar with Python, you might want to try the Aito Python SDK for data upload. You can use it for either batch upload or file upload. You will need Python3.6-3.8 in order for the SDK to work.

File upload

Python code example of the file upload. The file has to be a gzipped NDJSON file (.ndjson.gz).

from aito.aito_client import AitoClient

aito_client = AitoClient(
    instance_url="your_aito_instance_url", 
    api_key="your_rw_api_key")

aito_client.upload_file(
    table_name='table_name', 
    file_path=file_path)

Batch upload

Python code example of the batch upload.

Configuration

from aito.aito_client import AitoClient

aito_client = AitoClient(
    instance_url="your_aito_instance_url", 
    api_key="your_rw_api_key")

Upload list of entries

entries = [
{
      "PassengerId": 3,
      "Survived": 1,
      "Pclass": 3,
      "Name": "Heikkinen, Miss. Laina",
      "Sex": "female",
      "Age": 26,
      "SibSp": 0,
      "Parch": 0,
      "Ticket": "STON/O2. 3101282"
      "Fare": 7.25,
      "Cabin": "",
      "Embarked": "S"
    }
]

aito_client.upload_entries(
    table_name="table_name",
    entries=entries)

Upload pandas dataframe

# convert DataFrame to a list of entries
entries = df.to_dict(orient="records")

aito_client.upload_entries(
    table_name="table_name", 
    entries=entries)

REST API

The REST API is the most flexible way of uploading data, as you can use any coding language you choose and can modify the schema and data as you find suitable.

File upload

The File API requires a minimum of three calls per uploaded table:

  1. Initiate the file upload process
  2. Upload compressed ndjson file to S3, using the signed URL
  3. Trigger file processing
  4. Poll the file processing status (Optional)

1. Initiate the file upload process

Request

curl -X POST \
   $AITO_INSTANCE_URL/api/v1/data/$TABLE_NAME/file \
  -H 'x-api-key: $AITO_API_KEY'

Response

{
  "expires": "2020-07-06T10:22:25",
  "id": "8ac204ab-7004-4164-8228-9871ae30ac13",
  "method": "PUT",
  "url": "https://aitoai-customer-uploads.s3.eu-west-1.amazonaws.com/your-env-name/table_name/8ac204ab-7004-4164-8228-9871ae30ac13..."
}

You will need the ID and URL from the response in the next steps. Times returned by Aito are in UTC.

2. Upload compressed NDJSON file to S3, using the signed URL

Use the full URL you got from step 1 to upload your data file into AWS S3.

curl -v --upload-file file_name.ndjson.gzip "https://aitoai-customer-uploads.s3.eu-west-1.amazonaws.com/your-env-name/table_name/8ac204ab-7004-4164-8228-9871ae30ac13..."

You should get a HTTP 200 OK response, if everything goes smoothly in the S3 upload phase.

3. Trigger file processing

Use the ID you got in step 1 to trigger the file upload from S3 to Aito.

Request

curl -X POST \
  $AITO_INSTANCE_URL/api/v1/data/$TABLE_NAME/file/$UPLOAD_ID \
  -H 'x-api-key:  $AITO_API_KEY'

Response

{
  "id": "0a70fda6-5ece-49a6-95f7-ff2a9b695247",
  "status": "started"
}

The response will tell you if the file upload has been started succesfully.

4. Poll the file processing status

If you want to know if your file has been uploaded into Aito, you can use the following request. You will again need the upload ID, you got as a response in the first step.

Request

curl -X GET \
  $AITO_INSTANCE_URL/api/v1/data/$TABLE_NAME/file/$UPLOAD_ID \
  -H 'x-api-key: $AITO_API_KEY'

Response

{
	"errors": {
		"message": "Last 0 failing rows",
		"rows": null
	},
	"status": {
		"totalDurationMs": 1586146,
		"phase": "AitoDatabaseInsert",
		"finished": true,
		"completedCount": 891,
		"lastSuccessfulElement": {
			"Age": 32.0,
			"Cabin": null,
			"Embarked": "Q",
			"Fare": 7.75,
			"Name": "Dooley, Mr. Patrick",
			"Parch": 0,
			"PassengerId": 891,
			"Pclass": 3,
			"Sex": "male",
			"SibSp": 0,
			"Survived": 0,
			"Ticket": "370376"
		},
		"totalDuration": "26 minutes, 26 seconds and 146 milliseconds",
		"startedAt": "20200707T102426.246Z",
		"finishedAt": "20200707T105052.392Z",
		"throughput": "0.56/s"
	}
}

"finished": true will tell you if the data from the file has been uploaded into Aito.

Batch upload

The batch import can be used to upload multiple entries to a single table. The payload needs to be a valid JSON array (instead of NDJSON).

Note: batch API supports max 10MB payloads.

Request

curl -X POST \
  $AITO_INSTANCE_URL/api/v1/data/$TABLE_NAME/batch \
  -H "x-api-key: $AITO_API_KEY" \
  -H "content-type: application/json" \
  -d '
  [
    {
      "PassengerId": 3,
      "Survived": 1,
      "Pclass": 3,
      "Name": "Heikkinen, Miss. Laina",
      "Sex": "female",
      "Age": 26,
      "SibSp": 0,
       "Parch": 0,
       "Ticket": "STON/O2. 3101282"
      "Fare": 7.25,
      "Cabin": "",
      "Embarked": "S"
    }
  ]'

Response

{
  "entries": 1,
  "status": "ok"
}

The response will tell you the amount of entries uploaded as well as the status of the upload.

Was this article helpful?