Search keeps your database clean
  • 22 Sep 2020
  • 6 Minutes To Read
  • Print
  • Share
  • Dark
    Light

Search keeps your database clean

  • Print
  • Share
  • Dark
    Light

In this tutorial, you will learn how to utilize Aito's search capabilities to find correct company information even with typos in the name or a changed address.

Before you begin

Getting an Aito instance

If you want to follow along with this tutorial, you'll have to go and get your own Aito instance from the Aito Console.

  1. Sign in or create an account, if you haven't got one already.
  2. In the Aito Console go to the instances page and click the "Create an instance" button.
  3. Select the instance type you want to create and fill in needed fields, Sandbox is the free instance for testing and small projects. Visit our pricing page, to learn more about the Aito instance types.
  4. Click "Create instance" and wait for a moment while your instance is created, you will receive an email once your instance is ready.

If you're totally new to Aito the get started guides might be useful before reading this tutorial.

Accessing instance details

After the instance creation is ready, you can access the URL and API keys of the instance by

  1. Log in to Aito Console
  2. Click on the instance your created
  3. Go to the overview page. You can copy the API keys after clicking the eye icon.

api_info

Get the Aito Command Line Interface tool

  1. Get the Aito CLI tool if you don't have it yet (you'll need Python3.6 or higher installed)
pip install aitoai
  1. Configure the CLI using the instance URL and read/write API key.
aito configure

Use case: Keeping your database clean

Let’s say you want to add a new company, “ABC Holdings”, in your customer database. Which version would you type in?

  1. ABC Holdings - Street 15, New York
  2. ABC Holdings Inc - 15 Street, New York
  3. ABC Holdings, Inc. - 15 Street, NY

You probably have a standard format you always use. But is it the same as what your colleagues use?

Now we get to the problem: How do we know for sure whether or not this company has already been contacted by our colleagues?

Unfortunately, the common solution is to implement strict and rigorous guidelines for adding new entries. Normally these systems consist of endless drop-down menus and dozens of fields that require you to fill in the information you had no idea existed.

A more pleasant solution is to implement a search that finds possible duplicates when a new entry is being registered. A simple keyword search helps a lot already, but for a more reliable and robust search, you’ll need to consider the relationships between data points and their different variations. That’s when you need machine learning. This tutorial shows you how to get it done with Aito and it’ll take less than ten minutes.

Data

The dataset used in this tutorial is the USA public catalog. It contains the basic information of 3745 American companies in a tab-separated .txt format. You can find the cleaned-up .csv version which is used in this tutorial here.

ID Name Zip_Code Street Building City State Number
1904 ABRAHAM & CO., INC. 829452 3724 47TH STREET - GIG HARBOR WA 98335
2303 ROSPERA FINANCIAL SERVICES, INC. 828164 5429 LBJ FREEWAY SUITE 400 DALLAS TX 75240
2554 AEI SECURITIES, INC. 816750 1300 WELLS FARGO PLACE 30 SEVENTH STREET ST. PAUL MN 55101-4901
... ... ... ... ... ... ... ...

Pretty normal looking stuff. Before you get to try out the search, you need to upload the dataset into your Aito instance to serve as the learning data.

Overview

The workflow will consist of two steps:

  1. Upload data into Aito.
  2. Search for companies.

#1 Upload data

  1. Download the example dataset from here.
  2. Upload the data using the quick-add-table feature in the Aito CLI
aito quick-add-table --file-format csv --table-name company_info company_info.csv

#2 Keeping your database clean

Check the data

The first thing you should do after uploading is to have a quick look at your data to check for any errors or shenanigans. You can run the following commands on any cURL friendly terminal but using a REST client like Insomnia is way more convenient. Remember to replace the API URL and keys with those of your own instance. Here's the first cURL with our public instance:

curl -X POST \
https://public-1.aito.app/api/v1/_query \
-H 'content-type: application/json' \
-H 'x-api-key: bvss2i2dIkaWUfBCdzEO89LpxUkwO3A24hYg8MBq' \
-d '
{
  "from": "company_info",
  "limit": 1
}'

And the response below looks all good:

{
  "Building": "SUITE 210",
  "City": "CHERRY HILL",
  "ID": 9319,
  "Name": "BCG SECURITIES, INC.",
  "Number": "08002",
  "State": "NJ",
  "Street": "51 HADDONFIELD ROAD",
  "Zip_Code": 812680
}

Now onto the fun part!

Search for a company

Aito offers the _similarity API endpoint specifically designed for identifying similar entries in the database.

Let’s use the above company information and give it a small twist. We’ll leave out some of the data points and remove the “, Inc.” from the company name.

curl -X POST \
https://public-1.aito.app/api/v1/_similarity \
-H 'content-type: application/json' \
-H 'x-api-key: bvss2i2dIkaWUfBCdzEO89LpxUkwO3A24hYg8MBq' \
-d '
{
  "from": "company_info",
    "similarity": {
      "Building": "SUITE 210",
      "City": "CHERRY HILL",
      "Name": "BCG SECURITIES",
      "State": "NJ",
      "Street": "51 HADDONFIELD ROAD"
    },
  "limit": 1
}'

The from clause defines the table you're using for the prediction. Resembles FROM in SQL.

The similarity clause defines the company you're searching for. By using limit you're limiting to show only one result, you can leave it out if you want to see all the search results Aito gives you.

Result

Ta-da! Aito returns the right company information you wanted. As you can see in the response below, Aito also gives it a $score which indicates the strength of the similarity. The similarity calculations done by Aito is based on the term frequency-inverse document frequency (tf-idf). You’ll see the score going much lower when the queries get more difficult. This one was pretty easy.

{
  "$score": 1226332.030858179,
  "Building": "SUITE 210",
  "City": "CHERRY HILL",
  "ID": 9319,
  "Name": "BCG SECURITIES, INC.",
  "Number": "08002",
  "State": "NJ",
  "Street": "51 HADDONFIELD ROAD",
  "Zip_Code": 812680
}

Trial by fire

Now let's make things much more complex. What if the company moved to a completely different location and there’s a typo in the name?

curl -X POST \
https://public-1.aito.app/api/v1/_similarity \
-H 'content-type: application/json' \
-H 'x-api-key: bvss2i2dIkaWUfBCdzEO89LpxUkwO3A24hYg8MBq' \
-d '
{
  "from": "company_info",
    "similarity": {
      "Building": "A 30",
      "City": "NEW YORK",
      "Name": "BCG SECURTY",
      "State": "NY",
      "Street": "92 HELM STREET"
    },
  "limit": 1
}'

Aito still finds the right company. This time the score is significantly lower, as expected, but it’s multiple times larger than the next closest match. You can see more suggestions and their scores in the response by changing the “limit”: 1 in the query to a higher number.

{
  "$score": 15.109117592387861,
  "Building": "SUITE 210",
  "City": "CHERRY HILL",
  "ID": 9319,
  "Name": "BCG SECURITIES, INC.",
  "Number": "08002",
  "State": "NJ",
  "Street": "51 HADDONFIELD ROAD",
  "Zip_Code": 812680
}

There are a lot more scenarios we could try and see how Aito responds, try it yourself!

What's next

What you probably really care about is how would this work with your own data. If you're interested in making your own pipelines using Aito, contact us at hello@aito.ai and tell us about your use case and we can see how we can help to bring your ideas to life!

You can also reach us in Slack.

And by the way, there is a simple UiPath demo for you to play around with. You'll need to enable UiPath Web Activities in the Manage Packages console. Have fun!

Was This Article Helpful?