Skip to content

Import Dataset

graph LR
  A(UI)     --> |HTTP| D;
  B(Python) --> |HTTP| D(Data Service);
  D         --> |JDBC| E[(Data Database)];

A user wants to import a static dataset (e.g. from a .csv file). In this action, a table will be created in the database.

Importing a dataset required at least write-own access, see Database Access. If you are the owner of the database, you are good to go by default.

UI

Click on "Create Table" in the database toolbar at the top. Then give the table a name, optional description (this can be added at a later point as well) and set the visibility settings for transparency and insights. In this example the dataset will be fully visible to the world.

In the next step, provide the dataset structure, the default will be sufficient for most cases.

Select the column separator according to your CSV file (open it in a text editor to make sure, this prevents most errors).

The first line of a CSV usually contains the column names, if that is not the case, select "Data only" to indicate that your CSV does not have a first-line with headers.

Values in a CSV usually are in double quotes when they contain the separator, if your CSV uses a different quote encoding, select the correct one.

Many CSV have a newline character \n at the end of the line. If you are using a special newline encoding, select the correct one from the box.

Finally, select the CSV dataset, it will upload the dataset automatically and analyse the contents to recommend the table structure.

Next, confirm or correct the dataset schema that has been automatically recommended. For example, change the data type if it was incorrectly analysed. You need to select one or more columns to be the primary key that must contain a unique (combination of) values. Typically, this will be a column named id or similar.

The import settings in the import page already takes over the settings from the previous page. You need to click "Import Data". The table now contains the dataset.

Python

Python Compatibility

Ensure that you use the same Python library version as the target instance. For example: if you see 1.9.2 in the bottom left, you need to use the 1.9.2 Python library.

You can import a dataset from a pandas DataFrame via our Python library.

  • Table from Dataset
from dbrepo.RestClient import RestClient
from pandas import DataFrame

df = DataFrame({'some_col': 123})

client = RestClient("http://<hostname>", username="foo", password="bar")
table = client.create_table(<database_id>,
                            "Cool Table",
                            is_public=True,
                            is_schema_public=True,
                            dataframe=df)
print(f"table id: {table.id}")
  • Import Data into existing Table
from dbrepo.RestClient import RestClient
from pandas import DataFrame

df = DataFrame({'some_col': 123})

client = RestClient("http://<hostname>", username="foo", password="bar")
client.import_table_data(<database_id>,
                         table_id='4ce60952-13d3-430f-a2ad-93e4759542a0',
                         dataframe=df)