From Spreadsheet to Database with ddlgenerator

I’ve been using some fun open data sets recently but I’m more familiar with databases than with traditional data science formats (this one is CSV). Importing CSV into Postgres is fairly well-established, but you need a table structure to import into … and I was looking at a CSV with 50 columns (for my post about exoplanets on the Aiven blog). Enter this neat utility: ddlgenerator.

I used it for a CSV file and put the data into PostgreSQL but on closer inspection I realised that this tool can ingest a great selection of different formats, including JSON, YAML and even HTML. And it can create the table structure for any of the dialects that SQLAlchemy can speak – I expected PostgreSQL and MySQL support but the selection is pretty extensive, including SQLite, SQL Server and even Oracle databases.

The tool itself is so simple it’s almost disappointing! Install it with pip, and it’s a CLI tool.

$ pip install ddlgenerator
$ ddlgenerator postgres planets.csv

The output is the SQL statements to run on the platform you specified (e.g. postgres) to create the table structure for the input data (e.g. planets.csv). I used the built in support to import the CSV file to my new PostgreSQL database, but ddlgenerator can also generate all the INSERT statements you’d need to get SQL statements to put the data in the table as well, so that’s neat.

If you’re also wrangling data, this is one more tool for the box. And now I’ve written it down, I might remember it next time I need it too :)

One thought on “From Spreadsheet to Database with ddlgenerator

  1. That is neat, and definitely one for the toolbox. It reminded me that back in 2008, I had a similar issue with needing to import CSV data in daily batches into an Oracle database. Oracle has a tool called SQL*Loader which is a sort of Swiss-Army-knife data import tool, and I could run it from a cron job every day to pick up any new files and import them. I’ve just had a quick look to see if similar tools exist for other databases, and it seems that there is one for Postgres called pgloader. https://pgloader.io/

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

This site uses Akismet to reduce spam. Learn how your comment data is processed.