Saturday, 16 February 2019

Importing CSV files into SQLite

So this is another diversion from Python itself, but given how Python can interact with both SQLite and CSV files, I think it still fits into this blog's range of topics.

Previously I have debated SQLite vs CSV (Comma Separated Values) and came down on the side of SQLite. However, a lot of data is collected as spreadsheets - a quick and easy way to do tables of data, and tables are at the heart of how SQLite stores data. There is a method.
The example I will be using is for a database of Dungeons and Dragons things, including characters, monsters, spells and magic items. I have a spreadsheet (actually OpenOffice, .ods) from an older project where I painstakingly inputted over 800 spells. Rather than duplicate that effort with data entry in SQLite, it seems better to import the sheet into SQLite

Firstly there needs to be an appropriate SQLite table in the SQLite database. Usually the nature of the system and the problem  means that you are importing into an existing one, but just a warning that you can't import into an SQLite table that doesn't exist yet.
The data for the SQLite table "spell" is show in a program that makes use of the PRAGMA SQLite command.
Table: spell
============
(0, 'SpellName', 'text primary_key', 0, None, 0)
(1, 'Source', 'text', 0, None, 0)
(2, 'CasterClass', 'text', 0, None, 0)
(3, 'SpellLevel', 'integer', 0, None, 0)
(4, 'Description', 'text', 0, None, 0)
Number of records in spell = 2


So now we make sure that the spreadsheet has a similar layout.
Here are some sample rows
Caster/Domain Spell Name Level Brief Info Source
Air Obscuring Mist 1 Fog surrounds you Player's Handbook
Air Wind Wall 2 Deflects arrows, smaller creatures & gases Player's Handbook
Air Gaseous Form 3 Subject becomes insubstantial and can fly slowly Player's Handbook
Air Air Walk 4 Subject treads on air as if solid (climb at 45-degree angle) Player's Handbook
Air Control Winds 5 Change wind speed and direction Player's Handbook
Air Chain Lightning 6 1d6 damage/level, secondary bolts Player's Handbook
This does not use exactly the same column headings or order, so let's rearrange the columns  in the sheet. We also remove the heading rows (otherwise they will be imported as data). I decided to use semicolons as the delimiter between cells, so I use Find/Replace to remove any confusing semicolons from the data (replacing them with commas). Then we Save As... a CSV file. 
OpenOffice asks a few questions when saving from a spreadsheet to a CSV, including "Do you want to quote text values?" and I said yes. I hope this is right.

So if I now open up the CSV file using a text editor that than a spreadsheet program I get
"Obscuring Mist":"Player's Handbook":"Air":1:"Fog surrounds you"
"Wind Wall":"Player's Handbook":"Air":2:"Deflects arrows, smaller creatures & gases"
"Gaseous Form":"Player's Handbook":"Air":3:"Subject becomes insubstantial and can fly slowly"
"Air Walk":"Player's Handbook":"Air":4:"Subject treads on air as if solid (climb at 45-degree angle)"
"Control Winds":"Player's Handbook":"Air":5:"Change wind speed and direction"
"Chain Lightning":"Player's Handbook":"Air":6:"1d6 damage/level, secondary bolts"

Actually those look suspiciously like colons, not semicolons. But it doesn't matter in this case.
So here comes the meat of the matter. I'm not going to do this in Python but I'll go direct to the SQLite command line using Windows Command Prompt.

Microsoft Windows [Version 10.0.17134.590]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\Users\pc>cd C:\sqlite
C:\sqlite>sqlite3
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .separator :
sqlite>
sqlite> .open db/cardshuffle.db
sqlite> .tables
character  charlevel  event      location   magicitem  monster    spell
sqlite> .import C:/Users/pc/Documents/Misc D&D/SpellList2.csv spell
Usage: .import FILE TABLE
sqlite> .import "C:/Users/pc/Documents/Misc D&D/SpellList2.csv" spell
sqlite>


This is rather arcane knowledge (no, not the D&D spells, but the SQLite command line interface).
These are not commands that are executable via Python - they are SQLite interface commands.
So now when I run the same program as I did to show the metadata for the columns, I get:
Table: spell
============
(0, 'SpellName', 'text primary_key', 0, None, 0)
(1, 'Source', 'text', 0, None, 0)
(2, 'CasterClass', 'text', 0, None, 0)
(3, 'SpellLevel', 'integer', 0, None, 0)
(4, 'Description', 'text', 0, None, 0)
Number of records in spell = 814

The bottom line shows that there are now 814 records, and since there were 812 lines in the CSV file, I feel confident it has been imported as intended.

Let's try this again with the monsters table.  The metadata/PRAGMA for this is:
Table: monster
==============
(0, 'MonName', 'text primary_key', 0, None, 0)
(1, 'Source', 'text', 0, None, 0)
(2, 'CR', 'integer', 0, None, 0)
(3, 'MonType', 'text', 0, None, 0)
(4, 'Description', 'text', 0, None, 0)
(5, 'Image', 'text', 0, None, 0)
Number of records in monster = 2


And the first few lines of the spreadsheet are:
Name Type Basic HD CR Alignment Source
Spectral Plant Plant 0 0
CC2: DM
Bat Animal 0.25 0.1 TN MM
Toad Animal 0.25 0.1 TN MM
Floating Eye Beast 0.5 0.1 TN Tome of Horrors

Here the columns don't quite match up. In the spreadsheet, Name, Type, CR and Source are the equivalent of the SQLite table columns of MonName, MonType, CR and Source. However, the SQLite table has no column for Alignment or for HD, while the spreadsheet has no column for Description or Image. Also CR in the spreadsheet can be a decimal fraction (effectively a floating point number) while in the SQLite table it must be an integer (whole number).

Can this be done? Should it be done? These are judgements that should be made by the developer and maybe end user. Let's go ahead.
So I shuffle the columns around, remove the unwanted header row, round the floating point numbers down to integers, delete the unwanted columns and save it as a CSV file. Opening it up in a text editor, the first few lines are:
"Spectral Plant";"CC2: DM";0;"Plant"
"Bat";"MM";0;"Animal"
"Toad";"MM";0;"Animal"
"Floating Eye";"Tome of Horrors";0;"Beast"
"Centipede, Tiny";"MM";0;"Vermin"
"Rat";"MM";0;"Animal"

So I try importing this on the SQLite command line interface.
Microsoft Windows [Version 10.0.17134.590]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\Users\pc>cd C:\sqlite
C:\sqlite>sqlite3
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open db/cardshuffle.db
sqlite> .tables
character  charlevel  event      location   magicitem  monster    spell
sqlite> .separator ;
sqlite> .import "C:/Users/pc/Documents/Misc D&D/MonsterList.csv" monster
C:/Users/pc/Documents/Misc D&D/MonsterList.csv:1: expected 6 columns but found 4 - filling the rest with NULL
C:/Users/pc/Documents/Misc D&D/MonsterList.csv:2: expected 6 columns but found 4 - filling the rest with NULL
C:/Users/pc/Documents/Misc D&D/MonsterList.csv:3: expected 6 columns but found 4 - filling the rest with NULL
C:/Users/pc/Documents/Misc D&D/MonsterList.csv:4: expected 6 columns but found 4 - filling the rest with NULL

….
sqlite>

Actually it produces a lot more warning messages - not error messages, because it is still partially completing the tasks and continuing with the overall process.  
And the results (still in SQLite command line interface)?

sqlite> .headers
Usage: .headers on|off
sqlite> .headers on
sqlite> .mode columns
sqlite> SELECT * FROM monster WHERE MonType ="Giant";
MonName     Source      CR          MonType     Description  Image
----------  ----------  ----------  ----------  -----------  ----------
Rock Troll  Drag286     1           Giant
Ogre        MM          2           Giant
Giant, Pha  Faerun      3           Giant
Troll, Ice  Tome of Ho  3           Giant
Troll, Swa  Tome of Ho  3           Giant
Celestian   CC          4           Giant
Tuscar      CC2: DM     4           Giant
Troll, For  Mon Man 3   4           Giant
Steppe Tro  CC          5           Giant
Fensir      Fiend Foli  5           Giant
Ettin       MM          5           Giant
Troll       MM          5           Giant
Ogre, Skul  Mon Man 3   5           Giant
Giant, Woo  Tome of Ho  5           Giant
Troll, Cav  Tome of Ho  5           Giant
Aberrant    Tome of Ho  5           Giant
Flayed Gia  CC2: DM     6           Giant
Surged Gia  CC2: DM     6           Giant
Giant, Bog  Fiend Foli  6           Giant
Troll, Roc  Tome of Ho  6           Giant
Troll, Two  Tome of Ho  6           Giant
Giant, Smo  Tome of Ho  6           Giant
Qor         Liber Best  7           Giant
Giant, Hil  MM          7           Giant
Troll, Cry  Mon Man 3   7           Giant
Gaurak Tro  CC2: DM     8           Giant
Giant, Fog  Faerun      8           Giant
Fensir, Ra  Fiend Foli  8           Giant
Giant, Sto  MM          8           Giant
Ogre Mage   MM          8           Giant
Ogre, Skul  Mon Man 3   8           Giant
Troll, Cav  Mon Man 3   8           Giant
Steel Gian  CC          9           Giant
Half-Troll  Fiend Foli  9           Giant
Giant, Fro  MM          9           Giant
Giant, Cav  Tome of Ho  9           Giant
Stick Gian  CC          10          Giant
Forest Wal  CC2: DM     10          Giant
Giant, Fir  MM          10          Giant
Giant, San  Mon Man 3   10          Giant
Troll, Mou  Mon Man 3   10          Giant
Giant, Sea  Tome of Ho  10          Giant
Giant, Clo  MM          11          Giant
Fomorian    Mon Man 2   11          Giant
Giant, For  Mon Man 2   11          Giant
Giant, Bro  Tome of Ho  11          Giant
Firbolg     Mon Man 2   12          Giant
Giant, Sun  Mon Man 2   12          Giant
Biclops     Tome of Ho  12          Giant
Giant, Sto  MM          13          Giant
Menkau (Sa  Necropolis  14          Giant
Giant, San  Tome of Ho  14          Giant
Giant, Vol  Tome of Ho  14          Giant
Giant, Eld  Mon Man 3   15          Giant
Giant, San  Mon Man 3   15          Giant
Biclops He  Tome of Ho  15          Giant
Giant, Dea  Mon Man 3   16          Giant
Giant, Fer  Tome of Ho  17          Giant
Giant, Jac  Tome of Ho  17          Giant
Giant, Sha  Fiend Foli  18          Giant
Giant, Oce  Mon Man 2   19          Giant
Geriviar    Mon Man 3   19          Giant
Giant, Eld  Mon Man 3   20          Giant
Giant, Mou  Mon Man 2   26          Giant
SQLite>


That's good enough for me. I can fill in the other columns as and when.