Wednesday, 7 November 2018

Random thoughts on CSV files

Why bother with SQLite when I could use CSV files?
Since restarting this blog a month ago all of my posts have involved SQLite to varying degrees. For those who are expecting to just learn about Python this may be an annoying side-trek. One alternative I discussed in an earlier post is using CSV or text files to hold persistent data. Python can open, read from and write to these files. There are a number of advantages that SQLite has.
  • SQLite is faster when file sizes and quantities of data increase. In terms of speed, SQLite is something of a compromise. For enterprise/corporation level databases, something like Oracle or Microsoft SQL Server are in a whole different league of performance and can cope with millions of records and thousands of tables (see this page for MS SQL Server). Nonetheless, SQLite is better than using Python manipulating CSV files, and it is easier for me to install and run than the enterprise-level SQL programs. 
  • SQLite does a lot of work that you would otherwise need to do in Python. This is something I've appreciated even at this relatively low level. One well-written SQLite command passed by the cursor object can replace a dozen lines of Python code with their associated possible errors. As stated before, SQL is a language itself, and brings with it extra capabilities for managing and searching databases, especially relational databases. The effort put into learning the basics of SQL starts to pay off quickly. 
  • SQLite is professionally more useful, mainly because it uses SQL and a lot of businesses use SQL-based programs and databases rather than CSV files. They might use a different SQL program (like those mentioned above) but the SQL is mostly the same.  
  • Editing data in SQLite is far easier than reading/writing to CSV files, particularly if there is one or two items of data to edit in the middle of the table but other data around it does not need to be changed. For SQLite you can use the UPDATE command. For CSV files it is not easy to rewrite a specific line while leaving the rest of the file alone. At my (limited) skill I would end up reading the whole file into Python, making modifications in Python, then overwriting the entire file with the new version held in Python. Although doable with small files, when you are dealing with big files this can become very inefficient. 
Having said all that there are modules that help Python to interact with CSV files, including the csv module documented here. I probably ought to learn these, but right now SQLite seems better, particularly now I've made the initial investment of time and effort in getting SQLite running in my Python programs. CSV files can also be opened and edited in spreadsheets, unlike SQLite files, thus partially nullifying my final bullet point.
And SQLite, being a language with many nice features, can import data from and export data to CSV files, so if you change your mind after entering big tables of data you can always move your data from one to the other.

No comments:

Post a Comment