Beginner

For most serious applications, you will often have to have persistent storage (storage that still exists after your applications stops running) of some sort.  For new developers, it can be quite daunting to decide which option to go for.  Is a simple flat file enough?  When should you use something like a database?  Which database should you use?  There are so many options that are available it becomes quite daunting to decide which way to go for.  

This is a starting guide to provide an overview of some of the many data storage options that are available for you and how you can go about deciding.  One thing to keep in mind is that if you are developing an application which is either planned or has a possibility to scale over time, your underlying database might also grow overtime.  It may be quick and easy to implement a file as storage, but as your data grows it might be better to use a relational database but it will take a little bit more effort.  Let’s look at this a bit deeper

What are the possible ways to store data?

There are many methods of persistent storage that you can use (persistent storage means that after your program is finished running your data is not lost).  The typical ways you can do this is either by using a file which you save data to, or by using the python pickle mechanism.  Firstly I will explain what some of the persistent storage options are:

  • File: This is where you store the data in a text based file in format such as CSV (comma separated values), JSON, and others
  • Python Pickle: A python pickle is a mechanism where you can save a data structure directly to a file, and then you can retrieve the data directly from the file next time you run your program.  You can do this with a library called “pickle”
  • Config files: config files are similar to File and Python Pickle in that the data is stored in a file format but is intended to be directly edited by a user
  • Database SQLite: this is a database where you can run queries to search for data, but the data is stored in a file
  • Database Postgres (or other SQL based database): this is a database service where there’s another program that you run to manage the database, and you call functions (or SQL queries) on the database service to get the data back in an efficient manner.  SQL based databases are great for structured data – e.g. table-like/excel-like data.  You would search for data by category fields as an example
  • Key-value database (e.g redis is one of the most famous): A key-value database is exactly that, it contains a database where you search by a key, and then it returns a value.  This value can be a single value or it can be a set of fields that are associated with that value.  A common use of a key-value database is for hash-based data.  Meaning that you have a specific key that you want to search for, and then you get all the related fields associated with that key – much like a dictionary in python, but the benefit being its in a persistent storage
  • Graph Database (e.g. Neo4J): A graph database stores data which is built to navigate relationships.  This is something that is rather cumbersome to do in a relational database where you need to have many intermediary tables but becomes trivial with GraphQL language
  • Text Search (e.g. Elastic Search): A purpose built database for text search which is extremely fast when searching for strings or long text
  • Time series database (e.g. influx): For IoT data where each record is stored with a timestamp key and you need to do queries in time blocks, time series databases are ideal.  You can do common operations such as to aggregate, search, slice data through specific query operations
  • NOSQL document database (e.g. mongodb, couchdb): this is a database that also runs as a separate service but is specifically for “unstructured data” (non-table like data) such as text, images where you search for records in a free form way such as by text strings.

There is no one persistent storage mechanism that fits all, it really depends on your purpose (or “use case”) to determine which database works best for you as there are pros and cons for each.

SetupEditable outside PythonVolumeRead SpeedWrite SpeedInbuilt Redundancy
File None – you can create a file in your python codeFor text basedSmallSlowSlowNo – manual
Python PickleNone- you can create this in your python codeNo – only in pythonSmallSlowSlowNo – manual
Config FileOptional.  You can create a config file before handYes – you can use any text based editorSmallSlowSlowNo – manual
Database SQLiteNone – database created automaticallyNo – only in pythonSmall-MedSlow-MedSlow-MedNo – manual
Relational SQL DatabaseSeparate installation of serverThrough the SQL console or other SQL clientsLargeFastFastYes, require extra setup
NoSQL Column DatabaseSeparate installation of serverYes, through external  clientVery largeVery fastVery fastYes, inbuilt
Key-Value databaseSeparate installation of serverYes, through external  clientVery largeVery fastFast-Very FastYes, require extra setup
Graph DatabaseSeparate installation of serverSeparate installation of serverYes, through external  clientLargeMedMedYes, require extra setup
Time Series DatabaseSeparate installation of serverYes, through external  clientVery largeVery fastFastYes, require extra setup
Text Search DatabaseSeparate installation of serverYes, through external  clientVery largeVery fastFastYes, require extra setup
NoSQL Documet DBSeparate installation of serverYes, through external  clientVery largeVery fastFastYes, require extra setup 

A big disclaimer here, for some of the responses, the more accurate answer is “it depends”.  For example, for redundancy for relational databases, some have it inbuilt such as Oracle RAC enterprise databases and for others you can set up redundancy where you could have an infrastructure solution.  However, to provide a simpler guidance, I’ve made this a bit more prescriptive.  If you would like to dive deeper, then please don’t rely purely on the table above!  Look into the documentation of the particular database product you are considering or reach out to me and I’m happy to provide some advice.

Summary

 There are in fact plenty of SaaS-based options for database or persistent storage that are popping up which is exciting.  These newer SaaS options (for example, firebase, restdb.io, anvil.works etc) are great in that they save you time on the heavy lifting, but then there may be times you still want to manage your own database.  This may be because you want to keep your data yourself, or simply because you want to save costs as you already have an environment either on your own laptop, or you’re paying a fixed price for a virtual machine.  Hence, managing your own persistent storage may be more cost effective rather than paying for another SaaS.   However, certainly don’t discount the SaaS options altogether, as they will at least help you with things like backups, security updates etc for you.

Join the Python Insiders Group and get FREE tips in your inbox
Also, when you subscribe, we will send you a list of the most useful python one liners which will help you save time, make your code more readable, and which you can use immediately in your code! Subscribe to our email list and get the list now!