How I approached software development and why I prefer PostgreSQL to MySQL

This post is basically a PostgreSQL vs MySQL comparison. Working with MySQL has always been a pain for me, on the other side the experience with PostgreSQL was always just great! That is why I prefer PostgreSQL.

This post is more a rant against MySQL (InnoDB) rather than a technical tutorial (as I was used to write in the previous months). Everything started with this tweet. I'm not going to talk that much about exotic features in a multi server configuration in my comparison, I will try to focus on inconsistencies that might affect relatively simple every-day-applications.

If you are interested just in the MySQL wideness I've found, just jump to the next paragraph. On the other side if you want to hear a nostalgic story to get a bit more of a context continue reading.

When I approached web development was the 2001 (I was 18). In reality I even did not know really well what "software development" means or where I will end up. I learned to code with PHP and to use MySQL, just to build my personal website.

Later I got hired to work in a local web agency. They were using PostgreSQL across the majority of their products. At that time, for me there were no differences between the two databases, the only difference I was noticing was that I had to use pg_connect instead of mysql_connect and the admin panel was "phppgadmin" instead of "phpmyadmin".

Fast forward, to nowadays. I've learned a lot and there is still a lot to learn. I fall in love with PostgreSQL and for its consistency, but really often I have to work on projects that use MySQL and the experience is not that nice.

I've tried to analyze the "why people start projects by using mysql instead of postgres?" My conclusion is just one. Back in the time when I started development (~2001) PostgreSQL was not available on Windows (Mac and Linux based SO were not yet that good). MySQL was easy to install and run. So generations of developers learned to use MySQL, not as "a" database, but as "the" database. In 2005 postgres 8.0 was running natively on windows, but only 2-3 years later become good enough.

In the last two decades I've built and/or maintained a variety of applications, working intensively with both databases. Each time I had to deal with Postgres was always "ordinary work", I was able to focus on the application logic I was developing. On the other hand, most of the time I had to work with MySQL I had always to pay attention to quirks and edge-cases.

MySQL...

MySQL is a great database that contributed to the success of many other projects (included PHP). With a promise of being SQL-compliant and ACID-compliant. But as many of the software out there, what is MySQL today is the product of evolution. Unfortunately MySQL developers did not manage to keep a good balance between stability, consistency and new features.

Many of the parameter configurations available in MySQL 5.7 (the most popular installed version) are about "disabling" some of the weird features that MySQL offers. The even more weird thing that most of the time, the default setting prefers the less-standard value, creating more damage than benefit when they were introduced in first.

Examples:

  • sql-mode is a mixed mess of options aimed to disable/enable some of the weird features were added in previous versions.
    • ALLOW_INVALID_DATES: (enabled by default) MySQL allows you to save a date as "1900-00-00". Why? Which days is "1900-00-00"
    • NO_ZERO_DATE: similarly as in the previous case, in the past (enabled by default) MySQL allows you to save an invalid date as "0000-00-00".
    • explicit_defaults_for_timestamp: to make things more confusing, in MySQL 8.0 there is this new parameter that is somewhere in between of the previous two.
    • ONLY_FULL_GROUP_BY: this option is still used nowadays to be able to retrieve some extra columns when doing a "group by" query; The rows retrieved are not predictable and in more of the case depend on how data are internally stored in MySQL.
  • ENUM is one of the basic data types in MySQL, but it has many trade offs:
    • A column with ENUM('Mercury', 'Venus', 'Earth') allows you to insert records as NULL, "Mercury", "Venus", "Earth" and "" (empty string)... Why ""... ?
  • (this is one of my favorites) Adding a column in a table might (and will in highly concurrent environments) create duplicate rows and the failure of the column addition. This weird behaviour is explained here, the MySQL documentation explains the technical reason of this here.

    When running an online DDL operation, the thread that runs the ALTER TABLE statement applies an “online log” of DML operations that were run concurrently on the same table from other connection threads. When the DML operations are applied, it is possible to encounter a duplicate key entry error (ERROR 1062 (23000): Duplicate entry), even if the duplicate entry is only temporary and would be reverted by a later entry in the “online log”.

    Documenting the bug allows MySQL to not fix it! Nice!

  • MySQL will lock the tables for most of the schema operation as adding columns, removing columns and so on, leading to the inability to write data to the table and in some cases also to read data from it. This is a real pain point if you have tables with a lot of data.
  • When making schema changes (adding/removing tables, columns, indexes and so on) MySQL will automatically commit all the changes... making transactions useless...
  • MySQL allows only one trigger per type/table... making that trigger generally more complex and hard to maintain.

Another fun detail, this year MySQL started to roll-out a new major release, MySQL 8.0. Wait... where is the 6.0 and 7.0?

Good examples

Out there, there are some examples of great ways of using MySQL.

  • Designing Schemaless, Uber Engineering’s Scalable Datastore Using MySQL: In this article Uber explains how they use MySQL to store their trip data.
    They are using MySQL basically as key-value storage (with some nice extras) to store a huge amount of data in an eventually-consistent system. It is not SQL (many operations are just not possible), it is just an application-level managed storage. Scales well and works well for the Uber use case.
    To be honest, I'm not that smart to understand why want they did could not be implemented with any other database (postgres included), since all the storage logic is implemented by the application.

  • MyRocks: A space- and write-optimized MySQL database In this article, the facebook team explains how they have optimized MySQL to make a write-efficient database.
    MyRocks looks to be just a different database built on top of the MySQL storage-engine API. It has weak points, limitations and advantages but is far from being MySQL with the InnoDB storage.

This are just two example of some great software build on top of MySQL. But this cases (and many other) have one point in common. They are not really using MySQL, they are using a subset of it and building a brand new application top of it.

MySQL master-master replication

MySQL can be configured to perform master-to-master replication. This is one of the main selling point of MySQL. Most people I've met, do not know what they are buying.

Master-Master replication is tricky, has many limitations and you lose many of the traditional ACID-compliant relational databases features. Auto-increments are tricky to manage because of possible write conflicts; Unique-keys can not be easily supported and can lead to conflict situations having on different masters duplicate rows; This is an answer to the mysql master-master question i really like.

On the other hand... PostgreSQL!

PostgreSQL, initially developed by the Berkeley University (California) started with a much more formal approach to databases than MySQL. The development process of PostgreSQL is relatively strict, but on the other side PostgreSQL has a very low bug-rate and the features added are always well thought and consistent.

Personally do not remember cases of weird features introduced just for "marketing".

The list of features that PostgreSQL offers is endless. Will try just to point out some of the most common (and that I miss when I have to work with MySQL).

  • SQL-standards: Reading the documentation postgres claims to follow at least 160 out of 179 SQL:2011 mandatory features required for full Core conformance. On the other hand MySQL explicitly says that they are ready to sacrifice standards for speed and reliability (instead of pursuing a way to achieve both goals as postgres does most of the time).
  • Functional Indexes: In Postgres you can use functions (build in or written by you via stored procedures) and build indexes over them. This is just a killer feature most of the time! You can putt all the complex logic in a custom function, index it and your searches will be super fast!
  • Window-Functions: If you have to deal with reporting, this will save your life! Otherwise just do not bother about it. Window functions perform a calculations across a set of table rows that are somehow related to the current row. More info about it here.
  • GIN and GIST indexes: This are special kind of indexes that store on the leafs more than one row-reference. They are a killer feature if you have to store (and search) arrays, geo-data, full-text and similar complex data-sets. Moore info about GIN/GIST indexes is available here.
  • EXPLAIN: The postgres EXPLAIN query feature is just great! You should try it to see the difference. The postgres EXPLAIN query contains detailed index usage, scan and join strategies, timing, buffers and memory usage and much much more.
  • Row level constraints: PostgreSQL in addition to foreign keys and unique indexes offers CHECK constraints. Essentially is a check executed on each update/insert of a row and ensures the consistency of the data. As example
    we can create a check as CHECK (price > 0 AND discount <=100 OR special IS NOT NULL) to check that the "price" column is always greater than zero and the discount is less than 100 only if "special" is not null. The expression inside CHECK can be any valid postgres expression, including checking geo coordinates checks, special indexes and so on.
  • Range types: Postgres indexable data types to represent datetime and numeric ranges.
  • The merge-join strategy: is a special join strategy that is super fast when both tables have the same sorting.
  • Locking: Postgres has many features to avoid locking as example the CREATE INDEX CONCURRENTLY that allows you to add indexes without locking.
  • Adding/removing columns is atomic: adding columns (with default NULL) and removing them is super-fast as it alters only the table metadata (no need to copy the table onto a new one). MySQL in most of the cases has to re-write the table (that might be very expensive if you have millions of rows). When using postgres at-scale in a highly concurrent environment this makes the difference.
  • Full text: MySQL has full text support, but compared to postgres-full-text looks like a toy. PostgreSQL allows you to configure the language of the text, fine-tune the stemming rules for each language, weighted ranking and so on.
  • Spatial data: if you have to work with spatial data (geo data or geometric types/shapes) postgres is just excellent. By default postgres has specific data types to store and query information about "shapes" as squares, triangles, circles spheres cubes and so on. For more advanced use-cases, PostGIS is an extension-set that makes geo-data handling a pleasure (used by OpenStreetMap as example). MySQL introduced a similar feature-set in 5.7 but is much simpler and limited, with postgres you have more operation types built in specialized indexes for read-optimized or write-optimized operations on shapes and much more.

The list of PostgreSQL features I like (and miss when working with MySQL) is endless. What makes postgres so attractive to me is the extension mechanism it has that allows to add extra features to postgres really really easily. Postgres has already dozens of extensions ready to be used and that will make your life easier. Some extensions I use often are bloom (super efficient index types for probabilistic index matching), cube (multi dimensional indexable data type for 3D or more dimensions searches), earthdistance (calculate easily geo distances), hstore (simple, efficient and ACID key-value storage).

PostgreSQL replication

PostgreSQL offers master-slave replication. Works similarly to MySQL. Obviously has some differences but the big picture is similar. Offers classical master-slave replication mode with also hot-standby master mode (essential a "waiting" master ready to take over in case of the main master failure).

Out of the box PostgreSQL does not offer master-master replication. This could sound a big disadvantage.
In my experience, with PostgreSQL takes much more time (and load) to reach a state where more than one master is necessary for performance reasons. (For high availability, the hot-standby master-mode is already part of the postgres core.)

In reality there are third-party solutions for master-master replications as Postgres-BDR that do an excellent job with multi-master solutions!

Conclusion

If there is a rule I've learned in life is that things are never black or white, most of the time are gray.

So probably there are use cases where MySQL does a much better job than PostgreSQL, I just have not met them yet. Would be glad to hear experiences from both sides as in my opinion there is always something to learn.

Would be glad to hear experiences where in your opinion MySQL did really a great job.

Hope you enjoyed this article and the others. If you have any type of feedback, do not hesitate to leave a comment!

Edit, 2018-03-18: I've updated this document based on the feedback received in the past weeks. Thanks to everybody.

php, extensibility, plugins, static-files, images, css, js, modules, software, design

Want more info?