As every year, at the end of summer I updated the lectures for the course called Database Systems in Practice that we teach at universities. This time I decided to radically update our computer workshop material. For my students, I have the same environment prepared on various database servers. The students look for errors in the relational model, write scripts for changes in the schema, compare different software engineering practices and their suitability for developing data-oriented systems, and finally, test the behaviour of individual systems while parallel processing user queries.
My task was not too complicated: on a Linux server to install as many different database servers as possible and configure each of them as an environment that is easy to administer and restore. At the same time, each student’s environment had to be isolated to prevent any interference with the work of others. I was also looking for one easy-to-use development tool for all the database servers.
When preparing the environment, I was surprised by how much the individual database servers had changed over the last few years. My surprise was such that I decided to share it with you.
I did not feel like wading through the licence quagmire—like whether the server could be used for teaching and who owns the hardware and other niceties invented by individual licence holders. To avoid that, I picked either the express or community editions of servers. In the end, I chose Oracle 18c XEXE, PostgreSQL 10, Microsoft SQL Server 2017, and MySQL 15 .7. All these servers offer easy installation using rpm packages. The biggest surprise for me was the installation of MS SQL, which was the easiest. Hats off to Microsoft developers who, it seems, decided that for Linux they would do it from the ground up, and well. All the servers, except Oracle, have a connection to the shared rpm repository that allows easy application of patches. Only for Oracle you need to download installation packages directly from Oracle Web. I was looking for a universal client capable of working with all servers. I needed an application with a sufficiently open license that worked in Windows as well as in Linux or Mac OS. I considered Oracle SQL Developer, a native client for Oracle that also supports the JDBC of other parties. SQL Developer supports the administration of data objects and SQL commands, but it has problems with quite simple scripts. All the servers have command line clients (sqlplus, pgsql, isql or mysql) allowing them to run scripts, but this is not sufficient for effective development. In the end, the DBeaver Community Edition appeared to be the best choice.
All the above-mentioned database servers support similar logic regarding disk space management: a physical definition of storage space on the disk and logical re-distribution for individual databases and schemas. But the specific implementation of this logic is very different. Then on top of that sits a more or less ANSI-compatible structure of tables and other database objects. The differences between the solutions mean that creating databases and backing-up or copying databases are done differently on each server. For example, PostgreSQL supports template-based database generation. Also, MSSQL does simple database generation using a model; but in this case, the model is intended for tempdb used when starting the server and it is not suitable for modifying into user databases. Oracle only allows the creation of multiple databases in version 12 and newer. But this does not pose a problem, since instead of having several databases, we can use multiple database schemas in a single database—which in Oracle is a natural way of separating users.
The network configuration and the way users are connected is very different. Only MS SQL did not need access to configuration files to change the port that the server listens on. Both PostgreSQL and MySQL stored all the essential rules for the user connection in files. In both cases, I found inconsistencies with the documentation. Unlike Oracle and MSSQL, both Postgres and MySQL servers can control user access based on the user’s IP address.
The database catalogue—tables and views describing the database structure—is different for each database. Also, you cannot totally rely on the existence of a normalized INFORMATION_SCHEMA. It is not supported by Oracle and the native database catalogues contain much more of the information needed for effective development than the universal INFORMATION_SCHEMA. Each database has its own commands for the description of data structures—for example, Oracle DESCRIBE, PostgreSQL \d, MySQL SHOW, MSSQL sp_help. With the exception of MSSQL, these commands are implemented at the client level. If using a non-standard client, it is necessary to query the database catalogue directly.
Database programming would deserve its own chapter. Each database has its own programming language: Oracle has PL/SQL, MSSQL uses TransactSQL, PostgreSQL has PL/pgSQL, while MySQL uses SQL/PSM standard with some exceptions. Additionally, the databases support other languages linked to their architecture. In Oracle you can write procedures in C, MSSQL supports .NET, and Postgres knows PERL and Python. In my experience, the freedom to choose from different languages is more of a complication than an advantage. The use of non-native languages leads to excessive reliance on the operating system, it increases demands on the development team, and it tempts developers to use custom code instead of the code already available in the database when implementing various functional and non-functional requirements. And all that only distracts from the main goal: a well-designed logical and physical database model.
The task of trying out database servers in workshops and simple student projects revealed that there are no major differences between the servers I have tested. A few years ago, research companies such as Gartner® and Forrester® reported that all these databases are suitable for deployment in corporate production environments. This means that when developing applications, it is not necessary to pay so much attention to requirements concerning scalability, operations, and administration. We encounter these databases in our customers’ production environments, where we have seen both their benefits as well as their drawbacks.
I myself was curious about what the students would come up with in the workshops. Namely, what criteria the students would pick for evaluating the servers and what servers they would prefer. The results can be summarized in three points:
- Students appreciated the opportunity to evaluate various technologies. They were surprised by how different the individual systems are and how different the code is when writing SQL queries.
- Even undergraduate students preferred data servers that they had previous experience with, gained from projects or employment. Probably because data-centric development requires considerable experience and knowledge specific to the particular type of data server.
- Some sort of universal development environment simply does not exist. If the development is even the slightest bit complex, its necessary to use a specialized client and specialized tools.