Database Shellscripts Considered Harmful

Why You Should Stop Scripting and Start Designing Applications

As a database application designer and developer I am constantly dismayed to find complex database logic embedded in Unix shell scripts and SQL*Plus scripts. However when I suggest that this is not a good approach, the reaction is often along the lines of "but it works", "it's simple", "everyone uses scripts" and "why the hell not?". A response on one forum was "Haven't you heard of ETL?" suggesting (I think) that OS scripting is the only way to write a warehouse file loading process.

This article is an attempt to explain why I believe that scripting is in general unsuitable for database applications (with a couple of possible exceptions, which I'll mention at the end). It is written with Oracle and shell scripts in mind, but I believe the principles apply equally to any database and any scripting language.

Some problems with scripts

On the plus side

Scripts can occasionally be valuable, for example when you need to load multiple OS files and do not know their names in advance - although even then I would try to keep processing within the database if I could, for example by using a stored Java procedure to retrieve the file names, or having an OS scheduled task periodically write the name of the next file to load into a text file, and defining an external table over that. If I did use a script, it would simply load files via SQL*Loader and perhaps call one or more PL/SQL procedures - it would not contain any business rules, and it would be structured so that it either succeeded or failed.

There are doubtless other cases where they are useful (system administration tasks such as backups are another one - and the kind of simple, portable utilities that run in SQL*Plus are of course invaluable), but you should never need a shell script, as part of a production application, to go into SQL*Plus, run queries and make decisions about how to proceed based on the results, because this is to embed business rules in a non-portable, non-visible, dynamic, inefficient, insecure client application. The fact that you see this type of thing a lot does not make it a way to design applications.

Life without scripts

If you have a job scheduler, each task can be implemented as a call to a PL/SQL procedure, without the need for a custom script. DBI-enabled Perl can make a database connection and call a procedure without the need for SQL*Plus, which should make a PL/SQL-only scheduler interface possible. Note particularly that a PL/SQL definer rights procedure (the default kind) will run against the schema it is installed in, even when called by another user. Therefore it is perfectly feasible to have a BATCH or SCHEDULER Oracle user that has no objects of its own or permission to create them, but is simply granted EXECUTE permission on relevant packages, procedures and functions.

From Oracle 9i, loading data can be accomplished by defining external tables, which have a normal table definition plus some additional clauses similar to SQL*Loader controlfile syntax; when you query the table, the OS file is read. This means you can pull data in from within the database rather than having to push it from the OS. It also means you can use all the power of SQL (DISTINCT, ORDER BY, MERGE, joins, analytic functions etc) in one pass through the OS file. You can even define a materialized view encapsulating all this logic, and your database staging process then becomes a matter of refreshing a materialized view. One advantage of this is that Oracle already provides the whole TRUNCATE - INSERT APPEND logic so you don't need to reinvent it, and afterwards the standard dependency and source viewing tools tell you when it was last refreshed and what external table it used.

© William Robertson 2005 www.williamrobertson.net