How to list the files in an OS directory using PL/SQL

The problem

With the introduction of External Tables in Oracle 9i, it became possible to read host operating system files from within the database. (There was UTL_FILE in Oracle 7.3 and BFILEs in 8i, but these aren't designed for bulk loading operations.) An obvious use for this is in batch data loads, where a file can be parsed, queried and loaded into database tables, perhaps using 9i's MERGE or multi-table insert commands. A user can push a button in a front-end application to launch a batch load process on the database server. At last we could pull the data into the database from PL/SQL, rather than pushing it from some external application such as SQL*Loader. However, one limitation that remains is that you have to know the name of the file in advance. Of course, if there is only one file then there is no problem, you just define the external table to refer to dailysales.dat. The location clause of the external table definition can even include multiple files:

LOCATION ('sales1.dat, sales2.dat, sales3.dat')

but not wildcards:

LOCATION ('sales*.dat') -- Not valid
If the file has a predicably changing definition, such as dailysales_20080726.dat, you can dynamically change the external table definition using something like:

EXECUTE IMMEDIATE
       'ALTER TABLE daily_sales_xt LOCATION (''sales_' || TO_CHAR(SYSDATE,'YYYYMMDD') || '.dat'')';

But that only takes you so far. What if the file doesn't necessarily have today's date, or there are multiple files per day, or they are sequentially numbered? It's a common requirement for multiple files to arrive daily, and after loading they should be moved or renamed. We can do the moving or renaming part using utl_file.frename, but how do we get the list of files to load, short of working through every possible filename sequentially and performing an existence check?

Solution 1

Tom Kyte posted a solution using Java for version 8i back in 2000, in which a Java stored procedure populates a global temporary table. You call the procedure then query the table:

SQL> exec get_dir_list( '/tmp' );
   SQL> select * from dir_list where rownum < 5;

   FILENAME
   ------------------------------------------------------
   data.dat
   .rpc_door
   .pcmcia
   ps_data

Now although this is very effective (and efficient - you can requery the table without having to repeat the call to the OS), it won't work in Oracle Express Edition (XE) because XE doesn't support Java stored procedures. It also bothers me a little that I have to call a procedure and then query a table, when it feels neater to me just to call a function and get the results straight back in a collection. I'm sure this is possible using a Java stored function, but converting between a Java array and an SQL collection type is not straightforward if (like me) you don't know Java. I wish there were a way to do this in good old PL/SQL.

Solution 2

Well, there is. Chris Poole discovered that the undocumented DBMS_BACKUP_RESTORE package, which appeared in 10g as part of the RMAN backup utility, contains a procedure named SEARCHFILES that lists all files for a specified directory. It doesn't use the standard directory object interface standard across other file utilities, or even the old utl_file_dir init.ora parameter that limits access to a named list of directories, but just lists all the files in any directory you ask for. Yikes! Anyway, after a successful call to this procedure, the undocumented fixed table X$KRBMSFT will contain the names of all the files in the specified directory (and any subdirectories, recursively). This works in Oracle 10.2 and 11.1, including XE.

Chris' solution involves a package named XUTL_FINDFILES. The package uses standard named directory objects rather than literal OS paths, thus imposing a layer of security since directory objects are subject to grants. You call xutl_findfiles.search_for('directory_name'), then query the view findfiles_listing to see the file listing. The view is defined using an application context whose value is reset by the search_for procedure, so it always reflects the directory you most recently searched. It also parses out the file name from the path, and uses dbms_lob.getlength() to look up the size of each file like the traditional ls -l or dir commands. Due to the limitations of fixed tables such as X$KRBMSFT, he defines all of the objects in the SYS schema (he does mention it's just a demo).

Although this is a nice, working solution, I still wish we didn't have to do the two-step process to access, essentially, the attributes of an OS object. I'd also prefer to create as few objects in the SYS schema as possible.

Solution 2.1

To do: describe how to do this with external tables. Decide whether to scrap the section about XUTL_FINDFILES now that www.chrispoole.co.uk appears to be abandoned.