Cardinality

Reports on how unique a column is. For example, 90% of its values might be unique, with the remaining duplicates appearing in groups of twos and threes. This is useful when investigating a system that doesn't have a full set of unique constraints, or just for analyzing data in general. (I've also implemented this as a PL/SQL Developer right-click option using Browser Extender.)

-- Report on cardinality of column x in table y
-- William Robertson 2003, www.williamrobertson.net
--
-- e.g:
-- SQL> @cardinality object_type dba_objects
--
--  Occurrences Groups with this cardinality   Total rows Proportion
-- ------------ ---------------------------- ------------ ----------
--            1                            1            1          0
--            2                            2            4          0
--            3                            1            3          0
--            4                            2            8          0
--            6                            1            6          0
--            8                            1            8          0
--           10                            1           10          0
--           11                            1           11          0
--           22                            1           22          0
--           25                            1           25          0
--           45                            1           45          0
--           52                            1           52          1
--           63                            1           63          1
--           81                            1           81          1
--           84                            1           84          1
--          335                            1          335          4
--          345                            1          345          4
--          507                            1          507          6
--          583                            1          583          6
--          855                            1          855          9
--          991                            1          991         11
--        1,447                            1        1,447         16
--        1,578                            1        1,578         17
--        2,131                            1        2,131         23
--                                           ------------ ----------
--                                                  9,195        100
--
-- This tells us that there is only one objects whose type is unique (it turns out that
-- on my system I have exactly one evaluation context). These show as 0% because the
-- percentages are rounded. There are also two object types that occur twice each, and so on,
-- with the most common type (as it happens, 'VIEW') occurring 2,131 times, accounting
-- for 23% of all values.
--
-- Compare this with:
-- SQL> @cardinality object_name all_objects
--
--  Occurrences Groups with this cardinality   Total rows Proportion
-- ------------ ---------------------------- ------------ ----------
--            1                        4,243        4,243         46
--            2                        1,416        2,832         31
--            3                          232          696          8
--            4                           33          132          1
--            5                            2           10          0
--            6                            2           12          0
--            8                            1            8          0
--           10                            7           70          1
--           71                            1           71          1
--           89                            3          267          3
--           92                            4          368          4
--          162                            3          486          5
--                                           ------------ ----------
--                                                  9,195        100
--
-- SQL> @cardinality object_id
--
--  Occurrences Groups with this cardinality   Total rows Proportion
-- ------------ ---------------------------- ------------ ----------
--            1                        9,191        9,191        100
--                                           ------------ ----------
--                                                  9,191        100
--
-- Null values are not counted: e.g. if a million-row table contains a value in one
-- record, this would be reported as one unique value (not one unique value and
-- 999,999 duplicates).

def column = "&1"
def table = "&2"
break on report
comp sum label "" of COUNT(*) total_rows proportion on report
col key_occurs format 999G999G999 hea "Occurrences"
col total_rows like key_occurs hea "Total rows"
col proportion format 990 hea "Proportion"
col groups_this_cardinality format 999G999G999 hea "Groups with this cardinality"

set term off
col notnullclause new_value notnullclause
select case
           when '&column' like '%,%'
           then 'where not (' ||
                replace('&column',',',' is null and ') ||
                ' is null)'
           else 'where &column is not null'
       end as notnullclause
from  dual;
set term on

select key_occurs
     , count(*) as groups_this_cardinality
     , key_occurs * count(*) as total_rows
     , 100 * ratio_to_report(key_occurs * count(*)) over () as proportion
from   ( select count(*) as key_occurs
         from   &table ¬nullclause
         group by &column )
group by key_occurs
/

cl bre comp