Cardinality
Reports on how unique a column is. For example, 90% of its values might be unique, with the remaining duplicates appearing in groups of twos and threes. This is useful when investigating a system that doesn't have a full set of unique constraints, or just for analyzing data in general. (I've also implemented this as a PL/SQL Developer right-click option using Browser Extender.)
-- Report on cardinality of column x in table y
-- William Robertson 2003, www.williamrobertson.net
--
-- e.g:
-- SQL> @cardinality object_type dba_objects
--
-- Occurrences Groups with this cardinality Total rows Proportion
-- ------------ ---------------------------- ------------ ----------
-- 1 1 1 0
-- 2 2 4 0
-- 3 1 3 0
-- 4 2 8 0
-- 6 1 6 0
-- 8 1 8 0
-- 10 1 10 0
-- 11 1 11 0
-- 22 1 22 0
-- 25 1 25 0
-- 45 1 45 0
-- 52 1 52 1
-- 63 1 63 1
-- 81 1 81 1
-- 84 1 84 1
-- 335 1 335 4
-- 345 1 345 4
-- 507 1 507 6
-- 583 1 583 6
-- 855 1 855 9
-- 991 1 991 11
-- 1,447 1 1,447 16
-- 1,578 1 1,578 17
-- 2,131 1 2,131 23
-- ------------ ----------
-- 9,195 100
--
-- This tells us that there is only one objects whose type is unique (it turns out that
-- on my system I have exactly one evaluation context). These show as 0% because the
-- percentages are rounded. There are also two object types that occur twice each, and so on,
-- with the most common type (as it happens, 'VIEW') occurring 2,131 times, accounting
-- for 23% of all values.
--
-- Compare this with:
-- SQL> @cardinality object_name all_objects
--
-- Occurrences Groups with this cardinality Total rows Proportion
-- ------------ ---------------------------- ------------ ----------
-- 1 4,243 4,243 46
-- 2 1,416 2,832 31
-- 3 232 696 8
-- 4 33 132 1
-- 5 2 10 0
-- 6 2 12 0
-- 8 1 8 0
-- 10 7 70 1
-- 71 1 71 1
-- 89 3 267 3
-- 92 4 368 4
-- 162 3 486 5
-- ------------ ----------
-- 9,195 100
--
-- SQL> @cardinality object_id
--
-- Occurrences Groups with this cardinality Total rows Proportion
-- ------------ ---------------------------- ------------ ----------
-- 1 9,191 9,191 100
-- ------------ ----------
-- 9,191 100
--
-- Null values are not counted: e.g. if a million-row table contains a value in one
-- record, this would be reported as one unique value (not one unique value and
-- 999,999 duplicates).
def column = "&1"
def table = "&2"
break on report
comp sum label "" of COUNT(*) total_rows proportion on report
col key_occurs format 999G999G999 hea "Occurrences"
col total_rows like key_occurs hea "Total rows"
col proportion format 990 hea "Proportion"
col groups_this_cardinality format 999G999G999 hea "Groups with this cardinality"
set term off
col notnullclause new_value notnullclause
select case
when '&column' like '%,%'
then 'where not (' ||
replace('&column',',',' is null and ') ||
' is null)'
else 'where &column is not null'
end as notnullclause
from dual;
set term on
select key_occurs
, count(*) as groups_this_cardinality
, key_occurs * count(*) as total_rows
, 100 * ratio_to_report(key_occurs * count(*)) over () as proportion
from ( select count(*) as key_occurs
from &table ¬nullclause
group by &column )
group by key_occurs
/
cl bre comp