I have been running a PostgreSQL cluster for a while as the primary backend for many services. The database system is very robust, well supported by the community and very powerful. In this post, I'm going to show how I use central authentication and authorization with PostgreSQL.
Centralized management is an important principle whenever deployments become very dispersed. For authentication and authorization, having a high-available LDAP is one of the more powerful components in any architecture. It isn't the only method though - it is also possible to use a distributed approach where the master data is centrally managed, but the proper data is distributed to the various systems that need it. Such a distributed approach allows for high availability without the need for a highly available central infrastructure (user ids, group membership and passwords are distributed to the servers rather than queried centrally). Here, I'm going to focus on a mixture of both methods: central authentication for password verification, and distributed authorization.
PostgreSQL default uses in-database credentials
By default, PostgreSQL uses in-database credentials for the
authentication and authorization. When a CREATE ROLE
(or
CREATE USER
) command is issued with a password, it is stored in the
pg_catalog.pg_authid
table:
postgres# select rolname, rolpassword from pg_catalog.pg_authid;
rolname | rolpassword
----------------+-------------------------------------
postgres_admin |
dmvsl |
johan |
hdc_owner |
hdc_reader |
hdc_readwrite |
hadoop |
swift |
sean |
hdpreport |
postgres | md5c127bc9fc185daf0e06e785876e38484
this cannot be moved outside):
postgres# \l db_hadoop
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-----------+----------+------------+------------+---------------------------
db_hadoop | hdc_owner | UTF8 | en_US.utf8 | en_US.utf8 | hdc_owner=CTc/hdc_owner +
| | | | | hdc_reader=c/hdc_owner +
| | | | | hdc_readwrite=c/hdc_owner
Furthermore, PostgreSQL has some additional access controls through its
pg_hba.conf
file, in which the access towards the PostgreSQL service
itself can be governed based on context information (such as originating
IP address, target database, etc.).
For more information about the standard setups for PostgreSQL, definitely go through the official PostgreSQL documentation as it is well documented and kept up-to-date.
Now, for central management, in-database settings become more difficult to handle.
Using PAM for authentication
The first step to move the management of authentication and authorization outside the database is to look at a way to authenticate users (password verification) outside the database. I tend not to use a distributed password approach (where a central component is responsible for changing passwords on multiple targets), instead relying on a high-available LDAP setup, but with local caching (to catch short-lived network hick-ups) and local password use for last-hope accounts (such as root and admin accounts).
PostgreSQL can be configured to directly interact with an LDAP, but I
like to use Linux PAM whenever I can. For
my systems, it is a standard way of managing the authentication of many
services, so the same goes for PostgreSQL. And with the
sys-auth/pam_ldap
package integrating multiple services with LDAP is a breeze. So the
first step is to have PostgreSQL use PAM for authentication. This is
handled through its pg_hba.conf
file:
# TYPE DATABASE USER ADDRESS METHOD [OPTIONS]
local all all md5
host all all all pam pamservice=postgresql
This will have PostgreSQL use the postgresql
PAM service for
authentication. The PAM configuration is thus in
/etc/pam.d/postgresql
. In it, we can either directly use the LDAP PAM
modules, or use the SSSD modules and have SSSD work with LDAP.
Yet, this isn't sufficient. We still need to tell PostgreSQL which users can be authenticated - the users need to be defined in the database (just without password credentials because that is handled externally now). This is done together with the authorization handling.
Users and group membership
Every service on the systems I maintain has dedicated groups in which for instance its administrators are listed. For instance, for the PostgreSQL services:
# getent group gpgsqladmin
gpgsqladmin:x:413843:swift,dmvsl
A local batch job (ran through cron) queries this group (which I call
the masterlist, as well as queries which users in PostgreSQL are
assigned the postgres_admin
role (which is a superuser role like
postgres and is used as the intermediate role to assign to
administrators of a PostgreSQL service), known as the slavelist.
Delta's are then used to add the user or remove it.
# Note: membersToAdd / membersToRemove / _psql are custom functions
# so do not vainly search for them on your system ;-)
for member in $(membersToAdd ${masterlist} ${slavelist}) ; do
_psql "CREATE USER ${member} LOGIN INHERIT;" postgres
_psql "GRANT postgres_admin TO ${member};" postgres
done
for member in $(membersToRemove ${masterlist} ${slavelist}) ; do
_psql "REVOKE postgres_admin FROM ${member};" postgres
_psql "DROP USER ${member};" postgres
done
The postgres_admin
role is created whenever I create a PostgreSQL
instance. Likewise, for databases, a number of roles are added as well.
For instance, for the db_hadoop
database, the hdc_owner
,
hdc_reader
and hdc_readwrite
roles are created with the right set of
privileges. Users are then granted this role if they belong to the right
group in the LDAP. For instance:
# getent group gpgsqlhdc_own
gpgsqlhdc_own:x:413850:hadoop,johan,christov,sean
With this simple approach, granting users access to a database is a
matter of adding the user to the right group (like gpgsqlhdc_ro
for
read-only access to the Hadoop related database(s)) and either wait for
the cron-job to add it, or manually run the authorization
synchronization. By standardizing on infrastructural roles (admin,
auditor) and data roles (owner, rw, ro) managing multiple databases is a
breeze.