Papers, Please: Auditing SSH Access at Spreedly

As a PCI L1-compliant company, securing our Cardholder Data Environment (CDE) is central to what we do. The PCI Data Security Standards specify that all users accessing CDE systems must be uniquely identified–i.e., no shared user accounts. This is pretty standard, best practice stuff. You want to know who accessed what, when, and from where. These are necessary, but even more important is understanding ‘why’ a system is being accessed.

Although default auth and ssh logs will indeed inform us when systems are accessed, and by which user, these log entries lack important contextual information that might indicate why the system is being accessed, at all:

  • A deploy?
  • Configuration Management?
  • A quick informational query?
  • A long-running interactive session?

It can sometimes be difficult or impossible to infer these things from the raw logs.

Also, the default logs are not easy to audit for patterns of use:

  • How long is each session duration?
  • Are there days-old tmux sessions running?
  • Are deploys or configuration management runs getting faster or slowing down over time?

This kind of useful information simply isn’t baked into the logs by default.

Ideally, we’d like to understand the intent behind every system access while looking at a stream of log events. We’d like to force ourselves to think about why we are accessing any CDE system, and to log our expressed intention for the access. And, we’d like increased visibility into our patterns of access.

In this post, we’ll examine the system we’ve implemented to improve the audit posture of our SSH logs beyond PCI-DSS requirements, and make higher-level analysis of our SSH log data easier.

Just the Facts, Ma’am.

First, we need to talk about some relevant bits of our Cardholder Data Environment.

SSH everywhere

All CDE systems run either FreeBSD or Ubuntu Linux. All access to CDE systems occurs via SSH. SSH sessions come in one of two basic flavors: interactive and non-interactive.

Interactive Sessions

Interactive means attached to a TTY, almost always with a shell process. It’s what you get when you log into a remote system with ‘ssh user@remote‘. For logins, the default shell process to invoke is specified in /etc/passwd for each user.

Non-interactive Session

A non-interactive SSH session lacks an attached remote terminal and sends the result of some command to your local terminal directly. This type of session is what you’d get if you added a command argument to your ssh login: ‘ssh user@remote somecommand‘.

Non-interactive sessions are not just for ad-hoc command use, however. We use Ansible for configuration management and Capistrano for deployment. Both of these tools generate lots of non-interactive SSH sessions.

We want to quickly see which SSH sessions are normal configuration or deployment activity, so we’ll need a way to identify these types of ssh ‘clients’.

Centralized Log Processing

We use rsyslog to forward all system logs on each host to a central log aggregator. This host, in turn, emits centralized log events to our event processor, Riemann.

This flexible log processing pipeline allows us to do many things: deliver messages to Slack, issue PagerDuty alerts, perform basic anomaly detection, collect and send system telemetry data to Librato for graphing, and more. We also reap the benefit of a clear separation of concerns: local processes don’t have to know or care about what is being done with the log data they emit, and centralized processing of the aggregate log data allows us to do things we can’t do piecemeal on each host: things like global anomaly detection, or batching calls to external services with rolled-up data to avoid API rate limiting.

Logs All the Way Down

Every connection to the CDE should generate a log event. These log events should detail:

* connecting username
* connecting ip address
* the command being executed
* non-interactive client type (if applicable)
* why the CDE is being accessed

Getting Fancy with Logging

The first hurdle is getting all the data we need onto the remote CDE host.

Passing Data to SSH

It turns out that most of the information we want to log is already available to the ssh process via the USER and SSH_CONNECTION environment variables, but our auditing mechanism still needs a way for users to pass along two fields: their stated intent for accessing the system, and what kind of client is connecting (e.g. Ansible or Capistrano).

This data passing happens via the AcceptEnv directive in /etc/ssh/sshd_config. Customarily, this mechanism is used to export the local user’s language/locale settings to the server: environment variables like LANG, LC_CTYPE, and LC_ALL are common. The ssh client configuration, however, supports sending arbitrary ENV variables via the SendEnv directive in a user’s ~/.ssh/config. We’ll use this mechanism to forward special environment variables we set locally on our machines to the CDE servers when we make an SSH connection.

We’ll call our environment variables RATIONALE and RATIONALE_CLIENT. This means that configuring SendEnv RATIONALE* on the client side, and configuring AcceptEnv RATIONALE* on the server side gets the job done. It is important to pick variables that are not used by any known system on the server, as we’ll be setting these remotely for the duration of each users’ session.

The following demonstrates the ENV mechanism with these changes in place:

Teaching SSH to Log

We initially experimented a bit with modified .bashrc and profile scripts, trying to get them to emit a log event upon every shell login, but this proved to be brittle, error-prone, and circumventable.

We need a solution that will work portably across all our hosts, and regardless of which shell a user might be invoking. It would be best if we could coax SSH itself into emitting our log events properly.

To do this, we make use of another really useful trick SSH has up its sleeve: the ForceCommand directive in /etc/ssh/sshd_config.

From the man(5) sshd_config page:

ForceCommand

Forces the execution of the command specified by ForceCommand,
ignoring any command supplied by the client and ~/.ssh/rc if
present. The command is invoked by using the user’s login shell
with the -c option. This applies to shell, command, or subsystem
execution. The command originally supplied by the client is
available in the SSH_ORIGINAL_COMMAND environment variable.

With ForceCommand, it is possible to do arbitrarily interesting things to your SSH sessions. We’re not doing anything too complex, so we’ll implement the logging mechanism via a small Bash script that:

  • determines the client type, if RATIONALE_CLIENT is not specified
  • extracts the connecting ip address from the SSH_CONNECTION var
  • prompts for a RATIONALE if missing and the session is interactive
  • logs the client type, ip, username, rationale, and the requested command with the logger command
  • uses SSH_ORIGINAL_COMMAND to actually perform the requested command (invoke a shell, run an ad-hoc command, etc)
  • logs the command duration upon completion, again with the logger command

This script has a few other niceties, like signal traps that prevent common keyboard interrupts, but the basic idea is to decorate any command passed to SSH with calls to logger.

Integration with Ansible and Capistrano

Interactive sessions are great from an audit viewpoint: they provide access to a terminal, so even if a user chooses not to export a RATIONALE variable locally, we can still prompt them for one before allowing login to continue.

Non-interactive commands run over ssh by definition do not give us the ability to prompt for a RATIONALE. In the best case, a user (or, more likely, a script or tool) can choose to export RATIONALE and RATIONALE_CLIENT variables.

It turns out that Ansible uses OpenSSH under the covers, and thus supports the SendEnv configuration we set in ~/.ssh/config. All we need to do is ensure that RATIONALE is exported when Ansible fires.

We already use wrapper scripts to invoke Ansible, so this was a natural fit: we ensure that RATIONALE is set locally before the ansible or ansible-playbook tools run, and automatically set the RATIONALE_CLIENT to ‘ansible’ for these.

Capistrano uses the net-ssh gem, which sadly does not respect ~/.ssh/config. Supporting a RATIONALE export for Capistrano requires a bit more work: in each Capfile, we do:

This provides useful deployment information to our central log stream with each deploy action, across several apps.

Et VoilĂ !

Now, when SSHing into a CDE host over our VPN, authorized users who have not exported a RATIONALE environment variable are challenged with the following prompt:

This prompt is persistent and must be satisfied before the user’s shell is invoked. The mechanism results in log entries like:

Getting from default SSH access logs to parsable, actionable log entries with clear descriptions of the intent behind each access took a little work. The RATIONALE: prompt during interactive SSH sessions adds an extra step that must be thought about, but we think that’s a feature, not a bug. Lots of interactive sessions in production are an anti-pattern, and this auditing mechanism helps us to better understand our patterns of use. In practice, it really hasn’t been too onerous. We think the results are worth it!