12 January 2022

One-Click Set Up: Querying AWS ALB and VPC Flow Logs Made Easy

Let’s face it, as soon as you get in trouble with an application or infrastructure, logs are your first resort. To have the right logs at hand, VPC Flow Logs should be enabled when a VPC is part of an infrastructure. On top, a load balancer is most likely the second most common component of a setup, and it’s advised to have access logging enabled on those.

Because these two log streams are often quite verbose, ingesting them cost-efficiently means offloading the log data to S3 instead of CloudWatch. ALB logs even don’t offer a choice, those are stored on AWS S3 anyway. The tradeoff of storing logs on S3? It makes querying harder. As an answer, AWS created AWS Athena to facilitate querying structured S3 data.

Out-of-the-box ALB and Flow Log queries

Last year, AWS released Flow Logs Athena integration. Taking away the pain of the Athena VPC Flow setup. A similar counterpart to easily query ALB logs is sadly missing for the moment… Well, until now! 😉

In the background, the AWS Athena Flow log integration turned out to be a vanilla CloudFormation template that bootstraps some Athena resources. I simply enhanced the template adding Athena ALB log integration following AWS best practices.

To allow your account to easily query ALB and VPC Flow Logs, all you need to do is deploy this CloudFormation template. The details of the template are described below.

Here’s the result:

Overview

Queries

The CloudFormation Log Stack in Detail

The stack parameters:

  • EnvironmentName: to support multiple environments (DTAP). Examples: dev or prod
  • Context: The ALB log context, could be something like an application or account name.
  • FlowLogsLocation: the S3 location of the flow logs. Format: s3://doc-example-bucket/prefix/AWSLogs/{account\_id}/vpcflowlogs/{region\_code}/
  • AlbLogsLocation: the S3 location of the ALB logs. Format: s3://your-alb-logs-directory/AWSLogs/(account\_id}/elasticloadbalancing/{region\_code}/
  • InitialPartitionDays: the number of days (in the past) of log data that will be partitioned on setup.

The most essential resources in the stack are:

  • An Athena Database
  • An Athena workgroup
  • An Athena Partitioned Table for the VPC Flow Logs
  • An Athena Partitioned Table for ALB logs
  • A Partitioner that will partition the log data for x amount of days in the past on setup
  • A partitioner to daily partition new log data

A word on Partitioning Data in Athena

By partitioning Athena Tables, you can restrict the amount of data scanned by each Athena query, thus improving performance and reducing cost. You can partition your data by any key. A common practice is to partition the log data based on time. In this case, the data is partitioned by year, month and day, because it probably makes the most sense.

To verify if a table is correctly partitioned you can run:

show partitions vpcflowlogs_non_prod

Partitions

Enjoy querying your logs and until next time!