Kalletolonen.com - Linux and coding articles

Gospel for zcat & awk

I had a problem where I needed to find out what users have accessed some API-endpoint from Nginx-logs, so I made a handy script for that.

Kalle Tolonen
Feb. 27, 2025
Last updated on March 3, 2025

Update: I made an update to the script.

Zcat is nice, since it allows us to view gz-compressed logs and then use awk on the output.

The script for your pleasure:

#!/bin/bash

# Check if the log directory is passed as an argument
if [ -z "$1" ]; then
  echo "Usage: $0 /path/to/log/directory/"
  exit 1
fi

LOG_DIR="$1"

# Generate an output file name by removing any trailing slashes and incorporating the directory name
DIR_NAME=$(basename "${LOG_DIR%/}")
OUTPUT_FILE="${DIR_NAME}_unique_api_user_combinations.txt"

# Empty the output file if it exists
> "$OUTPUT_FILE"

process_log() {
  awk '{
      api = $7;  # Assume the API endpoint is the 7th field
      svc_user = "";
      for (i = 1; i <= NF; i++) {
          if ($i ~ /^svc/) {  # Find the field starting with "svc"
              svc_user = $i;
              break;
          }
      }
      if (svc_user != "") {
          # Normalize the API endpoint by replacing /<integer_segment> with /[ID]
          # Also, replace anything after [?,=,:] with char[PARAMS]
          normalized_api = gensub(/\/[0-9]+[^\s]*/, "/[ID]", "g", api);
          normalized_api = gensub(/\?.*/, "?[PARAMS]", "g", normalized_api);
          normalized_api = gensub(/\=.*/, "=[PARAMS]", "g", normalized_api);
          normalized_api = gensub(/\:.*/, ":[PARAMS]", "g", normalized_api);
          print normalized_api, svc_user;
      }
  }' "$1" >> "$OUTPUT_FILE"
}

# Process each regular log file in the directory
for log_file in "$LOG_DIR"/*.log; do
    process_log "$log_file"
done

# Process each compressed log file in the directory
for gz_log_file in "$LOG_DIR"/*.log-*.gz; do
    zcat "$gz_log_file" | process_log "/dev/stdin"
done

# Remove duplicates from the output file
sort "$OUTPUT_FILE" | uniq > "${OUTPUT_FILE}.tmp"
mv "${OUTPUT_FILE}.tmp" "$OUTPUT_FILE"

echo "Unique API/user combinations written to $OUTPUT_FILE"

Gospel for zcat & awk

Comments

Add a comment