I had a problem where I needed to find out what users have accessed some API-endpoint from Nginx-logs, so I made a handy script for that.
Update: I made an update to the script.
Zcat is nice, since it allows us to view gz-compressed logs and then use awk on the output.
The script for your pleasure:
#!/bin/bash # Check if the log directory is passed as an argument if [ -z "$1" ]; then echo "Usage: $0 /path/to/log/directory/" exit 1 fi LOG_DIR="$1" # Generate an output file name by removing any trailing slashes and incorporating the directory name DIR_NAME=$(basename "${LOG_DIR%/}") OUTPUT_FILE="${DIR_NAME}_unique_api_user_combinations.txt" # Empty the output file if it exists > "$OUTPUT_FILE" process_log() { awk '{ api = $7; # Assume the API endpoint is the 7th field svc_user = ""; for (i = 1; i <= NF; i++) { if ($i ~ /^svc/) { # Find the field starting with "svc" svc_user = $i; break; } } if (svc_user != "") { # Normalize the API endpoint by replacing /<integer_segment> with /[ID] # Also, replace anything after [?,=,:] with char[PARAMS] normalized_api = gensub(/\/[0-9]+[^\s]*/, "/[ID]", "g", api); normalized_api = gensub(/\?.*/, "?[PARAMS]", "g", normalized_api); normalized_api = gensub(/\=.*/, "=[PARAMS]", "g", normalized_api); normalized_api = gensub(/\:.*/, ":[PARAMS]", "g", normalized_api); print normalized_api, svc_user; } }' "$1" >> "$OUTPUT_FILE" } # Process each regular log file in the directory for log_file in "$LOG_DIR"/*.log; do process_log "$log_file" done # Process each compressed log file in the directory for gz_log_file in "$LOG_DIR"/*.log-*.gz; do zcat "$gz_log_file" | process_log "/dev/stdin" done # Remove duplicates from the output file sort "$OUTPUT_FILE" | uniq > "${OUTPUT_FILE}.tmp" mv "${OUTPUT_FILE}.tmp" "$OUTPUT_FILE" echo "Unique API/user combinations written to $OUTPUT_FILE"
Mix & match for your use.
Your comment may be published.
Name:
Email:
Message: