AWS – Using Logstash to Ingest Logs From S3 Bucket Into Elastic

S3 Logstash input with the Prefix working

I have written a few blog posts about setting up an ELK (Elastic Logstash Kibana) stack but have not really touched on the power of Logstash. In this blog post I am hoping to resolve this by looking at using Logstash to get logs from an AWS S3 bucket and place them into Elastic.

To follow along with this blog post I recommend:

  • AWS access
  • Some logs in an AWS S3 bucket
  • An ELK stack (need help with this? Check out my Ansible playbooks)

Create An AWS Role

Logstash is going to need to be able to connect to the S3 bucket and will need credentials to do this. I recommend creating a new account with application/program access and limiting it to the “S3 Read Bucket” policy that AWS has. AWS will generate an “access key” and a “secret access key”, keep these safe as they are needed later on. The bucket policy limits what the account can do (i.e. it wont be able to create EC2 instances), and as its read only it stops any accidental deletes.

Create An S3 Bucket & Add Some Logs

I am using a test bucket that I have called “geektechstuff-log-test”. Remember bucket names have to be unique.

Example S3 bucket with a log file
Example S3 bucket with a log file

I’m adding three log files:

  • log-17072020
    A text (txt) file located in the root of the bucket.
  • log-15072020.txt
    A text (txt) file located under /2020/07/15/
  • log-16072020.txt
    A text (txt) file located under /2020/07/16/

I grabbed an example log from online to use in this example, switching the dates and users in each example:

July 15 10:32:22 router1 mgd[3606]: UI_DBASE_LOGOUT_EVENT: User 'dr strange' exiting configuration mode
July 15 11:36:15 router1 mgd[3606]: UI_COMMIT: User 'root' performed commit: no comment
July 15 11:46:37 router1 mib2d[2905]: SNMP_TRAP_LINK_DOWN: ifIndex 82, ifAdminStatus up(1), ifOperStatus down(2), ifName at-1/0/0

Configure Logstash

Logstash's main.conf file
Logstash’s main.conf file

I’m running Logstash on an Ubuntu box, and the same box is also running Elastic and Kibana. This is not my normal set up (I normally give each their own instance or machine) but I wanted to play with AWS S3 and didn’t want to repurpose my current ELK stack.

Logstash stores it’s configuration in /etc/logstash and uses a few different files but the one to check is pipelines.yml which should be telling Logstash to point to /etc/logstash/conf.d/ and to read any .conf files. If it is then navigate to /etc/logstash/conf.d/ and either create a new .conf file or amend the main.conf file. I’m amending the main.conf file, as noted above this is a test project for me.

Test 1 – Ingesting From Root Of S3 Bucket

My first test was to ingest the log file I had placed at the root of the S3 bucket. I edited the main.conf file so that it read:

input {
      s3 {
          "access_key_id" => " My AWS Access Key Here "
          "secret_access_key" => " My AWS Secret Access Key Here"
          "bucket" => "geektechstuff-log-test"
          "region" => "eu-west-2"
          "additional_settings" => {
              "force_path_style" => true
              "follow_redirects" => false

output {
   elasticsearch {
     host => "http://localhost:9200"
     index => "logs-%{+YYYY.MM.dd}"

After amending main.conf I restarted the Logstash service and jumped in Kibana. Within a few minutes a new index showed up in Index Management with the correct naming convention.

logs- index showing in Index Management
logs- index showing in Index Management

And I could then use the Kibana Index Patterns to tell Kibana about the index.

Setting up Kibana logs- index pattern

Test 2 – Reading From  A Particular Folder / Directory

Next up was rejigging the main.conf so that I could read from a particular folder / directory within my S3 bucket. I did this incase the bucket contains more than logs, or if I wanted logs from certain devices/services read but not other logs.

For this I’m going to add “prefix” to the input settings in the main.conf file.

My initial attempts went a little wrong.

No files found in bucket…

But with a little time and testing I got there.

Note: AWS S3 buckets may look like they are using folders / directories but the end object’s filename is treated as one long flat file name. So log-16072020.txt in the directory 16, of directory 07 of directory 2020 in the S3 bucket actually has the name:


S3 Logstash input with the Prefix working
S3 Logstash input with the Prefix working

Logstash successfully ingested the log file within 2020/07/16 and did not ingest the log file in 2020/07/15.

Some notes:

  • The “prefix” option does not accept regular expression.
  • FileBeat may also be able to read from an S3 bucket
  • The “exclude_pattern” option for the Logstash input may be a better option
  • I’ve not done any filtering in this project, instead just relying on input and output.

Further Reading:

AWS IAMs and Bucket Policies

AWS S3 Example Policies

Elastic Logstash S3