EFK

What is EFK

EFK is the acronym for three open source projects
Elastic , Fluentd, Kibana.

Fluentd is a log shipper and data processing pipeline that ingests data from multiple sources simultneously, transforms it, and then sends it to Elasticsearch.
Elasticsearch is a search and analytics engine. Kibana lets users
visualize data with charts and graphs in Elasticsearch.

EFK Stack Architecture

Simple architecure of EFK Stack

Logs: Server logs that need to be analyzed are
identified.
Fluentd: Deployed as daemonset as it need to
collect the container logs from all the nodes.It connects to the
Elasticsearch service endpoint to forward the logs.
Elastic Search: Deployed as statefulset as it holds
the log data. We also expose the service endpoint for Fluentd and kibana
to connect to it.
Kibana: Deployed as deployment and connects to
elasticsearch service endpoint.

Elasticsearch

Elasticsearch is a NoSQL database. It is based on Lucence Search
engine, and it is built with RESTful APIs. It is often simple
depolyment, maximum reliabilty, and easily manageable. it also offers
advanced queries to perform detail analysis and stores all the data
centrally. It is helpful for executing a quick search of the documents.

Elasticsearch also allows you to store, search and analyze big
volume of data.It is mostly used as the underlying engine which powers
applications that completes search requirements. It has been adopted in
search engine platforms for modern web and mobile applications. Apart
from quick search, the tool also offers complex analytics and many
advanced features.

Features

Open source search server is written using java
Used to index any kind of heterogeneous data
Has REST API web-interface with JSON output
Full-Text Search
Near Real TIme(NRT) search
Sharded,replicated searchable,JSON document store
Schema-free, REST & JSON based distributed document store
Multi-language & Geolocation support

Advatages

Store schema-less data and also creates a schema for your data
Manipulate your data record by record with the help of
Multi-document APIs
Perform filtering and querying your data for insights
Based on Apache Lucene and provides RESTful API
Provides horizontal scalability, reliability, and multitentant
capability for real time use of indexing to make it faster search
Helps you to scale vertically and horizontally

Fluentd

Fluentd is an efficient log aggregator. It is written in Ruby, and
scales very well. For most small to medium sized deployments, fluentd is
fast and consumes relatively minimal resources. “Fluent-bit”, a new
project from the creators of fluentd claims to scale even better and
has an even smaller resource footprint. For the purpose of this
discussion, lets focus on fluentd as it is more mature and more widely
used.

Fluentd scraps logs from a given set of sources, processes them
(converting into a structured data format) and then forwards them to
other services like Elasticsearch, object storage etc. Fluentd is
especially flexible when it comes to integrations – it works with 300+
log storage and analytic services.

Fluentd gets data from multiple sources.
It structures and tags data.
It then sends the data to multiple destinations, based on matching
tags

Advantages

Fluentd provides both active-active and active-passive deployment
patterns for both availability and scale.
Fluentd can forward log and event data to any number of additional
processing nodes
Tagging and Dynamic Routing with Fluentd
Larger Plugin Library with Fluentd
It uses less memory approx ~40MB

Kibana

Kibana is a data visualization which completes the EFK stack. This
tool is used for visualizing the Elasticsearch documents and helps
developers to have a quick insight into it. Kibana dashboard offers
various interactive diagrams, geospatial data, and graphs to visualize
complex queries.

It can be used for search, view, and interact with data stored in
Elasticsearch directories. Kibana helps you to perform advanced data
analysis and visualize your data in a variety of tables, charts, and
maps.

In Kibana there are different methods for performing searches on your
data.

Advantages and

Disadvantages of Kibana

Easy visualization
Fully integrated with Elasticsearch
Offers real-time analysis, charting, summarization, and debugging
capabilities
Provides instinctive and user-friendly interface
Allows sharing of snapshots of the logs searched through
Permits saving the dashboard and managing multiple dashboards

kubernetes configs

Elastic Search State-full

set

kind: Service
apiVersion: v1
metadata:
  name: elasticsearch
  labels:
    app: elasticsearch
spec:
  selector:
    app: elasticsearch
  # clusterIP:
  type: NodePort
  ports:
    - port: 9200
      name: rest
      targetPort: 9200
      nodePort: 31000
    - port: 9300
      name: inter-node
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-cluster
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
        - name: elasticsearch
          image: docker.elastic.co/elasticsearch/elasticsearch:7.17.6
          resources:
            limits:
              cpu: 1000m
            requests:
              cpu: 100m
          ports:
            - containerPort: 9200
              name: rest
              protocol: TCP
            - containerPort: 9300
              name: inter-node
              protocol: TCP
          volumeMounts:
            - name: data
              mountPath: /usr/share/elasticsearch/data
          env:
            - name: cluster.name
              value: k8s-logs
            - name: node.name
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: discovery.seed_hosts
              value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
            - name: cluster.initial_master_nodes
              value: "es-cluster-0,es-cluster-1,es-cluster-2"
            - name: ES_JAVA_OPTS
              value: "-Xms512m -Xmx512m"
      initContainers:
        - name: fix-permissions
          image: busybox
          command:
            ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: data
              mountPath: /usr/share/elasticsearch/data
        - name: increase-vm-max-map
          image: busybox
          command: ["sysctl", "-w", "vm.max_map_count=262144"]
          securityContext:
            privileged: true
        - name: increase-fd-ulimit
          image: busybox
          command: ["sh", "-c", "ulimit -n 65536"]
          securityContext:
            privileged: true
  volumeClaimTemplates:
    - metadata:
        name: data
        labels:
          app: elasticsearch
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "local-path"
        resources:
          requests:
            storage: 30Gi

Fluentd Daemon set

#https://github.com/fluent/fluentd-kubernetes-daemonset
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd
rules:
  - apiGroups:
      - ""
    resources:
      - pods
      - namespaces
    verbs:
      - get
      - list
      - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: fluentd
    namespace: default
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  labels:
    k8s-app: fluentd-logging
    version: v1
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-logging
      version: v1
  template:
    metadata:
      labels:
        k8s-app: fluentd-logging
        version: v1
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      containers:
        - name: fluentd
          image: fluent/fluentd-kubernetes-daemonset:v1.14.6-debian-elasticsearch7-1.1
          env:
            - name: FLUENT_ELASTICSEARCH_HOST
              value: "192.168.10.120"
            - name: FLUENT_ELASTICSEARCH_PORT
              value: "31000"
            # - name: FLUENT_ELASTICSEARCH_SCHEME
            #   value: "http"
            # - name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
            #   value: "cri"
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 200Mi
          volumeMounts:
            - name: fluentconfig
              mountPath: /fluentd/etc/
            - name: varlog
              mountPath: /var/log
            - name: dockercontainerlogdirectory
              mountPath: /var/log/pods
              readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: dockercontainerlogdirectory
          hostPath:
            path: /var/log/pods
        - name: fluentconfig
          configMap:
            name: fluentdconf

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentdconf
  namespace: default
data:
  fluent.conf: |

    <label @FLUENT_LOG>
      <match fluent.**>
          @type null
      </match>
    </label>
    
    #region kubernetes metadata
    <filter kubernetes.**>
        @type kubernetes_metadata
    </filter>
    #endregion 
    
    #Enable this for debugging purposes 
    # <match **>
    #   @type stdout
    # </match>


    #region herald logs filters
    <source>
      @type tail
      path /var/log/containers/herald*.log
      pos_file /var/log/herald.log.pos
      tag kubernetes.*
      read_from_head true
      format json
      <parse>
        @type cri
        <parse>
          @type json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S
          keep_time_key true # if you want to keep "time" field, enable this parameter
        </parse>
      </parse>
    </source>
  
    <match kubernetes.var.log.containers.**herald**.log>
      @type elasticsearch
      host 192.168.10.120
      port 31000
      logstash_format true
      logstash_prefix herald-logs
      logstash_dateformat %Y-%m-%d
    </match>
    #endregion 

    #region Falcon logs filters
    <source>
      @type tail
      path /var/log/containers/falconsstateful*.log
      pos_file /var/log/falconsstateful.log.pos
      tag kubernetes.*
      read_from_head true
      format json
      <parse>
        @type cri
        <parse>
          @type json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S
          keep_time_key true # if you want to keep "time" field, enable this parameter
        </parse>
      </parse>
    </source>

    <match kubernetes.var.log.containers.**falconsstateful**.log>
      @type elasticsearch
      host 192.168.10.120
      port 31000
      logstash_format true
      logstash_prefix falcon-logs
      logstash_dateformat %Y-%m-%d
    </match>
    #endregion 

    #region wright logs filters
    <source>
      @type tail
      path /var/log/containers/wright*.log
      pos_file /var/log/wright.log.pos
      tag kubernetes.*
      read_from_head true
      format json
      <parse>
        @type cri
        <parse>
          @type json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S
          keep_time_key true # if you want to keep "time" field, enable this parameter
        </parse>
      </parse>
    </source>

    <match kubernetes.var.log.containers.**wright**.log>
      @type elasticsearch
      host 192.168.10.120
      port 31000
      logstash_format true
      logstash_prefix wright-logs
      logstash_dateformat %Y-%m-%d
    </match>
    #endregion

Kibana

apiVersion: v1
kind: Service
metadata:
  name: kibana-np
spec:
  selector:
    app: kibana
  type: NodePort
  ports:
    - port: 8080
      targetPort: 5601
      nodePort: 30000
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  labels:
    app: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
        - name: kibana
          image: docker.elastic.co/kibana/kibana:7.17.6
            #resources:
            #  limits:
            #    cpu: 1000m
            #  requests:
            #    cpu: 100m
          env:
            - name: ELASTICSEARCH_URL
              value: http://elasticsearch:9200
          ports:
            - containerPort: 5601

Test-pod:

apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  containers:
    - name: count
      image: busybox
      args:
        [
          /bin/sh,
          -c,
          'i=0; while true; do echo "This is EFK Stack! $i"; i=$((i+1)); sleep 1; done',
        ]

How to deploy

local-path provisioning

NOTE: making as default storage class


kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Deploy ElasticSearch

kubectl create -f ./elasticsearch/es-sts.yaml

Deploy Kibana

kubectl create -f ./kibana/kibana-deployment.yaml

Deploy Fluentd

kubectl create -f ./fluentd/fluentd-ds.yaml