DOCKER AND KUBERNETES IMPORTANT EXAMPLES

Docker Installation commands

In ubuntu 16.04, docker installation commands

sudo apt-get -f install

Install some required packages on your system for installing Docker on Ubuntu system.

sudo apt-get install apt-transport-https ca-certificates curl software-properties-common

Now import dockers official GPG key to verify packages signature before installing them with apt-get. Run the below command on terminal.

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add –

After that add the Docker repository on your Ubuntu system which contains Docker packages including its dependencies. You must have to enable this repository to install Docker on Ubuntu.

sudo add-apt-repository “deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable”

sudo apt-get update

Run the following

apt-cache search docker-ce

sample output:

docker-ce – Docker: the open-source application container engine

Install docker-ce:

sudo apt-get install docker-ce

Docker Container Commands

Docker container is a running instance of an image. A container combined only libraries and settings required to make the application work. It is the lightweight and portable encapsulated environment for an application.

Run Docker Container

Use docker run command to launch a Docker container on your system. For example below command will create a Docker container from the hello-world image.

docker run hello-world

Now create a real instance of Docker container using CentOS operating system. The option-it will provide an interactive session with pseudo-TTY enabled. This provides you container shell immediately.

docker run -it centos

Use docker ps command to list all running containers on your current system. This will not list stopped containers. This will show you the Container ID, name and other use full information about the container.

docker ps

Use -a option with above command to list all containers including stopped.

docker ps -a

Find all Details of Container

Use docker inspect command to find all details about a docker container. You need to provide container id or container name to get details of the specific container.

docker inspect <containerid>

Delete Docker Container

Use docker rm command to delete existing docker container. You need to provide container id or container name to delete specific container.

docker stop <containerid>

docker rm <containerid>

Docker Images

An image is an inert, immutable, file that’s essentially a snapshot of a container. The images can be created with the build command, which can be used to create a container when started with run.

List Docker Images. Use docker images command to list all images available on your local system.

$ docker images

Download Docker Images

You use docker pull command to download any image from docker hub. For example download centos image from dockerhub to your local system and use latest to create containers.

$ docker pull centos

Delete Docker Images

Use docker rmi command to remove any docker image from your local system. For example to remove image named centos use following command.

$ docker rmi centos

Docker Data Volumes

In docker, data volumes are useful for managing your data with Docker containers and host machines. Here you will find two way of managing data volumes on Docker containers.
Using the data volume container.
Using share data between the host and the Docker container
Use Docker Data Volume Container
- The Docker data volume container is same as other containers, but they just used for storage only. You can use storage of these containers to your containers. When you write data to your containers file system, it will be originally written to a storage container.
- Create Data Volume Container:
- $ docker create -v /data –name data_container centos
- The above command will launch a container with name “data_container” having data volume at /data. The container will stop immediately due to no service. Now create your application container with –volumes-from flag.
Use Data Volume to New Container:
- $ docker run -it –volumes-from data_container centos /bin/bash
- This will provide you shell access to newly created container. Now use /data directory to create any number of files and directories on it.
- cd /data
- echo “Hello Docker” > file.txt
- Now you can exit from current instance. Also, you remote this using docker remove command. After that launch new instance using the same data volume and try to access the files created under /data directory.
Verify Files on Data Volume:
- Let’s launch another container with the same data volumes and check available files on /data directory.
- $ docker run -it –volumes-from data_container centos /bin/bash
Sharing Host Volume to Docker Container
You can also share data between the host machine and docker container. For example you can share /var/www directory of host machine to /data directory. You can share any directory of the host machine to a container.
$ docker run -it -v /var/www:/data centos

Working with Dockerfile

Dockerfile is a file used to build images by reading instructions from a file. The default name is used as Dockerfile. You can create dockerfile in the current directory with specific instructions and build a customized image as per your requirements.

Build Image with Dockerfile
As a best practice, the Dockerfile is called Dockerfile and located in the root of the context. You can use the following command to build docker image. This will read Dockerfile in the current directory.

$ docker build -t image_name .

You can also use the -f flag with docker build command to point to a Dockerfile anywhere in your file system.

$ docker build -t image_name -f /path/to/Dockerfile .

What are Dockerfile Directives
In our previous tutorial, you learned how to build images with Dockerfile. This tutorial will help you to understand the basic Dockerfile directives and there uses.

FROM

The from directive is used to set base image for the subsequent instructions. A Dockerfile must have FROM directive with valid image name as the first instruction.
Examples:

FROM ubuntu

LABEL

Using label you can organize images in a proper way. this is useful to set maintainer address, vender name, version of image, release date etc. The line must be begain with keywork “LABEL”.

LABEL maintainer=”kiran”

LABEL vendor=”org”

LABEL com.example.version=”0.0.1″

RUN

Using RUN directing ,you can run any command to image during build time. For example you can install required packages during the build of image.

RUN apt-get update

RUN apt-get install -y apache2 automake build-essential curl

As a more formatted syntax, you can use as following.

RUN apt-get update && apt-get install -y \

automake \

build-essential \

curl \

COPY

The COPY directive used for coping files and directories from host system to the image during build. For example the first commands will copy all the files from hosts html/ directory /var/www/html image directory. Second command will copy all files with extension .conf to /etc/apache2/sites-available/ directory.

COPY html/* /var/www/html/

COPY *.conf /etc/apache2/sites-available/

WORKDIR

The WORKDIR directive used to sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD commands during build.

WORKDIR /opt

CMD

The CMD directive is used to run the service or software contained by your image, along with any arguments during the launching the container. CMD uses follwoing basic syntax

CMD [“executable”,”param1″,”param2″]

For example, to start Apache service during launch of container, Use the following command.

CMD [“apachectl”, “-D”, “FOREGROUND”]

EXPOSE

The EXPOSE directive indicates the ports on which a container will listen for the connections. After that you can bind host system port with container and use them.

EXPOSE 80

EXPOSE 443

ENV

The ENV directive is used to set environment variable for specific service of container.

ENV PATH=$PATH:/usr/local/pgsql/bin/ \

PG_MAJOR=9.6.0

VOLUME

The VOLUME directive creates a mount point with the specified name and marks it as holding externally mounted volumes from native host or other containers.

VOLUME [“/data”]

Manage Ports in Docker

The Docker containers run services inside it on a specific port. To access that services of container running on a port, You need to bind container port with some Docker host port.

Docker POC’s

Running MySQL as Docker

Have a look at below image. You will see that docker host is running two containers, one is running Apache which has some website and other have MySQL.
MySQL Docker container

docker pull mysql/mysql-server

If need to use version, run, inspect logs and port mapping

docker pull mysql/mysql-server:5.5

docker run –name=mysql1 -d mysql/mysql-server

If need to use version

docker run –name=mysql1 -d mysql/mysql-server:5.5

docker logs mysql1

docker run –name=mysql1 -e MYSQL_ROOT_HOST=% -p 3306:3306 -d mysql/mysql-server

docker run –name=mysql1 -e MYSQL_ROOT_HOST=% -p 3306:3306 -d mysql/mysql-server:5.5

docker logs mysql1 2>&1 | grep GENERATED

This will give generated root password, provide access

docker exec -it mysql1 mysql -u root -p

ALTER USER ‘root’@’localhost’ IDENTIFIED BY ‘admin@123’;

to check all the containers, include not-running containers

docker ps -a

CREATE USER ‘newUser’@’%’ IDENTIFIED BY ‘newPass123’;

GRANT ALL PRIVILEGES ON *.* TO ‘newUser’@’%’ WITH GRANT OPTION;

Run Tomcat as Docker

Run the default Tomcat server (CMD [“catalina.sh”, “run”]):

$ docker run -it –rm tomcat:8.0

You can test it by visiting http://container-ip:8080 in a browser or, if you need access outside the host, on port 8888:

$ docker run -it –rm -p 8888:8080 tomcat:8.0

You can then go to http://localhost:8888 or http://host-ip:8888 in a browser

Build a war file and create a docker file to run the war file as a tomcat docker container

Docker file for custom war file

FROM tomcat:8.0.43-jre8

ADD webdemodocker.war /usr/local/tomcat/webapps/

EXPOSE 8080

CMD chmod +x /usr/local/tomcat/bin/catalina.sh

CMD [“catalina.sh”, “run”]

sudo docker build -t webdemo .

sudo docker run -i -t -d -p 8080:8080 webdemo

References :

https://tecadmin.net/tutorial/docker/docker-tutorials/

Docker file for nginx

From nginx

COPY index.html /usr/share/nginx/html

index.html

This is hello world to Niginx

docker build -t mynginx .

docker run -it -p 8080:80 mynginx

docker run -it -p 8081:80 mynginx

Running 2 container instances on same port mapped to different host port

Kubernetes setup

Microservice using Spring Boot, Docker and Kubernetes or Minikube in Linux Mint

https://medium.com/@shivraj.jadhav82/microservice-using-spring-boot-docker-and-kubernetes-or-minikube-in-linux-mint-5b0859770baf

Install docker as mentioned above

Install minikube as below

curl -Lo minikube https://storage.googleapis.com/minikube/releases/v1.3.0/minikube-linux-amd64 && chmod +x minikube && sudo cp minikube /usr/local/bin/ && rm minikube

start minkube as below

sudo minikube start –vm-driver=none

If there is an issue in minikube pod eviction because of low ephermal storage issue,

$ rm -rf ~/.minikube

$ minikube start –disk-size 64g

Install kubectl

curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl

chmod +x ./kubectl

sudo mv ./kubectl /usr/local/bin/kubectl

kubectl version

kubectl get nodes will give below error

The connection to the server localhost:8080 was refused – did you specify the right host or port?

To start using cluster, need to execute below commands

sudo cp /etc/kubernetes/admin.conf $HOME/

sudo chown $(id -u):$(id -g) $HOME/admin.conf

export KUBECONFIG=$HOME/admin.conf

Now execute this command

kubectl get nodes

Create a spring boot application

Create a docker file

FROM openjdk:8-jdk-alpine

VOLUME /tmp

ARG JAR_FILE

COPY ${JAR_FILE} springbootapp.jar

ENTRYPOINT [“java”,”-Djava.security.egd=file:/dev/./urandom”,”-jar”,”springbootapp.jar”]

EXPOSE 8080

Copy the jar and dockerfile to one location

Build the docker image

sudo docker build -t kiranresearch2020/pochub:springbootapp –build-arg JAR_FILE=”springbootapp.jar” .

sudo docker login

User name : <username>

Password : <password>

sudo docker push kiranresearch2020/pochub:springbootapp

kubectl run sprinbootapp –image=kiranresearch2020/pochub:springbootapp –image-pull-policy=Never

kubectl expose deployment sprinbootapp –type=NodePort –port=8080

kubectl get pods

To view pod logs

kubectl logs <podname>

kubectl get services

sudo minikube service springbootapp –url

This will give the url

Now when you execute kubectl get services

Now execute from browser

30541 is node port that directs to 8080 target port or container port

http://<ipaddress:30541>

To delete pod, service and deployment

kubectl delete pods <podname>

kubectl delete services <servicename>

kubectl delete deployments <deploymentname>

minikube addons list – this command gives the list of add on enabled or disabled

minikube addons enable heapster – enable heapster

kubectl top node

kubectl top pods –all-namespaces

sudo minikube service monitoring-grafana -n kube-system

Open the VM ip address with port, this will open Grafana

Kubernetes cluster setup on Ubuntu with kubeadm

Setup kubenetes cluster on ubuntu 16.04 with kubeadm

https://medium.com/@SystemMining/setup-kubenetes-cluster-on-ubuntu-16-04-with-kubeadm-336f4061d929

Install a secure Kubernetes cluster on your machines

Add and prepare kubeadm package

Install dependency library

root@system-mining:~$ apt-get update && apt-get install -y apt-transport-https

Add key for new repository and add repository

root@system-mining:~$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -root@system-mining:~$ cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF

Update repository list

root@system-mining:~$ apt-get update

Install docker.io

If your has beeen install docker, Ignore docker.io packages

# Install docker if you don’t have it already.
root@system-mining:~$ apt-get install -y docker.io

Installkubelet & kubeadm If your has beeen install docker, Ignore docker.io packages

root@system-mining:~$ apt-get install -y kubelet kubeadm kubectl kubernetes-cni

Notes: You have to install kubernetes-cni to enable cni network on your machine. if not, kubernetes network will not working.
You need to do this step on your all of machines that you want to run kubernetes

Init your master clusterAll thing you need has been installed, now you need to initialize your master cluster

sminer@system-mining:~$ kubeadm init

After initializing master node, you get token to execute in slave nodes

kubeadm join 10.128.0.5:6443 –token 7y5l81.3apfeomvgollp37s \

–discovery-token-ca-cert-hash sha256:750d2c18fcb02a0205a761f0a0cc5fffb72b0d7f3b13bf1594fc4a86707e1887

After master node is initialized, execute below command to check current node status command

kubectl –kubeconfig /etc/kubernetes/admin.conf get nodes

reference : https://github.com/kubernetes/kubernetes/issues/23726

Setup kubernetes network

Kubeadmin is only support CNI network in this moment, we need to install an cni network in the master machine to help pod in cluster can communicate with each other, more infor about kubernetes pod network you can read here
In this example, I’ll use [weave-kube]kubectl apply -f https://git.io/weave-kube(https://github.com/weaveworks-experiments/weave-kube) as pod networks plugin

Kubectl –kubeconfig /etc/kubernetes/admin.conf apply -f https://git.io/weave-kube

Now login to slave nodes

Execute the above step1

To setup other machine to join into your cluster

Prepare your machine as step 1
Run command kubeadm join with params is the secret key of your kubernetes cluser and your master node ip

# This is the output of command *kubeadm init* on your master node
kubeadm join 10.128.0.5:6443 –token 7y5l81.3apfeomvgollp37s \

–discovery-token-ca-cert-hash sha256:750d2c18fcb02a0205a761f0a0cc5fffb72b0d7f3b13bf1594fc4a86707e1887

After run command, you can check to ensure that your node has been joined into cluster by run command

kubectl –kubeconfig /etc/kubernetes/kubelet.conf get nodes

Now, create pod and deployments

Dockerfile

FROM openjdk:8-jdk-alpine

VOLUME /tmp

ARG JAR_FILE

COPY ${JAR_FILE} springbootapp.jar

ENTRYPOINT [“java”,”-Djava.security.egd=file:/dev/./urandom”,”-jar”,”springbootapp.jar”]

EXPOSE 8080

Create kubernetes-pod.yml

apiVersion: v1

kind: Pod

metadata:

labels:

app: springbootwebapp

spec:

containers:

– name: springbootwebapp1

image: kiranresearch2020/pochub:springbootapp1

ports:

– containerPort: 8080

In master node, kubectl create –f kubernetes-pod.ymlNow check for nodes running or not If we get issue in describe pod0/1 nodes are available: 1 node(s) had taints that the pod didn’t tolerate. If jq command is not found, apt-get install jqkubectl get nodes -o json | jq .items[].spec.taints [ { “effect”: “NoSchedule”, “key”: “node-role.kubernetes.io/master” }] kubectl taint nodes –all node-role.kubernetes.io/master- reference : https://blog.kstaykov.eu/devops/kubernetes-taint/You will get output taints removedNow still if there an issue in describe podnetwork plugin is not ready: cni config uninitializeddelete /etc/systemd/system/kubelet.service.d/10-kubeadm.confreference : https://github.com/kubernetes/kubernetes/issues/48798some times you can use this kubectl apply –filename https://git.io/weave-kube-1.6 Now pod will be running Create deployment fileapiVersion: apps/v1kind: Deploymentmetadata: name: springbootappspec: replicas: 2 selector: matchLabels: app: springbootapp1 template: metadata: labels: app: springbootapp1 tier: backend track: stable spec: containers: – name: springbootapp image: “kiranresearch2020/pochub:springbootapp1” ports: – name: springbootapp1 containerPort: 8080 kubectl create –f kubernetes-deployment.yml Once deployment is created, create service with nodeport

kubectl expose deployment springbootapp –type=NodePort –port=8080

Now the pods are replicated in worker nodes

Adding Prometheus

https://linuxacademy.com/blog/kubernetes/running-prometheus-on-kubernetes/

Before we jump in to the technical stuff, there are a few assumptions that are being made. The first one is that you have a Kubernetes stack already in place. This post will not cover setting it up. The second is that you have ports 9090-9094 configured for your Kubernetes cluster. If you do not, you may need to change the targetPort for the Prometheus service.

Create the monitoring namespace

We are going to start off by creating the monitoring namespace. Using the editor of your choice, create namespace.yml and add the contents below:

{

“kind”: “Namespace”,

“apiVersion”: “v1”,

“metadata”: {

“name”: “monitoring”,

“labels”: {

“name”: “monitoring”

}

Namespaces act as virtual clusters in Kubernetes. We want to make sure that we run all of the Prometheus pods and services in the monitoring namespace. When you go to list anything you deploy out, you will need to use the -n flag and define monitoring as the namespace.

For example, if you want to list the Prometheus pods, you will need to do the following:

kubectl get pods -n monitoring

Apply the namespace

Now apply the namespace by executing the kubectl apply command:

kubectl apply -f namespace.yml

Next we will set up clusterRole.yml. This will be used to set up the cluster’s roles. We need to set this up so that Prometheus has the correct permissions to the Kubernetes API. Create the clusterRole.yml file and add the following contents to it:

apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRole

metadata:

rules:

– apiGroups: [“”]

resources:

– nodes

– nodes/proxy

– services

– endpoints

– pods

verbs: [“get”, “list”, “watch”]

– apiGroups:

– extensions

resources:

– ingresses

verbs: [“get”, “list”, “watch”]

– nonResourceURLs: [“/metrics”]

verbs: [“get”]

—

apiVersion: rbac.authorization.k8s.io/v1beta1

kind: ClusterRoleBinding

metadata:

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: ClusterRole

subjects:

– kind: ServiceAccount

namespace: monitoring

Apply cluster roles

Apply the cluster roles to the Kubernetes cluster:

kubectl apply -f clusterRole.yml

We are going to use a ConfigMap to decouple any configuration artifacts from image content. This will help keep containerized applications more portable. We will be using this to manage the prometheus.yml configuration file.
Create prometheus-config-map.yml and add the following:

apiVersion: v1

kind: ConfigMap

metadata:

name: prometheus-server-conf

labels:

name: prometheus-server-conf

namespace: monitoring

data:

prometheus.yml: |-

global:

scrape_interval: 5s

evaluation_interval: 5s

scrape_configs:

– job_name: ‘kubernetes-apiservers’

kubernetes_sd_configs:

– role: endpoints

scheme: https

tls_config:

ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

relabel_configs:

– source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

action: keep

regex: default;kubernetes;https

– job_name: ‘kubernetes-nodes’

scheme: https

tls_config:

ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:

– role: node

relabel_configs:

– action: labelmap

regex: __meta_kubernetes_node_label_(.+)

– target_label: __address__

replacement: kubernetes.default.svc:443

– source_labels: [__meta_kubernetes_node_name]

regex: (.+)

target_label: __metrics_path__

replacement: /api/v1/nodes/${1}/proxy/metrics

– job_name: ‘kubernetes-pods’

kubernetes_sd_configs:

– role: pod

relabel_configs:

– source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

action: keep

regex: true

– source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

– source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

action: replace

regex: ([^:]+)(?::\d+)?;(\d+)

replacement: $1:$2

target_label: __address__

– action: labelmap

regex: __meta_kubernetes_pod_label_(.+)

– source_labels: [__meta_kubernetes_namespace]

action: replace

target_label: kubernetes_namespace

– source_labels: [__meta_kubernetes_pod_name]

action: replace

target_label: kubernetes_pod_name

– job_name: ‘kubernetes-cadvisor’

scheme: https

tls_config:

ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:

– role: node

relabel_configs:

– action: labelmap

regex: __meta_kubernetes_node_label_(.+)

– target_label: __address__

replacement: kubernetes.default.svc:443

– source_labels: [__meta_kubernetes_node_name]

regex: (.+)

target_label: __metrics_path__

replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

– job_name: ‘kubernetes-service-endpoints’

kubernetes_sd_configs:

– role: endpoints

relabel_configs:

– source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

action: keep

regex: true

– source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

action: replace

target_label: __scheme__

regex: (https?)

– source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

– source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

action: replace

target_label: __address__

regex: ([^:]+)(?::\d+)?;(\d+)

replacement: $1:$2

– action: labelmap

regex: __meta_kubernetes_service_label_(.+)

– source_labels: [__meta_kubernetes_namespace]

action: replace

target_label: kubernetes_namespace

– source_labels: [__meta_kubernetes_service_name]

action: replace

target_label: kubernetes_name

This file has a lot going on in it. In a nutshell, we are creating the Prometheus targets using service discovery with the Kubernetes API. This is the reason why we needed to configure the cluster roles earlier. Without it, Prometheus wouldn’t have the necessary permissions to access the APIs to discover the targets.

The following jobs are being configured as targets using service discovery.

kubernetes-apiservers: Gets metrics on the Kubernetes APIs.
kubernetes-nodes: Gets metrics on the Kubernetes nodes.
kubernetes-pods: Gets metrics from Pods that have the prometheus.io/scrape and prometheus.io/port annotations defined in the metadata.
kubernetes-cadvisor: Gets cAdvisor metrics reported from the Kubernetes cluster.
kubernetes-service-endpoints: Gets metrics from Services that have the prometheus.io/scrape and prometheus.io/port annotations defined in the metadata.

By using service discovery, we don’t need to update the prometheus.conf file with new pods and services as they come online and offline. As long as the prometheus.io/scrape and prometheus.io/port annotations are defined in the metadata of your pods and services, Prometheus will automatically be updated with the targets.

Apply the ConfigMap

Now apply the ConfigMap:

kubectl apply -f config-map.yml

Now that the ConfigMap is in place, we can create the Prometheus Deployment and Service. Create prometheus-deployment.yml and add the following contents to it:

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

namespace: monitoring

spec:

replicas: 1

template:

metadata:

labels:

app: prometheus-server

spec:

containers:

– name: prometheus

image: prom/prometheus:v2.2.1

args:

– “–config.file=/etc/prometheus/prometheus.yml”

– “–storage.tsdb.path=/prometheus/”

ports:

– containerPort: 9090

volumeMounts:

– name: prometheus-config-volume

mountPath: /etc/prometheus/

– name: prometheus-storage-volume

mountPath: /prometheus/

volumes:

– name: prometheus-config-volume

configMap:

defaultMode: 420

– name: prometheus-storage-volume

emptyDir: {}

—

apiVersion: v1

kind: Service

metadata:

namespace: monitoring

annotations:

prometheus.io/scrape: ‘true’

prometheus.io/port: ‘9090’

spec:

selector:

app: prometheus-server

type: NodePort

ports:

– port: 8080

targetPort: 9090

nodePort: 30000

There are a few things I want to point out. Two volume mounts are being created. These are prometheus-config-volume and prometheus-storage-volume.

…

volumeMounts:

– name: prometheus-config-volume

mountPath: /etc/prometheus/

– name: prometheus-storage-volume

mountPath: /prometheus/

…

prometheus-config-volume will be using our ConfigMap to manage prometheus.yml, which is reflected in the volumes section.

…

– name: prometheus-config-volume

configMap:

defaultMode: 420

…

This is how we are able to use the prometheus-server-conf ConfigMap with the Prometheus deployment. For prometheus-storage-volume, we are creating an emptyDir to store the Prometheus data.

…

– name: prometheus-storage-volume

emptyDir: {}

…

This volume is ephemeral and is created and destroyed with the Pod. This means if you delete the Pod for any reason, the data in the prometheus-storage-volume is deleted with it. If you want this data to be persistent, then you will need to use a persistent volume instead.

Now lets take a look at the metadata defined in the service.

metadata:

namespace: monitoring

annotations:

prometheus.io/scrape: ‘true’

prometheus.io/port: ‘9090’

Here we are setting up the annotation so that this service will be discovered by Prometheus as a target to be scraped. To make the service available, set prometheus.io/scrape to true. Then, you need to make sure that prometheus.io/port is the targetPort defined in the service. If you don’t, the target will not be discovered.

ports:

– port: 8080

targetPort: 9090

nodePort: 30000

Because the targetPort is set to 9090 we will use that port with prometheus.io/port.

annotations:

prometheus.io/scrape: ‘true’

prometheus.io/port: ‘9090’

Create the deployment and service by executing kubectl apply.

kubectl apply -f prometheus-deployment.yml

Let’s verify that the pod and service were created.

kubectl get pods -n monitoring

kubectl get services -n monitoring

Once the pod and service are available, you can access Prometheus’s Expression Browser by going to http://<KUBERNETES_MASTER_IP>:9090.

Adding Grafana for visualization

https://medium.com/@wbassler23/getting-started-with-prometheus-pt-1-8f95eef417ed

docker run -d –name grafana -p 3000:3000 grafana/Grafana

User name : admin

Password : admin

Datasource as Grafana url

View at Medium.com

https://www.weave.works/docs/cloud/latest/tasks/monitor/configuration-k8s/

Annotations on pods allow a fine control of the scraping process:

io/scrape: The default configuration will scrape all pods and, if set to false, this annotation will exclude the pod from the scraping process.
io/path: If the metrics path is not /metrics, define it with this annotation.

RBAC Kubernetes

https://blog.viktorpetersson.com/2018/06/15/kubernetes-rbac.html

role.yaml

kind: Role

apiVersion: rbac.authorization.k8s.io/v1

metadata:

rules:

– apiGroups: [“”] # “” indicates the core API group

resources: [“pods”]

verbs: [“list”]

kubectl create -f role.yaml

rolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1

kind: RoleBinding

metadata:

subjects:

– kind: ServiceAccount

roleRef:

kind: Role

apiGroup: rbac.authorization.k8s.io

kubectl create -f rolebinding.yaml

kubectl auth can-i get pods \

–namespace lockdown \

–as system:serviceaccount:lockdown:sa-lockdown

sudo kubectl get pods –as system:serviceaccount:lockdown:sa-lockdown

Helm Charts

https://www.youtube.com/watch?v=LRDWxyFgZf0

How to Install Helm and Create Helm Chart on Ubuntu

Package manager for Kubernetes

Install Helm chart in Ubuntu

curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > install-helm.sh

Give permissions to install-helm.sh

chmod u+x install-helm.sh

./ install-helm.sh – Install Helm

create service account

sudo kubectl -n kube-system create serviceaccount tiller

create role binding

sudo kubectl create clusterrolebinding tiller –clusterrole cluster-admin –serviceaccount=kube-system:tiller

sudo helm init –service-account tiller – this will install tiller in kubernetes cluster

sudo helm install stable/kubernetes-dashboard –name dashboard-demo

If we get the error as below

an error occurred forwarding 36121 -> 44134: error forwarding port 44134 to pod c893914729ea40a021d21565026a57a9c6c51909b9ee3c752815c7a1d3ba5f0e, uid : unable to do port forwarding: socat not found.

Install socat

sudo apt-get install socat

once helm and tiller are installed

create chart

helm create hello-world

hello-world /

Chart.yaml

values.yaml

templates /

charts /

.helmignore

Let’s understand the relevance of these files and folders created for us:

Chart.yaml: This is the main file that contains the description of our chart
values.yaml: this is the file that contains the default values for our chart
templates: This is the directory where Kubernetes resources are defined as templates
charts: This is an optional directory that may contain sub-charts
.helmignore: This is where we can define patterns to ignore when packaging (similar in concept to .gitignore)

Creating Template

If we see inside the template directory, we’ll notice that few templates for common Kubernetes resources have already been created for us:

hello-world /

templates /

deployment.yaml

service.yaml

ingress.yaml

……

We may need some of these and possibly other resources in our application, which we’ll have to create ourselves as templates.

For this tutorial, we’ll create deployment and a service to expose that deployment. Please note the emphasis here is not to understand Kubernetes in detail. Hence we’ll keep these resources as simple as possible.

In values.yaml

replicaCount: 1

image:

repository: kiranresearch2020/pochub

tag: springbootapp

pullPolicy: IfNotPresent

imagePullSecrets: []

nameOverride: “”

fullnameOverride: “”

service:

type: NodePort

port: 8080

In deployment.yaml

containers:

– name: {{ .Chart.Name }}

image: “{{ .Values.image.repository }}:{{ .Values.image.tag }}”

imagePullPolicy: {{ .Values.image.pullPolicy }}

ports:

– name: http

containerPort: 8080

protocol: TCP

References : https://www.baeldung.com/kubernetes-helm

Firstly, this is a simple command that takes the path to a chart and runs a battery of tests to ensure that the chart is well-formed:

helm lint ./hello-world

==> Linting ./hello-world

1 chart(s) linted, no failures

Also, we have this command to render the template locally, without a Tiller Server, for quick feedback:

helm template ./hello-world

Once we’ve verified the chart to be fine, finally, we can run this command to install the chart into the Kubernetes cluster:

helm install –name hello-world ./hello-world

Now, we would like to see which charts are installed as what release. This command lets us query the named releases:

helm ls –all

What if we have modified our chart and need to install the updated version? This command helps us to upgrade a release to a specified or current version of the chart or configuration:

helm upgrade hello-world ./hello-world

It can always happen that a release went wrong and needs to be taken back. This is the command to rollback a release to the previous version:

helm rollback hello-world 1

Although less likely, we may want to delete a release completely. We can use this command to delete a release from Kubernetes:

helm delete –purge hello-world

Easy Kubernetes Development with Google Skaffold

https://medium.com/stakater/easy-kubernetes-development-with-google-skaffold-cbddb7e40519

Installing on Linux

Download the package:

# wget https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 -O /usr/local/bin/skaffold

Make the file executable:

# chmod +x /usr/local/bin/skaffold

Installing on macOS

Download the package.

# wget https://storage.googleapis.com/skaffold/releases/latest/skaffold-darwin-amd64 -O /usr/local/bin/skaffold
# chmod +x /usr/local/bin/skaffold

Confirm that the directory with newly added skaffold package is in your PATH environment. You can confirm using:

# echo $PATH

Scaled and distributed Jenkins on top of Kubernetes – Master and Slave

https://medium.com/muhammet-arslan/how-ive-created-scaled-and-distributed-jenkins-top-of-kubernetes-441db62b15cd

create namespace

sudo kubectl create ns Jenkins

Jenkins Service

create Jenkins-service.yml

##########################

# This template aims to Orchestrate / Provision Jenkins Services

# @author Muhammet Arslan <muhammet.arsln@gmail.com>

# @package tools

# @version 1.0

##########################

# [START jenkins_service_ui]

apiVersion: v1

kind: Service

metadata:

namespace: jenkins

spec:

ports:

– protocol: TCP

port: 8080

targetPort: 8080

selector:

app: master

type: NodePort

# [END jenkins_service_ui]

—

# [START jenkins_service_discovery]

apiVersion: v1

kind: Service

metadata:

namespace: jenkins

spec:

selector:

app: master

ports:

– protocol: TCP

port: 50000

targetPort: 50000

# [END jenkins_service_discovery]

sudo kubectl create -f jenkins-deployment.yml

for service account access

sudo kubectl create clusterrolebinding permissive-binding –clusterrole=cluster-admin –user=admin –user=kubelet –group=system:serviceaccounts:Jenkins

Dockerfile

After the services, we need to create the Jenkins Deployment. But before of that, I’ll create a basic Docker image to install all required plugins before the installation.

Create a “Dockerfile” file and paste the below contents in it.

from jenkins/jenkins:lts# Distributed Builds plugins
RUN /usr/local/bin/install-plugins.sh ssh-slaves# Scaling
RUN /usr/local/bin/install-plugins.sh kubernetes# install Maven
USER jenkins

Then build the template

docker build -t geass/jenkins:1.0.2 .

Then tag the Image

docker tag [IMAGE_ID] geass/jenkins:1.0.2

And push to HUB

docker push [IMAGE_ID] geass/jenkins

To get more detail about pushing/pulling images to Docker Hub, please follow that link: https://docs.docker.com/docker-cloud/builds/push-images/

Jenkins Deployment

Okay, we are ready to create deployment app for Jenkins. Create a file called “jenkins-deployment.yml” and paste the below contents

##########################
#
# This template aims to Orchestrate / Provision Jenkins Deployment
#
# @author Muhammet Arslan <muhammet.arsln@gmail.com>
# @package tools
# @version 1.0
#
##########################
# [START jenkins_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jenkins
namespace: jenkins
spec:
replicas: 1
template:
metadata:
labels:
app: master
spec:
containers:
– image: geass/jenkins:1.0.2
name: jenkins
ports:
– containerPort: 8080
name: http-port
– containerPort: 50000
name: jnlp-port
env:
– name: JAVA_OPTS
value: -Djenkins.install.runSetupWizard=false
# [END jenkins_deployment]

Indicating that, we will use our Docker image from the hub and exposing the both UI and Jenkins Slave Ports to the services. And we will set a JAVA_OPTS parameters to skip Jenkins Setup Wizard, which is enabled by default.

Now create the deployment;

kubectl create -f jenkins-deployment.yml

The output should be like;

deployment.extensions “jenkins” created

Check the pods

To be sure, Jenkins Master is running, let’s check the pods.

kubectl get pods –namespace jenkins
NAME READY STATUS RESTARTS AGE
jenkins-85bb8dcc5-d4lgv 1/1 Running 0 1m

Okay, let’s open the service.

minikube service jenkins –namespace jenkins
Opening kubernetes service jenkins/jenkins in default browser…

And boom! Jenkins is ready!

The language might differ on your browser 🙂

Enabling the Jenkins Slaves

With the Kubernetes plugin installed it must be configured by navigating to Manage Jenkins > Configure System and scrolling to the Cloud section. First, we configure the Kubernetes Section as below:

– Click “Add New Cloud” on drop-down button and select “kubernetes” option.

To obtain the Kubernetes URL you should invoke:

$ kubectl cluster-info | grep master
Kubernetes master is running at https:https://www.linkedin.com/redir/invalid-link-page?url=%2F%2F192%2e168%2e99%2e110%3A8443

To obtain the Jenkins URL, first, need to obtain the full pod name of the Jenkins master

$ kubectl get pods –namespace jenkins | grep ^jenkins
jenkins-85bb8dcc5-d4lgv 1/1 Running 0 7m

then obtain the IP address of the pod:

$ kubectl describe pod jenkins-85bb8dcc5-d4lgv –namespace jenkins | grep IP:
IP: https://www.linkedin.com/redir/invalid-link-page?url=172%2e17%2e0%2e14

With these configuration entries, the Jenkins Kubernetes plugin can interact with the Kubernetes API server. Next, we need to configure the pod template and container for the slave so that the plugin can provision a pod. Scroll down a bit, and click “Kubernetes pod template” option on the “Add Pod Template” drop-down.

Then click “Container Template” option in “Add Container” drop-down menu. And fill as below;

Okay! Apply and Save! Configurations are ready! Let’s create a Job.

Create a Job!

Click “New Item” button and simply give a name then choose “Freestyle Project” and go ahead.

You will need to set the Label Expression field to match that specified in the Pod Template configuration.

I’ve also created an Execute shell build step.

You are now ready to build the job. Before doing that you should watch the Kubernetes Pods. Do this by installing watch (brew install watch) and executing the watch as follows:

Every 2.0s: kubectl get pods –namespace jenkins                                                                                                                                                                                                                                                                              TRSEUISTMAC028.local: Tue Apr 24 10:08:46 2018NAME                      READY     STATUS    RESTARTS   AGE
jenkins-85bb8dcc5-d4lgv   1/1       Running   0          29m
jenkins-slave-4svzj       2/2       Running   0          1m

Yes! That’s all! Now your job is running a completely fresh Kubernetes Pod, called “jenkins-slave-*” and after your job finished, your pod will be removed automaticaly.

Script to be used in scripted pipeline in Jenkins

podTemplate(label: ‘mypod’, containers: [

containerTemplate(name: ‘git’, image: ‘alpine/git’, ttyEnabled: true, command: ‘cat’),

containerTemplate(name: ‘maven’, image: ‘maven:3.3.9-jdk-8-alpine’, command: ‘cat’, ttyEnabled: true),

containerTemplate(name: ‘docker’, image: ‘docker’, command: ‘cat’, ttyEnabled: true)

volumes: [

hostPathVolume(mountPath: ‘/var/run/docker.sock’, hostPath: ‘/var/run/docker.sock’),

]

) {

node(‘mypod’) {

stage(‘Check running containers’) {

container(‘docker’) {

// example to show you can run docker commands when you mount the socket

sh ‘hostname’

sh ‘hostname -i’

sh ‘docker ps’

}

stage(‘Clone repository’) {

container(‘git’) {

sh ‘whoami’

sh ‘hostname -i’

sh ‘git clone -b master https://github.com/lvthillo/hello-world-war.git’

}

stage(‘Maven Build’) {

container(‘maven’) {

dir(‘hello-world-war/’) {

sh ‘hostname’

sh ‘hostname -i’

sh ‘mvn clean install’

}

Testing running container

https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/

Verify that the Container is running:

kubectl get pod shell-demo

Get a shell to the running Container:

kubectl exec -it shell-demo — /bin/bash

How to ssh to docker container

docker build -t swift3-ssh .

docker run -p 2222:22 -i -t swift3-ssh

docker ps # find container id

docker exec -i -t <containerid> /bin/bash

How to copy files from host to docker container

The cp command can be used to copy files.

One specific file can be copied TO the container like:

docker cp foo.txt mycontainer:/foo.txt

One specific file can be copied FROM the container like:

docker cp mycontainer:/foo.txt foo.txt

For emphasis, mycontainer is a container ID, not an image ID.

Multiple files contained by the folder src can be copied into the target folder using:

docker cp src/. mycontainer:/target

docker cp mycontainer:/src/. target

Copy directories and files to and from Kubernetes Container [POD]

As we all know about SCP Linux command to Copy the files and directories from a remote host to the local host and vice versa over SSH.

Similar to that we have ‘KUBECTL CP’ to Copy the files and directories from a Kubernetes Container [POD] to the local host and vice versa.

Syntax:

kubectl cp <file-spec-src> <file-spec-dest>

POD in a specific container

kubectl cp <file-spec-src> <file-spec-dest> -c <specific-container>

Copy /tmp/foo local file to /tmp/bar in a remote pod in namespace

kubectl cp /tmp/foo <some-namespace>/<some-pod>:/tmp/bar

Copy /tmp/foo from a remote pod to /tmp/bar locally

kubectl cp <some-namespace>/<some-pod>:/tmp/foo /tmp/bar

Microservices With CQRS and Event Sourcing

The main topic of this article is to describe how we can integrate an event-driven architecture with microservices using event sourcing and CQRS.

Microservices are independent, modular services that have their own layered architecture.

When microservices share the same database, the data model among the services can follow relationships among the tables associated with the microservices.

For example, there are two microservices running in their own containers: ‘Order’ and ‘Customer.’

The Order service will take care of creating, deleting, updating, and retrieving order data. The Customer service will work with customer data.

One customer can have multiple orders, which has a one-to-many relationship. As both tables are in a single database, a one-to-many relationship can be established.

The Order service and Customer service, though running in separate containers, can access the tables from the same database. This will leverage proper transactions with ACID properties, where customer data is updated. Order data can also be updated to guarantee proper atomicity.

The are some limitations to this approach, however. A shared database is not recommended in a microservices-based approach, because, if there is a change in one data model, then other services are also impacted.

As part of microservices best practices, each microservice should have its own database.

The Order microservice access the Order database and the Customer microservice access the Customer database.

In this scenario, the relationships among the tables cannot be established, as both tables are in separate databases.

If the Customer microservice wants to update the Order data, the Customer microservice can pass teh customer id as a request parameter to the HTTP service of the Order microservice to update the Order data for the corresponding customer id in the Order database, as shown in below diagram.

The limitation of this approach is that transaction management cannot be properly handled. If customer data is deleted, the corresponding order also has to be deleted for that customer.

Though this can be achieved with workarounds, like calling a delete service in the Order service, atomicity is not achievable in a straight forward way. This needs to be handled with customization.

To overcome this limitation, we can integrate an event-driven architecture with our microservices components.

As per the below diagram, any change in the customer data will be published as an event to the messaging system, so that the event consumer consumes the data and updates the order data for the given customer changed event.

The limitation of this approach is the atomic updates between the database and publish events to the message queue cannot be handled easily. Though these types of transactions can be handled by distributed transaction management, this is not recommended in a microservices approach, as there might not be support for XA transactions in all scenarios.

To avoid these limitations, event-sourcing can be introduced in this microservices architecture.

In event-sourcing, any event triggered will be stored in an event store. There is no update or delete operations on the data, and every event generated will be stored as a record in the database. If there is a failure in the transaction, the failure event is added as a record in the database. Each record entry will be an atomic operation.

The advantages of event-sourcing are as follows:

Solves atomicity issues.
Maintains history and audit of records.
Can be integrated with data analytics as historical records are maintained.

There are a few limitations, which are:

Queries on the latest data or a particular piece of data in the event store involve complex handlings.
To make the data eventually consistent, this involves asynchronous operations because the data flow integrates with messaging systems.
The model that involves inserting and querying the data is the same and might lead to complexity in the model for mapping with the event store.
The event store capacity has to be larger in storing all the history of records.

Now we integrate CQRS (Command Query Responsibility Segregation) with event sourcing to overcome the above limitations.

CQRS is another design pattern used in microservices architecture which will have a separate service, model, and database for insert operations in the database. This acts as a command layer and separate service, model, and database for query data that acts as a query layer.

The read database can store a denormalized model where databases like NoSQL (that are horizontally scalable) can be leveraged.

The command layer is used for inserting data into a data store. The query layer is used for querying data from the data store.

In the Customer microservice, when used as a command model, any event change in customer data, like a customer name being added or a customer address being updated, will generate events and publish to the messaging queue. This will also log events in the database in parallel.

Th event published in the message queue will be consumed by the event consumer and update the data in the read storage.

The Customer microservice, when used as a query model, needs to retrieve customer data that invokes a query service, which gets data from read storage.

Similarly, events published across microservices also have to be passed through a message queue.

The advantages of CQRS integrated with event sourcing and microservices are:

Leveraging microservices for modularity with separate databases.
Leveraging event sourcing for handling atomic operations.
Maintain historical/audit data for analytics with the implementation of event sourcing.
CQRS having separate models and services for read and insert operations.
Request load can be distributed between read and insert operations.
Read operations can be faster as the load is distributed between read and insert services.
Read model or DTO need not have all the fields as a command model, and a read model can have required fields by the client view which can save the capacity of the read store.

The limitations of this approach are:

Additional maintenance of infrastructure, like having separate databases for command and query requests.
Models should be designed in an optimal way, or this will lead to complexity in handling and troubleshooting.

SQL Interfaces with Big Data systems

Big Data SQL applications provide SQL query kind of interface to query the data underlying in HDFS and other storage systems in Big Data

As SQL is popular common language used across globally, big data systems can be easily accessed using SQL interface.

Few are the below SQL interfaces described to access big data systems

Hive: Popular SQL engine on Hadoop. Provides SQL interface to access data in HDFS. Hive queries internally trigger MapReduce jobs for processing.

Hive on Apache Tez: Hive with Tez combination gives HiveQL performing on Hive execution engine. SQL query based on analytics which is highly performance oriented with distinct MapReduce jobs executed as DAG workflow thereby increasing parallelism

Hive on Apache Spark: Hive utilizing Spark framework for execution and utilizes in memory clusters for fast performance

Hive with LLAP: Hive with low latency queries running on Hadoop and Yarn. In normal Hive, every time a SQL job is submitted to Hive Server, Yarn application will be started. With LLAP introduced from Hive 2 where LLAP stands for Long Lived Analytical Processing, there will be LLAP daemon, also called HiveServer2-interactive, launches a long running YARN application when it starts. This YARN application will execute all the SQL queries, independently of the user running the query.

Impala: Impala is MPP based SQL query engine that provides low latency and high performance SQL queries on data stored in HDFS. Impala does not use Hadoop, it has daemons running on all nodes which caches the data in HDFS. These daemons return data quickly without having go through the whole MapReduce job. Impala does not provide fault tolerance compared to Hive

Apache Drill: Low latency SQL query engine for interactive SQL analytics. Has unique ability to discover schemas on read with data discovery and exploration capabilities on data residing in multiple formats residing either in flat files, HDFS or any file systems, NoSQL databases. This is open source version for Google BigQuery which is a service in Google Cloud

Apache Presto: Open source distributed SQL query engine for interactive analytics against many datasources like Cassandra, Hive, RDBMS and few proprietary datastores. One query from Presto can aggregate data from multiple data sources. This is mainly used in adhoc SQL analytics for large data sets. Presto supports ANSI SQL queries across a range of data sources, including Hive, Cassandra, relational databases or proprietary file systems (such as Amazon Web Services’ S3.)

Apache Tajo: Apache Tajo is a relational and distributed data warehouse for Hadoop. It is designed for low latency and scalable adhoc query analysis. It has distributed SQL query processing engine with advanced query optimization. It is ANSI SQL compliant allow access to Hive Metastore and supports various file formats. Tajo is designed for low-latency and scalable adhoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources

Flink Table API and SQL: The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink’s DataSet and DataStream APIs (Java and Scala)

Apache Phoenix: Relational interface over HBase, Phoenix takes SQL Query as input and compiles to set of HBase scans, coordinates running of scans and output JDBC results

BlinkDB: Large scale data warehouse system that adds the ability to create and use smaller samples of large datasets to make queries even faster. It runs queries on data samples and present results with valid threshold values

Hadapt: Cloud-Optimized system offering an analytical platform for performing complex analytics on structured and unstructured data. Like Apache Hive and other technologies, Hadapt provides a familiar JDBC/ODBC interface for submitting SQL or MapReduce jobs to the cluster. Hadapt provides a cost-based query optimizer, which can decide between a combination of MapReduce jobs and MPP database jobs to fulfill a query, or the job can be handled by the MPP database for fast interactive response

Spark SQL: Allows querying structured and unstructured data within Spark using SQL. Access variety of data sources and file formats such as Hive, HBase, Cassandra, Avro, Parquet, ORC, JSON and relational datasets. Spark SQL uses Hive Metastore with access to Hive data, queries and UDF’s

Splice Machine: General purpose RDBMS, a unique hybrid database that combines the advantage of SQL, the scale out of NoSQL, and the performance of in-memory technology. It provides ANSI SQL and ACID transactions of an RDBMS on the Hadoop eco system

Apache Calcite: Open source framework for building databases. It includes SQL parser, validator and JDBC driver, Query optimization tools, including relational algebra API, rule based planner and cost based query optimizer. Apache Hive uses Calcite for cost-based query optimization while Apache Drill and Apache Kylin uses SQL parser

Apache Kylin: Apache Kylin is an open-source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets

Google BigQuery: Google BigQuery is a cloud-based big data analytics warehouse for processing very large read-only data sets. BigQuery was designed for analyzing data on the order of billions of rows, using a SQL-like syntax. It runs on the Google Cloud Storage infrastructure and can be accessed with a REST-oriented application program interface (API)

Apache Trafodion: Trafodion is another SQL on HBase similar to Apache Phoenix and also integrated with Hive with supporting transactions

Lingual: Provides full ANSI SQL interface for Hadoop that can be easily integrated with existing BI tools and also supports JDBC

Apache Druid: Druid is an open-source data store designed for sub-second queries on real-time and historical data. It is primarily used for business intelligence (OLAP) queries. Druid SQL is a built-in SQL layer and an alternative to Druid’s native JSON-based query language

Actian Vector: High performance analytic database that makes use of vector processing, multiple data model to perform the same operation on multiple data simultaneously

AtScale: High performance OLAP server platform on Hadoop. This acts as a semantic layer between Big Data systems like Cloudera, Hortonworks or MapR and data visualization tools like Tableau, Qlik, MicroStrategy. Data access over Hadoop is leveraged through SQL or MDX

Citus: A horizontally scalable database built on PostgreSQL to solve real time big data challenges with a horizontally scalable architecture that combines with massive parallel query processing across highly available clusters.

Greenplum: High analytical query performance engine for petabytes of data built on PostgreSQL. It leverages standards-compliant SQL to support BI and reporting tools

HAWQ: Parallel SQL processing engine built on top of the HDFS optimized for analytics with full ACID transaction support. HAWQ breaks complex queries into smaller tasks and distributes them to query processing units for execution

JethroData: Index based SQL engine that enables interactive BI on big data. It indexes every column in HDFS and the queries uses indexes to access the data only they need instead of full scan, resulting in faster response time and low resource utilization

SQLstream: Provides interactive real time processing of data in motion to build new real time processing applications. It has optimized SQL query engine for unstructured machine data streams.

VoltDB: In memory, massively parallel relational database. This adds horizontal partitioning, active-active redundant clustering.

Vertica: Product of HP. Is designed for column oriented storage optimization, MPP, SQL query interface, In-database Machine Learning for large scale data analytics. Vertica is infastructure independent supporting deployments on multiple cloud platforms (AWS, Azure, Google), on-premises and natively on Hadoop nodes.

Netezza: Product of IBM. Designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.

Aster: Analytics platform acquired by Teradata. Provides integration with SQL and MapReduce. The mapper and reducer functions are implemented in SQL

Oracle Big Data SQL: Oracle Big Data SQL supports queries against non-relational data stored in multiple big data sources, including Apache Hive, HDFS, Oracle NoSQL Database, and Apache HBase

Spark SQL with Tachyon: Spark SQL integrated with Tachyon, an in-memory storage system integrated to store intermediate results which makes faster processing. Tachyon caching is powerful than Spark in memory caching. Spark cache is volatile across different jobs. Tachyon will be useful when the data or intermediate results has to be shared between different applications, caching the RDD for interactive exploratory analysis. Now Tachyon is called as Alluxio.

KSQL: KSQL provides a simple and completely interactive SQL interface for processing streaming data in Kafka provided from Confluent

Azure Stream Analytics: Streaming analytics platform supporting SQL like query language for performing transformations over stream of events provided by Azure

PolyBase: Provided by Microsoft where T-SQL statements access data stored in Azure Blob storage or HDFS. PolyBase enables Azure SQL Data Warehouse to import and export data from Azure Data Lake Store, and from Azure Blob Storage.

Big SQL: IBM Big SQL is a massive parallel processing engine that access various data sources like HDFS, RDBMS, NoSQL databases, Object stores using single database connection or single query.

Azure Synapse Analytics(formerly Azure SQL Data Warehouse): Azure Synapse Analytics , a fully managed cloud data warehouse and big data analytics platform for enterprises of any size that combines lightning-fast query performance

Amazon Redshift: Data warehousing service from Amazon Web Services which is fast, simple, cost-effective

Cassandra CQL: SQL kind of query language to access Cassandra database

Apache Kudu:Columnar storage developed for Hadoop system, has advantages like HBase and Parquet. It is as fast as HBase at ingesting data and random access and almost as quick as Parquet when it comes to analytics queries. Kudu has SQL interface from Spark and Impala

Dremio: Data lake engine that provides a self-service semantic layer directly on data lake storage without moving data to proprietary data warehouses, no cubes, no aggregation tables or extracts. Dremio reads data from any source (RDBMS, HDFS, S3, NoSQL) into Arrow(Apache Arrow-in memory storage) buffers, and provides fast SQL access via ODBC, JDBC, and REST for BI, Python, R

OmniSciDB: OmniSciDB is an open source SQL-based, relational, columnar database engine that leverages the full performance and parallelism of modern hardware (both CPUs and GPUs) to enable querying of multi-billion row datasets in milliseconds, without the need for indexing, pre-aggregation, or downsampling

Amazon Athena: Serverless interactive query service for analyzing big data in Amazon S3 using standard SQL. This is built on Presto