Build a Data Lake: Kafka Consumer in Spark-Scala | From Kafka to AWS S3 | Deploy on Container
Prasoon Parashar Prasoon Parashar
1.31K subscribers
402 views
22

 Published On Jul 25, 2023

Medium :-   / real-time-data-ingestion-building-an-aws-s...  

Github:- https://github.com/Noosarpparashar/ec...
LinkedIn:-   / prasoon-parashar-47433919b  


Commands Used:-
git clone https://github.com/Noosarpparashar/st...

Above repo might have been upgraded, Use this commit ID if you want to go to exact same date git checkout c3e227c61b13347c888c5201c16e8f7ec8387bac



docker-compose -f docker-compose-ch3.yml up

git clone https://github.com/Noosarpparashar/ec...

postman-connector
{
"name": "inventory-connector7",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "34.123.456.789",
"database.port": "5432",
"database.user": "postgres",
"database.password": "9473",
"database.dbname": "pinnacledb",
"database.server.name": "fullfillment",
"table.whitelist": "ECART.STOREINFO",
"topic.prefix": "mysixthtopic"
}
}
docker-network create my-network

docker-compose up
docker exec -it 5625b32279c1 bash (Change container name to that of your master node container)

docker cp /home/prasoon/IdeaProjects/ecart-migration/out/artifacts/ecart_migration_jar/ecart-migration.jar 5625b32279c1:/opt/bitnami/spark/myjars

docker exec -it 5625b32279c1 bash
cd myjars
spark-submit --master local[*] --class com.its.ecartsales.framework.jobs.controllers.StreamKafkaConsumerEcartFactOrder1 ecart-migration.jar


Introduction (0:00-01:09)
Clone DataGenerator Repo (01:11-01:44)
Generate AWS Access Keys (01:46-03:06)
Clone ecart-migration repo (03:13-03:50)
Understanding Project Structure (03:50-06:01)
Setup Kafka Cluster (06:03-07:20)
Whitelist Public IP (07:25-08:27)
Create Kafka Connector (08:29-09:48)
Spark Scala Kafka Consumer code (09:50-12:08)
Start data generator (12:10-14:13)
Create Jar and Build Artifact (14:55-16:00)
Run spark-submit on local (16:20-17:00)
Stop all containers (17:02-17:45)
Whitelist Public IP (17:46-18:30)
Set-up VM Instance with Ubuntu Image(18:32-19:22)
Connectivity Test (19:23-20:05)
Install docker on VM (20:07-20:58)
Kafka on GCP (21:04-23:05)
Kafka UI (23:07-23:40)
Whitelist VM's IP (23:42-24:43)
Create Kafka Connector (24:50-25:35)
Run spark scala (25:37-26:27)
Dockerize spark multinode cluster (26:40-27:19)
Put jar in docker container (27:22-28:53)
Run jar from container (28:55-29:15)
Prod deployment tips (29:17-30:28)

show more

Share/Embed