Completed the Udacity Data Engineering Nanodegree. It also teaches about how to use Sql to store and manipulate one's data. It also teach modern Cloud Datalake solutions, like Amazon S3 and Redshift. Moreover, it teaches on how to perform the basic ETL operations that any data engineer need to know. Some ETL tools taught are PySpark and also Airflow, which can automate ETL.
- PostgresSQL
- PySpark
- Apache Airflow
- Amazon S3
- Amazon Redshift
- Project 3 -- where we worked with Amazon S3 and load them to Amazon Redshift for staging)
- Project 4 -- where we performed Data Quality Checks using PySpark
- Project 5 -- where we implemented Airflow to automate Data Quality Checking and Extraction