Data Science Engineer Job in Dunnhumby
What we expect from you
Develop and maintain engineering tools/products needed for simple and efficient data science development
Analyse complex data pipelines to identify performance bottlenecks, and suggest robust ways to optimize the workload in reasonable costs
Proficiency in at least one programming language python/java/scala
Good knowledge of data engineering techniques, e.g. data ingestion, data processing , data validation, data publishing, data quality etc
Proficiency in Hadoop, PySpark, Pandas, NumPy and python version > 3.5
Good Working experience of Partitions, Joins, cache, HDFS, handling data skewness and code optimisation in general
Good knowledge of Airflow or any other data process orchestration tools like Nifi , Lungi
Good knowledge of SQL
Good knowledge of shell scripting and GIT workflows
Experience with container orchestration platforms like Kubernetes and containerization technologies like Docker
Experience in re-engineering, automating and productionising code
Great communication skills, both written and oral
Should have experience in handling/optimising various file formats like parquet, Avro etc
Should have good experience in working with Hadoop Ecosystem components (e.g. YARN, Spark UI, Hive, HDFS etc.) and cloud equivalents possibly in GCP
Working knowledge of web application development and application integration (REST APIs, Web services)

