Data Engineer - Python, Spark & Cloud Job in Tredence

Data Engineer - Python, Spark & Cloud

Apply Now
Job Summary Job DescriptionThis role is for candidates proficient in Spark and Python & Cloud programming. The candidate will be involved in processing data using PySpark like reading data from external sources, merge data, execute data enrichment and load them into target data destinations.Role

You will:

  • Understand functional requirements, translate into technical design and develop the data integration pipelines from a wide variety of source systems
  • Conduct data profiling and produce results to drive standardization and quality
  • Design, develop and deliver data integration jobs for different ingestion patterns and write to a variety of target data stores
  • Deliver alternate design options, where possible, in the interest to deliver most optimal solution
  • Develop Python programs for all data ingestion from different daatbases
  • Follow Object oriented programming concepts for data processing and build most optimal code with reusability in mind
  • Orchestrate end to end job flows based on individual job level dependencies
  • Test and deploy data pipelines
  • Interact with business in gathering requirements, clarifying questions and support in testing activities
  • Be an active participant in planning, estimating and delivering user stories in different sprints

Must Have (Required Skills):

  • 2 - 3 years of experience in designing and developing Python programs for data curation and processing
  • Knowledge of AWS storage, compute and serverless services, particularly S3, Lambda, Kinesis, SQS and Glue
  • Expertise in at least two of these database technologies: Relational, MPP and Distributed databases hosted in Cloud and On-Premise
  • 4 - 6 years overall experience in IT or professional services experience in IT delivery or large-scale IT analytics projects
  • Experience connecting and integrating with at least one of the following platforms: Google Cloud, Microsoft Azure, Amazon AWS
  • Experience with Redshift database that includes data modelling, data ingestion, integration, processing and provisioning from Redshift database
  • Implement data pipelines to automate the ingestion, transformation, and augmentation of data sources, and provide best practices for data pipeline operations
  • Able to work in a rapidly changing business environment and adopt to the fast-paced environment quickly
  • Advanced SQL writing and experience in data exploration and using databases in a business environment with complex datasets
  • Strong verbal and written communications skills are a must, as well as the ability to work effectively across internal and external organizations

Good to Have (Preferred Skills):

  • Experience with programming in any of the languages - Java, PySpark and Spark
  • Exposure to Apache Airflow
  • Exposure to any open source / commercial ETL tools such as Talend, Informatica, DataStage etc.
  • Familiar with data quality and standardization including reference data management
  • Experience with catalog, lineage and metadata management
  • Exposure to DevOps / CI & CD - Tools / services and methodologies
  • Deploy logging and monitoring across the different integration points for critical alerts
  • Experience with different computing paradigms in databases such as In-Memory, Distributed, Massively Parallel Processing
  • Delivered data and analytics projects in any of the Cloud platforms (AWS / Azure / GCP)
  • Experience in delivering projects in a highly collaborative delivery model with teams at onsite and offshore

Qualification :
B.E / B. Tech
Experience Required :

2 to 5 Years

Vacancy :

2 - 4 Hires

Similar Jobs for you

See more recommended jobs