ENGINEERING

Data Engineer at Ally

About the job

Job Description

Our client is looking for a Data Engineer with the primary role of designing, building, and maintaining scalable data pipelines and infrastructure to support data-intensive applications and analytics solutions. Ability to collaborate closely with data scientists, analysts, and software engineers to ensure efficient data processing, storage, and retrieval for business insights and decision-making.

Responsibilities

  • Data Pipeline Development: Design, implement, and maintain scalable data pipelines using tools such as Databricks, Python, and PySpark.
  • Data Modeling: Design and optimize data models and schemas for efficient storage, retrieval, and analysis.
  • ETL Processes: Develop and automate ETL workflows for diverse data sources.
  • Big Data Technologies: Utilize technologies such as Spark, Kafka, and Flink for distributed data processing and analytics.
  • Cloud Platforms: Deploy and manage data solutions on cloud platforms like AWS, Azure, or Google Cloud Platform (GCP).
  • Data Quality and Governance: Implement data quality checks and governance policies.
  • Monitoring, Optimization, and Troubleshooting: Monitor pipeline performance, optimize for scalability and reliability, and troubleshoot issues.
  • DevOps: Build and maintain CI/CD pipelines, commit code to version control, and deploy data solutions.
  • Collaboration: Work with cross-functional teams to understand requirements, define data architectures, and deliver data-driven solutions.
  • Documentation: Create and maintain technical documentation.
  • Best Practices: Continuously learn and apply best practices in data engineering and cloud computing.

Qualifications

  • Programming Languages: Proficiency in Python, Java, Scala, or SQL.
  • Data Modeling: Strong understanding of data modeling concepts and techniques.
  • Big Data Technologies: Experience with Databricks, Spark, Kafka, and Flink.
  • Modern Data Architectures: Experience with lakehouse architectures.
  • CI/CD and Version Control: Experience with CI/CD pipelines and Git.
  • ETL Tools: Knowledge of tools like Apache Airflow, Informatica, or Talend.
  • Data Governance: Knowledge of data governance and best practices.
  • Cloud Platforms: Familiarity with AWS, Azure, or GCP services.
  • SQL and Real-Time Data Streaming: Proficiency in SQL and real-time data streaming technologies like Apache Spark Streaming and Kafka.
  • Cloud Data Tools: Experience using data tools in at least one cloud service – AWS, Azure, or GCP (e.g., S3, EMR, Redshift, Glue, Azure Data Factory, Databricks, BigQuery, Dataflow, Dataproc).

Remuneration: 50k-70k USD per annum

APPLY HERE

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Check Also
Close
Back to top button