ENGINEERING
Data Engineer at Ally
About the job
Job Description
Our client is looking for a Data Engineer with the primary role of designing, building, and maintaining scalable data pipelines and infrastructure to support data-intensive applications and analytics solutions. Ability to collaborate closely with data scientists, analysts, and software engineers to ensure efficient data processing, storage, and retrieval for business insights and decision-making.
Responsibilities
- Data Pipeline Development: Design, implement, and maintain scalable data pipelines using tools such as Databricks, Python, and PySpark.
- Data Modeling: Design and optimize data models and schemas for efficient storage, retrieval, and analysis.
- ETL Processes: Develop and automate ETL workflows for diverse data sources.
- Big Data Technologies: Utilize technologies such as Spark, Kafka, and Flink for distributed data processing and analytics.
- Cloud Platforms: Deploy and manage data solutions on cloud platforms like AWS, Azure, or Google Cloud Platform (GCP).
- Data Quality and Governance: Implement data quality checks and governance policies.
- Monitoring, Optimization, and Troubleshooting: Monitor pipeline performance, optimize for scalability and reliability, and troubleshoot issues.
- DevOps: Build and maintain CI/CD pipelines, commit code to version control, and deploy data solutions.
- Collaboration: Work with cross-functional teams to understand requirements, define data architectures, and deliver data-driven solutions.
- Documentation: Create and maintain technical documentation.
- Best Practices: Continuously learn and apply best practices in data engineering and cloud computing.
Qualifications
- Programming Languages: Proficiency in Python, Java, Scala, or SQL.
- Data Modeling: Strong understanding of data modeling concepts and techniques.
- Big Data Technologies: Experience with Databricks, Spark, Kafka, and Flink.
- Modern Data Architectures: Experience with lakehouse architectures.
- CI/CD and Version Control: Experience with CI/CD pipelines and Git.
- ETL Tools: Knowledge of tools like Apache Airflow, Informatica, or Talend.
- Data Governance: Knowledge of data governance and best practices.
- Cloud Platforms: Familiarity with AWS, Azure, or GCP services.
- SQL and Real-Time Data Streaming: Proficiency in SQL and real-time data streaming technologies like Apache Spark Streaming and Kafka.
- Cloud Data Tools: Experience using data tools in at least one cloud service – AWS, Azure, or GCP (e.g., S3, EMR, Redshift, Glue, Azure Data Factory, Databricks, BigQuery, Dataflow, Dataproc).
Remuneration: 50k-70k USD per annum
APPLY HERE