Open Data Science job portal

Cloudera Big Data Engineer 1019 views

EPAM Big Data Practice is looking for the Cloudera Big Data Engineers. As the Cloudera Data Engineer, you will be responsible for designing and implementing the management, monitoring, security, and privacy of data using the full stack of Cloudera Hadoop Ecosystem services to satisfy the business needs.

You are curious, persistent, logical and clever – a true techie at heart. You enjoy living by the code of your craft and developing elegant solutions for complex problems. If this sounds like you, keep reading to learn more about this exciting role!
Come and join EPAM where Engineering is in their DNA.

Responsibilities

  • Implement non -relational data stores:
  • Implement a solution that uses Hive, HBase, Impala DB, HDFS
  • Implement data distribution and partitions
  • Implement a consistency model in Hive/HBase
  • Provision a non-relational data store in HDFS
  • Provide access to data to meet security requirements
  • Implement for high availability, disaster recovery, and global distribution
  • Manage data security:
  • Implement data masking
  • Encrypt data at rest and in motion
  • Develop batch processing solutions:
  • Develop batch processing solutions by using Hive and Spark transformations
  • Ingest data by using Sqoop
  • Create linked services and datasets
  • Create oozie workflow pipelines and activities
  • Create and schedule jobs
  • Implement Cloudera Spark clusters, Jupiter notebooks, jobs, and autoscaling
  • Ingest data into Cloudera HDFS
  • Develop streaming solutions:
  • Configure input and output with Kafka
  • Select the appropriate windowing functions
  • Implement event processing by using Spark Streaming /Kafka
  • Ingest and query streaming data using Spark
  • Monitor Cloudera Services:
  • Monitor Cloudera Cluster and its services like Spark, Oozie workflows, HDFS, Hive, etc
  • Troubleshoot data partitioning bottlenecks
  • Optimize HDFS Storage
  • Optimize Spark Streaming Analytics
  • Optimize Hive/Impala Analytics
  • Manage the data lifecycle
  • Optimize Cloudera data solutions:

Requirements

  • 5-10+ years in IT
  • 2-3+ years in Cloudera and Hadoop Ecosystem
  • Experience in Agile or PMI methodology managed projects
  • Experience in enterprise applications, solutions and data infrastructures
  • Experience in designing data management solutions
  • Experience in designing robust CI/CD solutions
  • Python/PySpark -highly desired
  • Java/Scala
  • Apache Hadoop HDFS, Map Reduce
  • Oozie
  • Hive
  • Cloudera Impala
  • Spark
  • Kafka
  • Spark Streaming
  • Yarn
  • Sqoop
  • Git, GitLab, Artifactory +
  • ADO
  • SCRUM Developer +
  • Big Data Hadoop certification +
           

What They Offer

They offer a range of discretionary benefits from time to time, including:

  • Group personal pension plan, life assurance, and income protection
  • Private medical insurance, private dental care, and critical illness cover
  • Cycle scheme and season ticket loan
  • Employee assistance program
  • Gym discount, Friday lunch, on-site massage, and social events
  • 1 day off for your wedding and baby basket
  • Tech purchase scheme
  • Unlimited access to LinkedIn learning solutions

Some of these benefits may be available only after you have passed your probationary period

More Information

Share this job
Company Information
Connect with us
Contact Us
https://jobs.opendatascience.com/wp-content/themes/noo-jobmonster/framework/functions/noo-captcha.php?code=65f68

Here at the Open Data Science Conference we gather the attendees, presenters, and companies that are working on shaping the present and future of AI and data science. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in the USA, Europe, and Asia.

Contact Us