Job Details

  • Title: Sr Data Engineer
  • Code: RCI-16055530
  • Location: Beaverton, OR 97006
  • Posted Date: 06/22/2022
  • Duration: 11 Months
Talk to our Recruiter

  Job Description

Standard 40 hour, 8 am – 5 pm work week

 

Description:

  • Here are some tasks that you could do day to day.       
  • Design, implement and support distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem.
  • You will be given the opportunity to own the design and implementation.
  • You will collaborate with Product managers, Data Scientists, Engineering folks to accomplish your tasks.
  • Publish REST ful  API’s to enable real-time data consumption using OpenAPI specifications.
  • This will enable many teams to consume the data that is being produced.
  • Explore and build proof of concepts using open source NOSQL technologies such as HBase, Dynamo DB, Cassandra and Distributed Stream Processing frameworks like ApacheSpark, Flink, Kafka stream.
  • Take part in Dev Ops by building utilities, user defined functions and frameworks to better enable data flow patterns.      
  • Work with architecture/engineering leads and other teammates to ensure high quality solutions through code reviews, engineering best practices documentation  

Responsibilities

  • The Data Engineer will collaborate with product owners, developers, databasearchitects, data analysts, visual developers and data scientists on data initiatives and will ensure optimal data delivery and architecture is consistent throughout ongoing projects.
  • Must be self-directed and comfortable supporting the data needs of the product roadmap.
  • The right candidate will be excited by the prospect of optimizing and building integrated and aggregated data objects to architect and support our next generation of products and data initiatives.
  • Create and maintain optimal data pipeline architecture,
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing for greater scalability
  • Comprehensive documentation and knowledge transfer to Production Support
  • Work with Production Support to analyze and fix Production issues
  • Participate in an Agile / Scrum methodology to deliver high -quality software releases every 2 weeks through Sprint
  • Refine, plan stories and deliver timely
  • Analyze requirement documents and Source to target mapping 

Must Have Skills

  • 5+ years’ experience in Big Data stack environments like AWS EMR, Clourdera, Hortonworks
  • 3+ years of SPARK in batch mode
  • 3+ years of experience in scripting using Python
  • 3+ years of experience working on AWS Cloud environment
  • In-depth knowledge of Hive and S3
  • Strong understanding of Hadoop, MPP systems and data structures
  • Strong understanding of solution and technical design
  • Experience building cloud scalable high-performance data lake solutions
  • Experience with relational SQL & tools like Snowflake
  • Aware of Data warehouse concepts
  • Performance tuning with large datasets
  • Experience with source control tools such as Git Hub and related dev processes
  • Experience with workflow scheduling tools like Airflow
  • Strong problem solving and analytical mindset
  • Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders

 

Top Skills:

  • Hadoop
  • Python
  • Spark