- Title: Sr Data Engineer
- Code: RCI-15896728
- Location: Beaverton, OR 97006
- Posted Date: 06/22/2022
- Duration: 12 Months
Talk to our Recruiter
- Name:Prachi Saxena
- Email: email@example.com
- Phone: 908-704-8843 ✖
Standard 40 hour, 8 am – 5 pm work week
- Here are some tasks that you could do day to day.
- Design, implement and support distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem.
- You will be given the opportunity to own the design and implementation.
- You will collaborate with Product managers, Data Scientists, Engineering folks to accomplish your tasks.
- Publish REST ful API’s to enable real-time data consumption using Open API specifications.
- This will enable many teams to consume the data that is being produced.
- Explore and build proof of concepts using open source NOSQL technologies such as H Base, Dynamo DB, Cassandra and Distributed Stream Processing frameworks like Apache Spark, Flink, Kafka stream.
- Take part in Dev Ops by building utilities, user defined functions and frameworks to better enable data flow patterns.
- Work with architecture/engineering leads and other teammates to ensure high quality solutions through code reviews, engineering best practices documentation
- The Data Engineer will collaborate with product owners, developers, database architects, data analysts, visual developers and data scientists on data initiatives and will ensure optimal data delivery and architecture is consistent throughout ongoing projects.
- Must be self-directed and comfortable supporting the data needs of the product roadmap.
- The right candidate will be excited by the prospect of optimizing and building integrated and aggregated data objects to architect and support our next generation of products and data initiatives.
- Create and maintain optimal data pipeline architecture,
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing for greater scalability
- Comprehensive documentation and knowledge transfer to Production Support
- Work with Production Support to analyze and fix Production issues
- Participate in an Agile / Scrum methodology to deliver high -quality software releases every 2 weeks through Sprint
- Refine, plan stories and deliver timely
- Analyze requirement documents and Source to target mapping
Must Have Skills
- 5+ years’ experience in Big Data stack environments like AWS EMR, Clourdera, Hortonworks
- 3+ years of SPARK in batch mode
- 3+ years of experience in scripting using Python
- 3+ years of experience working on AWS Cloud environment
- In-depth knowledge of Hive and S3
- Strong understanding of Hadoop, MPP systems and data structures
- Strong understanding of solution and technical design
- Experience building cloud scalable high-performance data lake solutions
- Experience with relational SQL & tools like Snowflake
- Aware of Datawarehouse concepts
- Performance tuning with large datasets
- Experience with source control tools such as GitHub and related dev processes
- Experience with workflow scheduling tools like Airflow
- Strong problem solving and analytical mindset
- Able to influence and communicate effectively, both verbally and written, withteam members and business stakeholders