Sr. Data Engineer, Auto-Tech
Sr. Data Engineer, Auto-Tech
ABOUT THE COMPANY:
The Company is the world’s leading roadside assistance platform. We expand mobility and transportation options for consumers, automotive, logistics, and technology companies. Analysts project travel miles to increase by one third globally by 2030 from new services which means more things breaking. We help fix those things by offering a seamless, end-to-end digital platform, viewable by every stakeholder in real-time.
The Company is proud that:
● We rank #6 out of 500 in Deloitte’s 2019 Technology Fast 500 fastest-growing tech-forward companies in North America
● We rank #10 on the Financial Time’s 2020 list of fastest growing companies
● We rank #221 on the 2020 Inc. 500 list of fastest-growing private companies in the U.S.
● We are backed by BMW, Porsche, Jaguar Land Rover and some of the world’s other biggest mobility companies. We work with some of the coolest brands on the planet!
YOUR MISSION: Your mission is to manage and optimize our cloud data platform. This means you will be responsible for working on a variety of data projects which includes orchestrating our data pipelines using modern big data tools as well as the engineering of existing transactional processing systems while meeting our security requirements.
YOUR LEGACY: Your legacy will include ensuring that every team member across our growing organization is able to access data from any source within minutes and that none of our data is ever lost. The result? You helped the Company become the world’s leading mobility assistance company.
WHAT YOU’LL BE RESPONSIBLE FOR:
1) First 3 months:
● Understand our platform development environment and philosophy.
● Understand our cloud platform and applications’ infrastructure.
● Understand our engineering teams’ work culture.
2) First 6 months:
● Integrate a number of 3rd party data sources
● Employ various cloud agnostic tools to marry our internal and external systems and third party APIs together.
● Develop data platform services.
● Build monitoring infrastructure / services to give visibility into the pipeline’s status.
● Interface with different teams to make data available for reporting and analytics.
3) Ongoing...
● Continue to optimize our data platform.
● Gather data requirements from other teams and implement solutions
● Ensure integrity between our various systems and champion the flow of data across all of our systems ensuring data consistency.
● Work with structured and unstructured data at scale from a variety of different data sources (key-value, document, columnar etc.) as well as traditional RDBMSs.
● Constantly monitor and support our complete data ecosystem.
● Maintain the data platform security and integrity.
● Operate and manage the services in production with high availability, disaster recovery and CI/CD in mind
WHO YOU ARE:
The ideal candidate for this role must have a Bachelor’s degree in computer science or applied mathematics and 8+ years of software development in startups or companies working on big data technologies.
Technical Skills :
○ Strong programming skills. Must be proficient in one of the following languages: Python / Scala / Java.
○ Must have working knowledge of Pyspark, Panda Data Frames, SparkSQL etc.
○ Working knowledge of messaging and data pipeline tools like Apache Kafka, Amazon Kinesis.
○ Must have experience developing APIs using frameworks like Flask/Django etc.
○ Experience with stream-processing systems: Apache Spark-Streaming, Apache Storm, AWS Kinesis Suite of products, etc.
○ Experience working in open table / in-memory table formats for huge analytics dataset: Iceberg, Parquet, Arrow, AVRO etc.
○ Experience writing and understanding complex SQL queries.
○ Experience with data orchestration tool such as Airflow and log mining tools such as Splunk
○ Experience using AWS tools such as Database Migration Service, S3, RDS and Redshift
Industry Experience:
○ You have been involved in developing at least one data pipeline that involved collecting/streaming, storing and processing (ETL) the data for various business use cases.
○ You have been an Integral part of a team working with structured, semi-structured and unstructured large data sets from real time/batch streaming data feeds.
● Problem Solving: You are known for proactively solving problems before they can become real problems. You are the kind of person who is constantly upgrading your skill sets & is always looking for ways to enhance the data platforms you’re working on.
● Team Member: You pride yourself on working collaboratively with all of your teammates. You are transparent in your communication and proactively share what others need to be aware of.
NICE TO HAVES:
● Experience with AWS cloud services: EMR, Glue, Athena.
● Have worked with data pipeline and governance tools:, Azkaban, Luigi etc.
● Experience working with NoSQL databases like, Apache Solr, DynamoDB, MongoDB.
● Have knowledge of HDFS, Flume, Hive, MapReduce
● Nice to have worked in other data warehouse tools such as Snowflake.
THE NITTY GRITTY:
● Location = Great news! You have the option of working from anywhere in the U.S.! Successful candidates for this position must be located outside of California. It is also our expectation that all work for this position will be conducted outside of the state as well.
● Manager = You’ll report to Director of Data Engineering
● Compensation = Commensurate with experience for a company of our size
● Benefits = we have awesome benefits! We cover 100% of the cost of your dental and vision plans and we also provide short term and long term disability and life insurance to you all free of charge! We have two medical plans - a base plan that has a Health Savings account add on and a PPO option (you do have to pay for these). You’ll have 12 holidays off and unlimited paid time off.. We match 100% on the first 3% you contribute to our 401(k)and then 50% of the next 2% you contribute. So, if you contribute 5% of your paycheck, we’ll match 4% of that. Free money!