As a Principal Data Architect, you will be the primary visionary for our global data strategy. You will tackle the "unsolved" problems of autonomous vehicle data: how to efficiently store, index, and query petabytes of high-dimensional, multi-modal sensor data.
You will lead the transition of our data infrastructure into a state-of-the-art Open Lakehouse architecture, leveraging Apache Iceberg and the Hadoop ecosystem to create a deterministic, high-performance environment for ML research and safety-critical validation.
This role would require you to work for two years in our Serbian office, with the option of then moving to the US office.
Architectural Innovation: Lead the R&D and design of a next-generation data lakehouse that supports the unique requirements of ADAS/AV, including 4D spatial-temporal querying and multi-modal data fusion.
Deep Optimization: Go beyond standard implementations of Apache Iceberg to develop custom partitioning schemes, Z-ordering, and hidden indexing strategies tailored for LiDAR, radar, and video metadata.
Theoretical Leadership: Apply advanced research in distributed systems to solve challenges regarding data consistency, deterministic "replay" of vehicle logs, and massive-scale data lineage.
Strategic Storage R&D: Develop novel algorithms for data deduplication and "intelligent tiering," ensuring that rare "edge-case" driving data is preserved while optimizing the cost-to-performance ratio of the petabyte-scale lake.
Cross-Functional Research: Partner with ML Research and Simulation teams to ensure the data architecture supports emerging paradigms like Foundation Models and End-to-End Autonomous Driving architectures.
Technical Mentorship: Act as a high-level consultant and mentor to the broader Data Engineering organization, fostering an environment of analytical rigor and engineering excellence.
Education: PhD in Computer Science, Distributed Systems, Database Systems, or a related quantitative field.
Specialized Experience: 5+ years of experience in data systems, with a significant track record of designing large-scale distributed architectures.
Iceberg & Hadoop Internals: Deep, "under-the-hood" knowledge of Apache Iceberg (specification and implementation) and the Hadoop ecosystem (HDFS, Spark, Trino/Presto).
Research & Publication: Evidence of contributions to the field, such as publications in top-tier conferences (e.g., SIGMOD, VLDB, ICDE, OSDI) or a history of significant contributions to major open-source data projects.
Computational Foundations: Expert-level understanding of query optimization, file format internals (Parquet/Avro), and the trade-offs of distributed consensus protocols.
Automotive Safety Standards: Understanding of data integrity requirements for ISO 26262 or SOTIF (Safety of the Intended Functionality).
Geospatial Mastery: Experience with H3, S2, or other spatial indexing systems for high-frequency GPS and trajectory data.
Cloud Economics: Proven ability to manage the financial architecture of massive cloud deployments (AWS/Azure/GCP).