Summary
Microsoft AI are looking for a talented Member of Technical Staff – Data Research Engineer at their Redmond office. This role sits at the intersection of data and innovation—collaborating with scientists, engineers, and annotators to curate, analyze, and evaluate diverse multimodal data sources critical to model development. You will lead efforts to develop novel data collection strategies, improve dataset quality and integrity, understand data-driven model behaviors, and align datasets with ethical and societal values.
About the Role
As a Data Research Engineer, you will be responsible for creating high-quality datasets for training and evaluation, running experiments on new datasets (data ablations) to assess their impact and determine the most effective data. You will also develop and maintain scalable data pipelines for multimodal ingestion, preprocessing, filtering, and annotation. Additionally, you will analyze real-world multimodal datasets to assess quality, diversity, relevance, and identify areas for improvement. You will build lightweight tools and workflows for dataset auditing, visualization, and versioning. You will collaborate with Safety, Ethics, and Governance teams to ensure datasets meet standards for quality, privacy, and responsible AI practices.
Accountabilities
- Create high-quality datasets for training and evaluation
- Run experiments on new datasets (data ablations) to assess their impact and determine the most effective data
- Develop and maintain scalable data pipelines for multimodal ingestion, preprocessing, filtering, and annotation
- Analyze real-world multimodal datasets to assess quality, diversity, relevance, and identify areas for improvement
- Build lightweight tools and workflows for dataset auditing, visualization, and versioning
The Candidate we're looking for
Experience:
- 4+ years technical engineering experience with coding in languages including, but not limited to, Python and common data libraries (Pandas, NumPy, etc.)
Technical skills:
- Proficiency in statistics and exploratory data analysis methods
- Familiarity with data processing frameworks such as Spark, Ray, or Apache Beam
Personal attributes:
- Ability to communicate technical findings clearly to research and product teams
Benefits
- Competitive salary
- Comprehensive benefits package
- Opportunities for professional growth and development
- Collaborative and dynamic work environment