AI Agent Data Pipeline Intern

23 days ago

Santa Clara, CA, USAIntern

Responsibilities

Build pipelines to ingest and organize experiment-related data from team communications and documents.
Use LLM-based methods to clean noisy unstructured data and extract relevant information.
Design data schemas and quality checks for easier search and traceability.
Support retrieval workflows, including semantic search and RAG-style pipelines.
Prepare curated datasets for agent evaluation and LLM fine-tuning.
Collaborate with MLEs and platform engineers to understand experiment workflows.
Evaluate the agent's use of curated data for generating insights.
Contribute to internal tools and dashboards for monitoring experiment status.

Requirements

Strong skills in Python, SQL, and data processing.
Experience with structured and unstructured data, including text-heavy sources.
Familiarity with data pipelines, ETL workflows, or large-scale data processing.
Interest in LLM development, evaluation, and agentic AI systems.
Familiarity with machine learning workflows and evaluation metrics.
Strong analytical thinking and attention to data quality.
Comfort working with ambiguous data sources and collaborating with engineers.
Previous experience building internal tools or data quality checks.

Benefits

A fun, supportive, and engaging environment.
Infrastructures and computational resources to support your work.
Opportunity to work on cutting-edge technologies with top talents.
Chance to make a significant impact on the transportation revolution.
Competitive compensation package.
Snacks, lunches, dinners, and fun activities.

Tech Stack

Categories

AI & MLData EngineeringData Science