23 days ago
Santa Clara, CA, USAIntern
Responsibilities
- Build pipelines to ingest and organize experiment-related data from team communications and documents.
- Use LLM-based methods to clean noisy unstructured data and extract relevant information.
- Design data schemas and quality checks for easier search and traceability.
- Support retrieval workflows, including semantic search and RAG-style pipelines.
- Prepare curated datasets for agent evaluation and LLM fine-tuning.
- Collaborate with MLEs and platform engineers to understand experiment workflows.
- Evaluate the agent's use of curated data for generating insights.
- Contribute to internal tools and dashboards for monitoring experiment status.
Requirements
- Strong skills in Python, SQL, and data processing.
- Experience with structured and unstructured data, including text-heavy sources.
- Familiarity with data pipelines, ETL workflows, or large-scale data processing.
- Interest in LLM development, evaluation, and agentic AI systems.
- Familiarity with machine learning workflows and evaluation metrics.
- Strong analytical thinking and attention to data quality.
- Comfort working with ambiguous data sources and collaborating with engineers.
- Previous experience building internal tools or data quality checks.
Benefits
- A fun, supportive, and engaging environment.
- Infrastructures and computational resources to support your work.
- Opportunity to work on cutting-edge technologies with top talents.
- Chance to make a significant impact on the transportation revolution.
- Competitive compensation package.
- Snacks, lunches, dinners, and fun activities.