GrepJob
XPENG

AI Agent Data Pipeline Intern

XPENG
Apply
23 days ago
Santa Clara, CA, USAIntern

Responsibilities

  • Build pipelines to ingest and organize experiment-related data from team communications and documents.
  • Use LLM-based methods to clean noisy unstructured data and extract relevant information.
  • Design data schemas and quality checks for easier search and traceability.
  • Support retrieval workflows, including semantic search and RAG-style pipelines.
  • Prepare curated datasets for agent evaluation and LLM fine-tuning.
  • Collaborate with MLEs and platform engineers to understand experiment workflows.
  • Evaluate the agent's use of curated data for generating insights.
  • Contribute to internal tools and dashboards for monitoring experiment status.

Requirements

  • Strong skills in Python, SQL, and data processing.
  • Experience with structured and unstructured data, including text-heavy sources.
  • Familiarity with data pipelines, ETL workflows, or large-scale data processing.
  • Interest in LLM development, evaluation, and agentic AI systems.
  • Familiarity with machine learning workflows and evaluation metrics.
  • Strong analytical thinking and attention to data quality.
  • Comfort working with ambiguous data sources and collaborating with engineers.
  • Previous experience building internal tools or data quality checks.

Benefits

  • A fun, supportive, and engaging environment.
  • Infrastructures and computational resources to support your work.
  • Opportunity to work on cutting-edge technologies with top talents.
  • Chance to make a significant impact on the transportation revolution.
  • Competitive compensation package.
  • Snacks, lunches, dinners, and fun activities.

Tech Stack

Categories

AI & MLData EngineeringData Science