Software Architect, Agent Evaluation & Core Framework Job at Datagrid AI, Santa Clara, CA

NEtXc3pRY2NYVkN6T1E3ckhyMFlPTGNpNWc9PQ==
  • Datagrid AI
  • Santa Clara, CA

Job Description

Job Title:

Software Architect, Agent Evaluation & Core Framework

Location:

Remote First

SF Bay area preferred 

About Datagrid

Datagrid is the AI Agent that gets work done for you.

Instead of just answering questions, Datagrid’s agents take action—automating entire workflows across your tools, files, and systems. Whether it’s searching through documents to find answers, cross-referencing data to uncover gaps, or running a financial analysis that updates your Excel file—Datagrid does the work, so you don’t have to.

You get your time back. You 10x your output. The AI runs the playbook.

Behind the scenes, Datagrid connects to over 100 platforms and 2,000+ APIs—Excel, Google Docs, SharePoint, Slack, PDFs, websites, and more. It handles multi-modal problems like handling unstructured data like images and documents, as well as entire databases with ease, and communicates through channels like Teams, Slack, or SMS.

It’s built for trust and precision: agents cite their sources and operate safely in real-time. Enterprise teams get full control with teamspaces, RBAC, and usage reports. You can customize everything—launch fast on your own, or partner with our expert team.

From research to reporting, from digging through files to delivering results— Datagrid doesn’t just assist. It executes.

We’re looking for passionate individuals to join us at the frontier of AI innovation.

About the role:

Datagrid Agents operate where our customers work-across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app

Software Architect, Agent Evaluation & Core Framework, is crucial because we cannot manually test the vast array of agent interactions and capabilities. You will own and drive extending our evaluation harness to provide actionable reports on agent regressions and improvements, directly impacting strategic direction and customer experience. A key part of this will be incorporating the best open-source benchmarks into our evaluation set, and figuring out how to Agentically generate evaluations that are representative of customer use cases. As you become established, you will also have the opportunity to make fundamental changes to the Core Framework to improve the way Agents reason, use tools, and collaborate with humans. 

What you’ll do:

  • Work closely with an ex-Googler who built Gemini evals to create a harness for evaluating Agent performance, make that harness available both for local development and in CI/CD pipelines, and set up alerting for when Agents misbehave.
  • Influence and contribute to the extension of Datagrid’s Agentic capabilities.
  • Choose the best open/closed source components to build out the testing infra.
  • Integrate publicly available benchmarks such as RAGBench into the testing system.
  • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions.
  • Expose evaluation performance via alerts and dashboards

What you’ll have:
  • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1.
  • 10+ years of B2B software engineering experience.
  • Ability to write effective LLM prompts without assistance.
  • Proficiency with nodejs and server side frameworks such as NestJS or NextJS.
  • Familiarity with JavaScript frameworks such as React, Angular JS.
  • Experience with databases such as Weaviate and BigQuery.
  • Experience working with GCP or similar cloud providers.

Nice to Haves
  • Experience with any LLM evaluation platform (Galileo, Arize, LangSmith Orq)
  • Background in B2B SaaS automation tools
  • Contributions to open-source AI projects or published research
  • Familiarity with prompt engineering or model evaluation

Pay Range and Benefits

$200,000 – $240,000 USD per year, depending on experience and qualifications.

At Datagrid we set pay ranges using market data, internal benchmarks, and the scope of responsibilities. Final compensation within this range will be determined based on relevant experience, skills, and geographic location.

In addition to base salary, this role may be eligible for:

  • Equity in the company
  • Home office set-up reimbursement
  • Health, dental, and vision benefits
  • Flexible PTO and remote work options

Equal Opportunity Employer

Datagrid is an equal opportunity employer and is committed to building a diverse and inclusive team. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law. We encourage candidates from all backgrounds to apply.

Job Tags

Local area, Home office, Flexible hours,

Similar Jobs

Walt Disney World Resort

Science Fellow: Animal Physiology (Project Hire) Job at Walt Disney World Resort

 ...immunoassays, HPLC (High Performance Liquid Chromatography), and associated data analysis; advanced understanding of applied organic and inorganic chemistry ~ Experience working in zoos/aquariums Required Education: ~ M.S. in biology, physiology, animal... 

iT Resource Solutions.net,inc

Communications Assistant Job at iT Resource Solutions.net,inc

 ...Digital Communications Assistant Position Summary: The Digital Communications Assistant will help enhance efficiency and workflow...  ...other support roles. Core Duties/Responsibilities: Entry-level role Assistance with general project/traffic management... 

Home Instead

CNA Job at Home Instead

Home Instead - JobID: 60812 [CNA / Health Aide] As a Certified Nursing Assistant at Home Instead, you'll: Assist elderly clients with daily activities such as bathing, dressing, and grooming; Monitor and record vital signs, including blood pressure and temperature; Administer...

Jobot

Warehouse - Shipping & Receiving Supervisor (2nd Shift-Bilingual Spanish) Job at Jobot

1st shift hours with a 4 day work week, Monday - Thursday schedule. 3+ year long project!!! This Jobot Consulting Job is hosted by: Brandon Fobert Are you a fit? Easy Apply now by clicking the "Apply" buttonand sending us your resume. Salary: $33 - $35 per hour... 

Universal Orlando

Manager, Finance - UP&E Job at Universal Orlando

 ...There are also roles that require being on-site full time. Limited remote opportunities may be available within specific departments. You...  ...industry a plus), in a finance function. \n Solid accounting / finance background required. \n Advanced Excel skills (ability...