OpenAI Tool Calling · FastAPI · Streamlit

Research Intelligence,
Fully Automated

LabFlow AI pipelines 6 repetitive research tasks through an agentic LLM system — extracting structured insights, comparing studies, and generating reports without manual effort.

▶ Live Demo View on GitHub

6 Workflows

+38% Quality Gain

3 Teams

REST API

Workflows

Six Automated Research Pipelines

Each workflow is an OpenAI tool. The agent selects and chains them automatically based on your task — no manual routing.

01 /

log_summarizer

Summarization

Transforms raw research notes into structured summaries — objectives, methods, results, and next steps extracted in a single LLM call.

02 /

findings_extractor

Extraction

Pulls key findings, hypotheses, and conclusions with evidence strength scores from unstructured research logs.

03 /

domain_classifier

Classification

Classifies research into scientific domain, subdomain, and keyword tags with a calibrated confidence score.

04 /

log_comparator

Comparison

Compares two research logs side-by-side — surfacing shared themes, unique elements, and contradictions between studies.

05 /

report_generator

Generation

Compiles multiple research summaries into an executive briefing with background, key findings, and actionable recommendations.

06 /

knowledge_searcher

Retrieval

Semantic search across your research corpus — returns ranked excerpts with relevance scores for any concept or query.

Architecture

Agentic Loop, End to End

A clean four-step pipeline — the model decides which tools to call, applies the optimized prompt variant, and returns typed JSON.

STEP 01

📥

Ingest Log

Upload research notes via the FastAPI endpoint or Streamlit dashboard. Persisted to the PostgreSQL-compatible database.

→

STEP 02

🤖

Agent Selects Tools

AgentCore sends the task to OpenAI with all 6 tools. The model decides which to call — zero manual routing logic.

→

STEP 03

🧪

Prompt Variant Applied

PromptManager selects Variant A or B deterministically per session, enabling live A/B testing across all workflows.

→

STEP 04

📊

Structured Output

Results are quality-scored, stored, and returned as typed JSON — available in the analytics dashboard or via REST.

Demo

Live API Response

Run any workflow in under a second. The agent handles tool selection, prompt optimization, and structured output automatically.

bash — labflow-ai

# Start the full stack

$ python run.py

✓ API ready on localhost:8000

✓ Dashboard ready on localhost:8501

✓ 6 workflows loaded · A/B enabled

# Call the agent endpoint

$ curl -X POST /api/v1/workflows/run \

-d '{"workflow_name":"log_summarizer",

"payload":{"log_text":"..."}}'

# Agentic mode — model picks tools

$ curl -X POST /api/v1/workflows/agent \

-d '{"message":"Summarize and classify

this research log: ..."}'

✓ tools_called: [log_summarizer,

domain_classifier] · 1240ms

POST /api/v1/workflows/run · log_summarizer 200 OK

{
  "run_id": 42,
  "workflow_name": "log_summarizer",
  "variant": "B",
  "quality_score": 0.91,
  "latency_ms": 843,
  "result": {
    "title": "Enzyme Kinetics Study",
    "objectives": [
      "Measure catalytic efficiency",
      "Compare thermal stability"
    ],
    "methods": ["ONPG assay", "SDS-PAGE"],
    "results": [
      "Peak activity at 37°C",
      "Vmax = 142 µmol/min/mg"
    ],
    "next_steps": ["Test B. subtilis variants"]
  }
}

Prompt Engineering

A/B Testing Framework

Every workflow runs two prompt variants simultaneously. Systematic iteration delivered a 38% quality improvement across all pipelines.

Variant A Baseline Prompt

Variant B Chain-of-Thought Prompt Winner

Prompt Template

Summarize the following research log.
Return a JSON object with keys:
title, objectives, methods, results,
next_steps.

Log: {log_text}

Quality Score

0.58

Completeness

52%

Structure

61%

Prompt Template

You are a research analyst. Carefully
read the log below and extract a
structured summary.

Think step by step:
1. Identify research objectives.
2. Identify the methodology used.
3. Extract key results.

Return ONLY valid JSON with
exact keys: title, objectives...

Quality Score

0.80

Completeness

88%

Structure

91%

+38%

Average quality improvement

Chain-of-thought prompts with output constraints consistently outperform direct prompts across all 6 workflows.

Tech Stack

Built to Production Standards

Every layer chosen for reliability and extensibility — switch from SQLite to PostgreSQL with one env var change.

🐍

Python 3.12

Type-annotated codebase with Pydantic v2 for runtime validation throughout.

⚡

FastAPI

Async REST API with OpenAPI docs auto-generated at /docs.

🤖

OpenAI Tool Calling

Native function calling API for agentic multi-tool orchestration — no LangChain abstractions.

📊

Streamlit

Multi-page analytics dashboard with live charts, A/B results, and an agent chat interface.

🗄️

PostgreSQL

SQLAlchemy ORM with full PostgreSQL support. SQLite used for zero-config local development.

🔧

SQLAlchemy 2.0

Declarative ORM models with typed CRUD layer. Database created automatically on first run.

🧪

A/B Framework

Deterministic variant assignment per session via MD5 hash — consistent UX, unbiased sampling.

🚀

Single Entry Point

python run.py starts the full stack. No Docker, no process manager required.

Research Intelligence, Fully Automated