OpenAI Tool Calling · FastAPI · Streamlit

Research Intelligence,
Fully Automated

LabFlow AI pipelines 6 repetitive research tasks through an agentic LLM system — extracting structured insights, comparing studies, and generating reports without manual effort.

6 Workflows
+38% Quality Gain
3 Teams
REST API
Workflows

Six Automated Research Pipelines

Each workflow is an OpenAI tool. The agent selects and chains them automatically based on your task — no manual routing.

01 /
log_summarizer
Summarization

Transforms raw research notes into structured summaries — objectives, methods, results, and next steps extracted in a single LLM call.

02 /
findings_extractor
Extraction

Pulls key findings, hypotheses, and conclusions with evidence strength scores from unstructured research logs.

03 /
domain_classifier
Classification

Classifies research into scientific domain, subdomain, and keyword tags with a calibrated confidence score.

04 /
log_comparator
Comparison

Compares two research logs side-by-side — surfacing shared themes, unique elements, and contradictions between studies.

05 /
report_generator
Generation

Compiles multiple research summaries into an executive briefing with background, key findings, and actionable recommendations.

06 /
knowledge_searcher
Retrieval

Semantic search across your research corpus — returns ranked excerpts with relevance scores for any concept or query.

Architecture

Agentic Loop, End to End

A clean four-step pipeline — the model decides which tools to call, applies the optimized prompt variant, and returns typed JSON.

STEP 01
📥

Ingest Log

Upload research notes via the FastAPI endpoint or Streamlit dashboard. Persisted to the PostgreSQL-compatible database.

STEP 02
🤖

Agent Selects Tools

AgentCore sends the task to OpenAI with all 6 tools. The model decides which to call — zero manual routing logic.

STEP 03
🧪

Prompt Variant Applied

PromptManager selects Variant A or B deterministically per session, enabling live A/B testing across all workflows.

STEP 04
📊

Structured Output

Results are quality-scored, stored, and returned as typed JSON — available in the analytics dashboard or via REST.

Demo

Live API Response

Run any workflow in under a second. The agent handles tool selection, prompt optimization, and structured output automatically.

bash — labflow-ai
# Start the full stack
$ python run.py
API ready on localhost:8000
Dashboard ready on localhost:8501
6 workflows loaded · A/B enabled
 
# Call the agent endpoint
$ curl -X POST /api/v1/workflows/run \
   -d '{"workflow_name":"log_summarizer",
     "payload":{"log_text":"..."}}'
 
# Agentic mode — model picks tools
$ curl -X POST /api/v1/workflows/agent \
   -d '{"message":"Summarize and classify
      this research log: ..."}'
tools_called: [log_summarizer,
   domain_classifier] · 1240ms
POST /api/v1/workflows/run · log_summarizer 200 OK
{
  "run_id": 42,
  "workflow_name": "log_summarizer",
  "variant": "B",
  "quality_score": 0.91,
  "latency_ms": 843,
  "result": {
    "title": "Enzyme Kinetics Study",
    "objectives": [
      "Measure catalytic efficiency",
      "Compare thermal stability"
    ],
    "methods": ["ONPG assay", "SDS-PAGE"],
    "results": [
      "Peak activity at 37°C",
      "Vmax = 142 µmol/min/mg"
    ],
    "next_steps": ["Test B. subtilis variants"]
  }
}
Prompt Engineering

A/B Testing Framework

Every workflow runs two prompt variants simultaneously. Systematic iteration delivered a 38% quality improvement across all pipelines.

Variant A Baseline Prompt
Variant B Chain-of-Thought Prompt Winner
Prompt Template
Summarize the following research log.
Return a JSON object with keys:
title, objectives, methods, results,
next_steps.

Log: {log_text}
Quality Score
0.58
Completeness
52%
Structure
61%
Prompt Template
You are a research analyst. Carefully
read the log below and extract a
structured summary.

Think step by step:
1. Identify research objectives.
2. Identify the methodology used.
3. Extract key results.

Return ONLY valid JSON with
exact keys: title, objectives...
Quality Score
0.80
Completeness
88%
Structure
91%
+38%

Average quality improvement

Chain-of-thought prompts with output constraints consistently outperform direct prompts across all 6 workflows.

Tech Stack

Built to Production Standards

Every layer chosen for reliability and extensibility — switch from SQLite to PostgreSQL with one env var change.

🐍

Python 3.12

Type-annotated codebase with Pydantic v2 for runtime validation throughout.

FastAPI

Async REST API with OpenAPI docs auto-generated at /docs.

🤖

OpenAI Tool Calling

Native function calling API for agentic multi-tool orchestration — no LangChain abstractions.

📊

Streamlit

Multi-page analytics dashboard with live charts, A/B results, and an agent chat interface.

🗄️

PostgreSQL

SQLAlchemy ORM with full PostgreSQL support. SQLite used for zero-config local development.

🔧

SQLAlchemy 2.0

Declarative ORM models with typed CRUD layer. Database created automatically on first run.

🧪

A/B Framework

Deterministic variant assignment per session via MD5 hash — consistent UX, unbiased sampling.

🚀

Single Entry Point

python run.py starts the full stack. No Docker, no process manager required.

Explore the Full Source

Clone, set your API key, and run the complete stack in under 3 minutes.

▶ Open Live Demo View on GitHub