Designing the workflow behind my Azure AI POC: a controlled internal engineering assistant using Azure AI Search, RAG, Azure AI Services, guardrails, observability, and lightweight evaluation.

Series: Azure AI-103 infrastructure POC with Bicep, Azure DevOps, and Microsoft Foundry

In the previous article, I focused on the platform foundation: Bicep modules, Azure DevOps pipelines, tokenised parameter files, validation, What-If, environment flow, regional constraints, and the first Azure AI infrastructure baseline. This follow-up deliberately avoids walking through the same infrastructure deployment again. The next question is different:

Now that the platform foundation exists, what useful Azure AI workload could actually run on top of it?

The answer I wanted to use for this POC is an internal Azure engineering assistant. Not a public chatbot, not an autonomous operator, and not a broad enterprise search tool. Just a controlled assistant that can eventually answer questions from approved architecture, DevOps, platform, runbook, and AI study documentation.

That gives the POC a business wrapper. Instead of deploying Azure AI services for the sake of deploying them, the infrastructure is now connected to a plausible internal use case: helping engineers find trusted platform guidance faster.

The Business Scenario

The business scenario is intentionally narrow: build a controlled internal Azure engineering assistant that can answer questions from approved platform documentation using Azure AI Services, Azure AI Search, and a lightweight runtime layer.

In a real organisation, this could help engineers find approved platform standards, understand deployment patterns, retrieve runbook guidance, and get faster answers to common Azure engineering questions. It could also support onboarding for new cloud engineers, because a lot of platform knowledge is usually spread across architecture documents, pipeline examples, wiki pages, and old implementation notes.

This is still a POC, but the business wrapper matters. It changes the question from Can I deploy Azure AI resources? to Can I build a repeatable, observable, and secure enough foundation for a useful AI workload?

POC Boundary

I do not want this POC to become an uncontrolled enterprise knowledge assistant. The first version should stay deliberately small. The source content should come from approved markdown files, AI-103 notes, platform notes, and synthetic runbooks. Customer data, production secrets, personal information, HR content, legal documents, finance documents, and uncontrolled SharePoint or file share crawling should stay out of scope.

The same applies to runtime behaviour. The assistant should answer questions from indexed documents. It should not approve access, change infrastructure, deploy resources, create tickets, or act as an autonomous operator. Those actions would require a much stronger control model, including identity boundaries, approval workflows, audit trails, and risk review.

For security, the first version should prefer managed identity where possible, keep sensitive configuration out of source code, and use Key Vault for secrets or sensitive configuration references where required. For evaluation, a small repeatable test set is enough to start. A full enterprise AI evaluation platform can come later if the workload proves useful.

This boundary keeps the design realistic. Most AI POCs do not fail because the first demo cannot be made to work. They fail because scope, data, security, and success criteria are left vague for too long.

Workflow at a Glance

The workflow is simple: approved documents are prepared, indexed, retrieved at question time, passed to the model as focused context, and returned as a grounded answer with references.

flowchart TD
    A[Approved documentation] --> B[Prepare and clean content]
    B --> C[Chunk and enrich with metadata]
    C --> D[Index in Azure AI Search]
    D --> E[Engineer asks a question]
    E --> F[Runtime retrieves relevant chunks]
    F --> G[Model generates grounded answer]
    G --> H[Answer returned with source references]
    H --> I[Telemetry and evaluation]
    I --> J[Improve documents, prompts or retrieval]
    J --> D

The important design choice is that the model should not receive everything. It should receive only the context that is relevant to the question. This keeps the prompt smaller, reduces noise, and makes the behaviour easier to reason about.

Logical Component Responsibilities

The platform components from the previous article are still relevant, but this article focuses on behaviour rather than deployment topology. The runtime layer is responsible for request validation, retrieval, prompt construction, model calls, response formatting, and telemetry. Azure AI Search provides controlled retrieval. Azure AI Services provides the model endpoint. Application Insights and Log Analytics provide operational visibility.

Key Vault remains part of the platform pattern, but I would not treat it as the main focus of the assistant workflow. Its job is simple: keep secrets and sensitive configuration out of source code and application settings where possible. For the deployed runtime, managed identity and Key Vault references are a better direction than hardcoded keys.

Document Preparation Workflow

For this POC, I would start with documents that are easy to control: markdown files, short runbooks, architecture notes, and AI study notes. That avoids spending the first iteration on OCR, complex document parsing, and permission inheritance problems.

The initial document set should be small enough to inspect manually but realistic enough to test the retrieval pattern. Good first candidates are:

  • architecture decision notes and platform standards;
  • Bicep module usage notes and pipeline deployment guides;
  • AI-103 study notes, model deployment notes, and synthetic runbooks.

The document preparation process should be boring and repeatable. First, confirm that the document is approved for the POC. Then remove obsolete or duplicated content, check for secrets or personal data, normalise the content into a simple format, split it into useful chunks, add metadata, index it in Azure AI Search, and run a few retrieval tests before connecting the model.

The metadata matters. A chunk should not just contain text. It should carry enough information for the assistant to explain where the answer came from. Useful metadata could include the document name, source path, section heading, topic, environment, last updated date, and document owner where that information exists.

This makes source references possible and gives the runtime a better way to filter retrieval results later.

Request Flow

At request time, the assistant should behave more like a controlled engineering tool than a generic chatbot.

sequenceDiagram
    participant User as Engineer
    participant App as Runtime layer
    participant Search as Azure AI Search
    participant Model as Azure AI model
    participant Logs as App Insights / Log Analytics

    User->>App: Ask engineering question
    App->>App: Validate input and apply guardrails
    App->>Search: Search approved index
    Search-->>App: Return relevant chunks and metadata
    App->>App: Build compact grounded prompt
    App->>Model: Send question and retrieved context
    Model-->>App: Return generated answer
    App->>Logs: Record telemetry and dependency calls
    App-->>User: Return answer with references

The assistant should be able to say when it cannot answer from approved sources. That is more useful than a confident answer based on weak or missing context.

Context Engineering

A common mistake in early AI solutions is to put too much into the prompt. The system prompt becomes a dumping ground for policies, examples, rules, assumptions, and documentation extracts. That may work in a demo, but it becomes difficult to maintain and hard to evaluate.

For this POC, I would keep the system prompt short and retrieve task-specific context from Azure AI Search. The runtime should include only the most relevant passages, preserve metadata needed for references, avoid unnecessary chat history, and instruct the model to answer from the retrieved context when the question is document-specific.

This is one of the main reasons to use RAG. The search layer becomes the controlled way to bring knowledge into the model at the right time, instead of loading everything into the model upfront.

Tool Design: Start Narrow

I would not start this POC by giving the assistant many tools. That can come later if there is a real need. The first version should prove a smaller capability: accept a question, retrieve relevant approved content, generate a grounded answer, return source references, emit useful telemetry, and support basic evaluation.

I would explicitly avoid tools that create tickets, approve access, deploy infrastructure, change Azure resources, or update production systems. Those capabilities require stronger identity controls, approval workflows, audit trails, and risk review.

Guardrails

The first guardrails do not need to be complicated. They need to be clear. The assistant should check whether the question is within scope, search only approved content, return an answer only when relevant sources are found, and say when the answer is not available from the approved document set.

That last point is important. In an engineering assistant, I could not find this in the approved sources is often a better answer than a confident but unsupported explanation.

The assistant is not a decision-maker. It should support engineers by finding and explaining approved information. It should not approve designs, grant access, modify infrastructure, or replace human review.

Observability and Evaluation

For a normal web application, telemetry tells you whether the application is healthy. For an AI workload, that is still required, but it is not enough. You also need to understand whether retrieval is working and whether answers are acceptable.

flowchart LR
    A[Runtime request] --> B[Application telemetry]
    B --> C[Application Insights]
    C --> D[Log Analytics]

    E[Test questions] --> F[Evaluation script]
    F --> G[Assistant answers]
    G --> H[Compare with expected sources and facts]
    H --> I[Pass / Review / Fail]

For the first version, I would keep evaluation lightweight. A JSON file with test questions is enough to start. Each test question can include the expected source document and a few key facts the answer should contain.

[
  {
    "question": "What is the purpose of Azure AI Search in this POC?",
    "expected_source": "rag-design.md",
    "expected_answer_contains": ["retrieval", "approved documents", "grounded answer"]
  },
  {
    "question": "Can the assistant approve production access?",
    "expected_answer_contains": ["no", "human review", "out of scope"]
  }
]

This gives the POC a simple quality gate. After changing prompts, chunking, or retrieval settings, I can rerun the same questions and see whether behaviour improved or regressed.

How I Would Build the Next Iteration

I would build the next stage gradually. First, I would confirm that local code can authenticate and call the deployed model. Then I would create a small Azure AI Search index, upload a few approved documents, and query the index from local code. Once retrieval works, the next step is to combine the user question and retrieved chunks into a compact prompt and return an answer with source references.

Only after that would I deploy the runtime to Function App or App Service. At that point, managed identity, Key Vault references, Application Insights tracing, dependency tracking, and failure logging become more important. Finally, I would add a small evaluation set and run it after changes to prompts, chunking, retrieval settings, or model configuration.

This order is important. I do not want to harden a runtime that has not yet proven the basic model call and retrieval path. I also do not want to build complex orchestration before the assistant can answer simple grounded questions reliably.

What I Would Not Build Yet

A good POC is partly defined by what it does not try to solve too early. For this stage, I would not build a fully autonomous Azure operator, multi-agent orchestration, large-scale enterprise search, uncontrolled document crawling, production change actions, complex long-running task memory, active-active DR for the assistant, or full private networking before the application workflow is proven.

Those topics may become relevant later. Adding them too early would make it harder to see whether the core assistant pattern is actually useful.

Success Criteria

For this workflow-focused stage, success should be measurable. The assistant should answer from a small approved document set, include source references, say when an answer is not found, expose retrieval results during troubleshooting, log model calls and failures, avoid secrets in source code, and support a small repeatable evaluation set.

Before moving beyond this stage, I would want to see three things working reliably:

  • local code can call the model and query Azure AI Search;
  • answers are grounded in retrieved documents and include useful references;
  • telemetry and a small evaluation set can show whether changes improved or broke the assistant.

These are practical success criteria. They do not pretend the POC is production-ready, but they also avoid treating a nice demo response as enough evidence.

Final Thoughts

This article is the bridge between infrastructure and application behaviour. The previous stage proved that the platform foundation can be deployed in a structured way. This stage defines what the workload should actually do.

For me, the important lesson is that an Azure AI POC needs both sides. It needs the platform discipline: Bicep, pipelines, identity, configuration, monitoring and environment structure. But it also needs the AI workload discipline: scoped data, retrieval quality, context control, grounded answers, guardrails and evaluation.

The next useful milestone is not a more impressive demo. It is a small working workflow: approved documents in, relevant retrieval, grounded answer out, references included, telemetry captured, and evaluation repeatable.

That is the point where the POC starts to become something more useful than a collection of deployed Azure resources.