How to Verify Whether a Candidate Really Works with AI

“Proficient in AI tools.”

That phrase now appears in one out of every three CVs. The problem is that for some people it means a daily, deep workflow built around language models — for others, it means typing a question into ChatGPT once every two weeks.

For a recruiter or hiring manager, both profiles look identical on paper.

This article gives you concrete tools: behavioral questions, a live task, and portfolio evaluation tips. By the end, you’ll know how to distinguish an enthusiast from someone who has genuinely embedded AI into their work and is getting measurable results from it — all within a single interview.

Three Levels of Proficiency — What You’re Actually Evaluating

Before you start asking questions, establish who you’re looking for. A common recruitment mistake is writing job requirements that sound like “Level 3” while asking interview questions designed for Level 1. The result: you hire someone too weak, or you reject someone too strong because they speak a language you don’t understand.

Level 1 — Occasional user

Uses ready-made interfaces (ChatGPT, Copilot, Gemini) for specific, one-off tasks: write an email, summarize a document, translate a text. Doesn’t consciously build prompts, doesn’t iterate, doesn’t critically evaluate outputs. Sufficient for roles where AI is just an optional extra.

Level 2 — Advanced user

Has developed their own workflow: uses several tools, consciously selects the right model for each task, can write effective prompts, and knows when AI fails. Understands limitations and compensates for them. This is the profile needed for most roles that list “AI skills” as a requirement.

Level 3 — Integrator / Builder

Connects models to other systems (APIs, automations, agents, RAG), builds custom solutions, or manages AI projects for a team. Understands architecture, costs, and risks. Required for technical roles or where AI is meant to become part of the product.

Before moving on to questions, decide which level is the “pass threshold” for the role you’re filling.

Behavioral Questions That Reveal Real Habits

Classic STAR-style competency questions work better here than theoretical “what do you know about large language models” questions. The goal is concrete past situations — those are hard to fabricate without operational knowledge.

Opening questions (for every candidate)

“Describe a specific situation from the last three months where AI genuinely saved you time or improved the quality of your work. How long did it take? Which tool? What exactly did you do?”

A good answer includes the tool name, an approximate time saving, and a description of the workflow before and after. A weak answer is vague, with no numbers and no tool names.

“When did AI last give you a wrong or useless result? What did you do next?”

This is one of the most important questions. Someone who genuinely works with AI has dozens of such stories. Someone who doesn’t will improvise or say “AI always works well.” A good answer describes a specific model error and how it was verified or corrected.

“How do you verify AI outputs before using them in your work?”

You’re looking for a concrete process: checking sources, cross-referencing with another model, using domain knowledge as a filter. “I read it and make corrections” is Level 1. “I use the model’s search feature and compare with documentation, and for numerical data I always double-check manually” — that’s Level 2.

Questions for Level 2–3 candidates

“How do you choose a model or tool for a specific task? Give an example.”

A conscious user can explain why they chose Claude for document analysis and Perplexity for research on current data — or why they use GitHub Copilot for coding instead of a general chat model.

“Describe your most elaborate prompt. What was in it? Why did you structure it that way?”

Level 2 people spontaneously mention context, role, output format, and constraints. Level 1 people say: “I just write what I want to get.”

“What has changed in the way you work with AI over the past year?”

This reveals trajectory: is the candidate actively learning and experimenting, or standing still? An answer worth noting: “I started using X, then discovered Y does it better because…” A concerning answer: “Pretty much the same, really.”

A Live Trial Task — The Most Accurate Test

Questions reveal declarative knowledge. A live task reveals operational competence — how the candidate actually thinks and acts with a tool in hand.

Format: 15–20 minutes during the interview (or as a take-home task before the meeting). The candidate works in their own environment — their own tool, their own account. The goal isn’t a polished result; it’s observing the process.

What to watch for

  • Tool selection — Does the candidate know what they’re reaching for and why?
  • First prompt — Do they immediately build meaningful context, or type keywords and hope for the best?
  • Iteration — How do they respond to an insufficient result? Can they diagnose the problem and improve the prompt?
  • Output evaluation — Do they accept the result uncritically, or do they check and question it?
  • Speed — An advanced user is noticeably faster, because they have established patterns.

Ready-made scenarios by role

Analyst / data: “You have this dataset (provide a CSV or screenshot). Extract three insights for management and suggest a chart. You can use any AI tool.”

Marketer / content: “We have a new product — here’s the brief (one paragraph). Write three headline variations for a landing page, each targeting a different segment. Justify your choices.”

Developer: “Here’s a code snippet with a bug (realistic, non-trivial). Use AI to diagnose and fix it. Think out loud as you go.”

Product manager: “You have survey results from users (ten responses). Extract the main themes and propose one feature for the roadmap. Justify your decision.”

After the task, always ask one follow-up question: “What would you do differently if you had more time?” The answer reveals the candidate’s level of self-awareness and their understanding of their output’s limitations.

Technical and Tool-Related Questions — For Advanced Roles

At Levels 2 and 3, it’s worth checking whether the candidate understands the ecosystem, not just one interface.

Knowledge of model limitations: Ask about hallucinations — a good answer describes a specific mechanism and mitigation strategies. Ask about context windows — does the candidate know what happens when a document is too long? This is operational knowledge acquired through use, not reading.

Ecosystem awareness: You don’t need to ask for technical detail, but it’s worth checking whether the candidate has heard of and understands concepts such as: embedding models, RAG (Retrieval-Augmented Generation), agents, automations with n8n or Make. For integrator roles — familiarity with the OpenAI or Anthropic API, working with tokens and costs.

Tool knowledge per professional context: Instead of asking broadly “which AI tools do you use,” ask: “Which AI tool would you choose for X and why?” — where X is a specific, day-to-day problem relevant to the role. An answer with no justification, or one that says “because everyone uses it,” is a warning sign.

Signals from Portfolios, Code, and Work Samples

Start verifying before the interview.

GitHub and public projects: Look for repositories with AI integrations — system prompts, API-calling scripts, model notebooks. Important: check commit dates and code comments. An active project with an iteration history is more credible than a single repository pushed three days before the application.

Portfolio and case studies: A strong candidate describes not just the outcome but the process — which tool, what problem, what limitations they encountered. If the description reads like a corporate blog paragraph with no specifics, treat it as a yellow flag.

Posts and articles: Substantive public content (LinkedIn, Substack, blog) shows the candidate thinks about AI beyond working hours. Not required, but if it exists — check whether the content goes deeper than “top 10 prompts you need to know.”

Verification questions during the interview: After reviewing the portfolio, ask: “In this project you wrote that you used AI for X. What did your prompt look like? What didn’t work the first time?” A genuine author will answer without hesitation. Someone who described someone else’s experience will start to stumble.

Red Flags That Should Raise Alarm

The following signals don’t necessarily disqualify a candidate outright, but they should trigger follow-up probing.

Answers with no specifics. “I use AI for various things at work” with no example after follow-up — that’s Level 1 presenting itself as higher.

“AI always works well.” Anyone who intensively uses models has a catalogue of failures and frustrations. The absence of such stories is a signal that usage is superficial.

Confusing consumer products with enterprise solutions. A candidate for a role requiring AI integration who doesn’t know the difference between ChatGPT and the OpenAI API is at Level 1, regardless of what’s on their CV.

Buzzword-dropping without substance. “I work with AI agents and RAG” — but when pressed, can’t explain what that means in practice. A sign the candidate has read about trends without applying them.

No reflection on limitations and risks. Mature AI usage involves an awareness of where models fail. Someone who speaks exclusively in superlatives probably hasn’t run into those failures often enough. Mature AI usage involves an awareness of where models fail. Someone who speaks exclusively in superlatives probably hasn’t run into those failures often enough.

When you hit a red flag — don’t end the conversation immediately. Ask one specific follow-up: “Can you give an example?” or “What does that look like in practice?” The answer will either confirm the problem or clear up a misunderstanding.

Summary: A Three-Step Verification Protocol

Instead of relying on intuition or CV declarations, apply a repeatable framework:

Step 1 — Before the interview: Review the portfolio, GitHub, and any public content. Prepare two follow-up questions about specific things you found.

Step 2 — During the interview: Three behavioral questions (a specific situation from recent months, an AI failure and how it was handled, the output verification process). One live task or a detailed walkthrough of a portfolio project.

Step 3 — Evaluation: Do the answers include specific tools, dates, numbers, and descriptions of errors? Does the candidate talk about limitations? Did the live task reveal real habits, or just familiarity with an interface?

Someone who genuinely works with AI shares one common trait: they talk about it like a tool that sometimes fails, demands attention, and requires constant adjustment — not like a magic wand. That’s a good compass for any interview.


AI Candidate Verification Protocol

Use our candidate verification checklist

1
Before interview
Pre-screening
0 / 5
2
During interview
Interview
0 / 6
3
Final score
After the interview
0 / 5

Piotr Pawłowski

Write to us!

Want to quote for a recruitment project, inquire about cooperation? Fill out the form, we will contact you as soon as possible.

"At Talent Place, we are changing the labor market with a focus on quality, modernity and flexibility. We use models such as crowdstaffing and talent pooling and create a work environment with a work-life fit spirit."

Contact us and find out how we can achieve more together!

Talent Place is part of Everuptive Group – a provider of effective business solutions based on the potential of the Internet and modern technologies.

Do you have questions? contact@talentplace.pl