Speaking Interview Task 2026 Format

TOEFL Speaking Interview 2026: 4 Question Types, 45-Second Response Framework, and Sample Band 6 Answers

May 2, 2026

14 min read

By Daniel Whitaker

The Take an Interview task is the second half of the new 2026 TOEFL Speaking section, and it is the single highest-anxiety moment in the entire test. A pre-recorded researcher asks you four questions in sequence. Each one starts the recording the instant the audio ends. There is no preparation time, no second take, and no way to extend the 45-second response window. For most candidates, the gap between a band 4 and a band 6 on Speaking is decided in these four 45-second windows. This guide is the playbook for handling them: what the task actually looks like, the four question types in order, the 45-second response framework that hits the rubric, three annotated band 6 sample answers, and the no-prep recovery moves that turn a panicked silence into a clean opening line.

1. What the Take an Interview task actually looks like

The Speaking section in the redesigned 2026 TOEFL has two halves. The first half is Listen and Repeat, where you echo seven sentences after a beep. The second half is Take an Interview, also called the simulated interview or the interview task. ETS lists it as Speaking Questions 8-11.

The setup on screen is deliberate. A small video window plays a pre-recorded researcher who introduces herself and explains that you have volunteered for a research study about everyday life. She asks four questions, one at a time, with the recording starting automatically the instant her question audio ends. Between questions, the video loops a small nodding animation to simulate engagement, but there is no real interaction. The interviewer cannot hear you, react to a stumble, or rephrase a question. You are speaking to a video and an AI scorer.

The framing matters because it shapes the topics. Because the researcher is studying common everyday experiences, the questions are accessible and personal: how you commute, what you read, how you make decisions, how you spend free time, what you think about a daily-life policy. You will not be asked anything that requires academic knowledge. The challenge is producing 45 seconds of well-organized, fluent speech under cold conditions, not finding things to say.

2. The mechanics: 0 prep, 45 seconds, 4 questions, 1 video

The four numbers below define the task. Internalize them before you do any content prep — most lost points trace back to misjudging one of these constraints.

Constraint	Value	What it means in practice
Preparation time	0 seconds	Recording starts the instant the question audio ends. Your opener has to be a stalling phrase, not silence.
Recording length	45 seconds	Hard cap. Aim for 40-44 seconds of content. Stopping under 30 seconds reads as undeveloped to SpeechRater.
Number of questions	4	All from one interview. The whole block is about 5 minutes including the video framing.
Replays	1	The question audio plays once. There is no replay button. If you mishear, you have to recover from the first three words you caught.
Question text on screen	Yes, brief	A short written prompt appears beneath the video while you respond. Use it to anchor the topic if your memory blanks.

Two of these constraints reward training more than the others. The zero-prep window is the one most candidates underprepare for, because most older TOEFL prep materials still teach 15-second prep rituals that no longer apply. The 45-second cap is the one most candidates overshoot on Question 1 (because the question feels easy) and undershoot on Question 4 (because they froze). Both patterns are correctable in three weeks of targeted practice — see section 9.

3. The four question types (in the order they always appear)

The four interview questions are not random. They follow a fixed difficulty progression that ETS uses to widen the score band across candidates. Knowing the progression lets you pre-plan the opener and frame for each slot, which buys 5-8 seconds of cognitive headroom on test day.

Q1 · Personal recall · Memory or past experience

Pattern: "Describe a time when..." / "Tell me about a memorable..." / "Think of a recent occasion when..."

What it tests: Whether you can quickly retrieve a concrete memory and tell it as a small story with a beginning, middle, and end.

Frame: Specific moment → what you did → how it ended → why it stuck. Easiest of the four if you keep it concrete; hardest if you abstract too early.

Q2 · Emotional reaction · Feelings or preferences

Pattern: "How do you feel about..." / "Do you prefer X or Y, and why?" / "What is your reaction when..."

What it tests: Whether you can take a clear feeling-position and back it with a personal example, not generic claims.

Frame: State the feeling clearly → one personal example → one extension or nuance. Avoid hedging openers like "It depends" — they cost you 5 seconds and signal weak commitment.

Q3 · Opinion with support · Take and justify a position

Pattern: "Some people think... others think... what is your view?" / "Do you agree that..." / "What is the best way to..."

What it tests: Whether you can structure a brief argument with a position, a reason, and an example, and land a one-sentence conclusion before time expires. This is the question type that historically replaced the standalone "express an opinion" task in older TOEFL formats.

Frame: Position → primary reason → supporting example → one-line wrap. Pick a side fast; the AI does not reward balanced fence-sitting under 45 seconds.

Q4 · Policy or prediction · Speculation or broader evaluation

Pattern: "What might happen if..." / "Should governments..." / "How will X change in the next ten years?" / "What would be the consequences of..."

What it tests: Whether you can reason abstractly under no-prep pressure. The hardest of the four, both because it is most distant from personal experience and because it comes when your cognitive reserves are lowest.

Frame: Acknowledge the trade-off → commit to one prediction → supporting reason → brief implication. Treat it as Q3 with a future tense — same structure, different verb tense.

The progression matters for pacing your effort across the block. Q1 should feel comfortable; if it does not, it usually means you over-thought it. Q4 will feel hardest; if it does not, it usually means you under-developed it. Most candidates who average band 5.0 on this task get band 6 on Q1-Q2 and band 4 on Q3-Q4. The fastest way to lift the section average is targeted Q3-Q4 practice — see the practice plan in section 9.

4. How AI scoring works on this task

Speaking is scored by a combination of SpeechRater (ETS's automated speech scoring engine) and human raters. SpeechRater evaluates each of your four interview responses against four rubric dimensions on a 0-5 scale. The four scores are averaged per response, your four response scores are aggregated, and the aggregate gets combined with your seven Listen and Repeat scores to produce your final Speaking band on the new 1-6 scale.

Rubric dimension	What gets scored	Quickest way to lose points
Delivery	Pronunciation clarity, intonation, intelligibility	Mumbling, monotone, swallowed word endings
Language Use	Grammar accuracy, vocabulary range	Tense errors, repeated basic words like "thing", "good", "very"
Task Completion	Did you actually answer the question? Are ideas developed?	Stopping at 25 seconds; never landing on the actual prompt
Fluency	Pace, rhythm, smoothness without long pauses	Pauses longer than 2 seconds; restart-and-repeat loops

The two dimensions that tank the fastest are Task Completion and Fluency. Grammar and vocabulary errors usually cost half a band each on a single response; an undeveloped answer or a long unfilled pause costs a full band. That is why the framework in the next section starts with a stalling phrase and ends with a one-line wrap — both are anti-fluency-loss devices, not just structural scaffolding.

One non-obvious rule from the rubric: SpeechRater rewards filler words that sound natural ("well", "actually", "you know what") more than complete silence. A 3-second "well, that's interesting because..." is graded better than a 3-second silence followed by a perfect sentence. Train yourself to never go silent for more than 1.5 seconds during the response.

5. The 45-second response framework

This is the structural skeleton that fits any of the four question types. It is engineered for the SpeechRater rubric: stalling phrase covers fluency in the cold-start window, three body sentences hit Task Completion, and the wrap-up line lets you land cleanly even if you misjudge the timer.

The 45-second skeleton

0-5sStalling opener — buys thinking time, signals structure. ("That's a really interesting question, because honestly...")
5-12sPosition or thesis — your one-sentence answer, stated clearly.
12-25sReason / story / first support — develop the main idea with a concrete reason, example, or memory.
25-38sSecond support / extension — add a second example, a contrast, or a nuance.
38-44sOne-line wrap — restate the position briefly and stop. ("So overall, that's why I'd say...")

Two timing rules built into this skeleton. First, the position must come before the 12-second mark — SpeechRater downgrades responses where the first 15 seconds do not contain anything answering the actual question. Second, the wrap-up has to happen by the 44-second mark, even if your second support is unfinished, because a sentence cut mid-phrase reads as worse than a clean one-line close.

The skeleton works for all four question types with one verb-tense swap per slot. For Q1 (personal recall), the body is a memory in past tense. For Q2 (feelings), the body is a feeling-claim in present tense plus an example. For Q3 (opinion), the body is an argument in present tense. For Q4 (policy), the body is a prediction in modal-future tense ("would", "could", "might"). Memorize the skeleton; let the verb tense flex with the question.

6. Stalling phrases that buy thinking time

The 0-5-second opener is the single most under-trained part of this task. A trained opener saves ten times its length in cognitive cost. Untrained, you either go silent (kills Fluency), or you start with the actual answer (which forces you to commit before your brain has organized the response). A good opener does three things at once: fills the silence with on-rubric speech, mirrors the question to lock the topic in working memory, and signals to the AI that a structured answer is coming.

Below are openers calibrated to each of the four question types. Memorize one per slot; rotate so they don't all sound identical across the block.

For Q1 (personal recall)

"That actually reminds me of something that happened last year, when..."
"You know, I had an experience just like that a few months ago. So..."
"OK, the first thing that comes to mind is a time when I was..."

For Q2 (feelings or preferences)

"My honest reaction would have to be that I really..."
"Well, if I'm being totally direct about it, I tend to prefer..."
"That's a question I've actually thought about before, and for me..."

For Q3 (opinion)

"I think the most reasonable position here is that..."
"My view on this is pretty clear, and it's that..."
"There are two sides to it, but I'd come down on the side of saying..."

For Q4 (policy or prediction)

"There are a couple of angles worth considering here, but my best guess would be..."
"Looking ahead, I think the most likely scenario is that..."
"It really depends on how you weigh the trade-off, but I'd predict that..."

One memorization tip: pair the opener with a hand gesture during practice (even a small finger tap on the desk). Motor patterns are easier to recall under stress than pure verbal patterns, and the opener has to fire automatically on test day or it costs more than it saves.

7. Three annotated band 6 sample responses

The three samples below cover Q1, Q3, and Q4 (the three slots where most candidates leave the most points on the table). Each is timed for ~42 seconds of speech at a natural pace. Annotations explain why each segment hits the rubric.

Q1 · Personal recall

Question: "Describe a recent time when you had to make a difficult decision."

"That actually reminds me of something that happened just two months ago. I was offered an internship in another city, but it would've meant leaving my family for the whole summer. So I had to decide pretty quickly whether the experience was worth the time apart. In the end, I took it, mostly because I knew I'd never get the same chance again at this stage in my career. The first week was honestly really hard, but by the third week I started feeling like I'd made the right call. So overall, that decision is one I'm still glad I made."

Why it scores band 6: Stalling opener (0-3s) covers Fluency. Concrete moment ("two months ago", "another city") locks in Task Completion early. Two-clause sentences with varied connectors ("but", "mostly because", "by the third week") score Language Use. Past-tense control is consistent. One-line wrap lands at ~42s.

Q3 · Opinion

Question: "Some people prefer to read books on paper; others prefer e-readers. Which do you think is better, and why?"

"I think the most reasonable position is that paper books are still the better choice, at least for serious reading. The main reason is that I retain information much more reliably when I'm physically turning pages and writing notes in the margin. I've tried switching to an e-reader for a couple of months, and I noticed that even though I could finish books faster, I remembered far less of what I read by the end of the week. There are real advantages to digital, like portability, but for actual learning, paper still wins for me. So that's why I'd come down on the side of paper."

Why it scores band 6: Position stated by 8s. Reason ("retain information") then example ("tried switching for a couple of months") in proper order. Acknowledges the counter ("real advantages to digital") without losing the position — SpeechRater rewards this nuance. One-line wrap at ~43s. Vocabulary range (retain, portability, reliably) avoids the basic-word penalty.

Q4 · Policy or prediction

Question: "Many cities are restricting cars in their downtown areas. What might happen in the next ten years if this trend continues?"

"There are a couple of angles worth considering here, but my best guess is that we'll see most major downtowns become much more pedestrian-focused over the next decade. The clearest driver is that public transit and bike infrastructure are getting cheaper to build, while car ownership is getting more expensive in dense areas. I'd predict that retail and restaurants in those zones would actually grow, because foot traffic tends to spend more time and money than people driving through. The trade-off is going to fall on people who live further out, who'll need much better transit links. But on balance, I think the shift is mostly positive."

Why it scores band 6: Hardest slot, but holds together. Modal-future verbs ("would", "we'll see", "I'd predict") signal correct register for prediction. Trade-off acknowledgment ("the trade-off is going to fall...") is exactly what Q4 rewards. Lands at ~44s with a position-restating wrap. No long pauses.

What none of these samples do: use rare vocabulary, deploy memorized "academic" phrases ("In contemporary society..."), or speak unnaturally fast. SpeechRater is calibrated against natural speech, not essay prose. The fastest way to lose points on this task is to sound like you are reciting an IELTS Speaking Part 2 monologue. For more detail on what this looks like across both halves of the section, see our full TOEFL Speaking 2026 strategy guide.

8. Eight mistakes that cap your interview score at 3

Going silent for the first 4 seconds. Every silent second in the cold-start window costs Fluency. Train the opener so it fires automatically.

Stopping at 25-30 seconds. Even a perfect short response gets capped at band 3 because Task Completion punishes underdevelopment. Aim for 40-44 seconds every time.

Restating the question instead of answering. "The question is asking me about..." burns 8 seconds and answers nothing. Get to your position by the 12-second mark.

Using memorized "academic" phrases. "In contemporary society, it is widely acknowledged that..." flags as rehearsed and downgrades Delivery and Language Use.

Speaking faster to fit more in. Pace is a Fluency dimension. Going faster does not buy more content — it usually loses Delivery points without gaining anything.

Restarting after a stumble. "Sorry, let me start over" wastes 5 seconds and tells the rubric you cannot recover. Talk through the stumble; SpeechRater forgives it.

Saving energy for Q4. The Speaking section is too short to pace yourself. Treat every question as the last one you'll get.

Letting Q4 derail your average. A weak Q4 is normal. A panicked Q4 followed by self-criticism that bleeds into the next test isn't — but Speaking is over after Q4, so the only consequence of a bad Q4 is the score itself, not a downstream drag.

9. The 3-week interview-only practice plan

This plan assumes you have at least a band 3 baseline on Speaking and a 3-week runway. If your full Speaking baseline is below band 3, follow the broader plan in our 4-week and 8-week TOEFL study plans first.

Week 1 — Skeleton drills (no full responses yet)

✓Day 1-2: Memorize one stalling opener per question type. Practice each opener 10 times in front of a mirror until it fires within 1 second of seeing the question.
✓Day 3-4: Drill 5-second skeleton fragments — opener + position only. Stop. The goal is to lock the opener-to-position transition, nothing more.
✓Day 5: Drill 15-second responses (opener + position + first reason). Stop at 15.
✓Day 6-7: Drill the full 45-second skeleton on practice prompts from our TOEFL Speaking topics 2026 collection. Record yourself. Listen for silent gaps over 1.5 seconds.

Week 2 — Full responses, all four question types

✓Day 1-2: 8 full Q1 responses (personal recall). Time them. Aim for 40-44 seconds each.
✓Day 3-4: 8 full Q3 responses (opinion). Same timing. Vary the position you take — the AI does not care which side, only that you commit to one.
✓Day 5-6: 8 full Q4 responses (policy/prediction). This is the hardest set. Force yourself to take a clear prediction even when uncertain.
✓Day 7: Mixed-question block — full 4-question interview with no break, exactly as on test day. Record. Self-grade against the rubric in section 4.

Week 3 — Mock conditions and recovery drills

✓Day 1: Two full Speaking sections (Listen and Repeat + interview). Headphones on, microphone live, no notes.
✓Day 2-3: Recovery drills — practice mishearing scenarios. Have a partner mumble the question, or play a recorded version at low volume. Practice answering on partial information without restarting.
✓Day 4: A full TOEFL timed mock test on the same setup you'll use on test day.
✓Day 5-6: Light maintenance only — 4 fresh interview prompts per day, recorded.
✓Day 7: Rest. No Speaking practice. Re-read the test day checklist instead.

The biggest predictor of band-6 outcomes from this plan is recording yourself daily. Self-grading by ear catches Fluency and Delivery issues that a partner usually misses. Keep the recordings; comparing week-1 to week-3 audio is the fastest way to see whether the prep is actually moving the score.

10. Test-day tactics for the no-prep moment

Speaking comes last on the 2026 test, which means you arrive at this task after 60+ minutes of Reading, Listening, and Writing. Cognitive reserves are low. The tactics below are designed for that state — they are not what you would do if Speaking came first.

1Reset your posture before the interview block. Sit up straight, both feet on the floor, shoulders back. Breath rate drops, voice gets clearer. Delivery rubric responds to this within 5 seconds.
2Aim your eyes at the video, not the screen text. The natural prosody of speech is better when you treat the interviewer like a person. The on-screen prompt is a backup, not the primary input.
3Fire the opener within 1 second. Even if you have not finished hearing the question, start the opener. The opener is content-free; you can buy 4 seconds of stalling time before you commit to a position.
4If you mishear a word, answer the half you caught. SpeechRater scores responses on the rubric, not on whether you addressed every nuance of the prompt. A confident, on-topic-ish answer beats a paralyzed accurate one.
5Watch the timer in your peripheral vision, not directly. Looking at the timer reads as hesitation in your voice. Trust the practice; if you've drilled the skeleton, your internal clock is accurate to within 3 seconds.
6After Q4, reset before whatever comes next. On the 2026 format, Speaking is the last section, so "what comes next" is finishing the test. But the relief of finishing should not turn into a slumped exit — many Home Edition candidates lose proctoring points by relaxing too visibly. Stay still until the test fully ends. Re-read our Home Edition setup guide for the full check-out flow.

The interview task rewards process over inspiration. The candidates who hit band 6 reliably are not the most fluent speakers in the room — they are the ones who fired their opener within 1 second, stayed on the skeleton for 45 seconds, and landed the wrap-up clean four times in a row. Trust the framework, stick to the timing, and let the AI score what it is calibrated to score.

11. FAQ

What is the Take an Interview task on the new TOEFL Speaking section?

Take an Interview is the second and longer half of the 2026 TOEFL Speaking section. A pre-recorded video of an interviewer asks four questions in sequence about an everyday research topic. You answer each one immediately with no preparation time. Each response gets exactly 45 seconds of recording. The four questions move from personal recall, to emotional reaction, to opinion, to policy or prediction, getting harder as they progress. The whole interview block runs about five minutes.

How much preparation time do I get for each TOEFL interview question in 2026?

None. The 2026 simulated interview has zero preparation time. The interviewer asks the question, the recording light turns on, and you must start speaking immediately. This is a deliberate redesign decision: the old TOEFL Speaking task gave 15-30 seconds of prep, but the new interview format tests fluency under cold conditions. The first 3-5 seconds of every response should be a stalling phrase that buys thinking time without sounding like a stall.

How long is each TOEFL interview response in 2026?

Each of the four interview responses is 45 seconds of recording. The recording starts automatically after the question audio ends and stops at exactly 45 seconds. There is no way to stop early or extend. A typical band 6 response uses 40-44 seconds of the window with content and one or two seconds of natural trailing. Stopping at 25-30 seconds reads as undeveloped to the AI scoring engine and almost always caps the score at 3.0 or below.

What are the four interview question types in order?

Question 1 is personal recall, asking about a memory or past experience ("Describe a time when..."). Question 2 is emotional reaction, asking how you feel or what you prefer ("How do you feel about..."). Question 3 is opinion with support, asking you to take and justify a position. Question 4 is policy or prediction, asking you to evaluate a broader claim or speculate about consequences. The questions get progressively harder. Question 4 is where most candidates lose the most points because it requires abstract reasoning under no-prep pressure.

How is the TOEFL interview scored by AI in 2026?

ETS uses SpeechRater (the AI scoring engine) combined with human raters. SpeechRater evaluates four rubric dimensions per response on a 0-5 scale: Delivery (clarity, intonation, intelligibility), Language Use (grammar accuracy and vocabulary range), Task Completion (how fully the question is answered and how developed the ideas are), and Fluency (pace, rhythm, smoothness without long pauses). The four scores are averaged, then aggregated across all 11 Speaking items, then converted to your 1-6 band score. Long unfilled pauses, monotone delivery, and grammar errors hurt the most.

What is the best opening line for a TOEFL interview response?

Start with a 3-5 second stalling phrase that buys thinking time without sounding empty. For Question 1 (personal recall): "That actually reminds me of a time last year when..." For Question 2 (feelings): "My honest reaction would have to be that..." For Question 3 (opinion): "I think the most reasonable position here is that..." For Question 4 (policy): "There are a couple of angles worth considering, but..." These openings sound natural to the AI scoring, give your brain 3 seconds to plan the body, and signal a structured response is coming.

What topics come up on the TOEFL interview in 2026?

Topics are deliberately everyday and accessible: cities and commuting, technology and devices, daily habits, decision-making, education and study habits, entertainment, exercise and health, reading, travel, food, and friendships. The simulated researcher framing means questions sound like a casual study about life experiences, not an academic exam. You will not get specialized topics that require subject-matter expertise. The challenge is producing 45 seconds of well-organized speech under no-prep pressure, not finding things to say.

Should I memorize templates for the TOEFL interview task?

Memorize the structural skeleton, not the content. A reusable opener, two transitions, and a wrap-up line save 10-15 seconds of cognitive load on test day, which translates directly into more developed body content. Memorizing full essay-style answers backfires because the AI scoring engine penalizes responses that sound rehearsed or off-topic. Use the framework in section 5 of this guide as the skeleton; fill the body with whatever you actually think about the prompt.

The interview task is the most coachable part of the 2026 TOEFL Speaking section because every constraint — no prep, fixed length, fixed question order, scored by a known rubric — is something you can train against directly. Treat the four 45-second windows as four small performances, not four small exams. Drill the opener until it fires automatically. Land the wrap-up cleanly even when the body is uneven. Use the practice plan, time everything, and record yourself. By test day, the opener should fire before you've consciously decided to start speaking — that is the difference between a band 4 and a band 6.

Practise the interview task under real conditions

Our free TOEFLMock Speaking practice tests use the 2026 interview format — same 45-second window, same four-question progression, same on-screen video framing, with AI feedback against the SpeechRater rubric. Run one full Speaking section before you book your test date and one more in the final week of your prep cycle.

Start a Free Speaking Practice Test

Daniel Whitaker

Head of Curriculum

Test preparation specialist and former classroom instructor. Designs full-length mock content aligned to the 2026 ETS redesign and writes section-strategy, study-plan, and rubric-decoded guides for every TOEFL task type.