June 7, 2026 · 5 min read

Voice AI Hello World: Have a Phone Number Greet a Caller in Under a Minute

The smallest working voice app: a phone number that answers and plays a pre-recorded greeting. Three lines of VoyloML, a wav file in the right format, and you're live.

voice AIVoyloMLtutorial

Every voice-AI tutorial we've seen opens with "and then your AI agent answers the call." None of them tell you how the call actually reaches the agent — what a phone number does between the carrier and Vapi / Retell / ElevenLabs / OpenAI Realtime, what audio format your "welcome message" needs to be in, why your first wav file plays silently, and what happens when the agent provider goes down at 2 a.m.

This post is the simplest possible starting point. A real phone number. A real welcome message. A real call that lands and plays the audio. No AI yet — just the layer underneath. If you can get this working you can build everything else on top of it.

What you'll build

A phone number that, when someone calls it, answers the line and plays a pre-recorded wav file. Total VoyloML:

app.xml
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Play>https://voylo.s3.ap-south-1.amazonaws.com/voylo-welcom.wav</Play>
  <Hangup/>
</Response>

Three lines of meaningful XML. Everything else is platform configuration that you do once and forget.

What you'll need

  • A Voylo account (sign up, free tier is enough).
  • A phone number in your workspace — either rent one through Voylo or import a number from a SIP carrier you already use (Twilio, Bandwidth, Plivo BYO mode, etc.).

We're using a pre-recorded welcome sample we host so you can skip audio prep entirely for this tutorial. When you're ready to swap in your own recording — pre-recorded message, ElevenLabs TTS output, anything — hosting it follows the same pattern and a future post covers the audio-format gotchas in full.

Step 1 — Create the Voylo application

In the Voylo console:

  1. Applications → New application.
  2. Name it Hello world (or anything memorable).
  3. Paste this as the static XML template:
app.xml
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Play>https://voylo.s3.ap-south-1.amazonaws.com/voylo-welcom.wav</Play>
  <Hangup/>
</Response>

Save. Copy the application's UUID for the next step.

Step 2 — Wire your phone number to the application

  1. Numbers → <your number>.
  2. Inbound routing: select Application → pick Hello world.
  3. Save.

That's the whole setup. The next call to your number runs this app.

Step 3 — Test it

The fastest test is from inside the console — Test call → Inbound opens a WebRTC softphone in your browser. Pick the number, click Call. You should hear your welcome message play to completion, then the line clears.

For a real PSTN test, call the number from your cell. Same result, but you confirm the carrier path works too.

If the call rings and immediately drops: the application is wired to the wrong number, or the application UUID was mistyped. Re-check Numbers → <your number> → Inbound routing.

If the call connects but you hear silence: double-check the URL in the XML is exactly the one above (lowercase .wav, no trailing spaces).

What you just built

This is the smallest possible voice application. Three verbs that together prove every piece of the stack works:

VerbWhat it proved
<Response>The application was found and the VoyloML parsed
<Play>Audio fetched over HTTPS, decoded, streamed into the call, the codec negotiation with the caller's network succeeded
<Hangup>The application can cleanly end the call without dropping the channel

Add anything else (Gather, Dial, Record, Switch, Flow, Redirect) and you're extending a working foundation rather than debugging it. Everything more complex in voice AI rests on the four claims above being true.

What's next

The real shape of a production voice app isn't "play a greeting and hang up" — it's "greet, route to the right place, fail over when something breaks, log it all." The next post walks through that: a digit-menu IVR that routes to an ElevenLabs voice agent, with automatic failover to a backup destination and a final human fallback if everything goes wrong.

Building Production Voice AI: ElevenLabs IVR with Failover to Human

Reference

Try this on Voylo

Free tier covers everything in this post. No credit card.