The problem - some books are not available in audiobook formats. 

The solution - I asked an AI agent to help create a small pipeline

  • PDF → extract text
  • Clean page numbers and captions
  • Send clean text to a text-to-speech model
  • Export audio

Three prompts later - I had working code.

First run on my laptop: 14 hour ETA. Overnight run.

Morning check: 8 hours left.

Evening check: Out-of-memory crash.

Classic.

Back to the agent.

It suggested running the job on GPU:

  • SageMaker
  • Kaggle

SageMaker refused to cooperate that day, so I switched to Kaggle.

Update. Upload. Run.

Three minutes later I had an audiobook.

Small difference, huh? 14h to get OOM crash vs 3 minute GPU jobby

Earlier that day the audiobook didn’t exist anywhere.

Now it did — because a small tool assembled it.

Preview: (quality is really pristine)

audio-thumbnail
Hw hacker
0:00
/34.092979

Success!


That’s the part of AI that excites me most. Not content generation. Tool generation.

Computer science has always been about building tools that reduce friction in everyday tasks. AI lowers the barrier dramatically.

You describe the tool.

The AI assembles it.

And problems that once required weeks of work - can become an evening experiment.

That’s the snowball.

One tool enables the next.