AI Text to Image Generator

A Next.js application combining OpenAI's chat and text-to-speech capabilities with local Stable Diffusion for image generation.

Technologies Used

Next.jsReactOpenAI APIStable DiffusionTTSWhisper API

Skills Developed

Built as my final project for the Encode Club AI Foundation Bootcamp, this solo project challenged me to integrate multiple AI systems including text generation, speech processing, and image creation. It significantly deepened my understanding of AI model pipelines.

About this Project

This comprehensive AI tool combines multiple technologies to create a seamless text-to-image experience. Users can input prompts which are enhanced by OpenAI's language models, converted to speech using text-to-speech technology, and then processed by a local installation of Stable Diffusion to generate high-quality images. The application demonstrates how different AI systems can be chained together to create a powerful creative tool.

Key Features

Text-to-image generation with Stable Diffusion
AI-powered prompt enhancement
Text-to-speech capability
Speech-to-text using Whisper API
Local image generation without dependency on external services
Image history and management
Customizable generation parameters

Challenges & Solutions

Challenge 1:Integrating local Stable Diffusion with web-based APIs

Solution:

Created a modular architecture that allows the frontend to communicate with the local image generation system via a custom API bridge

Challenge 2:Optimizing prompt engineering for better image outputs

Solution:

Implemented an AI-assisted prompt enhancement system that refines user inputs before sending them to the image generation pipeline