AI Text to Image Generator
A Next.js application combining OpenAI's chat and text-to-speech capabilities with local Stable Diffusion for image generation.
Technologies Used
Skills Developed
Built as my final project for the Encode Club AI Foundation Bootcamp, this solo project challenged me to integrate multiple AI systems including text generation, speech processing, and image creation. It significantly deepened my understanding of AI model pipelines.
About this Project
This comprehensive AI tool combines multiple technologies to create a seamless text-to-image experience. Users can input prompts which are enhanced by OpenAI's language models, converted to speech using text-to-speech technology, and then processed by a local installation of Stable Diffusion to generate high-quality images. The application demonstrates how different AI systems can be chained together to create a powerful creative tool.
Key Features
- Text-to-image generation with Stable Diffusion
- AI-powered prompt enhancement
- Text-to-speech capability
- Speech-to-text using Whisper API
- Local image generation without dependency on external services
- Image history and management
- Customizable generation parameters
Challenges & Solutions
Challenge 1:Integrating local Stable Diffusion with web-based APIs
Solution:
Created a modular architecture that allows the frontend to communicate with the local image generation system via a custom API bridge
Challenge 2:Optimizing prompt engineering for better image outputs
Solution:
Implemented an AI-assisted prompt enhancement system that refines user inputs before sending them to the image generation pipeline