Kani TTS Installation Guide

Prerequisites

Before installing Kani TTS, ensure you have Python 3.8 or higher installed on your system. The following dependencies are required for core functionality.

Core Dependencies

# Core dependencies
pip install torch librosa soundfile numpy huggingface_hub
pip install "nemo_toolkit[tts]"
# CRITICAL: Custom transformers build required for "lfm2" model type
pip install -U "git+https://github.com/huggingface/transformers.git"
# Optional: For web interface
pip install fastapi uvicorn

Important Note

The custom transformers build is critical for the "lfm2" model type. Make sure to install it from the GitHub repository as shown above.

Quick Start

Once you have installed the dependencies, you can start generating speech with Kani TTS using these simple commands.

Generate Audio with Default Sample Text

python basic/main.py

This command will load the TTS model and generate speech using the built-in sample text, saving the output as generated_audio_YYYYMMDD_HHMMSS.wav.

Generate Audio with Custom Text

python basic/main.py --prompt "Hello world! My name is Kani, I'm a speech generation model!"

Use the --prompt flag to specify your own text for speech generation. The model will process your custom text and generate corresponding audio.

Web Interface

For a browser-based interface with real-time audio playback and interactive controls, you can use the included web interface.

Start the FastAPI Server

python fastapi_example/server.py

This starts the FastAPI server on http://localhost:8000. The server provides REST API endpoints for speech generation.

Access the Web Interface

# Open fastapi_example/client.html in your web browser # Server runs on http://localhost:8000

Open the client.html file in your web browser to access the interactive interface.

Web Interface Features

  • • Interactive text input with example prompts
  • • Parameter adjustment (temperature, max tokens)
  • • Real-time audio generation and playback
  • • Download functionality for generated audio
  • • Server health monitoring

Configuration

Kani TTS comes with sensible default configurations that work well for most use cases. You can customize these settings based on your specific requirements.

Default Configuration

  • Model: https://huggingface.co/nineninesix/kani-tts-450m-0.1-pt
  • Sample Rate: 22,050 Hz
  • Generation: 1200 max tokens, temperature 1.4

Model Variants

Choose different models for specific voice characteristics and performance requirements.

Base Model (Default)

nineninesix/kani-tts-450m-0.1-pt

Generates random voices with consistent quality.

Female Voice

nineninesix/kani-tts-450m-0.2-ft

Fine-tuned for female voice characteristics.

Male Voice

nineninesix/kani-tts-450m-0.1-ft

Fine-tuned for male voice characteristics.

Changing Models

To use a different model, modify the ModelConfig class in config.py:

class ModelConfig:
model_name = "nineninesix/kani-tts-450m-0.2-ft" # Change this
# ... other configuration options

Troubleshooting

Common Installation Issues

Issue: "lfm2" model type not supported
Solution: Ensure you've installed the custom transformers build from GitHub
Issue: CUDA out of memory
Solution: Reduce batch size or use CPU mode for testing
Issue: Audio quality issues
Solution: Check sample rate settings and ensure proper audio codec installation

Performance Optimization

  • • Use GPU acceleration when available for faster processing
  • • Adjust temperature and max tokens based on your quality vs speed requirements
  • • Consider using smaller model variants for edge device deployment
  • • Monitor memory usage and adjust batch sizes accordingly

Ready to Start?

Now that you have Kani TTS installed, explore the demo and start building your applications.