github floneum/kalosm kalosm-0.3.0

latest release: kalosm-0.4.0
21 months ago

Kalosm 0.3

Kalosm 0.3 makes it significantly easier to use structured generation, improves transcription, and makes it possible to track model download progress. It also includes performance improvements for text generation and transcription models along with parser improvements

Performance Improvements

The new version of Kalosm includes significant performance improvements for llama, mistral, and phi models. We have also developed sampler aware structured generation which lets us skip parsing most tokens in loose structures. Performance should be between 2-4x as fast depending on your usecase:

Demo Kalosm 0.2 Kalosm 0.3
Text generation
kalosm-0.2.mp4
kalosm-0.3.mp4
Structured Generation
kalosm-0.2-structured.mp4
kalosm-0.3-structured.mp4

Structured Generation Improvements

Structured generation is both easier and faster in 0.3. Many structured generation tasks can use json. If you just need a json parser, kalosm 0.3 lets you derive the parser from your data:

use kalosm::language::*;

/// A fictional character
#[derive(Parse, Schema, Clone, Debug)]
struct Character {
    /// The name of the character
    #[parse(pattern = "[A-Z][a-z]{2,10} [A-Z][a-z]{2,10}")]
    name: String,
    /// The age of the character
    #[parse(range = 1..=100)]
    age: u8,
    /// A description of the character
    #[parse(pattern = "[A-Za-z ]{40,200}")]
    description: String,
}

Then you can build a task that generates the character:

#[tokio::main]
async fn main() {
    // First create a model. Chat models tend to work best with structured generation
    let model = Llama::new_chat().await.unwrap();
    // Then create a task with the parser as constraints
    let task = Task::builder_for::<Character>("You generate realistic JSON placeholders for characters")
        .build();
    // Finally, run the task
    let mut stream = task.run("Create a random character", &model);
    stream.to_std_out().await.unwrap();
    let character = stream.await.unwrap();
    println!("{character:?}");
}

Along with the parser, you can also derive a json schema that matches the parser which is useful for function calling models.

You can read more about how structured generation works in kalosm in our last blog post.

Streaming Voice Transcription

Kalosm 0.3 adds support for transcribing audio streams like microphone in chunks based on voice activity. You can now read the audio stream directly from the microphone and transcribe it as voices are detected:

// Create a new whisper model.
let model = Whisper::new().await.unwrap();

// Stream audio from the microphone
let mic = MicInput::default();
let stream = mic.stream().unwrap();

// Transcribe the audio into text in chunks based on voice activity.
let mut text_stream = stream.transcribe(model);

// Finally, print the text to the console
text_stream.to_std_out().await.unwrap();

Model Progress

Loading models is now async with a callback for loading progress:

let model = Bert::builder()
    // build with loading handler lets you track the progress of the model loading
    .build_with_loading_handler(|loading| match loading {
        ModelLoadingProgress::Downloading {
            source,
            start_time,
            progress,
        } => {
            let elapsed = start_time.elapsed();
            println!("Downloading model from {source}...{progress}% (elapsed {elapsed:?})");
        }
        ModelLoadingProgress::Loading { progress } => {
            println!("Loading model into memory...{progress}%");
        }
    })
    .await
    .unwrap();

Whisper transcriptions and wuerstchen generations are also async with progress info thanks to @newfla:

// Create a new whisper model
let model = WhisperBuilder::default()
    .with_source(WhisperSource::QuantizedDistilLargeV3)
    .build()
    .await.unwrap();

let mic = MicInput::default(); 
let audio = mic.stream().unwrap();

// Transcribe the source audio into text
let mut text = audio.transcribe(model);

// As the model transcribes the audio, print the text to the console
while let Some(chunk) = text.next().await {
    let text = chunk.as_ref();
    println!("{text}");
    println!(
        "estimated time left to decode chunk: {}s",
        chunk.remaining_time().as_secs()
    );
}

Documentation improvements

The inline documentation has been significantly improved in 0.3. Common items now include inline guides to help you get started like the language page and concept explanations like embeddings

New models!

Along with the new release, kalosm supports a few new models:

  • Quantized whisper models are now supported with presets for distilled versions of whisper to run even faster
  • The Phi-3 series of models is supported by kalosm-llama. The Phi series performs above its weight for structured json generation tasks

Full changelog

New Contributors

Full Git Diff: v0.2.0...kalosm-0.3.0

Don't miss a new kalosm release

NewReleases is sending notifications on new releases.