Skip to content

Generation Pipeline

This document explains the step-by-step music generation process in MIDI Sketch.

Pipeline Overview

Phase 1: Structure Building

The generator first creates the song structure based on StructurePattern:

cpp
void Generator::buildStructure() {
    arrangement_ = StructureBuilder::build(params_.structure);
}

Structure Patterns

PatternBarsSections
StandardPop24A(8)-B(8)-Chorus(8)
BuildUp28Intro(4)-A(8)-B(8)-Chorus(8)
DirectChorus16A(8)-Chorus(8)
RepeatChorus32A(8)-B(8)-Chorus(8)-Chorus(8)
FullPop56Intro-A-B-Chorus-A-B-Chorus-Outro
FullWithBridge52Intro-A-B-Chorus-Bridge-Chorus-Outro
Ballad56Intro(8)-A-B-Chorus-Interlude-B-Chorus-Outro
ExtendedFull90Full form with bridge and extended sections

Section Types

Each section has properties that affect generation:

cpp
struct Section {
    SectionType type;         // Intro, A, B, Chorus, Bridge, Interlude, Outro
    uint8_t bars;             // Length in bars
    VocalDensity vocal_density;    // Full, Sparse, None
    BackingDensity backing_density; // Normal, Thin, Thick
};

Phase 2: Rhythm Section

Bass Generation

Bass is generated first as the harmonic foundation:

Bass Patterns:

  • Sparse: Quarter notes on beats 1 and 3 (ballad, chill)
  • Standard: Quarter note rhythm with occasional eighths
  • Driving: Eighth note patterns with approach notes

Chord Generation

Chord voicing uses bass analysis for coordination:

cpp
void Generator::generateChord() {
    BassAnalysis bassAnalysis = analyzeBass(song_.bass);
    // Use rootless voicing when bass has root
    if (bassAnalysis.hasRootOnBeat1) {
        useRootlessVoicing();
    }
}

Voice Leading Algorithm:

  1. Calculate distance between consecutive voicings
  2. Minimize movement (sum of semitone distances)
  3. Maximize common tone retention
  4. Apply inversions to optimize transitions

Drums Generation

Drum patterns are selected based on mood:

StyleCharacteristicsUsed By
SparseHalf-time feel, minimalBallad, Chill
Standard8th hi-hat, 2&4 snareStraightPop
FourOnFloor4-on-floor kickElectroPop, IdolPop
UpbeatSyncopated, 16th hi-hatBrightUpbeat
RockRide cymbal, crash accentsLightRock
SynthTight 16th hi-hatYoasobi, Synthwave

Fill Generation:

  • Tom descend/ascend patterns
  • Snare rolls
  • Combination fills at section transitions

Phase 3: Melody Generation

Vocal Track

The most complex generator with phrase caching:

Vocal Attitudes:

AttitudeCharacteristics
CleanChord tones only, on-beat rhythms
ExpressiveTensions with delayed resolution, slight timing deviation
RawNon-chord tones, phrase boundary breaking

Non-Chord Tones:

  • 4-3 suspensions
  • Anticipations
  • Passing tones
  • Neighbor tones

Motif Track (BackgroundMotif style)

Generates repeating patterns:

cpp
MotifParams params {
    .length = MotifLength::TwoBars,    // 2 or 4 bars
    .rhythm_density = RhythmDensity::Medium,
    .motion = MotifMotion::Stepwise,
    .repeat_scope = RepeatScope::FullSong
};

Arpeggio Track (SynthDriven style)

Generates arpeggiated patterns:

cpp
ArpeggioParams params {
    .pattern = ArpeggioPattern::UpDown,
    .speed = ArpeggioSpeed::Sixteenth,
    .octave_range = 2,
    .gate = 0.5f  // Note length ratio
};

Phase 4: Polish

Transition Dynamics

Automatically applies energy transitions:

Section Energy Multipliers:

SectionMultiplier
Intro0.75
A0.85
B1.00
Chorus1.20
Bridge0.90
Outro0.80

Humanization

Adds natural variation to timing and velocity:

cpp
void applyHumanization(Song& song, float intensity) {
    // Timing: random offset ±ms
    // Velocity: random ±value
    // Not applied to drums
}

MIDI Output

Finally, the Song is converted to SMF Type 1:

Track Mapping:

TrackChannelProgram
Vocal00 (Piano)
Chord14 (E.Piano)
Bass233 (E.Bass)
Motif381 (Synth Lead)
Arpeggio481 (Saw Lead)
Drums9GM Drums
SE15Text events

Key Transposition

All generation happens in C major. Final transposition is applied at output:

cpp
uint8_t MidiWriter::transposePitch(uint8_t pitch, Key key) {
    return pitch + static_cast<uint8_t>(key);
}

Released under the MIT License.