pySLGR Package

Modules

pyslgr.GMMModel
pyslgr.LLFeatures
pyslgr.LLSignal
pyslgr.MFCCFeatures
pyslgr.FeatPipe
pyslgr.iVector
pyslgr.GSV
pyslgr.sad

pyslgr.GMMModel module

class pyslgr.GMMModel.GMMModel

Bases: object

icov()

Return the GMM diagonal of the inverse covariance matrix as a numpy array

is_loaded()
load()

Load a GMM model.

load(model_file_name)

Parameters:
model_file_name
: string
File name of model to load
mean()

Return the GMM mean vector as a numpy array

num_fea()
num_mix()
score_models()

Load a GMM model.

load(LLFeatures f, list models, int topM, bool use_shortfall, float sf_delta)

Parameters:
f
: LLFeatures
Feature input
models
List of GMM models to process
topM
: int
Number of Gaussians to score per frame (default: 5)
use_shortfall
: bool
Use shortfall method when evaluating Gaussian mixture model (default: True)
sf_delta
: float
Shortfall delta for pruning (default: 10.0)
suff_stats()
class pyslgr.GMMModel.GMMSAD

Bases: object

Class to process a signal and produce SAD marks using a GMM

GMMSAD (feat_config, gmm_models, label_keep, label_window, min_gap_dur=0.5, min_seg_dur=0.2) or GMMSAD (config) where config is a JSON string or dictionary.

Parameters:
feat_config
: string
Configuration string from a JSON object – see MFCCFeatures.process for more details
model_dir
: string
Base directory where models are stored
gmm_models
: dict
Dictionary of models for GMMSAD scoring. Typical keys are ‘speech’, ‘music’, ‘nonspeech’.
label_keep
: string
The key of the model to keep (default: ‘speech’)
label_window
: int
Window length for smoothing frame scores
min_gap_dur
: float (default 0.5 seconds)
Minimum gap between segments – segments are combined if the gap is smaller than this time
min_seg_dur
: float (default 0.2 seconds)
Minimum segment duration – ignore segments below this duration
process()

Process LLSignal and return a list of tuples. Each tuple is a start time (seconds) and duration of the detected class.

gmmsad (signal)
signal LLSignal f LLFeatures object (not used, but available for uniformity)

pyslgr.LLFeatures module

class pyslgr.LLFeatures.LLFeatures

Bases: object

accel()

Calculate acceleration values from delta values.

accel(accel_spread)

Parameters:
accel_spread
: int
Acceleration at time t is calculated using the frames t-k,...,t,...,t+k where k is ‘accel_spread’
apply_sad()

Given that speech activity detection has been calculated or loaded, remove frames corresponding to non-speech.

apply_sad()

delta()

Calculate delta values from base features.

delta(delta_spread)

Parameters:
delta_spread
: int
Delta at time t is calculated using the frames t-k,...,t,...,t+k where k is ‘delta_spread’
delta2point()

Calculate delta values from base features using only 2 values.

delta2point(delta_spread)

Parameters:
delta_spread
: int
Delta at time t is calculated using the frames t-k and t+k where k is ‘delta_spread’
feat_norm()

Normalize each feature individually to zero mean and unit variance across all frames.

feat_norm()

load_raw()

Load a file of raw floats into a feature store. Assumes all features are base features.

load_raw (filename, num_feat)

Parameters:
filename
: string
Name of file to load
num_feat
: int
Number of base features (dimension of feature vector)
load_sad_labels()

Load a file of SAD labels (0/1). Assumes whitespace separation between labels.

load_sad_labels(filename, sloppy)

Parameters:
filename
: string
Name of file to load
sloppy
: bool
Complain if the number of labels doesn’t match the number of feature vectors (default: True)
num_base_feat()

Return the number of base features in each vector. Base features do not include post-processing such as delta, sdc, or acceleration.

num_base_feat() -> int

num_outfeat()

Return the number of output features in each vector. Output features are set with the set_outfeat() method.

num_outfeat() -> int

num_total_feat()
num_vec()

Return the number of feature vectors.

num_vec() -> int

rasta()

Apply RASTA to all features.

rasta()

sad_labels()

Return the current SAD labels (0/1) per frame.

save_raw()

Save features as raw floats to ‘filename’. The features saved are set by ‘set_outfeat’.

save_raw(filename)

Parameters:
filename
: string
Name of file to save features in
save_sad_labels()

Save SAD labels (0/1) to file.

save_sad_labels(filename)

Parameters:
filename
: string
Name of output file
sdc()

Shifted-delta features – typically used for language recognition. Note: Uses available delta features which must be calculated before invoking ‘sdc()’.

sdc (sdc_p, sdc_k):

Parameters:
sdc_p
: int
p value – shift between delta blocks (typical value, 3 )
sdc_k
: int
k value – number of delta blocks to stack (typical value, 7)
set_outfeat()

Set the features to output for typical operations. The value can be changed at any time. Order in the parameter string determines the stacking order. By default the ‘outfeat’ is set to ‘all’ when the feature object is created.

set_outfeat(outfeat)

Parameters:
outfeat
: string
Set the output features. If outfeat==’all’, then all base features and calculated features are returned (except energy. Otherwise, ‘outfeat’ is examined a character at a time and features are stacked in that order. ‘f’ base features, ‘d’ delta-features, ‘a’ acceleration features, ‘e’ energy, ‘s’ sdc features. E.g., set_outfeat(‘fd’) would set the output to base features in indices 0, ..., num_base_feat-1 and delta features in num_base_feat+1, ..., -1.
xtalk()

xtalk energy based speech-activity detection.

xtalk (abs_min_energy, thresh, med_len=1)

Parameters:
abs_min_energy
: float
Below this threshold is non-speech. Typical values, -10 or 0.
thresh
: float
Above this threshold triggers speech activity (the algorithm is adaptive).
med_len: int (default 1)
Median filter to smooth activity. Large values imply less abrupt changes in speech activity.

pyslgr.LLSignal module

class pyslgr.LLSignal.LLSignal

Bases: object

Class to contain and process 1-dimensional signals–typically speech or audio.

LLSignal() – empty signal with zero samples.

get_f0()

Find the fundamental frequency f0 (“pitch”) from the signal using the Entropic algorithm.

get_f0 (min_f0, max_f0, window_dur, frame_step) -> np.array(dtype=float)

Parameters:
min_f0
: float
Minimum allowed fundamental frequency in Hz (e.g., 100)
max_f0
: float
Maximum allowed fundamental frequency in Hz (e.g., 650)
window_duration
: float
Window duration in seconds (e.g., 0.010 – 10 milliseconds)
frame_step
: float
Increment of window position (e.g., 0.002 – 2 milliseconds)
length()

Length of the signal in samples.

x.length() -> int

load_pcm_wav()

Load a single-channel pcm-encoded Microsoft wav file format.

load_pcm_wav(filename, sum_channels=True)

Parameters:
filename
: string
Path of file to load
sum_channels
: boolean (default True)
Default True – sum channels if multiple present. Otherwise an error will be thrown for multiple channels.
load_raw_short()

Load a single-channel pcm-encoded file of short ints with no header.

load_pcm_wav(filename, sampling_frequency)

Parameters:
filename
: string
Path to file to load
sampling_frequency
: int
Sampling frequency of the file in Hz (e.g., 8000 for 8 kHz)
load_sph()

Load a NIST sphere file – channel is 0, 1. Use 0 for single channel.

load_sph(filename, channel_num)

Parameters:
filename
: string
Path to file to load
channel_num
: int
Number of channel, 0 or 1, to load
normalize()

Normalize the amplitude of the waveform to 16-bits.

normalize()

preemphasis()

Perform pre-emphasis on the waveform; i.e., filter with 1/(1-alpha*z^(-1))

preemphasis (alpha)

Parameters:
alpha
: float
Pre-emphasis coefficient
remove_mean()

Remove the mean of the signal.

remove_mean()

resample_16k()

Resample the signal to an 16 kHz sampling rate. Note: The resample_init() method must be called before calling this method. Also, if the sample rate is below 16 kHz, no operation will be performed.

resample_16k()

resample_8k()

Resample the signal to an 8 kHz sampling rate. Note: The resample_init() method must be called before calling this method.

resample_8k()

sampling_frequency()

Return the sampling frequency of the currently loaded signal.

sampling_frequency()

save_pcm_wav()

Save the current signal in pcm-encoded Microsoft wav file format.

save_pcm_wav(filename, scale=False)

Parameters:
filename
: string
Path of file to save
scale
: boolean (default False)
Scale the output to full 16-bit range when saving
save_raw_short()

Save the current signal as short ints with no-header.

save_raw_short(filename, clip, scale=False)

Parameters:
filename
: string
Path of file to save
clip
: boolean
Clip the output if it is greater than the largest 16-bit value
scale
: boolean (default False)
Scale the output to full 16-bit range when saving

pyslgr.MFCCFeatures module

class pyslgr.MFCCFeatures.MFCCFeatures

Bases: pyslgr.LLFeatures.LLFeatures

config String or dictionary containing configuration parameters for MFCCs.
Parameters in the config are:

alpha Warping factor for bilinear method (no warping: 1.0)
dither 0/1 - Add low level noise to the signal (typical: 1)
fb_low Lowest filter bank frequency in Hz (typical: 300)
fb_hi Highest filter bank frequency in Hz (typical: 3140)
fb_only 0/1 - Instead of producing cepstral coefficients produce the ‘raw’ filter bank outputs instead
keep_c0 0/1 - Keep the c0 cepstral coefficient; c0 represents frame energy (typical: 0)
linear true/false - linear or mel-warped scale for filter banks (typical: false)
num_cep int - number of cepstral coefficients (c1-c??) to output (typical: 7-19)
tgt_num_filt int - number of filters across the entire bandwidth; only applied for linear=true
win_inc_ms int - window increment in milliseconds (typical: 10)
win_len_ms int - window length in milliseconds (typical: 20-30)

static config_dict_to_str()

Converts a feature configuration dictionary to a json string for processing.

duration()

Return the duration that the MFCC data spans in seconds.

duration() -> float seconds

static get_lid_config()

Returns default language id configuration as a dictionary. User can modify the entries in the dictionary or use this configuration as is. User must call static method config_dict_to_str(config) before processing a signal with such configuration.

static get_sid_config()

Returns default speaker id configuration as a dictionary. User can modify the entries in the dictionary or use this configuration as is. User must call static method config_dict_to_str(config) before processing a signal with such configuration.

get_win_inc_ms()

Return the window increment in milliseconds.

get_win_inc_ms()

load_sad_marks()

Load SAD marks from a file or list

load_sad_marks(src)

Parameters:
src
: string
Name of input file

or src : list of tuples

Tuples with start, duration in seconds: [(0.0,1.0),(2.0,1.5)]
process()

Process the signal to return mel-frequency cepstral coefficient (MFCC) features.

process(signal) -> features

Parameters:
signal Input signal – instance of LLSignal class
save_sad_marks()

Save SAD marks to a file.

save_sad_marks(filename)

Parameters:
filename
: string
Name of output file

pyslgr.FeatPipe module

class pyslgr.FeatPipe.FeatPipe(config, featClass, sadClass)

Bases: object

Implementation of a full fatures extraction pipeline.


config : a dictionary of config parameters with two main keys ‘pipe_config’, ‘sad_config’.
config[‘pipe_config’] has keys:

accel_spread int
delta_spread int
delta2point True/False
do_accel True/False
do_delta True/False
do_rasta True/False
do_feat_norm True/False
do_sdc True/False
outfeat string to pass to set_outfeat
feat_config dictionary to pass directly into LLFeatures object
sdc_params a tuple to pass to sdc – typically (3,7)

config[‘sad_config’] is passed directly to the sadClass

featClass : an LLFeatures compatible class
sadClass : a class with constructor sadClass(config) and method sadClass.process(LLSignal x, LLFeatures f)
process(x)
Extract features
x Input signal

Returns a feature object

pyslgr.iVector module

class pyslgr.iVector.iVector(config)

Bases: object

iVector extractor


config : a dictionary of config parameters

tv_matrix filename for total variability matrix – raw floats
ubm_model UBM model file

process(f)

f : LLFeatures object, input features

returns an ivector (factors with no scaling or transformation)

pyslgr.GSV module

class pyslgr.GSV.GSV

Bases: object

GSV(config)

config dictionary or JSON string with config parameters
process()

Process input features ‘f’ and produce a GSV expansion

f LLFeatures input

pyslgr.sad module

class pyslgr.sad.XtalkSAD(config)

Bases: object

Perform energy-based speech activity detection using Xtalk
config : a dictionary of config parameters to pass to xtalk
‘abs_min_energy’, ‘thresh’, ‘med_len’ (optional)
process(x, f)

pyslgr.Scores module

class pyslgr.Scores.Scores(f_scores, s, u_score)

Expects frame scores and scores (f_scores and s respectively) as Python lists. Argument u_score represents an ubm score as a float type.