pySLGR Package¶
Modules¶
pyslgr.GMMModel |
|
pyslgr.LLFeatures |
|
pyslgr.LLSignal |
|
pyslgr.MFCCFeatures |
|
pyslgr.FeatPipe |
|
pyslgr.iVector |
|
pyslgr.GSV |
|
pyslgr.sad |
pyslgr.GMMModel module¶
-
class
pyslgr.GMMModel.
GMMModel
¶ Bases:
object
-
icov
()¶ Return the GMM diagonal of the inverse covariance matrix as a numpy array
-
is_loaded
()¶
-
load
()¶ Load a GMM model.
load(model_file_name)
- Parameters:
- model_file_name : string
- File name of model to load
-
mean
()¶ Return the GMM mean vector as a numpy array
-
num_fea
()¶
-
num_mix
()¶
-
score_models
()¶ Load a GMM model.
load(LLFeatures f, list models, int topM, bool use_shortfall, float sf_delta)
- Parameters:
- f : LLFeatures
- Feature input
- models
- List of GMM models to process
- topM : int
- Number of Gaussians to score per frame (default: 5)
- use_shortfall : bool
- Use shortfall method when evaluating Gaussian mixture model (default: True)
- sf_delta : float
- Shortfall delta for pruning (default: 10.0)
-
suff_stats
()¶
-
-
class
pyslgr.GMMModel.
GMMSAD
¶ Bases:
object
Class to process a signal and produce SAD marks using a GMM
GMMSAD (feat_config, gmm_models, label_keep, label_window, min_gap_dur=0.5, min_seg_dur=0.2) or GMMSAD (config) where config is a JSON string or dictionary.
- Parameters:
- feat_config : string
- Configuration string from a JSON object – see MFCCFeatures.process for more details
- model_dir : string
- Base directory where models are stored
- gmm_models : dict
- Dictionary of models for GMMSAD scoring. Typical keys are ‘speech’, ‘music’, ‘nonspeech’.
- label_keep : string
- The key of the model to keep (default: ‘speech’)
- label_window : int
- Window length for smoothing frame scores
- min_gap_dur : float (default 0.5 seconds)
- Minimum gap between segments – segments are combined if the gap is smaller than this time
- min_seg_dur : float (default 0.2 seconds)
- Minimum segment duration – ignore segments below this duration
-
process
()¶ Process LLSignal and return a list of tuples. Each tuple is a start time (seconds) and duration of the detected class.
- gmmsad (signal)
- signal LLSignal f LLFeatures object (not used, but available for uniformity)
pyslgr.LLFeatures module¶
-
class
pyslgr.LLFeatures.
LLFeatures
¶ Bases:
object
-
accel
()¶ Calculate acceleration values from delta values.
accel(accel_spread)
- Parameters:
- accel_spread : int
- Acceleration at time t is calculated using the frames t-k,...,t,...,t+k where k is ‘accel_spread’
-
apply_sad
()¶ Given that speech activity detection has been calculated or loaded, remove frames corresponding to non-speech.
apply_sad()
-
delta
()¶ Calculate delta values from base features.
delta(delta_spread)
- Parameters:
- delta_spread : int
- Delta at time t is calculated using the frames t-k,...,t,...,t+k where k is ‘delta_spread’
-
delta2point
()¶ Calculate delta values from base features using only 2 values.
delta2point(delta_spread)
- Parameters:
- delta_spread : int
- Delta at time t is calculated using the frames t-k and t+k where k is ‘delta_spread’
-
feat_norm
()¶ Normalize each feature individually to zero mean and unit variance across all frames.
feat_norm()
-
load_raw
()¶ Load a file of raw floats into a feature store. Assumes all features are base features.
load_raw (filename, num_feat)
- Parameters:
- filename : string
- Name of file to load
- num_feat : int
- Number of base features (dimension of feature vector)
-
load_sad_labels
()¶ Load a file of SAD labels (0/1). Assumes whitespace separation between labels.
load_sad_labels(filename, sloppy)
- Parameters:
- filename : string
- Name of file to load
- sloppy : bool
- Complain if the number of labels doesn’t match the number of feature vectors (default: True)
-
num_base_feat
()¶ Return the number of base features in each vector. Base features do not include post-processing such as delta, sdc, or acceleration.
num_base_feat() -> int
-
num_outfeat
()¶ Return the number of output features in each vector. Output features are set with the set_outfeat() method.
num_outfeat() -> int
-
num_total_feat
()¶
-
num_vec
()¶ Return the number of feature vectors.
num_vec() -> int
-
rasta
()¶ Apply RASTA to all features.
rasta()
-
sad_labels
()¶ Return the current SAD labels (0/1) per frame.
-
save_raw
()¶ Save features as raw floats to ‘filename’. The features saved are set by ‘set_outfeat’.
save_raw(filename)
- Parameters:
- filename : string
- Name of file to save features in
-
save_sad_labels
()¶ Save SAD labels (0/1) to file.
save_sad_labels(filename)
- Parameters:
- filename : string
- Name of output file
-
sdc
()¶ Shifted-delta features – typically used for language recognition. Note: Uses available delta features which must be calculated before invoking ‘sdc()’.
sdc (sdc_p, sdc_k):
- Parameters:
- sdc_p : int
- p value – shift between delta blocks (typical value, 3 )
- sdc_k : int
- k value – number of delta blocks to stack (typical value, 7)
-
set_outfeat
()¶ Set the features to output for typical operations. The value can be changed at any time. Order in the parameter string determines the stacking order. By default the ‘outfeat’ is set to ‘all’ when the feature object is created.
set_outfeat(outfeat)
- Parameters:
- outfeat : string
- Set the output features. If outfeat==’all’, then all base features and calculated features are returned (except energy. Otherwise, ‘outfeat’ is examined a character at a time and features are stacked in that order. ‘f’ base features, ‘d’ delta-features, ‘a’ acceleration features, ‘e’ energy, ‘s’ sdc features. E.g., set_outfeat(‘fd’) would set the output to base features in indices 0, ..., num_base_feat-1 and delta features in num_base_feat+1, ..., -1.
-
xtalk
()¶ xtalk energy based speech-activity detection.
xtalk (abs_min_energy, thresh, med_len=1)
- Parameters:
- abs_min_energy : float
- Below this threshold is non-speech. Typical values, -10 or 0.
- thresh : float
- Above this threshold triggers speech activity (the algorithm is adaptive).
- med_len: int (default 1)
- Median filter to smooth activity. Large values imply less abrupt changes in speech activity.
-
pyslgr.LLSignal module¶
-
class
pyslgr.LLSignal.
LLSignal
¶ Bases:
object
Class to contain and process 1-dimensional signals–typically speech or audio.
LLSignal() – empty signal with zero samples.
-
get_f0
()¶ Find the fundamental frequency f0 (“pitch”) from the signal using the Entropic algorithm.
get_f0 (min_f0, max_f0, window_dur, frame_step) -> np.array(dtype=float)
- Parameters:
- min_f0 : float
- Minimum allowed fundamental frequency in Hz (e.g., 100)
- max_f0 : float
- Maximum allowed fundamental frequency in Hz (e.g., 650)
- window_duration : float
- Window duration in seconds (e.g., 0.010 – 10 milliseconds)
- frame_step : float
- Increment of window position (e.g., 0.002 – 2 milliseconds)
-
length
()¶ Length of the signal in samples.
x.length() -> int
-
load_pcm_wav
()¶ Load a single-channel pcm-encoded Microsoft wav file format.
load_pcm_wav(filename, sum_channels=True)
- Parameters:
- filename : string
- Path of file to load
- sum_channels : boolean (default True)
- Default True – sum channels if multiple present. Otherwise an error will be thrown for multiple channels.
-
load_raw_short
()¶ Load a single-channel pcm-encoded file of short ints with no header.
load_pcm_wav(filename, sampling_frequency)
- Parameters:
- filename : string
- Path to file to load
- sampling_frequency : int
- Sampling frequency of the file in Hz (e.g., 8000 for 8 kHz)
-
load_sph
()¶ Load a NIST sphere file – channel is 0, 1. Use 0 for single channel.
load_sph(filename, channel_num)
- Parameters:
- filename : string
- Path to file to load
- channel_num : int
- Number of channel, 0 or 1, to load
-
normalize
()¶ Normalize the amplitude of the waveform to 16-bits.
normalize()
-
preemphasis
()¶ Perform pre-emphasis on the waveform; i.e., filter with 1/(1-alpha*z^(-1))
preemphasis (alpha)
- Parameters:
- alpha : float
- Pre-emphasis coefficient
-
remove_mean
()¶ Remove the mean of the signal.
remove_mean()
-
resample_16k
()¶ Resample the signal to an 16 kHz sampling rate. Note: The resample_init() method must be called before calling this method. Also, if the sample rate is below 16 kHz, no operation will be performed.
resample_16k()
-
resample_8k
()¶ Resample the signal to an 8 kHz sampling rate. Note: The resample_init() method must be called before calling this method.
resample_8k()
-
sampling_frequency
()¶ Return the sampling frequency of the currently loaded signal.
sampling_frequency()
-
save_pcm_wav
()¶ Save the current signal in pcm-encoded Microsoft wav file format.
save_pcm_wav(filename, scale=False)
- Parameters:
- filename : string
- Path of file to save
- scale : boolean (default False)
- Scale the output to full 16-bit range when saving
-
save_raw_short
()¶ Save the current signal as short ints with no-header.
save_raw_short(filename, clip, scale=False)
- Parameters:
- filename : string
- Path of file to save
- clip : boolean
- Clip the output if it is greater than the largest 16-bit value
- scale : boolean (default False)
- Scale the output to full 16-bit range when saving
-
pyslgr.MFCCFeatures module¶
-
class
pyslgr.MFCCFeatures.
MFCCFeatures
¶ Bases:
pyslgr.LLFeatures.LLFeatures
config String or dictionary containing configuration parameters for MFCCs.Parameters in the config are:alpha Warping factor for bilinear method (no warping: 1.0) dither 0/1 - Add low level noise to the signal (typical: 1) fb_low Lowest filter bank frequency in Hz (typical: 300) fb_hi Highest filter bank frequency in Hz (typical: 3140) fb_only 0/1 - Instead of producing cepstral coefficients produce the ‘raw’ filter bank outputs instead keep_c0 0/1 - Keep the c0 cepstral coefficient; c0 represents frame energy (typical: 0) linear true/false - linear or mel-warped scale for filter banks (typical: false) num_cep int - number of cepstral coefficients (c1-c??) to output (typical: 7-19) tgt_num_filt int - number of filters across the entire bandwidth; only applied for linear=true win_inc_ms int - window increment in milliseconds (typical: 10) win_len_ms int - window length in milliseconds (typical: 20-30) -
static
config_dict_to_str
()¶ Converts a feature configuration dictionary to a json string for processing.
-
duration
()¶ Return the duration that the MFCC data spans in seconds.
duration() -> float seconds
-
static
get_lid_config
()¶ Returns default language id configuration as a dictionary. User can modify the entries in the dictionary or use this configuration as is. User must call static method config_dict_to_str(config) before processing a signal with such configuration.
-
static
get_sid_config
()¶ Returns default speaker id configuration as a dictionary. User can modify the entries in the dictionary or use this configuration as is. User must call static method config_dict_to_str(config) before processing a signal with such configuration.
-
get_win_inc_ms
()¶ Return the window increment in milliseconds.
get_win_inc_ms()
-
load_sad_marks
()¶ Load SAD marks from a file or list
load_sad_marks(src)
- Parameters:
- src : string
- Name of input file
or src : list of tuples
Tuples with start, duration in seconds: [(0.0,1.0),(2.0,1.5)]
-
process
()¶ Process the signal to return mel-frequency cepstral coefficient (MFCC) features.
process(signal) -> features
- Parameters:
- signal Input signal – instance of LLSignal class
-
save_sad_marks
()¶ Save SAD marks to a file.
save_sad_marks(filename)
- Parameters:
- filename : string
- Name of output file
-
static
pyslgr.FeatPipe module¶
-
class
pyslgr.FeatPipe.
FeatPipe
(config, featClass, sadClass)¶ Bases:
object
Implementation of a full fatures extraction pipeline.config : a dictionary of config parameters with two main keys ‘pipe_config’, ‘sad_config’.config[‘pipe_config’] has keys:accel_spread int delta_spread int delta2point True/False do_accel True/False do_delta True/False do_rasta True/False do_feat_norm True/False do_sdc True/False outfeat string to pass to set_outfeat feat_config dictionary to pass directly into LLFeatures object sdc_params a tuple to pass to sdc – typically (3,7) config[‘sad_config’] is passed directly to the sadClassfeatClass : an LLFeatures compatible classsadClass : a class with constructor sadClass(config) and method sadClass.process(LLSignal x, LLFeatures f)-
process
(x)¶ - Extract features
- x Input signal
Returns a feature object
-
pyslgr.iVector module¶
-
class
pyslgr.iVector.
iVector
(config)¶ Bases:
object
iVector extractorconfig : a dictionary of config parameterstv_matrix filename for total variability matrix – raw floats ubm_model UBM model file -
process
(f)¶ f : LLFeatures object, input features
returns an ivector (factors with no scaling or transformation)
-