Configuration File
==================

This is the core of the algorithm, so this file has to be filled properly based on your data. Even if all key parameters of the algorithm are listed in the file, only few are likely to be modified by a non-advanced user. The configuration file is divided in several sections. For all those sections, we will review the parameters, and tell you what are the most important ones

Data
----

The data section is::

    file_format    =          # Can be raw_binary, openephys, hdf5, ... See >> spyking-circus help -i for more info
    stream_mode    = None     # None by default. Can be multi-files, or anything depending to the file format
    mapping        =          # Mapping of the electrode (see http://spyking-circus.rtfd.org)
    suffix         =          # Suffix to add to generated files
    overwrite      = True     # If you want to filter or remove artefacts on site. Data are duplicated otherwise
    output_dir     =          # By default, generated data are in the same folder as the data.
    parallel_hdf5  = True     # Use the parallel HDF5 feature (if available)
    
.. warning::

    This is the most important section, that will allow the code to properly load your data. If not properly filled, then results will be wrong. Note that depending on your file_format, you may need to add here several parameters, such as ``sampling_rate``, ``data_dtype``, ... They will be requested if they can not be infered from the header of your data structure. To check if data are properly loaded, consider using :doc:`the preview mode <../GUI/python>` before launching the whole algorithm

Parameters that are most likely to be changed:
    * ``file_format`` You must select a supported file format (see :doc:`What are the supported formats <../code/fileformat>`) or write your own wrapper (see :doc:`Write your own data format  <../advanced/datafile>`)
    * ``mapping`` This is the path to your probe mapping (see :doc:`How to design a probe file <../code/probe>`)
    * ``stream_mode`` If streams in you data (could be multi-files, or even in the same file) should be processed together (see :doc:`Using multi files <../code/multifiles>`)
    * ``overwrite`` If True, data are overwritten during filtering, assuming the file format has write access. Otherwise, an external raw_binary file will be created during the filtering step, if any.
    * ``ouput_dir`` If you want all the file generated by SpyKING CIRCUS to be in a particular directory, instead of next to the raw data
    * ``parallel_hdf5`` Try to use the option for parallel write of HDF5. Need to be configured (see :doc:`how to install hdf5 <../introduction/hdf5>`)

Detection
---------

The detection section is::

    radius         = auto       # Radius [in um] (if auto, read from the prb file)
    N_t            = 5          # Width of the templates [in ms]
    spike_thresh   = 6          # Threshold for spike detection
    peaks          = negative   # Can be negative (default), positive or both
    dead_channels  =            # If not empty or specified in the probe, a dictionary {channel_group : [list_of_valid_ids]}

Parameters that are most likely to be changed:
    * ``N_t`` The temporal width of the templates. For *in vitro* data, 5ms seems a good value. For *in vivo* data, you should rather use 3 or even 2ms
    * ``radius`` The spatial width of the templates. By default, this value is read from the probe file. However, if you want to specify a larger or a smaller value [in um], you can do it here
    * ``spike_thresh`` The threshold for spike detection. 6-7 are good values
    * ``peaks`` By default, the code detects only negative peaks, but you can search for positive peaks, or both
    * ``dead_channels`` You can exclude dead channels either directly in the probe file, with the ``channels`` list, or with this ``dead_channels`` parameter. To do so, you must enter a dictionary of the following form {channel_group : [list_of_valid_ids]}
    
Filtering
---------

The filtering section is::

    cut_off        = 300, auto # Min and Max (auto=nyquist) cut off frequencies for the band pass butterworth filter [Hz]
    filter         = True      # If True, then a low-pass filtering is performed
    remove_median  = False     # If True, median over all channels is substracted to each channels (movement artefacts)
    common_ground  =           # If you want to use a particular channel as a reference ground: should be a valid channel number

.. warning::

    The code performs the filtering of your data writing on the file itself. Therefore, you ``must`` have a copy of your raw data elsewhere. Note that as long as your keeping the parameter files, you can relaunch the code safely: the program will not filter multiple times the data, because of the flag ``filter_done`` at the end of the configuration file.

Parameters that are most likely to be changed:
    * ``cut_off`` The default value of 500Hz has been used in various recordings, but you can change it if needed. You can also specify the upper bound of the Butterworth filter
    * ``filter`` If your data are already filtered by a third program, turn that flag to False
    * ``remove_median`` If you have some movement artefacts in your *in vivo* recording, and want to substract the median activity over all analysed channels from each channel individually
    * ``common_ground`` If you want to use a particular channel as a reference, and subtract its activity from all others. Note that the activity on this particular channel will thus be nul

Triggers
--------

The triggers section is::

    trig_file      =            # External stimuli to be considered as putative artefacts [in trig units] (see documentation)
    trig_windows   =            # The time windows of those external stimuli [in trig units]
    trig_unit      = ms         # The unit in which times are expressed: can be ms or timestep
    clean_artefact = False      # If True, external artefacts induced by triggers will be suppressed from data
    dead_file      =            # Portion of the signals that should be excluded from the analysis [in dead units]
    dead_unit      = ms         # The unit in which times for dead regions are expressed: can be ms or timestep
    ignore_times   = False      # If True, any spike in the dead regions will be ignored by the analysis
    make_plots     =            # Generate sanity plots of the averaged artefacts [Nothing or None if no plots]

Parameters that are most likely to be changed:
    * ``trig_file`` The path to the file where your artefact times and labels. See :doc:`how to deal with stimulation artefacts <../code/artefacts>`
    * ``trig_windows`` The path to file where your artefact temporal windows. See :doc:`how to deal with stimulation artefacts <../code/artefacts>`
    * ``clean_artefact`` If you want to remove any stimulation artefacts, defined in the previous files. See :doc:`how to deal with stimulation artefacts <../code/artefacts>`
    * ``make_plots`` The default format to save the plots of the artefacts, one per artefact, showing all channels. You can set it to None if you do not want any
    * ``trig_unit`` If you want times/duration in the ``trig_file`` and ``trig_windows`` to be in timestep or ms
    * ``dead_file`` The path to the file where the dead portions of the recording, that should be excluded from the analysis, are specified. . See :doc:`how to deal with stimulation artefacts <../code/artefacts>`
    * ``dead_unit`` If you want times/duration in the ``dead_file`` to be in timestep or ms
    * ``ignore_times`` If you want to remove any dead portions of the recording, defined in ``dead_file``. See :doc:`how to deal with stimulation artefacts <../code/artefacts>`

Whitening
---------

The whitening section is::

    spatial        = True      # Perform spatial whitening
    max_elts       = 10000     # Max number of events per electrode (should be compatible with nb_elts)
    nb_elts        = 0.8       # Fraction of max_elts that should be obtained per electrode [0-1]
    output_dim     = 5         # Can be in percent of variance explain, or num of dimensions for PCA on waveforms

Parameters that are most likely to be changed:
    * ``output_dim`` If you want to save some memory usage, you can reduce the number of features kept to describe a waveform.


Clustering
----------

The clustering section is::

    extraction     = median-raw # Can be either median-raw (default), median-pca, mean-pca, mean-raw, or quadratic
    sub_dim        = 10         # Number of dimensions to keep for local PCA per electrode
    max_elts       = 10000      # Max number of events per electrode (should be compatible with nb_elts)
    nb_elts        = 0.8        # Fraction of max_elts that should be obtained per electrode [0-1]
    nb_repeats     = 3          # Number of passes used for the clustering
    make_plots     =            # Generate sanity plots of the clustering
    merging_method = nd-bhatta  # Method to perform local merges (distance, dip, folding, nd-folding, bhatta)
    merging_param  = default    # Merging parameter (see docs) (3 if distance, 0.5 if dip, 1e-9 if folding, 2 if bhatta)
    sensitivity    = 3          # The only parameter to control the cluster. The lower, the more sensitive
    cc_merge       = 0.95       # If CC between two templates is higher, they are merged
    dispersion     = (5, 5)     # Min and Max dispersion allowed for amplitudes [in MAD]
    smart_search   = True       # Parameter to activate the smart search mode

.. note::

    This is the a key section, as bad clustering will implies bad results. However, the code is very robust to parameters changes.

Parameters that are most likely to be changed:
    * ``extraction`` The method to estimate the templates. ``Raw`` methods are slower, but more accurate, as data are read from the files. ``PCA`` methods are faster, but less accurate, and may lead to some distorted templates. ``Quadratic`` is slower, and should not be used.
    * ``max_elts`` The number of elements that every electrode will try to collect, in order to perform the clustering
    * ``nb_repeats`` The number of passes performed by the algorithm to refine the density landscape
    * ``smart_search`` By default, the code will collect only a subset of spikes, randomly, on all electrodes. However, for long recordings, or if you have low thresholds, you may want to select them in a smarter manner, in order to avoid missing the large ones, under represented. If the smart search is activated, the code will first sample the distribution of amplitudes, on all channels, and then implement a rejection algorithm such that it will try to select spikes in order to make the distribution of amplitudes more uniform.
    * ``cc_merge`` After local merging per electrode, this step will make sure that you do not have duplicates in your templates, that may have been spread on several electrodes. All templates with a correlation coefficient higher than that parameter are merged. Remember that the more you merge, the faster is the fit
    * ``merging_method`` Several methods can be used to perform greedy local merges on each electrodes. Each of the method has a parameter, defined by ``merge_param``. This replaces former parameters ``sim_same_elec`` and ``dip_threshold``
    * ``dispersion`` The spread of the amplitudes allowed, for every templates, around the centroid.
    * ``make_plots`` By default, the code generates sanity plots of the clustering, one per electrode.

Fitting
-------

The fitting section is::

    amp_limits     = (0.3, 30) # Amplitudes for the templates during spike detection
    amp_auto       = True      # True if amplitudes are adjusted automatically for every templates
    collect_all    = False     # If True, one garbage template per electrode is created, to store unfitted spikes
    ratio_thresh   = 0.9       # Ratio of the spike_threshold used while fitting [0-1]. The lower the slower
    
Parameters that are most likely to be changed:
    * ``collect_all`` If you want to also collect all the spike times at which no templates were fitted. This is particularly useful to debug the algorithm, and understand if something is wrong on a given channel
    * ``ratio_thresh`` If you want to get more spikes for the low amplitudes templates, you can decrease this value. It will slow down the fitting procedure, but collect more spikes for the templates with
    an amplitude close to threshold

Merging
-------

The merging section is::

    erase_all      = True       # If False, a prompt will ask you to remerge if merged has already been done
    cc_overlap     = 0.85       # Only templates with CC higher than cc_overlap may be merged
    cc_bin         = 2          # Bin size for computing CC [in ms]
    default_lag    = 5          # Default length of the period to compute dip in the CC [ms]
    auto_mode      = 0.75       # Between 0 (aggressive) and 1 (no merging). If empty, GUI is launched
    remove_noise   = False      # If True, meta merging will remove obvious noise templates (weak amplitudes)
    noise_limit    = 0.75       # Amplitude at which templates are classified as noise
    sparsity_limit = 0.75       # Sparsity level (in percentage) for selecting templates as putative noise (in [0, 1])
    time_rpv       = 5          # Time [in ms] to consider for Refraction Period Violations (RPV) (0 to disable)
    rpv_threshold  = 0.02       # Percentage of RPV allowed while merging
    merge_drifts   = True       # Try to automatically merge drifts, i.e. non overlapping spiking neurons
    drift_limit    = 0.1        # Distance for drifts. The higher, the more non-overlapping the activities should be

To know more about how those merges are performed and how to use this option, see :doc:`Automatic Merging <../code/merging>`. Parameters that are most likely to be changed:
    * ``erase_all`` If you want to always erase former merging, and skip the prompt
    * ``auto_mode`` If your recording is stationary, you can try to perform a fully automated merging. By setting a positive value, you control the level of merging performed by the software. Values such as 0.75 should be a good start, but see see :doc:`Automatic Merging <../code/merging>` for more details. The lower, the more the merging will be aggressive.
    * ``remove_noise`` If you want to automatically get rid of noise templates (very weak ones), just set this value to True.
    * ``noise_limit`` normalized amplitude (with respect to the detection threshold) below which templates are considered as noise
    * ``sparsity_limit`` To be considered as noisy templates, sparsity level that must be achieved by the templates. Internally, the code sets to 0 channels without any useful information. So the sparsity is the ratio between the number of channels with non-zero values divided by the number of channels that should have had a signal. Usually, noise tends to only be defined on few channels (if not only one)
    * ``time_rpv`` When performing merges, the code wil check if the merged unit has a valid ISI without any RPV. If yes, then merge is performed, and otherwise this is avoided. This is the default time using to compute RPV. If you want to disable this feature, set this value to 0.
    * ``rpv_threshold`` Percentage of RPV allowed while merging, you can increase it if you want to be less stringent.
    * ``drift_limit`` To assess if a unit is drifting or not, we compute distances between the histograms of the spike times, for a given pair of cells, and assess how much do they overlap. For drifting units, they should not overlap by much, and the threshold can be set by this value. The higher, the more histograms should be distinct to be merged.

Converting
----------

The converting section is::

    erase_all      = True      # If False, a prompt will ask you to export if export has already been done
    sparse_export  = True      # If True, data for phy are exported in a sparse format. Need recent version of phy
    export_pcs     = prompt    # Can be prompt [default] or in none, all, some
    export_all     = False     # If True, unfitted spikes will be exported as the last Ne templates

Parameters that are most likely to be changed:
    * ``erase_all`` If you want to always erase former export, and skip the prompt
    * ``sparse_export`` If you have a large number of templates or a very high density probe, you should use the sparse format for phy
    * ``export_pcs`` If you already know that you want to have all, some, or no PC and skip the prompt
    * ``export_all`` If you used the ``collect_all`` mode in the ``[fitting]`` section, you can export unfitted spike times to phy. In this case, the last `N` templates, if `N` is the number of electrodes, are the garbage collectors.

Extracting
----------

The extracting section is::

    safety_time    = 1         # Temporal zone around which spikes are isolated [in ms]
    max_elts       = 10000     # Max number of events per templates (should be compatible with nb_elts)
    nb_elts        = 0.8       # Fraction of max_elts that should be obtained per electrode [0-1]
    output_dim     = 5         # Percentage of variance explained while performing PCA
    cc_merge       = 0.975     # If CC between two templates is higher, they are merged
    noise_thr      = 0.8       # Minimal amplitudes are such than amp*min(templates) < noise_thr*threshold


This is an experimental section, not used by default in the algorithm, so nothing to be changed here

Validating
----------

The validating section is::

    nearest_elec   = auto      # Validation channel (e.g. electrode closest to the ground truth cell)
    max_iter       = 200       # Maximum number of iterations of the stochastic gradient descent (SGD)
    learning_rate  = 1.0e-3    # Initial learning rate which controls the step-size of the SGD
    roc_sampling   = 10        # Number of points to estimate the ROC curve of the BEER estimate
    test_size      = 0.3       # Portion of the dataset to include in the test split
    radius_factor  = 0.5       # Radius factor to modulate physical radius during validation
    juxta_dtype    = uint16    # Type of the juxtacellular data
    juxta_thresh   = 6         # Threshold for juxtacellular detection
    juxta_valley   = False     # True if juxta-cellular spikes are negative peaks
    juxta_spikes   =           # If none, spikes are automatically detected based on juxta_thresh
    filter         = True      # If the juxta channel need to be filtered or not
    make_plots     = png       # Generate sanity plots of the validation [Nothing or None if no plots]

Please get in touch with us if you want to use this section, only for validation purposes. This is an implementation of the :doc:`BEER metric <../advanced/beer>`
