Discussion:
[pymvpa] Parallelization
marco tettamanti
2017-11-09 09:08:11 UTC
Permalink
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.

Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?

Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?

Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: ***@hsr.it
Skype: mtettamanti
Matteo Visconti di Oleggio Castello
2017-11-09 18:35:39 UTC
Permalink
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).

In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).

HTH,
Matteo
Post by marco tettamanti
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing
(multiple CPUs or clusters) with pymvpa2, in addition to what is already
implemented for searchlight (pprocess)? That is, also for general
cross-validation, nested cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
<https://maps.google.com/?q=Via+Olgettina+58&entry=gmail&source=g>
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Skype: mtettamanti
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
--
Matteo Visconti di Oleggio Castello
Ph.D. Candidate in Cognitive Neuroscience
Dartmouth College

+1 (603) 646-8665
mvdoc.me || github.com/mvdoc || linkedin.com/in/matteovisconti
Nick Oosterhof
2017-11-10 07:20:17 UTC
Permalink
Post by marco tettamanti
Are there any chances that one can generally apply parallel computing
(multiple CPUs or clusters) with pymvpa2, in addition to what is already
implemented for searchlight (pprocess)? That is, also for general
cross-validation, nested cross-validation, permutation testing, RFE, etc.?
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
marco tettamanti
2017-11-10 08:57:04 UTC
Permalink
Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to
make pymvpa2 computations faster and more efficient.

I often find myself in the situation of giving up with a particular analysis,
because it is going to take far more time that I can bear (weeks, months!). This
happens particularly with searchlight permutation testing (gnbsearchlight is
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation
in a sample of 18 patients and 18 controls (1 image x subject), training the
classifiers to discriminate patients from controls in a leave-one-pair-out
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
cycling over approx 40 different classifiers takes about 2 hours for each fold
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same
analysis may be completed in less than one day. This is the reason of my
interest and questions about parallelization.

Is there anything that you experts do in such situations to speed up or make the
computation more efficient?

Thank you again and best wishes,
Marco
Post by Nick Oosterhof
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Post by marco tettamanti
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
Matteo Visconti di Oleggio Castello
2017-11-10 15:43:01 UTC
Permalink
Post by marco tettamanti
As for the latter, for example, I recently wanted to run nested
cross-validation in a sample of 18 patients and 18 controls (1 image x
subject), training the classifiers to discriminate patients from controls
in a leave-one-pair-out partitioning scheme. This yields 18*18=324 folds.
For a small ROI of 36 voxels, cycling over approx 40 different classifiers
takes about 2 hours for each fold on a decent PowerEdge T430 Dell server
with 128GB RAM. This means approx. 27 days for all 324 folds!
What do you mean with "cycling over approx 40 different classifiers"? Are
you testing different classifiers? If that's the case, a possibility is to
create a script that takes as argument the type of classifiers and runs the
classification across all folds. In that way you can submit 40 jobs and
parallelize across classifiers.

If that's not the case, because the folds are independent and deterministic
I would create a script that performs the classification on blocks of folds
(say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
to parallelize there.

I think that if you send a snippet of the code you're using it can be more
evident which are good points for parallelization.
Post by marco tettamanti
The same server is equipped with 32 CPUs. With full parallelization, the
same analysis may be completed in less than one day. This is the reason of
my interest and questions about parallelization.
Is there anything that you experts do in such situations to speed up or
make the computation more efficient?
Thank you again and best wishes,
Marco
Post by Nick Oosterhof
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which
parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Post by marco tettamanti
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
--
Matteo Visconti di Oleggio Castello
Ph.D. Candidate in Cognitive Neuroscience
Dartmouth College

+1 (603) 646-8665
mvdoc.me || github.com/mvdoc || linkedin.com/in/matteovisconti
marco tettamanti
2017-11-10 20:13:48 UTC
Permalink
Dear Matteo,
thank you for the willingness to look into my code.

This is taken almost verbatim from
http://dev.pymvpa.org/examples/nested_cv.html, except for the leave-one-pair-out
partitioning, and a slight reduction in the number of classifiers (in the
original example, they are around 45).

Any help or suggestion would be greatly appreciated!
All the best,
Marco


########## * ##########
##########

PyMVPA:
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 Version control (GIT):
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
SYSTEM:
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)


print fds.summary()
Dataset: ***@float32, <sa: chunks,targets,time_coords,time_indices>, <fa:
voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and maxt if
desired
Summary for targets across chunks
  targets mean std min max #chunks
    C      0.5 0.5  0   1     18
    D      0.5 0.5  0   1     18


#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5

partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
for fds_ in partitionerCD.generate(fds):
    training = fds[fds_.sa.partitions == 1]
    #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
for fds_ in partitionerCD.generate(fds):
    testing = fds[fds_.sa.partitions == 2]
    #print list(zip(testing.sa.chunks, testing.sa.targets))

#Helper function (partitionerCD recursively acting on dstrain, rather than on fds):
def select_best_clf(dstrain_, clfs):
    """Select best model according to CVTE
    Helper function which we will use twice -- once for proper nested
    cross-validation, and once to see how big an optimistic bias due
    to model selection could be if we simply provide an entire dataset.
    Parameters
    ----------
    dstrain_ : Dataset
    clfs : list of Classifiers
      Which classifiers to explore
    Returns
    -------
    best_clf, best_error
    """
    best_error = None
    for clf in clfs:
        cv = CrossValidation(clf, partitionerCD)
        # unfortunately we don't have ability to reassign clf atm
        # cv.transerror.clf = clf
        try:
            error = np.mean(cv(dstrain_))
        except LearnerError, e:
            # skip the classifier if data was not appropriate and it
            # failed to learn/predict at all
            continue
        if best_error is None or error < best_error:
            best_clf = clf
            best_error = error
        verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
    verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
            % (len(clfs), best_clf.descr, best_error))
    return best_clf, best_error

#Estimate error using nested CV for model selection:
best_clfs = {}
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
for isplit, partitions in enumerate(partitionerCD.generate(fds)):
    verbose(2, "Processing split #%i" % isplit)
    dstrain, dstest = list(splitter.generate(partitions))
    best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
    best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
    # now that we have the best classifier, lets assess its transfer
    # to the testing dataset while training on entire training
    tm = TransferMeasure(best_clf, splitter,
                         postproc=BinaryFxNode(mean_mismatch_error,
space='targets'), enable_ca=['stats'])
    tm(partitions)
    confusion += tm.ca.stats

##########
########## * ##########
Post by Matteo Visconti di Oleggio Castello
What do you mean with "cycling over approx 40 different classifiers"? Are
you testing different classifiers? If that's the case, a possibility is to
create a script that takes as argument the type of classifiers and runs the
classification across all folds. In that way you can submit 40 jobs and
parallelize across classifiers.
If that's not the case, because the folds are independent and deterministic
I would create a script that performs the classification on blocks of folds
(say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
to parallelize there.
I think that if you send a snippet of the code you're using it can be more
evident which are good points for parallelization.
Post by marco tettamanti
Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to
make pymvpa2 computations faster and more efficient.
I often find myself in the situation of giving up with a particular analysis,
because it is going to take far more time that I can bear (weeks, months!). This
happens particularly with searchlight permutation testing (gnbsearchlight is
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation
in a sample of 18 patients and 18 controls (1 image x subject), training the
classifiers to discriminate patients from controls in a leave-one-pair-out
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
cycling over approx 40 different classifiers takes about 2 hours for each fold
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same
analysis may be completed in less than one day. This is the reason of my
interest and questions about parallelization.
Is there anything that you experts do in such situations to speed up or make the
computation more efficient?
Thank you again and best wishes,
Marco
Post by Nick Oosterhof
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Post by marco tettamanti
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Skype: mtettamanti
http://scholar.google.it/citations?user=x4qQl4AAAAAJ
marco tettamanti
2017-11-23 11:07:48 UTC
Permalink
Dear Matteo (and others),
sorry, I am again asking for your help!

I have experimented with the analysis of my dataset using an adaptation of your
joblib-based gist.
As I wrote before, it works perfectly, but not with some classifiers: SVM
classifiers always cause the code to terminate with an error.

If I set:
        myclassif=clfswh['!gnpp','!skl','svm']    #Note that 'gnnp' and 'skl'
were excluded for independent reasons
the code runs through without errors.

However, with:
        myclassif=clfswh['!gnpp','!skl']
I get the following error:
        MaybeEncodingError: Error sending result:
'[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(),
weight=[], probability=1,
         weight_label=[]), splitter=Splitter(space='partitions'),
postproc=BinaryFxNode(space='targets'), enable_ca=['stats'])]'. Reason:
'TypeError("can't
        pickle SwigPyObject objects",)'

After googling for what may cause this particular error, I have found that the
situation improves slightly (i.e. more splits executed, sometimes even all
splits) by importing the following:
        import os
        from sklearn.externals.joblib import Parallel, delayed
        from sklearn.externals.joblib.parallel import parallel_backend
and then specifying just before 'Parallel(n_jobs=2)':
        with parallel_backend('threading'):
However, also in this case, the code invariably terminates with a long error
message (I only report an extract, but in case I can send the whole error message):
        <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>,
UnicodeEncodeError('ascii',
u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing
exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py
in
       _run_module_as_main(mod_name=\'ipykernel_launcher\', alter_argv=1)\n   
169     pkg_name = mod_name.rpartition(\'.\')[0]\n    170
       main_globals = sys.modules["__main__"].__dict__\n 171     if
alter_argv:\n    172         sys.argv[0] = fname\n 173     return _run_code(code,
       main_globals, None,\n--> 174


I think I have sort of understood that the problem is due to some failure in
pickling the parallelized jobs, but I have no clues if and how it can be solved.
Do you have any suggestions?

Thank you and very best wishes,
Marco

p.s. This is again the full code:

########## * ##########
##########

PyMVPA:
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path:          /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 Version control (GIT):
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
SYSTEM:
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)


print fds.summary()
Dataset: ***@float32, <sa: chunks,targets,time_coords,time_indices>, <fa:
voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and maxt if
desired
Summary for targets across chunks
  targets mean std min max #chunks
    C      0.5 0.5  0   1     18
    D      0.5 0.5  0   1     18


#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5

partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
for fds_ in partitionerCD.generate(fds):
    training = fds[fds_.sa.partitions == 1]
    #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
for fds_ in partitionerCD.generate(fds):
    testing = fds[fds_.sa.partitions == 2]
    #print list(zip(testing.sa.chunks, testing.sa.targets))

#Helper function (partitionerCD recursively acting on dstrain, rather than on fds):
def select_best_clf(dstrain_, clfs):
    """Select best model according to CVTE
    Helper function which we will use twice -- once for proper nested
    cross-validation, and once to see how big an optimistic bias due
    to model selection could be if we simply provide an entire dataset.
    Parameters
    ----------
    dstrain_ : Dataset
    clfs : list of Classifiers
      Which classifiers to explore
    Returns
    -------
    best_clf, best_error
    """
    best_error = None
    for clf in clfs:
        cv = CrossValidation(clf, partitionerCD)
        # unfortunately we don't have ability to reassign clf atm
        # cv.transerror.clf = clf
        try:
            error = np.mean(cv(dstrain_))
        except LearnerError, e:
            # skip the classifier if data was not appropriate and it
            # failed to learn/predict at all
            continue
        if best_error is None or error < best_error:
            best_clf = clf
            best_error = error
        verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
    verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
            % (len(clfs), best_clf.descr, best_error))
    return best_clf, best_error

# This function will run all classifiers for one single partitions
myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM classifier
def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
    verbose(2, "Processing split #%i" % isplit)
    dstrain, dstest = list(splitter.generate(partitions))
    best_clf, best_error = select_best_clf(dstrain, classifiers)
    # now that we have the best classifier, lets assess its transfer
    # to the testing dataset while training on entire training
    tm = TransferMeasure(best_clf,
splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'),
enable_ca=['stats'])
    tm(partitions)
    return tm

#import os
#from sklearn.externals.joblib import Parallel, delayed
#from sklearn.externals.joblib.parallel import parallel_backend

# Parallel estimate error using nested CV for model selection
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
# Here we are using joblib Parallel to parallelize each partition
# Set n_jobs to the number of available cores (or how many you want to use)
#with parallel_backend('threading'):
#    tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
                         for isplit, partitions in
enumerate(partitionerCD.generate(fds)))
# Parallel retuns a list with the results of each parallel loop, so we need to
# unravel it to get the confusion matrix
best_clfs = {}
for tm in tms:
    confusion += tm.ca.stats
    best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1

##########
########## * ##########
Post by marco tettamanti
Dear Matteo,
grazie mille, this is precisely the kind of thing I was looking for: it works
like charm!
Ciao,
Marco
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
in your case, I would then recommend looking into joblib to parallelize
your for loops (https://pythonhosted.org/joblib/parallel.html).
As an example, here's a gist containing part of the PyMVPA's nested_cv
example where I parallelized the loop across partitions. I feel this is
what you might want to do in your case, since you have a lot more folds.
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
Post by marco tettamanti
Dear Matteo,
thank you for the willingness to look into my code.
This is taken almost verbatim from
http://dev.pymvpa.org/examples/nested_cv.html, except for the
leave-one-pair-out partitioning, and a slight reduction in the number of
classifiers (in the original example, they are around 45).
Any help or suggestion would be greatly appreciated!
All the best,
Marco
########## * ##########
##########
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
print fds.summary()
voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and maxt
if desired
Summary for targets across chunks
  targets mean std min max #chunks
    C      0.5 0.5  0   1     18
    D      0.5 0.5  0   1     18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
    training = fds[fds_.sa.partitions == 1]
    #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
    testing = fds[fds_.sa.partitions == 2]
    #print list(zip(testing.sa.chunks, testing.sa.targets))
    """Select best model according to CVTE
    Helper function which we will use twice -- once for proper nested
    cross-validation, and once to see how big an optimistic bias due
    to model selection could be if we simply provide an entire dataset.
    Parameters
    ----------
    dstrain_ : Dataset
    clfs : list of Classifiers
      Which classifiers to explore
    Returns
    -------
    best_clf, best_error
    """
    best_error = None
        cv = CrossValidation(clf, partitionerCD)
        # unfortunately we don't have ability to reassign clf atm
        # cv.transerror.clf = clf
            error = np.mean(cv(dstrain_))
            # skip the classifier if data was not appropriate and it
            # failed to learn/predict at all
            continue
            best_clf = clf
            best_error = error
        verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
    verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
            % (len(clfs), best_clf.descr, best_error))
    return best_clf, best_error
best_clfs = {}
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
    verbose(2, "Processing split #%i" % isplit)
    dstrain, dstest = list(splitter.generate(partitions))
    best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
    best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
    # now that we have the best classifier, lets assess its transfer
    # to the testing dataset while training on entire training
    tm = TransferMeasure(best_clf, splitter,
postproc=BinaryFxNode(mean_mismatch_error, space='targets'),
enable_ca=['stats'])
    tm(partitions)
    confusion += tm.ca.stats
##########
########## * ##########
Post by Matteo Visconti di Oleggio Castello
What do you mean with "cycling over approx 40 different classifiers"? Are
you testing different classifiers? If that's the case, a possibility is to
create a script that takes as argument the type of classifiers and runs the
classification across all folds. In that way you can submit 40 jobs and
parallelize across classifiers.
If that's not the case, because the folds are independent and deterministic
I would create a script that performs the classification on blocks of folds
(say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
to parallelize there.
I think that if you send a snippet of the code you're using it can be more
evident which are good points for parallelization.
Post by marco tettamanti
Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to
make pymvpa2 computations faster and more efficient.
I often find myself in the situation of giving up with a particular analysis,
because it is going to take far more time that I can bear (weeks, months!). This
happens particularly with searchlight permutation testing (gnbsearchlight is
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation
in a sample of 18 patients and 18 controls (1 image x subject), training the
classifiers to discriminate patients from controls in a leave-one-pair-out
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
cycling over approx 40 different classifiers takes about 2 hours for each fold
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same
analysis may be completed in less than one day. This is the reason of my
interest and questions about parallelization.
Is there anything that you experts do in such situations to speed up or make the
computation more efficient?
Thank you again and best wishes,
Marco
Post by Nick Oosterhof
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Post by marco tettamanti
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Skype: mtettamanti
http://scholar.google.it/citations?user=x4qQl4AAAAAJ
Matteo Visconti di Oleggio Castello
2017-11-24 17:32:33 UTC
Permalink
Hi Marco,

some ideas in random order

- what version of sklearn/joblib are you using? I would make sure to use
the latest version (0.11), perhaps not importing it from sklearn (unless
you have the latest sklearn version, 0.19.1)
- are you running the code in a jupyter notebook? There might be some
issues with that (see https://github.com/joblib/joblib/issues/174). As a
test you might try to convert your notebook to a script and then run it
Post by marco tettamanti
Dear Matteo (and others),
sorry, I am again asking for your help!
I have experimented with the analysis of my dataset using an adaptation of
your joblib-based gist.
As I wrote before, it works perfectly, but not with some classifiers: SVM
classifiers always cause the code to terminate with an error.
myclassif=clfswh['!gnpp','!skl','svm'] #Note that 'gnnp' and
'skl' were excluded for independent reasons
the code runs through without errors.
myclassif=clfswh['!gnpp','!skl']
'[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(),
weight=[], probability=1,
weight_label=[]), splitter=Splitter(space='partitions'),
'TypeError("can't
pickle SwigPyObject objects",)'
After googling for what may cause this particular error, I have found that
the situation improves slightly (i.e. more splits executed, sometimes even
import os
from sklearn.externals.joblib import Parallel, delayed
from sklearn.externals.joblib.parallel import parallel_backend
However, also in this case, the code invariably terminates with a long
error message (I only report an extract, but in case I can send the whole
<type 'str'>: (<type 'exceptions.UnicodeEncodeError'>,
UnicodeEncodeError('ascii',
u'JoblibAttributeError\n____________________________________
_______________________________________\nMultiprocessing
exception:\n................................................
...........................\n/usr/lib/python2.7/runpy.py in
_run_module_as_main(mod_name=\'ipykernel_launcher\',
alter_argv=1)\n 169 pkg_name = mod_name.rpartition(\'.\')[0]\n
170
main_globals = sys.modules["__main__"].__dict__\n 171 if
alter_argv:\n 172 sys.argv[0] = fname\n 173 return
_run_code(code,
main_globals, None,\n--> 174
I think I have sort of understood that the problem is due to some failure
in pickling the parallelized jobs, but I have no clues if and how it can be
solved.
Do you have any suggestions?
Thank you and very best wishes,
Marco
########## * ##########
##########
Version: 2.6.3
Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
is not under GIT"
OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
(2017-10-15)
print fds.summary()
<fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,
mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and
maxt if desired
Summary for targets across chunks
targets mean std min max #chunks
C 0.5 0.5 0 1 18
D 0.5 0.5 0 1 18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
training = fds[fds_.sa.partitions == 1]
#print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
testing = fds[fds_.sa.partitions == 2]
#print list(zip(testing.sa.chunks, testing.sa.targets))
"""Select best model according to CVTE
Helper function which we will use twice -- once for proper nested
cross-validation, and once to see how big an optimistic bias due
to model selection could be if we simply provide an entire dataset.
Parameters
----------
dstrain_ : Dataset
clfs : list of Classifiers
Which classifiers to explore
Returns
-------
best_clf, best_error
"""
best_error = None
cv = CrossValidation(clf, partitionerCD)
# unfortunately we don't have ability to reassign clf atm
# cv.transerror.clf = clf
error = np.mean(cv(dstrain_))
# skip the classifier if data was not appropriate and it
# failed to learn/predict at all
continue
best_clf = clf
best_error = error
verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
% (len(clfs), best_clf.descr, best_error))
return best_clf, best_error
# This function will run all classifiers for one single partitions
myclassif=clfswh['!gnpp','!skl'][5:6] #Testing a single SVM classifier
def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
verbose(2, "Processing split #%i" % isplit)
dstrain, dstest = list(splitter.generate(partitions))
best_clf, best_error = select_best_clf(dstrain, classifiers)
# now that we have the best classifier, lets assess its transfer
# to the testing dataset while training on entire training
tm = TransferMeasure(best_clf, splitter,postproc=
BinaryFxNode(mean_mismatch_error,space='targets'), enable_ca=['stats'])
tm(partitions)
return tm
#import os
#from sklearn.externals.joblib import Parallel, delayed
#from sklearn.externals.joblib.parallel import parallel_backend
# Parallel estimate error using nested CV for model selection
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
# Here we are using joblib Parallel to parallelize each partition
# Set n_jobs to the number of available cores (or how many you want to use)
# tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit,
partitions)
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
for isplit, partitions in enumerate(partitionerCD.
generate(fds)))
# Parallel retuns a list with the results of each parallel loop, so we need to
# unravel it to get the confusion matrix
best_clfs = {}
confusion += tm.ca.stats
best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
##########
########## * ##########
Dear Matteo,
grazie mille, this is precisely the kind of thing I was looking for: it
works like charm!
Ciao,
Marco
Hi Marco,
in your case, I would then recommend looking into joblib to parallelize
your for loops (https://pythonhosted.org/joblib/parallel.html).
As an example, here's a gist containing part of the PyMVPA's nested_cv
example where I parallelized the loop across partitions. I feel this is
what you might want to do in your case, since you have a lot more folds.
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
Dear Matteo,
thank you for the willingness to look into my code.
This is taken almost verbatim from http://dev.pymvpa.org/
examples/nested_cv.html, except for the leave-one-pair-out partitioning,
and a slight reduction in the number of classifiers (in the original
example, they are around 45).
Any help or suggestion would be greatly appreciated!
All the best,
Marco
########## * ##########
##########
Version: 2.6.3
Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
is not under GIT"
OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
(2017-10-15)
print fds.summary()
<fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,
mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and
maxt if desired
Summary for targets across chunks
targets mean std min max #chunks
C 0.5 0.5 0 1 18
D 0.5 0.5 0 1 18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
training = fds[fds_.sa.partitions == 1]
#print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
testing = fds[fds_.sa.partitions == 2]
#print list(zip(testing.sa.chunks, testing.sa.targets))
"""Select best model according to CVTE
Helper function which we will use twice -- once for proper nested
cross-validation, and once to see how big an optimistic bias due
to model selection could be if we simply provide an entire dataset.
Parameters
----------
dstrain_ : Dataset
clfs : list of Classifiers
Which classifiers to explore
Returns
-------
best_clf, best_error
"""
best_error = None
cv = CrossValidation(clf, partitionerCD)
# unfortunately we don't have ability to reassign clf atm
# cv.transerror.clf = clf
error = np.mean(cv(dstrain_))
# skip the classifier if data was not appropriate and it
# failed to learn/predict at all
continue
best_clf = clf
best_error = error
verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
% (len(clfs), best_clf.descr, best_error))
return best_clf, best_error
best_clfs = {}
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
verbose(2, "Processing split #%i" % isplit)
dstrain, dstest = list(splitter.generate(partitions))
best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
# now that we have the best classifier, lets assess its transfer
# to the testing dataset while training on entire training
tm = TransferMeasure(best_clf, splitter,
postproc=BinaryFxNode(mean_mismatch_error,
space='targets'), enable_ca=['stats'])
tm(partitions)
confusion += tm.ca.stats
##########
########## * ##########
What do you mean with "cycling over approx 40 different classifiers"? Are
you testing different classifiers? If that's the case, a possibility is to
create a script that takes as argument the type of classifiers and runs the
classification across all folds. In that way you can submit 40 jobs and
parallelize across classifiers.
If that's not the case, because the folds are independent and deterministic
I would create a script that performs the classification on blocks of folds
(say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
to parallelize there.
I think that if you send a snippet of the code you're using it can be more
evident which are good points for parallelization.
Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to
make pymvpa2 computations faster and more efficient.
I often find myself in the situation of giving up with a particular analysis,
because it is going to take far more time that I can bear (weeks, months!). This
happens particularly with searchlight permutation testing (gnbsearchlight is
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation
in a sample of 18 patients and 18 controls (1 image x subject), training the
classifiers to discriminate patients from controls in a leave-one-pair-out
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
cycling over approx 40 different classifiers takes about 2 hours for each fold
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same
analysis may be completed in less than one day. This is the reason of my
interest and questions about parallelization.
Is there anything that you experts do in such situations to speed up or make the
computation more efficient?
Thank you again and best wishes,
Marco
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888 <+39%2002%202643%204888>
Fax ++39-02-26434892 <+39%2002%202643%204892>
Skype: mtettamantihttp://scholar.google.it/citations?user=x4qQl4AAAAAJ
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
--
Matteo Visconti di Oleggio Castello
Ph.D. Candidate in Cognitive Neuroscience
Dartmouth College

+1 (603) 646-8665
mvdoc.me || github.com/mvdoc || linkedin.com/in/matteovisconti
marco tettamanti
2017-11-24 20:57:50 UTC
Permalink
Dear Matteo,
thank you for kindly replying!

Yes, I do have the latest versions of joblib (0.11) and sklearn (0.19.1), see at
the bottom of this email.

The problem seems independent of either running in jupyter, or evoking ipython
or python directly in the console.

I am now wondering whether there may be something wrong in my snippet.
When first running your gist, I encountered an:

        UnboundLocalError: local variable 'best_clf' referenced before assignment

which I solved by moving the best_clf declaration a few lines down:

-----------------------------------------
#best_clfs = {}  #moved down 7 lines
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
                         for isplit, partitions in
enumerate(partitionerCD.generate(fds)))
best_clfs = {}
for tm in tms:
    confusion += tm.ca.stats
    best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
-----------------------------------------

But now, running the snippet in ipython/python specifically for the SVM
parallelization issue, I saw the error message popping up again:

         UnboundLocalError: local variable 'best_clf' referenced before assignment

May this be the culprit? As a reminder, the full snippet I am using is included
in my previous email.

Thank you and very best wishes,
Marco


In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other possible arguments
(['sources',
Out[21]:
Current date:   2017-11-24 21:23
PyMVPA:
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 Version control (GIT):
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
SYSTEM:
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
 Distribution:  debian/buster/sid
EXTERNALS:
 Present:       atlas_fsl, cPickle, ctypes, good
scipy.stats.rv_continuous._reduce_func(floc,fscale), good
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib,
liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge
2.4, mock, nibabel, nose, numpy, numpy_correct_unique, pprocess, pylab, pylab
plottable, pywt, pywt wp reconstruct, reportlab, running ipython env, scipy,
skl, statsmodels
 Absent:        afni-3dinfo, atlas_pymvpa, cran-energy, datalad, elasticnet,
glmnet, good scipy.stats.rdist, hcluster, lars, mass, nipy, nipy.neurospin,
numpydoc, openopt, pywt wp reconstruct fixed, rpy2, scipy.weave, sg ge 0.6.4, sg
ge 0.6.5, sg_fixedcachesize, shogun, shogun.krr, shogun.lightsvm, shogun.mpd,
shogun.svmocas, shogun.svrlight, weave
 Versions of critical externals:
  ctypes      : 1.1.0
  h5py        : 2.7.1
  hdf5        : 1.10.0
  ipython     : 5.5.0
  joblib      : 0.11
  lxml        : 4.1.0
  matplotlib  : 2.0.0
  mdp         : 3.5
  mock        : 2.0.0
  nibabel     : 2.3.0dev
  numpy       : 1.13.1
  pprocess    : 0.5
  pywt        : 0.5.1
  reportlab   : 3.4.0
  scipy       : 0.19.1
  skl         : 0.19.1
 Matplotlib backend: TkAgg
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
some ideas in random order
- what version of sklearn/joblib are you using? I would make sure to use
the latest version (0.11), perhaps not importing it from sklearn (unless
you have the latest sklearn version, 0.19.1)
- are you running the code in a jupyter notebook? There might be some
issues with that (see https://github.com/joblib/joblib/issues/174). As a
test you might try to convert your notebook to a script and then run it
Post by marco tettamanti
Dear Matteo (and others),
sorry, I am again asking for your help!
I have experimented with the analysis of my dataset using an adaptation of
your joblib-based gist.
As I wrote before, it works perfectly, but not with some classifiers: SVM
classifiers always cause the code to terminate with an error.
        myclassif=clfswh['!gnpp','!skl','svm']    #Note that 'gnnp' and 'skl'
were excluded for independent reasons
the code runs through without errors.
        myclassif=clfswh['!gnpp','!skl']
'[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(),
weight=[], probability=1,
         weight_label=[]), splitter=Splitter(space='partitions'),
'TypeError("can't
        pickle SwigPyObject objects",)'
After googling for what may cause this particular error, I have found that
the situation improves slightly (i.e. more splits executed, sometimes even
        import os
        from sklearn.externals.joblib import Parallel, delayed
        from sklearn.externals.joblib.parallel import parallel_backend
However, also in this case, the code invariably terminates with a long error
        <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>,
UnicodeEncodeError('ascii',
u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing
exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py
in
       _run_module_as_main(mod_name=\'ipykernel_launcher\',
alter_argv=1)\n    169     pkg_name = mod_name.rpartition(\'.\')[0]\n    170
       main_globals = sys.modules["__main__"].__dict__\n 171     if
alter_argv:\n    172         sys.argv[0] = fname\n    173     return
_run_code(code,
       main_globals, None,\n--> 174
I think I have sort of understood that the problem is due to some failure in
pickling the parallelized jobs, but I have no clues if and how it can be solved.
Do you have any suggestions?
Thank you and very best wishes,
Marco
########## * ##########
##########
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
print fds.summary()
voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and maxt
if desired
Summary for targets across chunks
  targets mean std min max #chunks
    C      0.5 0.5  0   1     18
    D      0.5 0.5  0   1     18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
    training = fds[fds_.sa.partitions == 1]
    #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
    testing = fds[fds_.sa.partitions == 2]
    #print list(zip(testing.sa.chunks, testing.sa.targets))
    """Select best model according to CVTE
    Helper function which we will use twice -- once for proper nested
    cross-validation, and once to see how big an optimistic bias due
    to model selection could be if we simply provide an entire dataset.
    Parameters
    ----------
    dstrain_ : Dataset
    clfs : list of Classifiers
      Which classifiers to explore
    Returns
    -------
    best_clf, best_error
    """
    best_error = None
        cv = CrossValidation(clf, partitionerCD)
        # unfortunately we don't have ability to reassign clf atm
        # cv.transerror.clf = clf
            error = np.mean(cv(dstrain_))
            # skip the classifier if data was not appropriate and it
            # failed to learn/predict at all
            continue
            best_clf = clf
            best_error = error
        verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
    verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
            % (len(clfs), best_clf.descr, best_error))
    return best_clf, best_error
# This function will run all classifiers for one single partitions
myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM classifier
def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
    verbose(2, "Processing split #%i" % isplit)
    dstrain, dstest = list(splitter.generate(partitions))
    best_clf, best_error = select_best_clf(dstrain, classifiers)
    # now that we have the best classifier, lets assess its transfer
    # to the testing dataset while training on entire training
    tm = TransferMeasure(best_clf,
splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'),
enable_ca=['stats'])
    tm(partitions)
    return tm
#import os
#from sklearn.externals.joblib import Parallel, delayed
#from sklearn.externals.joblib.parallel import parallel_backend
# Parallel estimate error using nested CV for model selection
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
# Here we are using joblib Parallel to parallelize each partition
# Set n_jobs to the number of available cores (or how many you want to use)
#    tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
                         for isplit, partitions in
enumerate(partitionerCD.generate(fds)))
# Parallel retuns a list with the results of each parallel loop, so we need to
# unravel it to get the confusion matrix
best_clfs = {}
    confusion += tm.ca.stats
    best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
##########
########## * ##########
Post by marco tettamanti
Dear Matteo,
grazie mille, this is precisely the kind of thing I was looking for: it
works like charm!
Ciao,
Marco
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
in your case, I would then recommend looking into joblib to parallelize
your for loops (https://pythonhosted.org/joblib/parallel.html).
As an example, here's a gist containing part of the PyMVPA's nested_cv
example where I parallelized the loop across partitions. I feel this is
what you might want to do in your case, since you have a lot more folds.
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
Post by marco tettamanti
Dear Matteo,
thank you for the willingness to look into my code.
This is taken almost verbatim from
http://dev.pymvpa.org/examples/nested_cv.html, except for the
leave-one-pair-out partitioning, and a slight reduction in the number of
classifiers (in the original example, they are around 45).
Any help or suggestion would be greatly appreciated!
All the best,
Marco
########## * ##########
##########
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
print fds.summary()
imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and
maxt if desired
Summary for targets across chunks
  targets mean std min max #chunks
    C      0.5 0.5  0   1     18
    D      0.5 0.5  0   1     18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
    training = fds[fds_.sa.partitions == 1]
    #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
    testing = fds[fds_.sa.partitions == 2]
    #print list(zip(testing.sa.chunks, testing.sa.targets))
    """Select best model according to CVTE
    Helper function which we will use twice -- once for proper nested
    cross-validation, and once to see how big an optimistic bias due
    to model selection could be if we simply provide an entire dataset.
    Parameters
    ----------
    dstrain_ : Dataset
    clfs : list of Classifiers
      Which classifiers to explore
    Returns
    -------
    best_clf, best_error
    """
    best_error = None
        cv = CrossValidation(clf, partitionerCD)
        # unfortunately we don't have ability to reassign clf atm
        # cv.transerror.clf = clf
            error = np.mean(cv(dstrain_))
            # skip the classifier if data was not appropriate and it
            # failed to learn/predict at all
            continue
            best_clf = clf
            best_error = error
        verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
    verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
            % (len(clfs), best_clf.descr, best_error))
    return best_clf, best_error
best_clfs = {}
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
    verbose(2, "Processing split #%i" % isplit)
    dstrain, dstest = list(splitter.generate(partitions))
    best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
    best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
    # now that we have the best classifier, lets assess its transfer
    # to the testing dataset while training on entire training
    tm = TransferMeasure(best_clf, splitter,
postproc=BinaryFxNode(mean_mismatch_error, space='targets'),
enable_ca=['stats'])
    tm(partitions)
    confusion += tm.ca.stats
##########
########## * ##########
Post by Matteo Visconti di Oleggio Castello
What do you mean with "cycling over approx 40 different classifiers"? Are
you testing different classifiers? If that's the case, a possibility is to
create a script that takes as argument the type of classifiers and runs the
classification across all folds. In that way you can submit 40 jobs and
parallelize across classifiers.
If that's not the case, because the folds are independent and deterministic
I would create a script that performs the classification on blocks of folds
(say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
to parallelize there.
I think that if you send a snippet of the code you're using it can be more
evident which are good points for parallelization.
Post by marco tettamanti
Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to
make pymvpa2 computations faster and more efficient.
I often find myself in the situation of giving up with a particular analysis,
because it is going to take far more time that I can bear (weeks, months!). This
happens particularly with searchlight permutation testing (gnbsearchlight is
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation
in a sample of 18 patients and 18 controls (1 image x subject), training the
classifiers to discriminate patients from controls in a leave-one-pair-out
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
cycling over approx 40 different classifiers takes about 2 hours for each fold
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same
analysis may be completed in less than one day. This is the reason of my
interest and questions about parallelization.
Is there anything that you experts do in such situations to speed up or make the
computation more efficient?
Thank you again and best wishes,
Marco
Post by Nick Oosterhof
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Post by marco tettamanti
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Skype: mtettamanti
http://scholar.google.it/citations?user=x4qQl4AAAAAJ
Matteo Visconti di Oleggio Castello
2017-11-28 20:45:47 UTC
Permalink
Hi Marco,

I think there are a bunch of conflated issues here.

- First, there was an error in my code, and that's why you got the
error " UnboundLocalError:
local variable 'best_clf' referenced before assignment". I updated the
gist, and now the example code for running the parallelization should be ok
and should work as a blueprint for your code (
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0).

- You are correct in changing the backend to 'threading' for this
particular case because of the pickling error.

- However, I think that the example for nested_cv.py didn't work from the
start, even without parallelization. The last change was 6 years ago, and
I'm afraid that things changed in between and the code wasn't updated. I
opened an issue on github to keep track of it (
https://github.com/PyMVPA/PyMVPA/issues/559).
Post by marco tettamanti
Dear Matteo,
thank you for kindly replying!
Yes, I do have the latest versions of joblib (0.11) and sklearn (0.19.1),
see at the bottom of this email.
The problem seems independent of either running in jupyter, or evoking
ipython or python directly in the console.
I am now wondering whether there may be something wrong in my snippet.
UnboundLocalError: local variable 'best_clf' referenced before
assignment
-----------------------------------------
#best_clfs = {} #moved down 7 lines
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
for isplit, partitions in enumerate(partitionerCD.
generate(fds)))
best_clfs = {}
confusion += tm.ca.stats
best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
-----------------------------------------
But now, running the snippet in ipython/python specifically for the SVM
UnboundLocalError: local variable 'best_clf' referenced before
assignment
May this be the culprit? As a reminder, the full snippet I am using is
included in my previous email.
Thank you and very best wishes,
Marco
In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other possible
arguments (['sources',
Current date: 2017-11-24 21:23
Version: 2.6.3
Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
is not under GIT"
OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
(2017-10-15)
Distribution: debian/buster/sid
Present: atlas_fsl, cPickle, ctypes, good
scipy.stats.rv_continuous._reduce_func(floc,fscale), good
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib,
liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp
ge 2.4, mock, nibabel, nose, numpy, numpy_correct_unique, pprocess, pylab,
pylab plottable, pywt, pywt wp reconstruct, reportlab, running ipython env,
scipy, skl, statsmodels
Absent: afni-3dinfo, atlas_pymvpa, cran-energy, datalad,
elasticnet, glmnet, good scipy.stats.rdist, hcluster, lars, mass, nipy,
nipy.neurospin, numpydoc, openopt, pywt wp reconstruct fixed, rpy2,
scipy.weave, sg ge 0.6.4, sg ge 0.6.5, sg_fixedcachesize, shogun,
shogun.krr, shogun.lightsvm, shogun.mpd, shogun.svmocas, shogun.svrlight,
weave
ctypes : 1.1.0
h5py : 2.7.1
hdf5 : 1.10.0
ipython : 5.5.0
joblib : 0.11
lxml : 4.1.0
matplotlib : 2.0.0
mdp : 3.5
mock : 2.0.0
nibabel : 2.3.0dev
numpy : 1.13.1
pprocess : 0.5
pywt : 0.5.1
reportlab : 3.4.0
scipy : 0.19.1
skl : 0.19.1
Matplotlib backend: TkAgg
Hi Marco,
some ideas in random order
- what version of sklearn/joblib are you using? I would make sure to use
the latest version (0.11), perhaps not importing it from sklearn (unless
you have the latest sklearn version, 0.19.1)
- are you running the code in a jupyter notebook? There might be some
issues with that (see https://github.com/joblib/joblib/issues/174). As a
test you might try to convert your notebook to a script and then run it
Dear Matteo (and others),
sorry, I am again asking for your help!
I have experimented with the analysis of my dataset using an adaptation of
your joblib-based gist.
As I wrote before, it works perfectly, but not with some classifiers: SVM
classifiers always cause the code to terminate with an error.
myclassif=clfswh['!gnpp','!skl','svm'] #Note that 'gnnp' and
'skl' were excluded for independent reasons
the code runs through without errors.
myclassif=clfswh['!gnpp','!skl']
'[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(),
weight=[], probability=1,
weight_label=[]), splitter=Splitter(space='partitions'),
'TypeError("can't
pickle SwigPyObject objects",)'
After googling for what may cause this particular error, I have found that
the situation improves slightly (i.e. more splits executed, sometimes even
import os
from sklearn.externals.joblib import Parallel, delayed
from sklearn.externals.joblib.parallel import parallel_backend
However, also in this case, the code invariably terminates with a long
error message (I only report an extract, but in case I can send the whole
<type 'str'>: (<type 'exceptions.UnicodeEncodeError'>,
UnicodeEncodeError('ascii',
u'JoblibAttributeError\n____________________________________
_______________________________________\nMultiprocessing
exception:\n................................................
...........................\n/usr/lib/python2.7/runpy.py in
_run_module_as_main(mod_name=\'ipykernel_launcher\',
alter_argv=1)\n 169 pkg_name = mod_name.rpartition(\'.\')[0]\n
170
main_globals = sys.modules["__main__"].__dict__\n 171 if
alter_argv:\n 172 sys.argv[0] = fname\n 173 return
_run_code(code,
main_globals, None,\n--> 174
I think I have sort of understood that the problem is due to some failure
in pickling the parallelized jobs, but I have no clues if and how it can be
solved.
Do you have any suggestions?
Thank you and very best wishes,
Marco
########## * ##########
##########
Version: 2.6.3
Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
is not under GIT"
OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
(2017-10-15)
print fds.summary()
<fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,
mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and
maxt if desired
Summary for targets across chunks
targets mean std min max #chunks
C 0.5 0.5 0 1 18
D 0.5 0.5 0 1 18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
training = fds[fds_.sa.partitions == 1]
#print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
testing = fds[fds_.sa.partitions == 2]
#print list(zip(testing.sa.chunks, testing.sa.targets))
"""Select best model according to CVTE
Helper function which we will use twice -- once for proper nested
cross-validation, and once to see how big an optimistic bias due
to model selection could be if we simply provide an entire dataset.
Parameters
----------
dstrain_ : Dataset
clfs : list of Classifiers
Which classifiers to explore
Returns
-------
best_clf, best_error
"""
best_error = None
cv = CrossValidation(clf, partitionerCD)
# unfortunately we don't have ability to reassign clf atm
# cv.transerror.clf = clf
error = np.mean(cv(dstrain_))
# skip the classifier if data was not appropriate and it
# failed to learn/predict at all
continue
best_clf = clf
best_error = error
verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
% (len(clfs), best_clf.descr, best_error))
return best_clf, best_error
# This function will run all classifiers for one single partitions
myclassif=clfswh['!gnpp','!skl'][5:6] #Testing a single SVM classifier
def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
verbose(2, "Processing split #%i" % isplit)
dstrain, dstest = list(splitter.generate(partitions))
best_clf, best_error = select_best_clf(dstrain, classifiers)
# now that we have the best classifier, lets assess its transfer
# to the testing dataset while training on entire training
tm = TransferMeasure(best_clf, splitter,postproc=
BinaryFxNode(mean_mismatch_error,space='targets'), enable_ca=['stats'])
tm(partitions)
return tm
#import os
#from sklearn.externals.joblib import Parallel, delayed
#from sklearn.externals.joblib.parallel import parallel_backend
# Parallel estimate error using nested CV for model selection
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
# Here we are using joblib Parallel to parallelize each partition
# Set n_jobs to the number of available cores (or how many you want to use)
# tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit,
partitions)
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
for isplit, partitions in enumerate(partitionerCD.
generate(fds)))
# Parallel retuns a list with the results of each parallel loop, so we need to
# unravel it to get the confusion matrix
best_clfs = {}
confusion += tm.ca.stats
best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
##########
########## * ##########
Dear Matteo,
grazie mille, this is precisely the kind of thing I was looking for: it
works like charm!
Ciao,
Marco
Hi Marco,
in your case, I would then recommend looking into joblib to parallelize
your for loops (https://pythonhosted.org/joblib/parallel.html).
As an example, here's a gist containing part of the PyMVPA's nested_cv
example where I parallelized the loop across partitions. I feel this is
what you might want to do in your case, since you have a lot more folds.
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
Dear Matteo,
thank you for the willingness to look into my code.
This is taken almost verbatim from http://dev.pymvpa.org/
examples/nested_cv.html, except for the leave-one-pair-out partitioning,
and a slight reduction in the number of classifiers (in the original
example, they are around 45).
Any help or suggestion would be greatly appreciated!
All the best,
Marco
########## * ##########
##########
Version: 2.6.3
Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
GIT information could not be obtained due "/usr/lib/python2.7/dist-packages/mvpa2/..
is not under GIT"
OS: posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
(2017-10-15)
print fds.summary()
<fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,
mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and
maxt if desired
Summary for targets across chunks
targets mean std min max #chunks
C 0.5 0.5 0 1 18
D 0.5 0.5 0 1 18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
training = fds[fds_.sa.partitions == 1]
#print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
testing = fds[fds_.sa.partitions == 2]
#print list(zip(testing.sa.chunks, testing.sa.targets))
"""Select best model according to CVTE
Helper function which we will use twice -- once for proper nested
cross-validation, and once to see how big an optimistic bias due
to model selection could be if we simply provide an entire dataset.
Parameters
----------
dstrain_ : Dataset
clfs : list of Classifiers
Which classifiers to explore
Returns
-------
best_clf, best_error
"""
best_error = None
cv = CrossValidation(clf, partitionerCD)
# unfortunately we don't have ability to reassign clf atm
# cv.transerror.clf = clf
error = np.mean(cv(dstrain_))
# skip the classifier if data was not appropriate and it
# failed to learn/predict at all
continue
best_clf = clf
best_error = error
verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
% (len(clfs), best_clf.descr, best_error))
return best_clf, best_error
best_clfs = {}
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
verbose(2, "Processing split #%i" % isplit)
dstrain, dstest = list(splitter.generate(partitions))
best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
# now that we have the best classifier, lets assess its transfer
# to the testing dataset while training on entire training
tm = TransferMeasure(best_clf, splitter,
postproc=BinaryFxNode(mean_mismatch_error,
space='targets'), enable_ca=['stats'])
tm(partitions)
confusion += tm.ca.stats
##########
########## * ##########
What do you mean with "cycling over approx 40 different classifiers"? Are
you testing different classifiers? If that's the case, a possibility is to
create a script that takes as argument the type of classifiers and runs the
classification across all folds. In that way you can submit 40 jobs and
parallelize across classifiers.
If that's not the case, because the folds are independent and deterministic
I would create a script that performs the classification on blocks of folds
(say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
to parallelize there.
I think that if you send a snippet of the code you're using it can be more
evident which are good points for parallelization.
Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to
make pymvpa2 computations faster and more efficient.
I often find myself in the situation of giving up with a particular analysis,
because it is going to take far more time that I can bear (weeks, months!). This
happens particularly with searchlight permutation testing (gnbsearchlight is
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation
in a sample of 18 patients and 18 controls (1 image x subject), training the
classifiers to discriminate patients from controls in a leave-one-pair-out
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
cycling over approx 40 different classifiers takes about 2 hours for each fold
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same
analysis may be completed in less than one day. This is the reason of my
interest and questions about parallelization.
Is there anything that you experts do in such situations to speed up or make the
computation more efficient?
Thank you again and best wishes,
Marco
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888 <+39%2002%202643%204888>
Fax ++39-02-26434892 <+39%2002%202643%204892>
Skype: mtettamantihttp://scholar.google.it/citations?user=x4qQl4AAAAAJ
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
--
Matteo Visconti di Oleggio Castello
Ph.D. Candidate in Cognitive Neuroscience
Dartmouth College

+1 (603) 646-8665
mvdoc.me || github.com/mvdoc || linkedin.com/in/matteovisconti
marco tettamanti
2017-11-29 11:26:19 UTC
Permalink
Hi Matteo,
thank you again for your precious help!

I have spotted a small typo in your gist. Line 105 should read:
    for tm in tms_parallel:


Apart from the the problems with the nested_cv.py example that you refer to, I
am experiencing some more troubles:
- backend='threading' does not seem to parallelize on my machine, only
backend='multiprocessing'
- while your 'nested_cv_parallel.py' gist runs smoothly, when I adapt it on my
dataset, partitioner, etc., I get the following error:

    tms_parallel, best_clfs_parallel = zip(*out_parallel)
TypeError: zip argument #1 must support iteration


I guess I am putting my nested analysis in standby for the moment. Hopefully
these issues will soon be solved.

Thank you for all!
Marco
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
I think there are a bunch of conflated issues here.
- First, there was an error in my code, and that's why you got the
local variable 'best_clf' referenced before assignment". I updated the
gist, and now the example code for running the parallelization should be ok
and should work as a blueprint for your code (
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0).
- You are correct in changing the backend to 'threading' for this
particular case because of the pickling error.
- However, I think that the example for nested_cv.py didn't work from the
start, even without parallelization. The last change was 6 years ago, and
I'm afraid that things changed in between and the code wasn't updated. I
opened an issue on github to keep track of it (
https://github.com/PyMVPA/PyMVPA/issues/559).
Post by marco tettamanti
Dear Matteo,
thank you for kindly replying!
Yes, I do have the latest versions of joblib (0.11) and sklearn (0.19.1), see
at the bottom of this email.
The problem seems independent of either running in jupyter, or evoking
ipython or python directly in the console.
I am now wondering whether there may be something wrong in my snippet.
        UnboundLocalError: local variable 'best_clf' referenced before assignment
-----------------------------------------
#best_clfs = {}  #moved down 7 lines
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
                         for isplit, partitions in
enumerate(partitionerCD.generate(fds)))
best_clfs = {}
    confusion += tm.ca.stats
    best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
-----------------------------------------
But now, running the snippet in ipython/python specifically for the SVM
         UnboundLocalError: local variable 'best_clf' referenced before
assignment
May this be the culprit? As a reminder, the full snippet I am using is
included in my previous email.
Thank you and very best wishes,
Marco
In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other possible arguments
(['sources',
Current date:   2017-11-24 21:23
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
 Distribution:  debian/buster/sid
 Present:       atlas_fsl, cPickle, ctypes, good
scipy.stats.rv_continuous._reduce_func(floc,fscale), good
scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5, ipython, joblib,
liblapack.so, libsvm, libsvm verbosity control, lxml, matplotlib, mdp, mdp ge
2.4, mock, nibabel, nose, numpy, numpy_correct_unique, pprocess, pylab, pylab
plottable, pywt, pywt wp reconstruct, reportlab, running ipython env, scipy,
skl, statsmodels
 Absent:        afni-3dinfo, atlas_pymvpa, cran-energy, datalad, elasticnet,
glmnet, good scipy.stats.rdist, hcluster, lars, mass, nipy, nipy.neurospin,
numpydoc, openopt, pywt wp reconstruct fixed, rpy2, scipy.weave, sg ge 0.6.4,
sg ge 0.6.5, sg_fixedcachesize, shogun, shogun.krr, shogun.lightsvm,
shogun.mpd, shogun.svmocas, shogun.svrlight, weave
  ctypes      : 1.1.0
  h5py        : 2.7.1
  hdf5        : 1.10.0
  ipython     : 5.5.0
  joblib      : 0.11
  lxml        : 4.1.0
  matplotlib  : 2.0.0
  mdp         : 3.5
  mock        : 2.0.0
  nibabel     : 2.3.0dev
  numpy       : 1.13.1
  pprocess    : 0.5
  pywt        : 0.5.1
  reportlab   : 3.4.0
  scipy       : 0.19.1
  skl         : 0.19.1
 Matplotlib backend: TkAgg
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
some ideas in random order
- what version of sklearn/joblib are you using? I would make sure to use
the latest version (0.11), perhaps not importing it from sklearn (unless
you have the latest sklearn version, 0.19.1)
- are you running the code in a jupyter notebook? There might be some
issues with that (see https://github.com/joblib/joblib/issues/174). As a
test you might try to convert your notebook to a script and then run it
Post by marco tettamanti
Dear Matteo (and others),
sorry, I am again asking for your help!
I have experimented with the analysis of my dataset using an adaptation of
your joblib-based gist.
As I wrote before, it works perfectly, but not with some classifiers: SVM
classifiers always cause the code to terminate with an error.
        myclassif=clfswh['!gnpp','!skl','svm']    #Note that 'gnnp' and
'skl' were excluded for independent reasons
the code runs through without errors.
        myclassif=clfswh['!gnpp','!skl']
'[TransferMeasure(measure=SVM(svm_impl='C_SVC', kernel=LinearLSKernel(),
weight=[], probability=1,
         weight_label=[]), splitter=Splitter(space='partitions'),
'TypeError("can't
        pickle SwigPyObject objects",)'
After googling for what may cause this particular error, I have found that
the situation improves slightly (i.e. more splits executed, sometimes even
        import os
        from sklearn.externals.joblib import Parallel, delayed
        from sklearn.externals.joblib.parallel import parallel_backend
However, also in this case, the code invariably terminates with a long
error message (I only report an extract, but in case I can send the whole
        <type 'str'>: (<type 'exceptions.UnicodeEncodeError'>,
UnicodeEncodeError('ascii',
u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing
exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py
in
_run_module_as_main(mod_name=\'ipykernel_launcher\', alter_argv=1)\n   
169     pkg_name = mod_name.rpartition(\'.\')[0]\n    170
       main_globals = sys.modules["__main__"].__dict__\n    171     if
alter_argv:\n    172         sys.argv[0] = fname\n 173     return
_run_code(code,
       main_globals, None,\n--> 174
I think I have sort of understood that the problem is due to some failure
in pickling the parallelized jobs, but I have no clues if and how it can be
solved.
Do you have any suggestions?
Thank you and very best wishes,
Marco
########## * ##########
##########
 Version:       2.6.3
 Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785
 Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2 (2017-10-15)
print fds.summary()
<fa: voxel_indices>, <a: imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and maxt
if desired
Summary for targets across chunks
  targets mean std min max #chunks
    C      0.5 0.5  0   1     18
    D      0.5 0.5  0   1     18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
    training = fds[fds_.sa.partitions == 1]
    #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
    testing = fds[fds_.sa.partitions == 2]
    #print list(zip(testing.sa.chunks, testing.sa.targets))
    """Select best model according to CVTE
    Helper function which we will use twice -- once for proper nested
    cross-validation, and once to see how big an optimistic bias due
    to model selection could be if we simply provide an entire dataset.
    Parameters
    ----------
    dstrain_ : Dataset
    clfs : list of Classifiers
      Which classifiers to explore
    Returns
    -------
    best_clf, best_error
    """
    best_error = None
        cv = CrossValidation(clf, partitionerCD)
        # unfortunately we don't have ability to reassign clf atm
        # cv.transerror.clf = clf
            error = np.mean(cv(dstrain_))
            # skip the classifier if data was not appropriate and it
            # failed to learn/predict at all
            continue
            best_clf = clf
            best_error = error
        verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
    verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
            % (len(clfs), best_clf.descr, best_error))
    return best_clf, best_error
# This function will run all classifiers for one single partitions
myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM classifier
def _run_one_partition(isplit, partitions, classifiers=myclassif): #see §§
    verbose(2, "Processing split #%i" % isplit)
    dstrain, dstest = list(splitter.generate(partitions))
    best_clf, best_error = select_best_clf(dstrain, classifiers)
    # now that we have the best classifier, lets assess its transfer
    # to the testing dataset while training on entire training
    tm = TransferMeasure(best_clf,
splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'),
enable_ca=['stats'])
    tm(partitions)
    return tm
#import os
#from sklearn.externals.joblib import Parallel, delayed
#from sklearn.externals.joblib.parallel import parallel_backend
# Parallel estimate error using nested CV for model selection
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
# Here we are using joblib Parallel to parallelize each partition
# Set n_jobs to the number of available cores (or how many you want to use)
#    tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit, partitions)
                         for isplit, partitions in
enumerate(partitionerCD.generate(fds)))
# Parallel retuns a list with the results of each parallel loop, so we need to
# unravel it to get the confusion matrix
best_clfs = {}
    confusion += tm.ca.stats
    best_clfs[tm.measure.descr] = best_clfs.get(tm.measure.descr, 0) + 1
##########
########## * ##########
Post by marco tettamanti
Dear Matteo,
grazie mille, this is precisely the kind of thing I was looking for: it
works like charm!
Ciao,
Marco
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
in your case, I would then recommend looking into joblib to parallelize
your for loops (https://pythonhosted.org/joblib/parallel.html).
As an example, here's a gist containing part of the PyMVPA's nested_cv
example where I parallelized the loop across partitions. I feel this is
what you might want to do in your case, since you have a lot more folds.
https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0
Post by marco tettamanti
Dear Matteo,
thank you for the willingness to look into my code.
This is taken almost verbatim from
http://dev.pymvpa.org/examples/nested_cv.html, except for the
leave-one-pair-out partitioning, and a slight reduction in the number of
classifiers (in the original example, they are around 45).
Any help or suggestion would be greatly appreciated!
All the best,
Marco
########## * ##########
##########
 Version:       2.6.3
 Hash: 9c07e8827819aaa79ff15d2db10c420a876d7785
 Path: /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc
 GIT information could not be obtained due
"/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"
 OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian 4.13.4-2
(2017-10-15)
print fds.summary()
imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163 max=9.88639
No details due to large number of targets or chunks. Increase maxc and
maxt if desired
Summary for targets across chunks
  targets mean std min max #chunks
    C      0.5 0.5  0   1     18
    D      0.5 0.5  0   1     18
#Evaluate prevalent best classifier with nested crossvalidation
verbose.level = 5
partitionerCD = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2), ('targets', ['C', 'D'])])], space='partitions')
# training partitions
    training = fds[fds_.sa.partitions == 1]
    #print list(zip(training.sa.chunks, training.sa.targets))
# testing partitions
    testing = fds[fds_.sa.partitions == 2]
    #print list(zip(testing.sa.chunks, testing.sa.targets))
#Helper function (partitionerCD recursively acting on dstrain, rather
    """Select best model according to CVTE
    Helper function which we will use twice -- once for proper nested
    cross-validation, and once to see how big an optimistic bias due
    to model selection could be if we simply provide an entire dataset.
    Parameters
    ----------
    dstrain_ : Dataset
    clfs : list of Classifiers
      Which classifiers to explore
    Returns
    -------
    best_clf, best_error
    """
    best_error = None
        cv = CrossValidation(clf, partitionerCD)
        # unfortunately we don't have ability to reassign clf atm
        # cv.transerror.clf = clf
            error = np.mean(cv(dstrain_))
            # skip the classifier if data was not appropriate and it
            # failed to learn/predict at all
            continue
            best_clf = clf
            best_error = error
        verbose(4, "Classifier %s cv error=%.2f" % (clf.descr, error))
    verbose(3, "Selected the best out of %i classifiers %s with error %.2f"
            % (len(clfs), best_clf.descr, best_error))
    return best_clf, best_error
best_clfs = {}
confusion = ConfusionMatrix()
verbose(1, "Estimating error using nested CV for model selection")
partitioner = partitionerCD
splitter = Splitter('partitions')
    verbose(2, "Processing split #%i" % isplit)
    dstrain, dstest = list(splitter.generate(partitions))
    best_clf, best_error = select_best_clf(dstrain, clfswh['!gnpp','!skl'])
    best_clfs[best_clf.descr] = best_clfs.get(best_clf.descr, 0) + 1
    # now that we have the best classifier, lets assess its transfer
    # to the testing dataset while training on entire training
    tm = TransferMeasure(best_clf, splitter,
postproc=BinaryFxNode(mean_mismatch_error, space='targets'),
enable_ca=['stats'])
    tm(partitions)
    confusion += tm.ca.stats
##########
########## * ##########
Post by Matteo Visconti di Oleggio Castello
What do you mean with "cycling over approx 40 different classifiers"? Are
you testing different classifiers? If that's the case, a possibility is to
create a script that takes as argument the type of classifiers and runs the
classification across all folds. In that way you can submit 40 jobs and
parallelize across classifiers.
If that's not the case, because the folds are independent and deterministic
I would create a script that performs the classification on blocks of folds
(say fold 1 to 30, 31, to 60, etc...), and then submit different jobs, so
to parallelize there.
I think that if you send a snippet of the code you're using it can be more
evident which are good points for parallelization.
Post by marco tettamanti
Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to
make pymvpa2 computations faster and more efficient.
I often find myself in the situation of giving up with a particular analysis,
because it is going to take far more time that I can bear (weeks, months!). This
happens particularly with searchlight permutation testing (gnbsearchlight is
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation
in a sample of 18 patients and 18 controls (1 image x subject), training the
classifiers to discriminate patients from controls in a leave-one-pair-out
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels,
cycling over approx 40 different classifiers takes about 2 hours for each fold
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same
analysis may be completed in less than one day. This is the reason of my
interest and questions about parallelization.
Is there anything that you experts do in such situations to speed up or make the
computation more efficient?
Thank you again and best wishes,
Marco
Post by Nick Oosterhof
There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.
Post by Matteo Visconti di Oleggio Castello
Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).
In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).
HTH,
Matteo
Post by marco tettamanti
Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.
Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?
Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?
Thank you and best wishes!
Marco
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Skype: mtettamanti
http://scholar.google.it/citations?user=x4qQl4AAAAAJ
Loading...