nevopy.processing package

Submodules

nevopy.processing.base_scheduler module

Defines a common interface for processing schedulers.

This module contains a base model for a processing scheduler, the entity responsible for managing the computation of a population’s fitness in NEvoPy’s algorithms. Schedulers allow the implementation of the computation methods (like the use of serial or parallel processing) to be separated from the implementation of the neuroevolutionary algorithms.

nevopy.processing.base_scheduler.TProcItem

TypeVar indicating an item to be scheduled for processing by a ProcessingScheduler. Alias for TypeVar(“TProcItem”).

Type

TypeVar

nevopy.processing.base_scheduler.TProcResult

TypeVar indicating the result of processing a TProcItem. Alias for TypeVar(“TProcResult”).

Type

TypeVar

class nevopy.processing.base_scheduler.ProcessingScheduler

Bases: abc.ABC

Defines a common interface for processing schedulers.

In NEvoPy, a processing scheduler is responsible for managing the computation of the fitness of a population of individuals being evolved. This abstract class defines a common interface for processing schedulers used by different algorithms. Schedulers allow the implementation of the computation methods (like the use of serial or parallel processing) to be separated from the implementation of the neuroevolutionary algorithms.

Implementing your own processing scheduler is useful when you want to customize the computation of the population’s fitness. You can, for example, implement a scheduler that makes use of multiple CPU cores or GPUs (parallel processing).

abstract run(items, func)

Processes the given items and returns a result.

Main function of the scheduler. Call it to make the scheduler manage the processing of a batch of items.

Parameters
  • items (Sequence[TProcItem]) – Iterable containing the items to be processed.

  • func (Optional[Callable[[TProcItem], TProcResult]]) – Callable (usually a function) that takes one item TProcItem as input and returns a result TProcResult as output. Generally, TProcItem is an individual in the population and TProcResult is the individual’s fitness. Since some scenarios requires the fitness of the population’s individuals to be calculated together, at once, the use of this parameter is not mandatory (this decision is a implementation particularity of each sub-classed scheduler). If additional arguments must be passed to the callable you want to use, it’s possible to use Python’s functools.partial or to just wrap it with a simple function.

Return type

List[~TProcResult]

Returns

A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.

nevopy.processing.networked_scheduler module

Implementation of a processing scheduler take makes use of workers hosted in different machines in a network.

Todo

Implementation of the scheduler.

nevopy.processing.pool_processing module

Implements a processing scheduler that uses multiprocessing.Pool.

multiprocessing.Pool is a built-in Python class that facilitates parallel processing on a single machine. Note that it requires compatibility with pickle.

class nevopy.processing.pool_processing.PoolProcessingScheduler(num_processes=None, chunksize=None)

Bases: nevopy.processing.base_scheduler.ProcessingScheduler

Processing scheduler that uses Python’s multiprocessing.Pool.

This scheduler implements parallel processing (on a single machine) using Python’s built-in module multiprocessing, specifically, the class Pool.

Note

Pool uses, internally, pickle as the serialization method. This might be a source of errors due to incompatibility. Make sure you read the docs carefully before using this scheduler.

Note

When the processing of individual items isn’t a very resource demanding task (e.g., learning the 2 variable XOR), using this scheduler might yield significantly better performance than using RayProcessingScheduler (due to ray’s greater overhead). However, in most situations, the performance difference is negligible and using RayProcessingScheduler as the processing scheduler is preferable to using this class, since ray is safer, scales better and allows clustering.

Parameters
  • num_processes (Optional[int]) – Number of worker processes to use. If None, then the number returned by os.cpu_count() is used.

  • chunksize (Optional[int]) – Pool.map(), used internally by the scheduler, chops the input iterable into a number of chunks which it submits to the process pool as separate tasks. This parameter specifies the (approximate) size of these chunks.

close()

Calls the equivalent method on the scheduler’s pool object.

join()

Calls the equivalent method on the scheduler’s pool object.

run(items, func)

Processes the given items and returns a result.

Main function of the scheduler. Call it to make the scheduler manage the parallel processing of a batch of items using multiprocessing.Pool.

Note

Make sure that both items and func are serializable with pickle.

Parameters
  • items (Sequence[TProcItem]) – Iterable containing the items to be processed.

  • func (Callable[[TProcItem], TProcResult]) – Callable (usually a function) that takes one item TProcItem as input and returns a result TProcResult as output. Generally, TProcItem is an individual in the population and TProcResult is the individual’s fitness.

Return type

List[~TProcResult]

Returns

A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.

terminate()

Calls the equivalent method on the scheduler’s pool object.

nevopy.processing.ray_processing module

Implements a processing scheduler that uses the ray framework.

By using ray (https://github.com/ray-project/ray), the scheduler is able to implement parallel processing, either on a single machine or on a cluster.

class nevopy.processing.ray_processing.RayProcessingScheduler(address=None, num_cpus=None, num_gpus=None, worker_gpu_frac=None, **kwargs)

Bases: nevopy.processing.base_scheduler.ProcessingScheduler

Scheduler that uses ray to implement parallel processing.

Ray is an open source framework that provides a simple, universal API for building distributed applications. This scheduler uses it to implement parallel processing. It’s possible to either run ray on a single machine or on a cluster. For more information regarding the ray framework, checkout the project’s GitHub page: https://github.com/ray-project/ray.

It’s possible to view the ray’s dashboard at http://127.0.0.1:8265. It contains useful information about the distribution of work and usage of resources by ray.

When this class is instantiated, a new ray runtime is created. You should close existing ray runtimes before creating a new ray scheduler, to avoid possible conflicts. If, for some reason, you want to use a currently running ray runtime instead of creating a new one, pass True as argument to ignore_reinit_error.

This class is, basically, a simple wrapper for ray. If you’re an advanced user and this scheduler doesn’t meet your needs, it’s recommended that you implement your own scheduler by inheriting ProcessingScheduler.

Parameters
  • address (Optional[str]) – The address of the Ray cluster to connect to. If this address is not provided, then this command will start Redis, a raylet, a plasma store, a plasma manager, and some workers. It will also kill these processes when Python exits. If the driver is running on a node in a Ray cluster, using auto as the value tells the driver to detect the the cluster, removing the need to specify a specific node address.

  • num_cpus (Optional[int]) – Number of CPUs the user wishes to assign to each raylet. By default, this is set based on virtual cores (value returned by os.cpu_count()).

  • num_gpus (Optional[int]) –

    Number of GPUs the user wishes to assign to each raylet. By default, this is set based on detected GPUs. If you are using TensorFlow, it’s recommended for you to execute the following piece of code before importing the module:

    import os
    os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "true"
    

    This will prevent individual TensorFlow’s sessions from allocating the entire GPU memory available.

  • worker_gpu_frac (Optional[float]) – Minimum fraction of a GPU a worker needs in order to use it. If there isn’t enough GPU resources available for a worker when a task is assigned to it, it will not use any GPU resources. Here we consider the number of workers as being equal to the number of virtual CPU cores available. By default, this fraction is set to num_gpus / num_cpus, which means that all workers will use the GPUs, each being able to access an equal fraction of them. Note that this might be a source of out of memory errors, since the GPU fraction assigned to each worker might be too low. It’s usually better to manually select a fraction.

  • **kwargs – Optional named arguments to be passed to ray.init(). For a complete list of the parameters of ray.init(), check ray’s official docs (https://docs.ray.io/en/master/package-ref.html).

run(items, func)

Processes the given items and returns a result.

Main function of the scheduler. Call it to make the scheduler manage the parallel processing of a batch of items using ray.

Parameters
  • items (Sequence[TProcItem]) – Sequence containing the items to be processed.

  • func (Callable[[TProcItem], TProcResult]) – Callable (usually a function) that takes one item TProcItem as input and returns a result TProcResult as output. Generally, TProcItem is an individual in the population and TProcResult is the individual’s fitness. If additional arguments must be passed to the callable you want to use, it’s possible to use Python’s functools.partial or to just wrap it with a simple function. The callable doesn’t need to be annotated with ray.remote, this is handled for you.

Return type

List[~TProcResult]

Returns

A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.

nevopy.processing.serial_processing module

Implements a simple wrapper for the serial processing of items.

class nevopy.processing.serial_processing.SerialProcessingScheduler

Bases: nevopy.processing.base_scheduler.ProcessingScheduler

Simple wrapper for the serial processing of items.

This is scheduler is just a wrapper for the serial processing of items, i.e., the processing of one item at a time. It doesn’t involve any explicit parallel processing.

run(items, func)

Sequentially processes the input items.

Parameters
  • items (Sequence[TProcItem]) – Iterable containing the items to be processed.

  • func (Callable[[TProcItem], TProcResult]) – Callable (usually a function) that takes one item TProcItem as input and returns a result TProcResult as output. Generally, TProcItem is an individual in the population and TProcResult is the individual’s fitness. If additional arguments must be passed to the callable you want to use, it’s possible to use Python’s functools.partial or to just wrap it with a simple function.

Return type

List[~TProcResult]

Returns

A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.

Module contents

Imports core names of nevopy.processing.