nevopy.processing package¶
Submodules¶
nevopy.processing.base_scheduler module¶
Defines a common interface for processing schedulers.
This module contains a base model for a processing scheduler, the entity responsible for managing the computation of a population’s fitness in NEvoPy’s algorithms. Schedulers allow the implementation of the computation methods (like the use of serial or parallel processing) to be separated from the implementation of the neuroevolutionary algorithms.
-
nevopy.processing.base_scheduler.
TProcItem
¶ TypeVar
indicating an item to be scheduled for processing by aProcessingScheduler
. Alias for TypeVar(“TProcItem”).- Type
TypeVar
-
nevopy.processing.base_scheduler.
TProcResult
¶ TypeVar
indicating the result of processing aTProcItem
. Alias for TypeVar(“TProcResult”).- Type
TypeVar
-
class
nevopy.processing.base_scheduler.
ProcessingScheduler
¶ Bases:
abc.ABC
Defines a common interface for processing schedulers.
In NEvoPy, a processing scheduler is responsible for managing the computation of the fitness of a population of individuals being evolved. This abstract class defines a common interface for processing schedulers used by different algorithms. Schedulers allow the implementation of the computation methods (like the use of serial or parallel processing) to be separated from the implementation of the neuroevolutionary algorithms.
Implementing your own processing scheduler is useful when you want to customize the computation of the population’s fitness. You can, for example, implement a scheduler that makes use of multiple CPU cores or GPUs (parallel processing).
-
abstract
run
(items, func)¶ Processes the given items and returns a result.
Main function of the scheduler. Call it to make the scheduler manage the processing of a batch of items.
- Parameters
items (Sequence[TProcItem]) – Iterable containing the items to be processed.
func (Optional[Callable[[TProcItem], TProcResult]]) – Callable (usually a function) that takes one item
TProcItem
as input and returns a resultTProcResult
as output. Generally,TProcItem
is an individual in the population andTProcResult
is the individual’s fitness. Since some scenarios requires the fitness of the population’s individuals to be calculated together, at once, the use of this parameter is not mandatory (this decision is a implementation particularity of each sub-classed scheduler). If additional arguments must be passed to the callable you want to use, it’s possible to use Python’sfunctools.partial
or to just wrap it with a simple function.
- Return type
List
[~TProcResult]- Returns
A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.
-
abstract
nevopy.processing.networked_scheduler module¶
Implementation of a processing scheduler take makes use of workers hosted in different machines in a network.
Todo
Implementation of the scheduler.
nevopy.processing.pool_processing module¶
Implements a processing scheduler that uses
multiprocessing.Pool
.
multiprocessing.Pool
is a built-in Python class that facilitates
parallel processing on a single machine. Note that it requires compatibility
with pickle.
-
class
nevopy.processing.pool_processing.
PoolProcessingScheduler
(num_processes=None, chunksize=None)¶ Bases:
nevopy.processing.base_scheduler.ProcessingScheduler
Processing scheduler that uses Python’s
multiprocessing.Pool
.This scheduler implements parallel processing (on a single machine) using Python’s built-in module
multiprocessing
, specifically, the classPool
.Note
Pool
uses, internally, pickle as the serialization method. This might be a source of errors due to incompatibility. Make sure you read the docs carefully before using this scheduler.Note
When the processing of individual items isn’t a very resource demanding task (e.g., learning the 2 variable XOR), using this scheduler might yield significantly better performance than using
RayProcessingScheduler
(due to ray’s greater overhead). However, in most situations, the performance difference is negligible and usingRayProcessingScheduler
as the processing scheduler is preferable to using this class, since ray is safer, scales better and allows clustering.- Parameters
num_processes (Optional[int]) – Number of worker processes to use. If None, then the number returned by
os.cpu_count()
is used.chunksize (Optional[int]) –
Pool.map()
, used internally by the scheduler, chops the input iterable into a number of chunks which it submits to the process pool as separate tasks. This parameter specifies the (approximate) size of these chunks.
-
close
()¶ Calls the equivalent method on the scheduler’s pool object.
-
join
()¶ Calls the equivalent method on the scheduler’s pool object.
-
run
(items, func)¶ Processes the given items and returns a result.
Main function of the scheduler. Call it to make the scheduler manage the parallel processing of a batch of items using
multiprocessing.Pool
.Note
Make sure that both items and func are serializable with pickle.
- Parameters
items (Sequence[TProcItem]) – Iterable containing the items to be processed.
func (Callable[[TProcItem], TProcResult]) – Callable (usually a function) that takes one item
TProcItem
as input and returns a resultTProcResult
as output. Generally,TProcItem
is an individual in the population andTProcResult
is the individual’s fitness.
- Return type
List
[~TProcResult]- Returns
A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.
-
terminate
()¶ Calls the equivalent method on the scheduler’s pool object.
nevopy.processing.ray_processing module¶
Implements a processing scheduler that uses the ray framework.
By using ray (https://github.com/ray-project/ray), the scheduler is able to implement parallel processing, either on a single machine or on a cluster.
-
class
nevopy.processing.ray_processing.
RayProcessingScheduler
(address=None, num_cpus=None, num_gpus=None, worker_gpu_frac=None, **kwargs)¶ Bases:
nevopy.processing.base_scheduler.ProcessingScheduler
Scheduler that uses ray to implement parallel processing.
Ray is an open source framework that provides a simple, universal API for building distributed applications. This scheduler uses it to implement parallel processing. It’s possible to either run ray on a single machine or on a cluster. For more information regarding the ray framework, checkout the project’s GitHub page: https://github.com/ray-project/ray.
It’s possible to view the ray’s dashboard at http://127.0.0.1:8265. It contains useful information about the distribution of work and usage of resources by ray.
When this class is instantiated, a new ray runtime is created. You should close existing ray runtimes before creating a new ray scheduler, to avoid possible conflicts. If, for some reason, you want to use a currently running ray runtime instead of creating a new one, pass True as argument to ignore_reinit_error.
This class is, basically, a simple wrapper for ray. If you’re an advanced user and this scheduler doesn’t meet your needs, it’s recommended that you implement your own scheduler by inheriting
ProcessingScheduler
.- Parameters
address (Optional[str]) – The address of the Ray cluster to connect to. If this address is not provided, then this command will start Redis, a raylet, a plasma store, a plasma manager, and some workers. It will also kill these processes when Python exits. If the driver is running on a node in a Ray cluster, using auto as the value tells the driver to detect the the cluster, removing the need to specify a specific node address.
num_cpus (Optional[int]) – Number of CPUs the user wishes to assign to each raylet. By default, this is set based on virtual cores (value returned by os.cpu_count()).
num_gpus (Optional[int]) –
Number of GPUs the user wishes to assign to each raylet. By default, this is set based on detected GPUs. If you are using TensorFlow, it’s recommended for you to execute the following piece of code before importing the module:
import os os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "true"
This will prevent individual TensorFlow’s sessions from allocating the entire GPU memory available.
worker_gpu_frac (Optional[float]) – Minimum fraction of a GPU a worker needs in order to use it. If there isn’t enough GPU resources available for a worker when a task is assigned to it, it will not use any GPU resources. Here we consider the number of workers as being equal to the number of virtual CPU cores available. By default, this fraction is set to
num_gpus / num_cpus
, which means that all workers will use the GPUs, each being able to access an equal fraction of them. Note that this might be a source of out of memory errors, since the GPU fraction assigned to each worker might be too low. It’s usually better to manually select a fraction.**kwargs – Optional named arguments to be passed to ray.init(). For a complete list of the parameters of ray.init(), check ray’s official docs (https://docs.ray.io/en/master/package-ref.html).
-
run
(items, func)¶ Processes the given items and returns a result.
Main function of the scheduler. Call it to make the scheduler manage the parallel processing of a batch of items using ray.
- Parameters
items (Sequence[TProcItem]) – Sequence containing the items to be processed.
func (Callable[[TProcItem], TProcResult]) – Callable (usually a function) that takes one item
TProcItem
as input and returns a resultTProcResult
as output. Generally,TProcItem
is an individual in the population andTProcResult
is the individual’s fitness. If additional arguments must be passed to the callable you want to use, it’s possible to use Python’sfunctools.partial
or to just wrap it with a simple function. The callable doesn’t need to be annotated with ray.remote, this is handled for you.
- Return type
List
[~TProcResult]- Returns
A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.
nevopy.processing.serial_processing module¶
Implements a simple wrapper for the serial processing of items.
-
class
nevopy.processing.serial_processing.
SerialProcessingScheduler
¶ Bases:
nevopy.processing.base_scheduler.ProcessingScheduler
Simple wrapper for the serial processing of items.
This is scheduler is just a wrapper for the serial processing of items, i.e., the processing of one item at a time. It doesn’t involve any explicit parallel processing.
-
run
(items, func)¶ Sequentially processes the input items.
- Parameters
items (Sequence[TProcItem]) – Iterable containing the items to be processed.
func (Callable[[TProcItem], TProcResult]) – Callable (usually a function) that takes one item
TProcItem
as input and returns a resultTProcResult
as output. Generally,TProcItem
is an individual in the population andTProcResult
is the individual’s fitness. If additional arguments must be passed to the callable you want to use, it’s possible to use Python’sfunctools.partial
or to just wrap it with a simple function.
- Return type
List
[~TProcResult]- Returns
A list containing the results of the processing of each item. It is guaranteed that the ordering of the items in the returned list follows the order in which the items are yielded by the iterable passed as argument.
-
Module contents¶
Imports core names of nevopy.processing
.