Custom Serialization¶
When we communicate data between computers we first convert that data into a sequence of bytes that can be communicated across the network.
Dask can convert data to bytes using the standard solutions of Pickle and Cloudpickle. However, sometimes pickle and cloudpickle are suboptimal so Dask also supports custom serialization formats for special types. This helps Dask to be faster on common formats like NumPy and Pandas and gives power-users more control about how their objects get moved around on the network if they want to extend the system.
We include a small example and then follow with the full API documentation
describing the serialize
and deserialize
functions, which convert
objects into a msgpack header and a list of bytestrings and back.
Example¶
Here is how we special case handling raw Python bytes objects. In this case
there is no need to call pickle.dumps
on the object. The object is
already a sequnce of bytes.
def serialize_bytes(obj):
header = {} # no special metadata
frames = [obj]
return header, frames
def deserialize_bytes(header, frames):
return frames[0]
register_serialization(bytes, serialize_bytes, deserialize_bytes)
API¶
register_serialization (cls, serialize, …) |
Register a new class for custom serialization |
serialize (x) |
Convert object to a header and list of bytestrings |
deserialize (header, frames) |
Convert serialized header and list of bytestrings back to a Python object |
-
distributed.protocol.serialize.
register_serialization
(cls, serialize, deserialize)[source]¶ Register a new class for custom serialization
Parameters: cls: type
serialize: function
deserialize: function
See also
Examples
>>> class Human(object): ... def __init__(self, name): ... self.name = name
>>> def serialize(human): ... header = {} ... frames = [human.name.encode()] ... return header, frames
>>> def deserialize(header, frames): ... return Human(frames[0].decode())
>>> register_serialization(Human, serialize, deserialize) >>> serialize(Human('Alice')) ({}, [b'Alice'])
-
distributed.protocol.serialize.
serialize
(x)[source]¶ Convert object to a header and list of bytestrings
This takes in an arbitrary Python object and returns a msgpack serializable header and a list of bytes or memoryview objects. By default this uses pickle/cloudpickle but can use special functions if they have been pre-registered.
Returns: header: dictionary containing any msgpack-serializable metadata
frames: list of bytes or memoryviews, commonly of length one
See also
deserialize
- Convert header and frames back to object
to_serialize
- Mark that data in a message should be serialized
register_serialization
- Register custom serialization functions
Examples
>>> serialize(1) ({}, [b'\x80\x04\x95\x03\x00\x00\x00\x00\x00\x00\x00K\x01.'])
>>> serialize(b'123') # some special types get custom treatment ({'type': 'builtins.bytes'}, [b'123'])
>>> deserialize(*serialize(1)) 1