- This topic has 1 reply, 1 voice, and was last updated 1 year, 3 months ago by
Xaveer.
-
AuthorPosts
-
13 December 2023 at 10:37 #7056
szilardio
ParticipantHello,
I was wondering if I could get some information about what possible options there would be for efficiently pickling the Cell objects that are created by Nazca.
The intended use-case for this would be to allow for parallelized generation of Nazca Cells, each of which are complex enough to take on the order of ~mins to build. Python multiprocessing has some limitations of its own in terms of how the spawned processes are allowed to interact with the main script execution. Unfortunately in Python, the processes which are being invoked live in their own block of memory, and therefore have no direct access to the main execution script’s variables.
What this means in practice is that the Cell objects generated by the processes running in parallel need to be pickled/serialized so that they can be passed back to the main script. This however also means that more complex Cell objects (as is the case in this example) end up being massive when pickled, and also need to be unpickled on the main script side so that the Cell object representation is regained for further use. This is both insanely memory intensive and also causes the execution to slow down when the pickling/unpickling happens. It also scales very unfavorably with the number of Cells that need to be pickled/unpickled in a single go (such as if they all get put into a list object).
While the regular “pickle” module of Python does not recognize Nazca Cells as valid, both cloudpickle and dill were successful in serializing the Cells (with cloudpickle being relatively fast). So I know that the implementation on my end works in practice.
Now the questions I’d have are:
- Is there a way to reduce the complexity of Cell objects (maybe by removing objects within it that do not form a critical part of the put() method) so that the pickling process is much faster?
- If yes, what it the absolute minimum object set required for Nazca to still be able to put cells?
- If not, would there be a possibility of having a custom pickling algorithm for Cell objects that decomposes them into a flat, dictionary format efficiently?
- If pickling Cells will always be fundamentally slow, what alternatives would there be to transfer Cell objects between processes in separate memory blocks?
- If (efficient) multiprocessing is out of the question as a result of the previous questions having no resolution, how could Nazca Cells be made compatible with python multithreading? In this case at least pickling could in principle be avoided as memory is shared across the entire execution span.
Thank you very much in advance for any insights/suggestions.
Best,
Szilard
18 December 2023 at 18:17 #7060Xaveer
ModeratorDear Szilard,
I’m not sure what complex cells you create that take minutes to build. Maybe there is a better way to create them. But I would need more information to be of help, if possible.
For an alternative to pickling, I would suggest you try to just generate one or more libraries of cell objects in one or more GDS files. Writing those should be quite efficient and can be done in parallel.
In writing you would lose nazca-specific information, but in your case that is probably not an issue.
A final script would then read the libraries and place the cells. I’m not sure if it would be more efficient than what you’ve tried, but I think it’s easy to implement and worth a try.
Xaveer
- Is there a way to reduce the complexity of Cell objects (maybe by removing objects within it that do not form a critical part of the put() method) so that the pickling process is much faster?
-
AuthorPosts
- You must be logged in to reply to this topic.