A list of puns related to "Multiprocessing"
Hello all!
I am new to Python programming and I have tried researching multiprocessing. However, I just can't seem to grasp the concept. For the code example below:
def doSomething():
value = sayHello()
fetchData(value) #function that could take an extended amount of time to run
I want to use multiprocessing because fetchData is a rather lengthy function and I want sayHello to be triggered without delay when doSomething is called (it is called constantly which is why time matters). Ideally, a queue of sorts is set up where sayHello keeps running and returning values while fetchData runs separately in the background with each corresponding value.
Unsure how to do this. Let me know if it is possible. Any help would be much appreciated :)
I am not reading files, i am just making a small curses terminal simulation.
Ive tried multiprocessing.Pool and imap / map, ThreadPoolExecutor and another multithreading thing that involved a map method. I dont really remember.
Hi Gang,
Id like to write a program that has a background task that runs at the start of every minutes and then checks a file, the minute is in the file it then changes a flag, which is then picked up by another task.
for example.
variable_in_file = 1503
task_flag = False
# background task1 running every minute.
# when the time is 1503 the background task sets the task flag to True
#background task2 checks the task_flag every 5 seconds and when it is set to True , runs a function and then sets it back to false.
I know this explanation is a bit loose, but Ive been looking into multiprocessing and i feel like my head is going to implode.
If someone can help me get my head around this i would be very grateful
Hi,
So I am looking into running a python script that uses multiprocessing.
Can I increase the number of cpus-per-task to a value higher than all cpus in a node? For example: i have several nodes with 16 cpus. I want to run a single task with 32 cpus, i.e use two nodes for one task and all cpus for a task.
Is this possible? Or am I always capped at the maximum numbers of a node?
Thanks
quick rundown of what I want to do:
>open website via selenium
>gather many objects from it (in this case it's js checkboxes)
>click them all AS QUICKLY AS POSSIBLE, which would entail engaging all my processor's cores, so that each core is responsible for approximately a quarter of the objects (since I have 4 cores)
I've already written the code so that it works like a charm WITHOUT multiprocessing.
So, if I want to use MP, the plan right now is to roughly do this:
import selenium_operations as sop #separate file in which I've defined some selenium-related functions related to mining stuff from a webpage
from multiprocessing import Process
if __name__ == '__main__':
boxes = sop.getBoxes(selection) #get the boxes. There's 200 of them
box_set_1 = Process(target=sop.clickBoxes, args=(boxes[0:50],))
box_set_2 = Process(target=sop.clickBoxes, args=(boxes[50:100],))
box_set_3 = Process(target=sop.clickBoxes, args=(boxes[100:150],))
box_set_4 = Process(target=sop.clickBoxes, args=(boxes[150:200],))
box_set_1.start() #click the boxes
box_set_2.start()
box_set_3.start()
box_set_4.start()
box_set_1.join()
box_set_2.join()
box_set_3.join()
box_set_4.join()
Look good? Or is tere something I should be aware of?
p.s. am using Chrome
I have a script that does a task, and when the task runs into a certain case it stops and asks the user for input. I have seen some things that looked helpful on SO and tried to implement them, but they didnt do what i needed.
he relevant part is this one, here i am trying to get user input in the function
elif keepgoing == False:
feedback = q.get()
fn = sys.stdin.fileno()
print(feedback)
self.Main(feedback, weaponPos, gamblePos, wipePos, colors, q, fn)
anyways, this is the is my current code, a bit botched because i tried a few things myself and because i didnt do much with multithreaded programming yet
import pyscreenshot as ImageGrab
import pydirectinput as pyautogui
import time
import os
import sys
from multiprocessing import Process, freeze_support, JoinableQueue
class AutoGambler:
def Main(self, *args):
#print(args)
intervals = args[0]
weaponPos = args[1]
gamblePos = args[2]
wipePos = args[3]
colors = args[4]
q = args[5]
sys.stdin = os.fdopen(args[6])
#print(colors)
keepgoing = True
x = 0
print("intervals: " + str(intervals))
print("keepgoing: " + str(keepgoing))
while x < int(intervals) and keepgoing == True:
self.gamble(weaponPos, gamblePos, wipePos)
#print("gamble succeeded")
keepgoing = self.checkPix()
#print("checkPix suceeded")
x+=1
print(x)
if x == int(intervals):
break
if x >= int(intervals):
feedback = input(f"{intervals} amount of retries reached. Go again? (Enter amount of retries, default 10)") or 10
self.Main(feedback, weaponPos, gamblePos, wipePos, colors, q)
elif keepgoing == False:
feedback = q.get()
fn = sys.stdin.fileno()
print(feedback)
self.Main(feedback, weaponPos, gamblePos, wipePos, colors, q, fn)
def doubleclickBox(self, horz,vert):
#608 | 683 "0/0"
#640 | 715 erste box
#930 | 875 letzte box
#32 abstand
xcoord = 608+32*horz
ycoord = 683+32*vert
time.sleep(0.01)
py
... keep reading on reddit β‘Code in a Pastebin link at the bottom, explanation here.
I think it will help to explain what this project is. I'm a graduate student analyzing some of my simulations. The simulations produce a trajectory which has the positions and velocities of all the simulated atoms at a bunch of different frames, just like a movie. The code runs perfectly fine in serial and I have used that to get great results before. However, we've extended the simulations and they now have ~50k frames. The computations aren't fast so the estimated time for running in serial is ~55 hours. The cluster I run on has a 24 hour limit but also has 24 cores so this feels like the perfect place to use multiprocessing. I should also mention that I use a ton of Python in my work so I feel like I'm getting better, but I've had no formal training of any kind so I am not an expert by any means. Multiprocessing in particular is totally new to me (and a constant headache).
So the goal of the code is to simply chop up the 50k frames into n number of sections and have n processes compute each leg of the trajectory simultaneously. For reference, the output is 3 computed values: convexity, mcurvature and gcurvature (and I'm really bad about keeping that consistent so also various abbreviations of those). Convexity is a single value, the curvatures are both variable length arrays. I'm storing all the results as dictionaries because it can handle the different data types and array lengths very easily and makes combining and sorting different times very easy. Also note that due to the large number of frames and large data output, it's really easy to fill up the queue. I had a bunch of issues where the processes would get hung because the queue was full. The solution I worked out was to dedicate half the processes to be writer processes that carry out the analysis and then shove results to a Queue and then half reader processes that pull from the Queue and write to a dictionary. Of course... "solution" may be an overstatement. It was a solution to a previous problem so maybe now it needs updating again.
The issue is that multiprocessing is not doing what it's supposed to be doing: decreasing the computation time. The latest version shown in the attached code, running a test of 100 frames produces the timings:
That time increase obviously isn't much, but this is only 100 frames out of
... keep reading on reddit β‘Hi everyone,
I'm trying to look into running concurrent processes. I've tried getting multiprocessing to work but I keep getting the same error, no matter how complicated the code is:
>ModuleNotFoundError: No module named 'multiprocessing.spawn'; 'multiprocessing' is not a package
This was caused by the following code:
However, when running the threading-equivilant, there are no issues:
Can someone please shed some light on how to get multiprocessing to work? I've tried upgrading Python, installing Visual Studio's C++ compiler/packages and reading through google. I'm still relatively new to Python and I get lost pretty quickly, so please use noob-friendly language :)
Thanks.
edit: Reddit butchered the formatting. Uploaded to pastebin instead.
SOLVED! K900_ and chewy1970 fixed it. In an effort to help other people with the same issue, ensure you have no other scripts called multiprocessing.py.
Hey all, was checking out some libraries for work on how we can slot in multiprocessing with minimal fuss, and decided to write up the initial investigation.
Its by no means comprehensive, just hope it might be useful in comparing a few popular libraries and how to set them all up.
If there are good libraries I've missed, please let me know and I'll look into them!
I'm using a library that uses multiprocessing.pool.Pool in our Lambda functions, but after upgrading 3.7 -> 3.8 multiprocessing.pool.ThreadPool
is no longer working.
The error I'm getting is:
OSError: [Errno 38] Function not implemented
File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 57, in __init__
sl = self._semlock = _multiprocessing.SemLock
I'm pretty sure the reason is that:
synchronize.Lock
doesn't work in lambda for any version of Python (lambda has no /dev/shm, and no write access to /dev in lambda - see: https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda)ThreadPool
is now using synchronize.Lock
from version 3.8I can't find exactly why ThreadPool now uses synchronize.Lock, but given the wide usage of AWS lambda and other environments that don't have /dev/shm (assuming there are a few because the unit tests run into this as well: https://bugs.python.org/issue38377) - is there anything to work around this?
synchronize.Lock
use /tmp if /dev is not writable and /dev/shm isn't available?Any ideas or suggestions would be much appreciated.
Context:
I have two heavy machine learning inferences models running whose webcam streams are being streamed with flask.
These two models are different and have different weights.
Currently I have them setup as /blueprint1 and /blueprint2 for ML model 1 and ML model 2 respectively.
But I Think my implementation is not safe since flask is switching threads for both weights and this is causing few seconds of lag in the webcam streams. I am not sure if this thread switching concept is even correct.
What I want to do:
I was thinking of some easy solution to run these like two different apps ( multiprocessing ).
Hello,
I have a small application for generating gif files from videos, I started that as a learning project and I'm still adding features as learning exercises.
The code is quite simple, there's a for loop scanning a folder recursively, with magic I'm checking if the file is a video file, if it is then I'm appending that file into a list. Once this is done I'm processing the items in the list with my generate_gif() method. Fairly straight forward, nothing complicated.
However, this way I can process only one video at a time. I thought it would be good if I could process multiple videos simultaneously. After doing some research I decided to use pool. Here's how I tried it.
video_files is a list of strings, filepath to each video file.
if __name__Β ==Β '__main__':
withΒ ProcessPoolExecutor(max_workers=4)Β asΒ pool:
forΒ videoΒ in range(len(video_files)):
pool.submit(generate_gif,Β video_files[video])
(screenshot of the code: https://gyazo.com/d32d2ea01a7f76e7c1a7f41b30cc8c29 )
As you expect, that didn't work. Program spits out all the file names to the console then process only one file and exit (gracefully). I would like to understand how should I approach this problem and what I'm doing wrong. I'd be really happy if someone could explain what I'm doing wrong and maybe some sample code showing how to process items from a list with multiple workers.
(I have the full code in github but not adding a link just in case it would look like I'm promoting).
Thanks!
Rooti
(I tried to format the code properly but I'm struggling. sorry)
Just started using parallel processing and loving it.
Currently my program is slow due to computations and parallel processing (quad core) is a godsend. However, I have another program I'm considering implementing parallel processing that takes significantly more memory, so much so, I believe it uses harddrive as virtual memory.
In this case, would parallel processing be a benefit? Am I correct in saying that my program would use 4x more memory?
Does anyone know a good resource for intermediate multiprocessing examples? If someone would be willing to walk me through an intermediate level example that would be even more helpful too. All I can find online is either extremely basic or so advanced I can't make any changes without breaking everything.
I've watched at least 15 hours worth of tutorials, and I've been trying for the past 3 days to get anywhere but every tutorial just shows extremely basic examples and I can't get passed anything more than basic stuff in the tutorials. If I try to implement anything even remotely complex I run straight into a brick-wall with no clear idea of what is going wrong or why.
I feel like I understand all the theoretical stuff but I just can't execute anything effectively, and it's driving me crazy
Short rant: Going through the questions on my mock exam and I'm having trouble with multiprocessing, my exam is tomorrow and I missed so much 'cause of a kidney stone and tendonitis at the same time; that pain coupled with ADHD levels of concentration mean I've learnt nothing and I feel useless right now. For the first time, I'm not enjoying python and I just feel lost and stressed.
Going through the questions on my mock exam and I'm having trouble with multiprocessing
So here is my actual mock exam question:
"By the following convergent series pi can be estimated analytically exactly (note - this does not mean that this series converges to pi!):
conv = lambda n: 1/n**2
def conv_series(n):
sum = 0
for i in range(1,n+1):
sum += conv(i)
return sum
Unfortunately, this series converges very slowly.
You would like to determine the convergence value of the series to 10 decimal places. This requires a very large number of terms in the series - n = 1000000000 or more sum elements may be necessary.
Consider how you can parallelize conv_series(n) to speed up the calculation of the series by multiprocessing. Calculate the sum with (at least) n = 1000000000 terms.
What are the 10 decimal places? (E.g. if you would have the result 2.7819512345, then enter 7819512345 as result)"
So far I have managed to get this:
import multiprocessing
from multiprocessing import Pool
n = 1000000000
x = 60 # maximum multiprocesses.. not sure if this matters?
conv = lambda n: 1 / n ** 2
def conv_series(n):
sum = 0
for i in range(1, n + 1):
sum += conv(i)
return sum
if __name__ == '__main__':
with Pool(x) as pool:
p = multiprocessing.Process()
for item in pool.map(conv_series, (n,)):
print(item)
although this is just doing the same process multiple times right?
How would I actually speed this process up as the question asks?
Hi all,
I have been tinkering with the multiprocessing module and have been following this tutorial (https://realpython.com/courses/functional-programming-python/), but I ran into a small problem.
When I run this code:
import collections
import time
import multiprocessing
# New named_tuple which describes a scientsist
Scientist = collections.namedtuple("Scientist", ['name', 'field', 'born', 'nobel'])
# some sample scientists
scientists = (
Scientist(name='Ada Lovelace', field='math', born=1815, nobel=False),
Scientist(name='Emmy Noether', field='math', born=1882, nobel=False),
Scientist(name='Marie Curie', field='physics', born=1867, nobel=True),
)
# some function
def transform(scientist):
# scientist : named_tuple
time.sleep(1)
return {'name': x.name, 'age': 2021-x.born}
start = time.time()
pool = multiprocessing.Pool()
result = pool.map(transform, scientists)
pool.close()
pool.join()
my program seems to crash with the error code:
RuntimeError:
An attempt has been made to start a new process before the current process has finished its bootsrapping phase
Moreover, the program doesn't exit but keeps re-running the script.
But when I replace the multiprocessing part it works:
if __name__ == '__main__':
pool = multiprocessing.Pool()
result = pool.map(transform, scientists)
pool.close()
pool.join()
So my question is basically why? Why does the modified part (with the "main" function) work while the other one doesn't?
Thanks in advance, any and all help (and guesses) is appreciated!
Best
How to extend existing multiprocessing framework to mpi4py in python?
With multiprocessing, starts by physically splitting big npz file into small chunks and every Chuck processed individually then all output chunks collected and gathered into a single processed npz file.
Input files are in npz format containing multiple np.arrays.
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.