A list of puns related to "Multi Threaded"
Initial benchmarks of AMD's new flagship probably could be better as this is on engineering samples:
https://wccftech.com/amd-epyc-7773x-milan-x-flagship-cpu-benchmarked-in-dual-socket-configuration-scores-almost-30000-multi-threaded-points-in-cpu-z/
Milan is already killing Intel's FUTURE servers planned this year after delays.. now this... Note expected price higher at $9000 vs Chinese ES on sale..
That's what KeyBanc and Wells Fargo talked about regarding AMD's datacenters revenue Ryzen in 2022 getting higher market share. The PPS will follow.
It's a deadlock issue. Originally it was 1/20. Made some changes then 1/100. Now it's 1/10000. It's incredibly difficult to reproduce it. My question is, is it even worth debugging/fixing if it's so rare?
I realize multi-threaded Minecraft is a bit of a meme, but I want to discuss some theories around how it could be done if you had lots of money and time and such. I came across this because I was working on an old cellular automata project and realized I had no way to make it multi-threaded. If I could make it work on my project, the same thing would β in theory β work in Minecraft, right? I might try implementing the best proposal in my cellular automata project just for the heck of it, so this is not purely academic.
This is also extremely programming-heavy and doesn't really involve FeedTheBeast in any way, but this is the only place where I know I can find experienced Minecraft programmers. I know programming very well in general, but I don't know anything about Minecraft source code. And finally, I'm sure this problem has been solved somewhere before, but I haven't been able to find it.
Some specifics:
- For the sake of argument, write this in any framework/language. For example, write it for Bedrock, Minestorm, or a Rust rewrite. It doesn't even have to be for Minecraft, just any voxel-based environment of sufficient complexity, like Minetest.
- True arbitrary multiprocessing. As threads/cores/memory approach infinity, so should performance. Putting lighting or chunk loading on a different thread doesn't count.
- Two block updates in different chunks which do not affect each other should usually run on separate threads.
- Focus on server only.
- Arbitrary number of players in a world is more important than arbitrary performance in a single area. 5000 players > 5000 cows.
Vaguely, each player gets their own region. All of the chunks in their region are processed on a separate thread. As they move around, their region moves with them. If two players walk close enough that their regions touch, you get a region sync and merge. This means no matter how many players are at a base, it still only runs one thread. All block updates that exit a region add those chunks to the previous region. When a redstone line runs thousands of blocks into unloaded chunks, the region reshapes to include all of them. If the redstone line activates a chunk loader, then all of those chunks get added to the region as well. Each dimension would always have separate regions, so going to the nether would not cause a region merge, it would just move the play
... keep reading on reddit β‘Thank you ZOS!! Finally!! I need to test this more but so far this has fixed any weird fps drops that I used to get for no apparent reason!! Would dip as low to 40 fps before in random locations!! Im runnin a strix 2080ti i9 9900k and 32 gigs of ram. 4k with a ton of addons, reshade,unlimited draw distance mod, and set mips to -3. Game is smooth as butter on the 65 inch 4k tv @60fps.
I was baffled that my xbox series X version ran smoother in towns. This is no longer the case! Now all I need is the option to have all my crown purchases and CP be shared between platforms!! I dont give a fuck about losing my xbox toons and inventory, but if they could find a way for just CP and crown purchases to be shared across platforms I would be in heaven!!
This simple program:
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <time.h>
#define N_THREADS 4 //(4*8)
#define LIMIT 1000000000
#define CONSECUTIVE_ITERS (LIMIT/N_THREADS)
long RESULTS_THREADS[N_THREADS];
clock_t TIMES_THREADS[N_THREADS];
void* threadRoutine(void* arg) {
clock_t start = clock();
int threadIndex = *((int*) arg);
free(arg);
int i = threadIndex * CONSECUTIVE_ITERS;
long totalSum = 0;
for (int j = i; j < i + CONSECUTIVE_ITERS; ++j)
totalSum += j;
RESULTS_THREADS[threadIndex] = totalSum;
TIMES_THREADS[threadIndex] = clock() - start;
}
int main() {
clock_t start_parallel = clock();
pthread_t threads[N_THREADS];
for(int i = 0; i < N_THREADS; ++i) {
int* index = malloc(sizeof(int));
*index = i;
pthread_create(&threads[i], NULL, threadRoutine, (void*) index);
}
long result_parallel = 0;
for(int i = 0; i < N_THREADS; ++i) {
pthread_join(threads[i], NULL);
result_parallel += RESULTS_THREADS[i];
}
double time_parallel = ((double) (clock() - start_parallel))/CLOCKS_PER_SEC;
clock_t start_sequential = clock();
long result_sequential = 0;
for(int k = 0; k < LIMIT; ++k)
result_sequential += k;
double time_sequential = ((double) (clock() - start_sequential))/CLOCKS_PER_SEC;
printf("parallel: \t\t%ld\n", result_parallel);
printf("Sequential: \t\t%ld\n\n", result_sequential);
printf("Time parallel: %fs\n", time_parallel);
for (int i = 0; i < N_THREADS; ++i) {
double time_thread = ((double) TIMES_THREADS[i])/CLOCKS_PER_SEC;
printf("Time thread %d: %fs\n", i, time_thread);
}
printf("Time sequential: %fs\n", time_sequential);
exit(EXIT_SUCCESS);
}
Gives this result:
Time parallel: 2.043344s
Time thread 0: 2.043192s
Time thread 1: 2.032970s
Time sequential: 2.212173s
Which is something that I just don't understand. I have a AMD Ryzen 5 3550H, so 4 cores. This should be faster, because each individual thread does half the work as the main thread, but they take practically the same! And it actually g
... keep reading on reddit β‘I've been working on a functional programming language in the past few years and I'd like to share it with you, would be nice to have some feedback on it! The language is called "Clio" and you can find it here: https://github.com/clio-lang/clio or here: https://clio-lang.org
It has a minimal and noise-free syntax, a minimal type system, and also a gradual type checking system. It has a few innovations, for example, remote functions and built-in support for clustering and making distributed systems. It compiles to JavaScript, it's super fast [1], and it brings multi-threading to the browser.
Let me know what you think, any feedback is appreciated. I'm looking forward to hearing out your opinions so I can improve my language!
[1] https://pouyae.medium.com/clio-extremely-fast-multi-threaded-code-on-the-browser-e78b4ad77220
I'm looking for a little advice on a home server. I'm currently running a Synology DS720+ for file storage, Plex and PiHole. Then have a Dell OptiPlex 3070 (i5-9500T - 6 cores / 6 threads) running a bunch of containers.
I want something a bit more beefy to replace the OptiPlex, specifically, I want more cores with hyperthreading and more than a single NIC. I'd like to run Proxmox for a few VMs:
I'm concious of my power bill so I'd like something that can be fairly low power idling but still have a decent number of cores with hyperthreading.
I can't find any 1L / Micro form factor PCs that offer more than a single NIC. I see there's an Intel NUC 11 with duel NICs coming soon which is an option, but is on the more pricey side and would rule out the potential of running my NAS on the same machine due to the storage constaints.
I've looked a little bigger into the HPE MicroServer Gen10, but while the form factor, power consumption and HDD space would work for me, the CPU's a little weak. ServeTheHome has a great piece detailing upgrade options, with the Xeon E-2236 being the desired CPU, it's Β£270 on top of buying the machine for aroung Β£470 and that's before getting an SSD and RAM.
I could build something, and I'm open to that having built gaming PCs over the years, but not really sure what to look for with the CPU / Mobo / PSU for a home server.
TLDR - I want a smallish form factor home server with 6-8 cores hyperthreaded (12-16 threads) with multiple NICs, 2 would work, 4 would be perfect. What can you recommend?
If I have 10 blocking operations and 5 threads, I get that I can only run 5 of those operations at once, and none of my threads are free til the operation they were given completes.
I also get that if I make those operations non-blocking, I'm no longer limited in this way: I give an operation to a thread, it kicks it off, and then it's free for more work.
What I don't get is how this is so. Is the thread handing the operation off to the os and saying "tell me when it's done"? If so, how does the os not get jammed up in the same way as our blocking/5-thread app? It seems like at some point there must be a thing that waits on the operation to complete.
If it helps, here is an example of contrasting behavior: the first is async and parallel, the second blocking and parallel.
Simple example - suppose the following data are stored in the following data structure:
Design 1:
Single (but a long) Table:
e.g. daily financial data from 1960 - 2020 of a portfolio
Design 2:
Multi (but short) Table:
e.g. each table only has each year's daily financial data of a portfolio
To make things quicker for computation (such as multi-threads, multi-core processing or even API access), generally, would it be better to lump all data into one table (e.g. a dataframe) or split them up into smaller chunks of tables?
Imagine an API attempts to 'GET' values from Design 1; I'd think this would be longer than Design 2? But for multi-threaded, wouldn't Design 2 involve context switching i.e. more cost?
Why is JavaScript single-threaded?
Wouldn't it be better if they made it multi threaded and allow for real asynchronous code instead of the current event loop stuff we have now?
Are there engineering limitations that prevent this?
https://github.com/madMAx43v3r/chia-plotter
This is a new implementation of a chia plotter which is desinged as a processing pipeline, similar to how GPUs work, only the "cores" are normal software CPU threads.
As a result this plotter is able to fully max out any storage device's bandwidth, simply by increasing the number of "cores", ie. threads.
chia_plot <pool_key> <farmer_key> [tmp_dir] [tmp_dir2] [num_threads] [log_num_buckets]
For <pool_key> and <farmer_key> see output of `chia keys show`.
<tmp_dir> needs about 200G space, it will handle about 25% of all writes. (Examples: './', '/mnt/tmp/')
<tmp_dir2> needs about 110G space and ideally is a RAM drive, it will handle about 75% of all writes.
If <tmp_dir> is not specified it defaults to current directory.
If <tmp_dir2> is not specified it defaults to <tmp_dir>.
Make sure to crank up <num_threads>
if you have plenty of cores, the default is 4.
Depending on the phase more threads will be launched, the setting is just a multiplier.
RAM usage depends on <num_threads>
and <log_num_buckets>
.
With default <log_num_buckets>
and 4 threads it's ~2GB, with 16 threads it's ~6GB.
On a dual Xeon(R) E5-2650v2@2.60GHz R720 with 256GB RAM and a 3x800GB SATA SSD RAID0, using a 110G tmpfs for <tmp_dir2>
:
Number of Threads: 16
Number of Sort Buckets: 2^7 (128)
Working Directory: ./
Working Directory 2: ./ram/
[P1] Table 1 took 21.0467 sec
[P1] Table 2 took 152.6 sec, found 4295044959 matches
[P1] Lost 77279 matches due to 32-bit overflow.
[P1] Table 3 took 181.169 sec, found 4295030463 matches
[P1] Lost 62514 matches due to 32-bit overflow.
[P1] Table 4 took 223.303 sec, found 4295044715 matches
[P1] Lost 76928 matches due to 32-bit overflow.
[P1] Table 5 took 232.129 sec, found 4294967739 matches
[P1] Lost 235 matches due to 32-bit overflow.
[P1] Table 6 took 221.468 sec, found 4294932892 matches
[P1] Table 7 took 182.597 sec, found 4294838936 matches
Phase 1 took 1214.37 sec
[P2] max_table_size = 4295044959
[P2] Table 7 scan took 16.9198 sec
[P2] Table 7 rewrite took 44.796 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 47.5287 sec
[P2] Table 6 rewrite took 81.2195 sec, dropped 581301544 entries (13.5346 %)
[P2] Table 5 scan took 46.6094 sec
[P2] Table 5 rewrite took 77.9914 sec, dropped 76
... keep reading on reddit β‘Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.