Lmst

Calculating the Fibonacci numbers on GPU

https://veitner.bearblog.dev/calculating-the-fibonacci-numbers-on-gpu/

#HackerNews #Calculating #Fibonacci #GPU #Performance #ParallelComputing #TechInnovation #Algorithms

Link: https://mediatum.ub.tum.de/?id=601795 (It took digging to find this from the Wikipedia article [1] and the unsecured HTTP homepage for "BMDFM".)

```bibtex
@phdthesis{dissertation,
author = {Pochayevets, Oleksandr},
title = {BMDFM: A Hybrid Dataflow Runtime Parallelization Environment for Shared Memory Multiprocessors},
year = {2006},
school = {Technische Universität München},
pages = {170},
language = {en},
abstract = {To complement existing compiler-optimization methods we propose a programming model and a runtime system called BMDFM (Binary Modular DataFlow Machine), a novel hybrid parallel environment for SMP (Shared Memory Symmetric Multiprocessors), that creates a data-dependence graph and exploits parallelism of user application programs at run time. This thesis describes the design and provides a detailed analysis of BMDFM, which uses a dataflow runtime engine instead of a plain fork-join runtime library, thus providing transparent dataflow semantics on the top virtual machine level. Our hybrid approach eliminates disadvantages of the parallelization at compile-time, the directive based paradigm and the dataflow computational model. BMDFM is portable and is already implemented on a set of available SMP platforms. The transparent dataflow paradigm does not require parallelization and synchronization directives. The BMDFM runtime system shields the end-users from these details.},
keywords = {Parallel computing;Shared memory multiprocessors;Dataflow;Automatic Parallelization},
note = {},
url = {https://mediatum.ub.tum.de/601795},
}
```

[1]: https://en.wikipedia.org/wiki/Binary_Modular_Dataflow_Machine

#SMP #Parallelization #Multithreading #DependenceGraph #RunTime #DataFlow #VirtualMachine #VM #ParallelComputing #SharedMemoryMultiprocessors #AutomaticParallelization #CrossPlatform #Virtualization #Configware #Transputer

📸 Full house at the OpenMP BOF at #ISC25 — over 140 attendees joined us in Hamburg! 🎉

Our session "What to Expect from OpenMP API Version 6.0" covered:

✅ A dive into key features of OpenMP 6.0
✅ A preview of 6.1 and 7.0
✅ Updates from toolchain developers
✅ Lively Q&A to help shape future OpenMP directions

Thanks to everyone who contributed — your feedback is powering the future of parallel programming! 💡

#OpenMP #HPC #ISC2025 #OpenMP6 #ParallelComputing #Supercomputing

We’re excited to welcome NextSilicon to the OpenMP Architecture Review Board! 🎉

Their Intelligent Compute Architecture blends adaptive computing with self-optimizing hardware/software and open frameworks like OpenMP. Together, we’re shaping a future of performant, portable, shared-memory parallelism. 💻🌐

Read the press release:
https://tinyurl.com/yksfbrah

#OpenMP #NextSilicon #HPC #OpenStandards #ParallelComputing

Join us at #ISC25 for the tutorial “Advanced OpenMP: Performance and 6.0 Features” on Friday, June 13, 9:00–13:00 CEST in Hall Y12, 2nd Floor, Hamburg Congress Center.

Learn how to boost OpenMP code performance on NUMA systems and accelerators, and get hands-on insights into vectorization, data locality, and the latest features in OpenMP 6.0.

Ideal for developers who want to go beyond the basics!

#HPC #OpenMP #ISC2025 #ParallelComputing

Just published the post "Parallel and distributed computing in GNU Health." :gnu: 🏥
https://meanmicio.org/2025/05/27/parallel-and-distributed-computing-in-gnu-health/
#ParallelComputing #GNUHealth #Tryton #OpenScience #GNU

Vediamo le funzionalità del Fortran introdotte nel 2008, con i CoArray, e nel 2018 per scoprire come si possono sfruttare tutti i core delle nostre CPU abbreviando i tempi di calcoli scientifici complessi. #fortran #parallelcomputing #multithreading
https://www.youtube.com/watch?v=78_12a89MWQ

🚀 Excited to announce that https://hachyderm.io/@mppf and Shreyas from the #ChapelLang project will be at #HPSFCon in Chicago this week! Come find us, talk all things parallel computing, open source, and the future of high-performance software.

🎙️ Don’t miss our presentation on Day 1 — check the full schedule at https://events.linuxfoundation.org/hpsf-conference/program/schedule/

Big thanks to The Linux Foundation and HPSF for hosting!

#ChapelLang #HPC #OpenSource #HPSFCon #LinuxFoundation #ParallelComputing

🏆 Hugo Krawczyk – ACM Paris Kanellakis Theory and Practice Award
For pioneering and lasting contributions to the theoretical foundations of cryptographically secure communications, and to the protocols that form the security foundations of the Internet.
🔗 https://bit.ly/4jBJjHX

👏 Congratulations to all the awardees shaping the future of computing!

#ACMTechnicalAwards #Cryptography #ParallelComputing #ComputerScience

I achieved a speedup of 10000 compared to the CPU implementation. I'm quite happy with that. Note that the chart uses a logarithmic scale, otherwise most of the runs would not be visible.

For example, the CPU took a bit more than 30 minutes to solve the biggest input, the parallel (CPU) version took a bit more than 8 minutes. Meanwhile, my best handwritten implementation on the GPU takes less than 200ms for the same problem, and the version using the optimized cuBLAS library takes just 64ms.

Edit: upload chart with white background, the transparent version didn't fare well with my Fediverse clients.

#computerScience #parallelComputing #cuda

A bar chart showing the runtime of different matrix multiplication implementations, from a matrix size of 64 up to 4096, doubling every time. To summarize, here are the runtimes in ms for size 4096:

CPU time: 1952765
CPU OMP time: 498996
GPU GM time : 434
GPU CUBLAS time: 64
GPU SHM COL time: 2456
GPU SHM TILE time: 191

Hi R people! Could you suggest some guides and links on how to set up parallel processing with `foreach` that can work both on Windows and Linux/Mac? I'm searching for guides on the net, but most of them seem to have become obsolete. Thank you!

#rstats #ParallelComputing

🚀 Intel Developer Tools v2025.1 is here with new OpenMP 6.0 features!

The Intel® Fortran Compiler enhances #OpenMP 6.0 support with two powerful additions: WORKDISTRIBUTE for efficient thread-level work distribution, and INTERCHANGE to reorder loop nests for improved parallelism and optimization.

A big win for HPC and embedded devs!

https://www.intel.com/content/www/us/en/developer/articles/news/oneapi-news-updates.html#2025.1

#Fortran #HPC #Embedded #ParallelComputing

👨‍💻 Oh, the eternal quest for a "good" parallel computer, like a quest for a unicorn that can also do your taxes. 🦄💻 Apparently, GPUs are only good for "predictable" tasks – maybe like predicting the inevitable death of your dreams for a universally versatile chip. 😂 Why not just ask for a toaster that can handle your email while you're at it? 🍞📧
https://raphlinus.github.io/gpu/2025/03/21/good-parallel-computer.html #parallelcomputing #unicornchip #GPU #humor #techdreams #HackerNews #ngated

I want a good parallel computer

https://raphlinus.github.io/gpu/2025/03/21/good-parallel-computer.html

#HackerNews #goodparallelcomputer #parallelcomputing #technology #innovation #computinghardware

another example (Newton-Raphson zooming for Mandelbrot set):

with 7 threads : 105.26 Watts * 12.2 seconds = 1284.6 Joules
with 1 thread : 54.17 Watts * 52.1 seconds = 2822.3 Joules
sleeping / idle overhead : 26.01 Watts
7 threads minus overhead (105 - 26.01)W * 12.2s = 964 Joules
1 thread minus overhead (54.17 - 26.01)W * 52.1s = 1467.2 Joules
if the machine would be on/idle anyway: 105.26W * 12.2s + 26.01 W * (52.1 - 12.2)s = 2321.9 Joules

thus using more threads saves energy even when parallelism efficiency is far from perfect: best to get in/out as quickly as possible so you can turn the machine off (ideal case) / leave it fully idle (second best)

power consumption doesn't scale linearly with load (a little load increases a lot vs baseline: 1 thread doubles idle power consumption, but high load doesn't increase much more: 7 threads quadruples idle power consumption)

measured with turbostat on Debian, AMD 2700X CPU with default CPU scaling governor, usual browser/email/etc running too

#ParallelComputing #Energy #permacomputing #turbostat

🚀💻 "I went to an #NVIDIA event, so naturally, my entire existence now revolves around repackaging basic sorting algorithms with #CUDA. Because, really, what else is parallel computing for besides impressing friends at parties? 🤓"
https://ashwanirathee.com/blog/2025/sort2/ #parallelcomputing #sortingalgorithms #techhumor #codinglife #HackerNews #ngated

Sorting Algorithm with CUDA

https://ashwanirathee.com/blog/2025/sort2/

#HackerNews #Sorting #Algorithm #CUDA #ParallelComputing #TechInnovation #DataScience

#ParallelComputing: not what it's cracked up to be

«#Joblib is a set of tools to provide lightweight pipelining in #Python In particular:

transparent disk-#caching of functions and lazy re-evaluation (#memoize pattern)
easy simple #ParallelComputing

Joblib is optimized to be fast and robust on large data in particular and has specific optimizations for #numpy arrays. It is BSD-licensed.»

https://joblib.readthedocs.io/en/stable/

🚀 Exciting news for HPC 🚀

Michael Klemm sat down with Doug Eadline of HPCwire to explore OpenMP 6.0! 🎙️

This release introduces features that simplify parallel programming and drive co-processor-agnostic acceleration for unmatched portable performance. Whether scaling or optimizing, OpenMP 6.0 delivers.

🔗 Watch now: https://www.hpcwire.com/livewire-interviews/

#HPC #OpenMP #ParallelComputing #OpenMP60 @dontknow

#ParallelComputing

Client Info