Calculating the Fibonacci numbers on GPU
https://veitner.bearblog.dev/calculating-the-fibonacci-numbers-on-gpu/
#HackerNews #Calculating #Fibonacci #GPU #Performance #ParallelComputing #TechInnovation #Algorithms
Calculating the Fibonacci numbers on GPU
https://veitner.bearblog.dev/calculating-the-fibonacci-numbers-on-gpu/
#HackerNews #Calculating #Fibonacci #GPU #Performance #ParallelComputing #TechInnovation #Algorithms
Link: https://mediatum.ub.tum.de/?id=601795 (It took digging to find this from the Wikipedia article [1] and the unsecured HTTP homepage for "BMDFM".)
```bibtex
@phdthesis{dissertation,
author = {Pochayevets, Oleksandr},
title = {BMDFM: A Hybrid Dataflow Runtime Parallelization Environment for Shared Memory Multiprocessors},
year = {2006},
school = {Technische Universitรคt Mรผnchen},
pages = {170},
language = {en},
abstract = {To complement existing compiler-optimization methods we propose a programming model and a runtime system called BMDFM (Binary Modular DataFlow Machine), a novel hybrid parallel environment for SMP (Shared Memory Symmetric Multiprocessors), that creates a data-dependence graph and exploits parallelism of user application programs at run time. This thesis describes the design and provides a detailed analysis of BMDFM, which uses a dataflow runtime engine instead of a plain fork-join runtime library, thus providing transparent dataflow semantics on the top virtual machine level. Our hybrid approach eliminates disadvantages of the parallelization at compile-time, the directive based paradigm and the dataflow computational model. BMDFM is portable and is already implemented on a set of available SMP platforms. The transparent dataflow paradigm does not require parallelization and synchronization directives. The BMDFM runtime system shields the end-users from these details.},
keywords = {Parallel computing;Shared memory multiprocessors;Dataflow;Automatic Parallelization},
note = {},
url = {https://mediatum.ub.tum.de/601795},
}
```
[1]: https://en.wikipedia.org/wiki/Binary_Modular_Dataflow_Machine
#SMP #Parallelization #Multithreading #DependenceGraph #RunTime #DataFlow #VirtualMachine #VM #ParallelComputing #SharedMemoryMultiprocessors #AutomaticParallelization #CrossPlatform #Virtualization #Configware #Transputer
๐ธ Full house at the OpenMP BOF at #ISC25 โ over 140 attendees joined us in Hamburg! ๐
Our session "What to Expect from OpenMP API Version 6.0" covered:
โ
A dive into key features of OpenMP 6.0
โ
A preview of 6.1 and 7.0
โ
Updates from toolchain developers
โ
Lively Q&A to help shape future OpenMP directions
Thanks to everyone who contributed โ your feedback is powering the future of parallel programming! ๐ก
#OpenMP #HPC #ISC2025 #OpenMP6 #ParallelComputing #Supercomputing
Weโre excited to welcome NextSilicon to the OpenMP Architecture Review Board! ๐
Their Intelligent Compute Architecture blends adaptive computing with self-optimizing hardware/software and open frameworks like OpenMP. Together, weโre shaping a future of performant, portable, shared-memory parallelism. ๐ป๐
Read the press release:
https://tinyurl.com/yksfbrah
Join us at #ISC25 for the tutorial โAdvanced OpenMP: Performance and 6.0 Featuresโ on Friday, June 13, 9:00โ13:00 CEST in Hall Y12, 2nd Floor, Hamburg Congress Center.
Learn how to boost OpenMP code performance on NUMA systems and accelerators, and get hands-on insights into vectorization, data locality, and the latest features in OpenMP 6.0.
Ideal for developers who want to go beyond the basics!
Just published the post "Parallel and distributed computing in GNU Health." :gnu: ๐ฅ
https://meanmicio.org/2025/05/27/parallel-and-distributed-computing-in-gnu-health/
#ParallelComputing #GNUHealth #Tryton #OpenScience #GNU
Vediamo le funzionalitร del Fortran introdotte nel 2008, con i CoArray, e nel 2018 per scoprire come si possono sfruttare tutti i core delle nostre CPU abbreviando i tempi di calcoli scientifici complessi. #fortran #parallelcomputing #multithreading
https://www.youtube.com/watch?v=78_12a89MWQ
๐ Excited to announce that https://hachyderm.io/@mppf and Shreyas from the #ChapelLang project will be at #HPSFCon in Chicago this week! Come find us, talk all things parallel computing, open source, and the future of high-performance software.
๐๏ธ Donโt miss our presentation on Day 1 โ check the full schedule at https://events.linuxfoundation.org/hpsf-conference/program/schedule/
Big thanks to The Linux Foundation and HPSF for hosting!
#ChapelLang #HPC #OpenSource #HPSFCon #LinuxFoundation #ParallelComputing
๐ Hugo Krawczyk โ ACM Paris Kanellakis Theory and Practice Award
For pioneering and lasting contributions to the theoretical foundations of cryptographically secure communications, and to the protocols that form the security foundations of the Internet.
๐ https://bit.ly/4jBJjHX
๐ Congratulations to all the awardees shaping the future of computing!
#ACMTechnicalAwards #Cryptography #ParallelComputing #ComputerScience
I achieved a speedup of 10000 compared to the CPU implementation. I'm quite happy with that. Note that the chart uses a logarithmic scale, otherwise most of the runs would not be visible.
For example, the CPU took a bit more than 30 minutes to solve the biggest input, the parallel (CPU) version took a bit more than 8 minutes. Meanwhile, my best handwritten implementation on the GPU takes less than 200ms for the same problem, and the version using the optimized cuBLAS library takes just 64ms.
Edit: upload chart with white background, the transparent version didn't fare well with my Fediverse clients.
Hi R people! Could you suggest some guides and links on how to set up parallel processing with `foreach` that can work both on Windows and Linux/Mac? I'm searching for guides on the net, but most of them seem to have become obsolete. Thank you!
๐ Intel Developer Tools v2025.1 is here with new OpenMP 6.0 features!
The Intelยฎ Fortran Compiler enhances #OpenMP 6.0 support with two powerful additions: WORKDISTRIBUTE for efficient thread-level work distribution, and INTERCHANGE to reorder loop nests for improved parallelism and optimization.
A big win for HPC and embedded devs!
https://www.intel.com/content/www/us/en/developer/articles/news/oneapi-news-updates.html#2025.1
๐จโ๐ป Oh, the eternal quest for a "good" parallel computer, like a quest for a unicorn that can also do your taxes. ๐ฆ๐ป Apparently, GPUs are only good for "predictable" tasks โ maybe like predicting the inevitable death of your dreams for a universally versatile chip. ๐ Why not just ask for a toaster that can handle your email while you're at it? ๐๐ง
https://raphlinus.github.io/gpu/2025/03/21/good-parallel-computer.html #parallelcomputing #unicornchip #GPU #humor #techdreams #HackerNews #ngated
another example (Newton-Raphson zooming for Mandelbrot set):
with 7 threads : 105.26 Watts * 12.2 seconds = 1284.6 Joules
with 1 thread : 54.17 Watts * 52.1 seconds = 2822.3 Joules
sleeping / idle overhead : 26.01 Watts
7 threads minus overhead (105 - 26.01)W * 12.2s = 964 Joules
1 thread minus overhead (54.17 - 26.01)W * 52.1s = 1467.2 Joules
if the machine would be on/idle anyway: 105.26W * 12.2s + 26.01 W * (52.1 - 12.2)s = 2321.9 Joules
thus using more threads saves energy even when parallelism efficiency is far from perfect: best to get in/out as quickly as possible so you can turn the machine off (ideal case) / leave it fully idle (second best)
power consumption doesn't scale linearly with load (a little load increases a lot vs baseline: 1 thread doubles idle power consumption, but high load doesn't increase much more: 7 threads quadruples idle power consumption)
measured with turbostat on Debian, AMD 2700X CPU with default CPU scaling governor, usual browser/email/etc running too
๐๐ป "I went to an #NVIDIA event, so naturally, my entire existence now revolves around repackaging basic sorting algorithms with #CUDA. Because, really, what else is parallel computing for besides impressing friends at parties? ๐ค"
https://ashwanirathee.com/blog/2025/sort2/ #parallelcomputing #sortingalgorithms #techhumor #codinglife #HackerNews #ngated
Sorting Algorithm with CUDA
https://ashwanirathee.com/blog/2025/sort2/
#HackerNews #Sorting #Algorithm #CUDA #ParallelComputing #TechInnovation #DataScience
#ParallelComputing: not what it's cracked up to be
ยซ#Joblib is a set of tools to provide lightweight pipelining in #Python In particular:
Joblib is optimized to be fast and robust on large data in particular and has specific optimizations for #numpy arrays. It is BSD-licensed.ยป
๐ Exciting news for HPC ๐
Michael Klemm sat down with Doug Eadline of HPCwire to explore OpenMP 6.0! ๐๏ธ
This release introduces features that simplify parallel programming and drive co-processor-agnostic acceleration for unmatched portable performance. Whether scaling or optimizing, OpenMP 6.0 delivers.
๐ Watch now: https://www.hpcwire.com/livewire-interviews/