Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Printable Version  
 Pages:  1  2
Author: Subject: Speed of C programs, relative to Java
woelen
Super Administrator
*********




Posts: 7977
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 14-12-2023 at 05:27


For solving polynomial equations, the so-called companion matrix is used. This indeed is a sparse matrix, but the methods for determination of eigenvalues destroy this sparsity already after the first iterative step . There are optimized eigenvalue methods for tridiagonal matrices, but the companion matrix has a very different structure.

And conceptually, if there were a matrix-method, which preserves the sparsity structure of the companion matrix while it does its iterative computatations, then what would be the difference with a method, which just works on the polynomial coefficients? Such a matrix method with order O(n) storage instead of order O(n^2) storage would be another O(n) polynomial solving method. E.g. AE gives all roots simultaneously, just like most eigenvalue methods for matrices, and this removes the need of polynomial deflation, which has its own numerical stability issues.

@clearly_not_atara: The fact that the computations are very fast and the runtime may be a dominated by "non-arithmetic" logic is an interesting observation. Indeed, modern JVMs use inlining, combined with so-called escape analysis to strongly optimize the use of small objects, which only have limited scope. These techniques can dramatically increase the performance of Java applications. Such things unfortunately are not easily achieved with statically compiled C-programs.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
teodor
National Hazard
****




Posts: 872
Registered: 28-6-2019
Location: Heerenveen
Member Is Offline


[*] posted on 24-12-2023 at 07:36


I did some tests and I see that fma instructions (calculating a*b - c*d by Kahan's algorithm) are not used until you specify -march like this : gcc -O3 -march=skylake ...
Instead, the software implementation of fma is used (call to some subroutines).
This definitely can delay execution a lot.
View user's profile View All Posts By User
woelen
Super Administrator
*********




Posts: 7977
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 27-12-2023 at 11:01


It does make some difference.

In my quartic solver code it hardly matters, but that code only uses fma instruction sparingly. The quintic solver benefits more from it.

Solving 100M quintics on a single core goes from 100 seconds to 85 seconds, with the -fmarch=skylake option added.
I already used -ffast-math besides the -O3 option. Adding that option also already made a difference of appr. 10% in the C code.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
teodor
National Hazard
****




Posts: 872
Registered: 28-6-2019
Location: Heerenveen
Member Is Offline


[*] posted on 28-12-2023 at 00:42


So, what is the time difference for Java vs C for different types of equations now?
View user's profile View All Posts By User
woelen
Super Administrator
*********




Posts: 7977
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 29-12-2023 at 02:12


For quartics, generating, solving 100M equations, and analysing the results:
- Java: 35 seconds
- C (with -O3 -ffast-math -march=skylake): 40 seconds

For quintics, same number of equations:
- Java: 89 seconds
- C (with -O3 -ffast-math -march=skylake): 85 seconds

As you can see, things have improved a lot for the C code. For quintics, it even is slightly faster.

Hwever, I'll stick to Java as main language. What I really like with the Java environment is that the good performance of that is obtained completely automatically. And the same compiled code works on all platforms I have. If a piece of hardware has FMA instructions, then these are used, otherwise it uses a software implementation. I do not have to worry about the right compiler option and distributing the right version of software for the right hardware. Most likely, the binaries, compiled with -march=skylake will not run on an older CPU, which does not have FMA instructions.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
teodor
National Hazard
****




Posts: 872
Registered: 28-6-2019
Location: Heerenveen
Member Is Offline


[*] posted on 29-12-2023 at 03:13


Indeed, I am impressed by the performance of the modern JIT. But this achievement is only possible because of Java was intensively used by a large business for many years. This is not the case with any new hardware or custom libraries. Also, many scientific applications which require intensive calculation are written in C++ and using JVM libraries from C++ is not an option. But using C library is possible from virtually any language, so I think the effort to make it fast is absolutely necessary.
I will look on quartics, thank you for presenting the current status.
It would be also very interesting to compare the performance on the new Apple M2/M3 architecture.
As you know, I am interesting in DSP applications. Do you have any example how your solver could be used in such application: audio signal filtering, processing, etc?

Did you compare your solver whith what MATLAB has? May be we can make an external library for MATLAB in C++ togetger if you can decide with license conditions.


[Edited on 29-12-2023 by teodor]
View user's profile View All Posts By User
Rainwater
National Hazard
****




Posts: 800
Registered: 22-12-2021
Member Is Offline

Mood: indisposition to activity

[*] posted on 29-12-2023 at 16:17


Are the data sets being processed identical for both platforms?



"You can't do that" - challenge accepted
View user's profile View All Posts By User
woelen
Super Administrator
*********




Posts: 7977
Registered: 20-8-2005
Location: Netherlands
Member Is Offline

Mood: interested

[*] posted on 31-12-2023 at 11:19


@Rainwater: The data sets are not identical, but comparable. Actually, even between two runs on the same platform, the data set is not the same. For each polynomial, a few pseudo random numbers are generated, uniformly distributed between 0 and 1 and from that, coefficients are derived. The random generator is seeded by the current system time in nanoseconds. Because of the large number of polynomials testen (100M), on average, the properties of the data sets are so close to each other over different runs, and also for the C and Java versions, that it is possible to do a meaningful comparison of performance.

@teodor: I compared my polynomial solvers with Matlab's roots-function. My generic polynomial solver performs better than roots, especially for somewhat larger degrees. This is because roots uses a matrix eigenvalue method for finding the roots. It is fine to me if the C-code is used for making a Matlab library. At the moment, I only have the Java code on my website, but I'll put C-code on it next week.




The art of wondering makes life worth living...
Want to wonder? Look at https://woelen.homescience.net
View user's profile Visit user's homepage View All Posts By User
 Pages:  1  2

  Go To Top