Page MenuHome GnuPG

Kleopatra: Performance problems decrypting and encrypting large Archives
Closed, ResolvedPublic

Description

@werner this is meant as the issue for the buffer sizes etc. which we talked about yesterday.

In short:
We need to improve the performance of Kleopatra when working with large files. On my Windows System when encrypting for example I have 15% CPU usage of GnuPG, 15% CPU usage of Kleopatra and it is also not bottlenecked by Disk IO.

A starting point for me would be:

GPGME on Windows uses a buffer size of 512 byte. This is ok when data is only exchanged internally in GPGME but with QGpgME callbacks are used for IO and everything is slowed down through a lot of overhead when for example a qprocess is passed new data. Every read / write callback call causes a lot of code to be exectuted and to do this every 512 bytes seems very wasteful.

Revisions and Commits

rKLEOPATRA Kleopatra
rM GPGME

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
werner lowered the priority of this task from High to Normal.May 13 2022, 4:01 PM

We have a workaround by using a recent version of gpgtar directly. Thus lowering priority.

We have discussed this and what we think would be the best solution would be to have an extension in the engine-gpg of GpgME either through a flag or through a new API to use gpgtar directly with --encrypt and decrypt. This should behave exactly like the gpg encrypt / decrypt / verify functions but would avoid the need of Piping in Kleopatra. It is a fairly recent development that gpgtar can do the crypto operations by itself so this is why this was not done initially.

aheinecke raised the priority of this task from Normal to High.Jan 5 2023, 10:19 AM
aheinecke added a subscriber: ikloecker.

Since the issue T6328 described an issue with high pirority which would be fixed by this issue I am raising the prio here.

ikloecker moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

I did some tests. I encrypted the g10/src folder which contains multiple repos (33098 files) with a total weight of about 1.4 GiB.

Using Kleopatra 3.1.24.221202 (22.12.2) with GnuPG 2.3.8 and gpgme 1.18 (openSUSE Tumbleweed):
36.5 s
34.8 s
34.1 s
36.2 s (control after 3 runs with my dev build)

Using Kleopatra master with GnuPG 2.4.1-beta21 and gpgme master:
37.2 s
40.2 s
36.2 s
35.2 s (control after 4th run with stock build)

As you can see, the new implementation isn't faster. (Luckily, it also isn't significantly slower.)

Other observations:

  • Size of archives: The new archives are a bit larger than the old ones: 1,050 MB (new) vs. 1,023 MB (old). (For tiny test files I observed the opposite.)
  • Progress bar: The old implementation shows a static progress bar (as if the UI was blocked). The new implementation shows a progress bar which alternates between actual progress (n %) and the static progress bar. The progress also sometimes jumps back. I guess the latter is caused by gpgtar discovering new files while processing the directory tree. I have to debug why sometimes the static progress bar is shown.
  • gpgtar --list emits lots of warnings for the archives produced by the old implementation:
gpgtar: [?]: warning: name not terminated by a nul
gpgtar: [?]: header block 997801 is corrupt (size=60 type=9 nrec=1)

There are no warnings for the archives produced by the new implementation.

Using gpgtar --decrypt on an old archive I get

gpgtar: unsupported file type 2 for 'src/libassuan/src/.libs/libassuan.la' - skipped
gpgtar: unsupported file type 2 for 'src/libassuan/src/.libs/libassuan.so' - skipped
gpgtar: unsupported file type 2 for 'src/libassuan/src/.libs/libassuan.so.0' - skipped
[...]
gpgtar: unsupported file type 1 for 'src/gpg4win-appimage-test/.git/objects/pack/pack-480a1e9a96bc93fd3c9c57b7935d843bf7ba83c6.idx' - skipped
[...]
gpgtar: unsupported file type 1 for 'src/gpg4win-appimage-test/.git/objects/00/105321c98a9b556b7fc0fdb127b095ca0f8a16' - skipped

Using gpgtar --decrypt on an old archive I get

gpgtar: unsupported file type 2 for 'src/libassuan/src/.libs/libassuan.la' - skipped
gpgtar: unsupported file type 2 for 'src/libassuan/src/.libs/libassuan.so' - skipped
gpgtar: unsupported file type 2 for 'src/libassuan/src/.libs/libassuan.so.0' - skipped
[...]

i.e. I get the same errors about "unsupported file type 2", but not the errors about "unsupported file type 1".

Diffing the original src tree and the extracted src trees shows that the files for which an error was reported are missing.

"file type 2" may refer to symbolic links.

For testing the old version, did you use GNU Tar with Kleopatra or changed the configuration to use gpgtar?

I have not tested with gpgtar on Linux a lot as Kleopatra was not using it. But T4332 might be related to the corrupted archives. My analysis back then was that we lost data blocks somewhere between QProcess / QIODevice / QGpgME's Dataprovider. And I could not think of a better solution to do the process handling manually in Kleopatra with the windowsprocessdevice. Which is why this code exitsts. Afterwards we have no reports of data loss on windows anymore. Only we had this issue recently with hanging / unflushed output.

These are USTAR types:

case TF_REGULAR:   raw->typeflag[0] = '0'; break;
case TF_HARDLINK:  raw->typeflag[0] = '1'; break;
case TF_SYMLINK:   raw->typeflag[0] = '2'; break;
case TF_CHARDEV:   raw->typeflag[0] = '3'; break;
case TF_BLOCKDEV:  raw->typeflag[0] = '4'; break;
case TF_DIRECTORY: raw->typeflag[0] = '5'; break;
case TF_FIFO:      raw->typeflag[0] = '6'; break;

Un Unix we store symlinks but for other non-regular file we only write the header block. Creating symlinks, hardlink or fifos does not seem to be a good idea for data received from uncertain origins.

Running gpgtar directly only gives slightly better results. The following

GNUPGHOME=~/xxxx gpgtar --batch --status-fd 2 --gpg-args --enable-progress-filter --encrypt --gpg-args --always-trust -r D5E17E5ABC11F4CD060E02D41DD0D4BAF77BE140 -r C02C4012C09B2AE33921CF87577E88AC284DC575 --output - --directory /xxxx src >src-gpgtar.tar.gpg 2>src-gpgtar.log

took about 31.1 seconds.

Okay. So the problems with "file type 1" seem to come from git using hardlinks and tar storing them as hardlinks, but gpgtar ignores them on --decrypt. This would also explain the larger size of the archives if gpgtar stores the hardlinked files multiple times in the archive. Take home message: Don't gpgtar your git repo!

I wrote T6412: Kleopatra: Inform user if some files were not extracted from encrypted archive to inform the user about not extracted files. I think this shouldn't block this issue because special files probably don't occur in normal usage of GnuPG VSD.

ikloecker changed the task status from Open to Testing.Mar 16 2023, 10:36 AM
ikloecker removed ikloecker as the assignee of this task.
ikloecker moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

ready for testing

Oh my, so I have not really tested this for nearly three months and had my head in the send when reviewing it. I really need to apologize for that. The performance improvement is _not_ what I hoped for if it is even there.

My test folder ldata-test is a real world game. Its size is 13GB with some large files, some small files ~500 files in 80 folders. Disk is dedicated to this test. Microsoft Defender is active.
I use compress algo-none
No procmon is active.

Kleopatra settings:

Note that the temporary file should be created in the output location, so that we avoid any messy moving from disk to disk.

For Kleo the most helpful timing is the debug log IMO. I have reset the debugview right before clicking on "encrypt".

Notice that the Disk IO for the most time never went above 30%, in the end there were some spikes with larger then 40%. CPU Load is higher across the board. Neutral timing is obtained by dbgview. Which I reset right before clicking encrypt. So 0:00 coincides with the start of the operation:

Same disks, same files, same everything. One took 42 seconds The other 248 seconds. For me this means we have to go back to the drawing board.

ikloecker changed the task status from Testing to Open.Jun 12 2023, 6:38 PM
ikloecker claimed this task.
ikloecker moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
ikloecker changed the task status from Open to Testing.Jun 21 2023, 4:59 PM
ikloecker moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

ready for testing

I ran the test AES.OCB encrypt only, no compression test with the same GnuPG 2.4 version on Linux.

Encrypting a 16GB folder now takes 17 Seconds. While it took 29 seconds with Kleopatra (23.04.2) See also: https://dev.gnupg.org/T6561

ebo moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 20 2023, 10:32 AM