Hi there,
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 7 2017
May 24 2017
May 23 2017
May 17 2017
May 4 2017
Apr 10 2017
This is fixed in bf8b5e9042b3d86d419b2ac1987a9298c9d21500.
Apr 7 2017
I confirmed that it's 64-bit big-endian.
I wrote a patch for testing. D421: padding is needed for 64-bit big endian
If s390x is big-endian, we need padding at the start of the cell structure. So that the _flag can be compatible to the vector element.
I'll see on the porterbox myself, too.
Apr 6 2017
I just merged the current git head over on zelenka, which includes b83903f59ec5d49ac579f263da70ebc8dc3645b5, and managed to still get the same segfaults.
@gniibe good catch! I'll fix that and we'll see if that improves things.
IIUC, cells are used for a place for vector elements.
I'm afraid what happens for memory space not used for vector elements.
fwiw, this remains a problem on 2.1.20: https://buildd.debian.org/status/fetch.php?pkg=gnupg2&arch=s390x&ver=2.1.20-1&stamp=1491409561&raw=0
Apr 3 2017
Sure:
Fix is in 2.1.20
Mar 31 2017
Mar 30 2017
Mar 28 2017
Yes, print *a was correct. Could you please do
print *sc->load_stack[sc->file_i]->curr_line
there?
I've now pulled from the current master head
(caf00915532e6e8e509738962964edcd14fb0654), rebuilt on zelenka with -O0 -g, and
triggered the error again, causing a core file to be dumped.
I copied gpgscm-gdb.py into tests/gpgscm/ , added it to add-auto-load-safe-path
in ~/.gdbinit, and then ran "gdb -c tests/gpgscm/core tests/gpgscm/gpgscm" and
tried to print a, as requested. here's what i got:
0 (sid_s390x-dchroot)dkg@zelenka:~/src/gnupg2/gnupg2/build$ echo
add-auto-load-safe-path
/home/dkg/src/gnupg2/gnupg2/build/tests/gpgscm/gpgscm-gdb.py > /home/dkg/.gdbinit
0 (sid_s390x-dchroot)dkg@zelenka:~/src/gnupg2/gnupg2/build$ gdb -c
tests/gpgscm/core ./tests/gpgscm/gpgscm
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later < GPL license >
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "s390x-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
< GDB Bugs >.
Find the GDB manual and other documentation resources online at:
< GDB Documentation >.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./tests/gpgscm/gpgscm...done.
[New LWP 7145]
Core was generated by `./gpgscm ../../../tests/gpgscm/t-child.scm'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000002aae4ecf748 in is_vector (p=0x4634508) at
../../../tests/gpgscm/scheme.c:220
220 INTERFACE INLINE int is_vector(pointer p) { return (type(p)==T_VECTOR); }
(gdb) bt
#0 0x000002aae4ecf748 in is_vector (p=0x4634508) at
../../../tests/gpgscm/scheme.c:220
#1 0x000002aae4ed3470 in vector_elem (vec=0x4634508, ielem=7) at
../../../tests/gpgscm/scheme.c:1349
#2 0x000002aae4ed975e in tailstack_flatten (sc=0x2ab046296f0,
tailstack=0x4634508, i=8, n=7, acc=0x2ab04629838) at
../../../tests/gpgscm/scheme.c:3117
#3 0x000002aae4ed99d4 in callstack_flatten (sc=0x2ab046296f0, i=8, n=7,
acc=0x2ab04629838) at ../../../tests/gpgscm/scheme.c:3155
#4 0x000002aae4ed9af0 in history_flatten (sc=0x2ab046296f0) at
../../../tests/gpgscm/scheme.c:3173
#5 0x000002aae4ed8488 in _Error_1 (sc=0x2ab046296f0, s=0x2aae4efe634 "eval:
unbound variable:", a=0x2ab0462bdd8) at ../../../tests/gpgscm/scheme.c:2777
#6 0x000002aae4eda162 in opexe_0 (sc=0x2ab046296f0, op=OP_EVAL) at
../../../tests/gpgscm/scheme.c:3298
#7 0x000002aae4ee3ef0 in Eval_Cycle (sc=0x2ab046296f0, op=OP_T0LVL) at
../../../tests/gpgscm/scheme.c:5358
#8 0x000002aae4ee5384 in scheme_load_named_file (sc=0x2ab046296f0,
fin=0x2ab04684f90, filename=0x2ab04684d80 "../../../tests/gpgscm/init.scm") at
../../../tests/gpgscm/scheme.c:5748
#9 0x000002aae4ec1ec6 in load (sc=0x2ab046296f0, file_name=0x2aae4efc7d4
"init.scm", lookup_in_cwd=0, lookup_in_path=1) at ../../../tests/gpgscm/main.c:180
#10 0x000002aae4ec22cc in main (argc=0, argv=0x3ffffe44e48) at
../../../tests/gpgscm/main.c:266
(gdb) up 5
#5 0x000002aae4ed8488 in _Error_1 (sc=0x2ab046296f0, s=0x2aae4efe634 "eval:
unbound variable:", a=0x2ab0462bdd8) at ../../../tests/gpgscm/scheme.c:2777
2777 history = history_flatten(sc);
(gdb) print a
$1 = (pointer) 0x2ab0462bdd8
(gdb) print *a
$2 = define-macro
(gdb)
maybe i'm doing something wrong? i'll ask and see whether i can give out an
account on the porterbox for you, justus.
Mar 27 2017
I have looked into this. I installed Debian on an s390 emulator (hercules), but
have been unable to reproduce the problem there, maybe due to the emulation (it
is quite slow on my system, and the gpgscm interpreter seems especially slow,
maybe because of the challenge of doing branch prediction on interpreters).
Your stack trace suggests a memory corruption early during the initialization
("init.scm", the standard library, is being loaded), we see an error being
generated due to an unbound variable (i.e. the environment hash table is
corrupted / does not perform as expected). Then we see a segfault while the
history buffer is flattened into a list for the error message (i.e. hints at a
corruption).
Unfortunately, memory corruption bugs are very hard to detect in gpgscm due to
its use of a custom memory allocator. The allocator allocates large segments
using malloc and hands out cells from that pool as necessary. However, memory
is never freed, so tools like valgrind can not be used to detect use-after-free,
or even most out-of-bounds accesses.
I have been working on the low-level allocator last week trying to make it more
debuggable and memory errors more detectable, e.g. by moving parts of the
interpreter into readonly sections.
As Werner said, a stack trace with less optimizations would be helpful. Also,
is the problem always the same if it happens? If so, it would be interesting to
know what kind of variable is unbound (for that, inspect the 'a' parameter of
'_Error_1' [I'm attaching a pretty-printer for gdb, with that, do 'print a']).
Access to the porter box would be helpful as well.
Mar 25 2017
Can you rebuild using -O0 -g and try to get a back trace again. That might be
helpful.
Mar 22 2017
Roundup won't let me include the details, but i will say that from a git bisect,
i discovered that the first commit that has this behavior is
49e2ae65e892f93be7f87cfaae3392b50a99e4b1 ("gpgscm: Use a compact vector
representation.")
The crashes that happen are segfaults.
Mar 20 2017
Mar 17 2017
This should be fixed in b1106b4 . The problem had to do with an incorrect
assumption that a key with policy 'ask' necessarily had at least one conflict.
This assumption may not hold if --tofu-default-policy is set to ask.
Thankfully, the assertion caught this.
Mar 16 2017
I was able to reproduce it again. Maybe this bug depends on which keyserver in
the pool answers. The error is the same for Tor and non-Tor connections.
Thanks for reporting this. I can reproduce it and will hopefully have a good
fix soon.
I don't know why, it is not repdroducible anymore.
Mar 15 2017
Neal, this is still not fixed in 2.1.19.
Mar 14 2017
This seems to be a bug in our new resolver library. I have contacted the author
for assistance.
Mar 13 2017
This is a duplicate of #2990.
Hey :-)
Glad to see I'm not the only one ;-)
Indeed, I can reproduce this.
PS: Hi flokli :)
Mar 10 2017
Mar 1 2017
The --hostable option is a debugging aid and only used manually.
Feb 22 2017
Should be fixed with commit 6d50eeb for 2.1.19.
My idea on how to do a general fix turned out to be too complicated and thus I
fixed just the Polish translation
Feb 21 2017
Are you using tor? if so, is your tor daemon up and running, and actively
connecting to the outside world?
Feb 19 2017
Feb 17 2017
I guess that is because the prompt has not been translated but the answer string
is translated.
msgid "NnCcEeOoQq"
msgstr "IiKkEeDdWw"
Thus using 'i' should give you the prompt for name.
A fix for this would be to use a different answer string for --gen-key - the one
we use if from --full-gen-key (i.e. with "(C)omment". This would the also work
for other incomplete translations, which will have the same problem.
Thanks for these fixes! I'm not sure i understand why ptr lookups are needed
for keyserver --hosttable. Can we drop those too?
Feb 15 2017
I have fixed some things. In general PTR lookups are onow only used when you
run the 'keyserver --hosttable' command.
Feb 14 2017
I note that even if i drop the "--trust-model tofu+pgp" and subsequently invoke
just "gpg --tofu-default-policy ask --fingerprint" i get the same crash.
however, if i just execute that in a fresh homedir without ever having set
"--trust-model tofu+pgp" i don't get a crash. so there is some sort of state
being set up that is then tickling the assertion later.
Feb 13 2017
The whole point of a daemon is that is idling in the background to wait for work.
A more useful feature would be to flush the passphrase cache when the user is
not anymore logged in. But for Debian this has already been done by --supervised.
Oh well, using a curl based key server helper this might have worked in the
past. We better implement that for 2.2
There has never been support in GnuPG for https via an http proxy.
So can we change this to a feature request?
I have seen that discussion and will takle care of this bug soon.
Feb 8 2017
The unnecessary PTR lookup is causing problems for other people too, over on
https://bugs.debian.org/854359
I agree about that race condition being an important thing to consider, but i
think it's orthogonal to whether the process is self-terminating.
That is: we need to consider that race condition even in the case of deliberate
shutdown too, right?
Do we have a test case that involves two concurrent processes, one that tries to
stop the agent, and the other that tries to access it?
I can reproduce this. Our test indeed feeds a passphrase to the agent.
Feb 7 2017
One thing to look out for is a race condition between the agent deciding to shut
down, and a client trying to connect at that time, and that might lead to
intermittent failures. It may be doable correctly, but it is something to look
out for.
The other point being raised in the bug report about older daemons hanging
around over package upgrades should be discussed in a different bug. Yes,
shutting down the daemon when idle may work around this issue sometimes, but
clearly this is not a robust solution.
Feb 6 2017
Feb 5 2017
Any thoughts or progress on this?
Feb 3 2017
The Debian report is waiting since October for a reply from the orig. submitter.
Jan 26 2017
Jan 24 2017
for cases (1), (2), and (3) it sounds like you don't need the PTR at all. right?
For your case (4), i think we should reject hkps via literal IP addresses. It's
not a real-world use case, and if you want to test/experiment with hkps as a
developer, you should have at least the capacity to edit /etc/hosts (or whatever
your system's equivalent is). Anyway, trying to support this case for the
purposes of debugging doesn't make sense if support for this case is the cause
of the bugs in the first place ;)
re: duplicate hosts: I live in a part of the world where dual-stack
connectivity is sketchy at best. And, when connecting to things over Tor, it's
possible that connections to IPv4 hosts will have a different failure rate than
IPv6 connections.
So unless you already know that the host itself is down, why would you avoid
trying the other routes you have to it?
Look at it another way: when trying to reach host X, you discover that X has two
IP addresses, A and B. You try to reach A and it's not available. Isn't it
better to try B instead, rather than to avoid trying B at all just because A was
unreachable?
In a pool scenario, you might want to try to cluster addresses together by
perceived identity so that you can try an entirely different host first, rather
than a different address for the same host who happens to be in the pool twice.
But that strikes me as a very narrow optimization, certainly something that'd
only be worth implementing after we've squeezed the last bit of performance out
of other parts of the code (parallel connections, "happy eyeballs", etc).
Definitely not something to bother with at the outset. So i'd say drop that
optimization for simplicity's sake.
So the simplest approach is:
a) know the configured name of the keysserver
b) resolve it to a set of addresses
c) try to connect to those addresses, using the configured name of the server
for SNI and HTTP Host:
This is all that's needed for cases (1) and (3), and it could also be used in
case (2) if you see (b) as a two-stage resolution process (name→SRV→A/AAAA),
discarding the intermediate names from the SRV. Given that some people may
access the pool via case (1), and servers in the pool won't be able to
distinguish between how they were selected (SRV vs. A/AAAA), they'll still
accept the connections.
If you decide the additional complexity is worthwhile for tracking the
intermediate names in the SRV records, you can always propagate the intermediate
names wherever you like locally without changing the "simplest" algorithm.
If you really want to use the names from the SRV in collecting, then the
algorithm should change to:
a) know the configured name of the keyserver
b) resolve it to a set of intermediate names
c) resolve the intermediate names to a set of addresses
d) try to connect to those addresses, using the intermediate name of the server
for SNI and HTTP host.
But still, no PTR records are needed.
Okay, I get this error now. I had to implement a new option --disable-ipv4 to
make testing easier.
I have never seen the no permission message but only a general connection failed
error. I can try your suggestion of setting an explicit NoIPv6Traffic
We have several cases:
- A pool accessed via round-robin A/AAAA record: We do not use the canonical hostname (i.e. from the PTR) but the name of the pool for the certificate. This is the classical way how keyserver pools.
- A pool access via SRV records: The SRV record has the canonical name and thus we do not need a PTR lookup. But we need a address lookup.
- A keyserver specified by its name: We alread have the name thus no need for PTR lookup.
- A keyserver specified by literal IP address: We need a host name for the certificate. Either we take it from the PTR record or we reject TLS access. I don't think that is is a real world use case but for debugging it is/was really helpful. Should we reject hkps via literal IP addresses?
It is quite possible that some of these cases do not work right. I
have done only manual testing and the matrix is pretty complex: We
have all combinations of direct/Tor, v4 only, v6 only, v4, v6,
interface up, network down.
Right, by "duplicate host", I mean hosts reachable by several addresses
and in particular by v4 and v6. My test back when I originally
implemented the code showed that when hosts are down their other
addresses are also down. Without marking the host dead, the code
would have tried the same request on another address and would run
into the next timeout.
I also think that most delays are due to connection problems and not due to DNS
problems. And most connection problems are due to lost network access. There
we might need to tweak the code a bit similar to what I did for ADNS.
Here's a concrete example of how using PTR records gets things mixed up.
keyserver.stack.nl offers keyserver service on port 443.
It has an A record at 131.155.141.70.
But the ptr is to mud.stack.nl:
70.141.155.131.in-addr.arpa. 69674 IN PTR mud.stack.nl.
and the https SNI and HTTP Host: directives provide an entirely different
website depending on whether you access it with:
https://mud.stack.nl/
or
https://keyserver.stack.nl/
If you access it as https://hkps.pool.sks-keyservers.net/, you get the
"keyserver" view. But if you access it by the name in the PTR record
("mud.stack.nl") then you get the mud view (and a 404 on any /pks URLs)
Even more troubling is that dirmngr successfully connects to mud.stack.nl and
does the query, even though it is configured to only talk to
hkps.pool.sks-keyservers.net
This suggests that anyone able to spoof a PTR record to me can get my dirmngr to
send my potentially-sensitive keyserver queries to an entirely different webserver.
Jan 23 2017
I've moved the discussion about the need for PTR over to
T2928
In this ticket, let's focus on what happens when Tor has the NoIPv6Traffic flag
set. How should dirmngr respond in that case?
I think if it gets a "permission denied" from its tor socket (or from any proxy)
when it's trying to connect to a host, it should treat that host as dead and
move to try next one. If dirmngr knows that it is using tor, and it knows that
the address it is trying is also IPv6, it could also log a message about the
IPv6Traffic flag.
does that seem like the right set of changes needed?
if you add NoIPv6Traffic to your torrc, and restart tor, can you replicate the
problem?