diff --git a/an-advanced-introduction-to-gnupg.org b/an-advanced-introduction-to-gnupg.org index 1754598..0703604 100644 --- a/an-advanced-introduction-to-gnupg.org +++ b/an-advanced-introduction-to-gnupg.org @@ -1,2055 +1,2498 @@ # % -*- mode: org; ispell-dictionary: "american"; eval: (require 'org-ref); eval:(setq org-latex-pdf-process '("lualatex -interaction nonstopmode -output-directory %o %f" "bibtex %b" "lualatex -interaction nonstopmode -output-directory %o %f" "lualatex -interaction nonstopmode -output-directory %o %f")) -*- # This file requires the org-ref module to be installed. org-ref is # available at: https://github.com/jkitchin/org-ref . As of July # 2017, it is not in Debian. Given the large number of dependecies # (which also don't appear to be in Debian), the best way to install # appears to be via melpa. This can be done by executing the # following three lines of code (C-x C-e at the end of each) and then # installing org-ref. Note: melpa doesn't use TLS to transfer the # packages never mind checking signatures! # # (add-to-list 'package-archives '("melpa" . "http://melpa.org/packages/") t) # (package-initialize) # (package-list-packages) # # Cheat sheet: # C-c ] Insert a reference. # org-ref-find-bad-citations #+STARTUP: indent showeverything #+latex_header: \usepackage{url} #+LaTeX_HEADER: \usepackage[T1]{fontenc} #+LaTeX_HEADER: \usepackage{mathpazo} #+LaTeX_HEADER: \linespread{1.05} #+LaTeX_HEADER: \usepackage[scaled]{helvet} #+LaTeX_HEADER: \usepackage{courier} #+LaTeX_CLASS: book # Only start making lists after 5 levels of nesting. #+OPTIONS: H:5 # Turn off the automatic placement of the TOC so that can insert the # copyright text first. #+OPTIONS: toc:nil #+Title: An Advanced Introduction to GnuPG #+AUTHOR: Neal H. Walfield #+BEGIN_LaTeX \clearpage \ \vfill #+END_LaTeX #+LATEX:\noindent Copyright \copy 2017 g10 Code GmbH. #+LATEX:\noindent This work is licensed under a [[http://creativecommons.org/licenses/by/4.0/][Creative Commons Attribution 4.0 International License]]. #+TOC: headlines 2 +# If we don't have at least one part, i.e., some org mode header with +# a single *, then all headers get promoted. That is ** become *. +# This is a formatting disaster. Hence this hack. Perhaps at some +# point we'll use parts. * Main Matter ** Introduction GnuPG is an implementation of the OpenPGP protocol, which is used for encryption and authentication. GnuPG is used to encrypt email. But, this functionality is not just used by individuals to preserve their privacy: political activists rely on it to organize their activities, journalists rely on it to protect their sources, and lawyers rely on it to protect attorney-client conversations. Jason Reich, the director of security for BuzzFeed, describes the importance of GnuPG to journalists this way: "GPG is part of a balanced breakfast of any reporter, especially one who wants to protect their sources, and be able to be reached for leaks and things of that nature."\nbsp{}cite:walfield2017jason-reich-interview. # XXX: This should be Michał Woźniak, but for some reason, when # org-mode exports it the characters are lost. They don't show up in # org-entities either. Just kill the accents for now. And, Michal Wozniak, the Chief Information Security Officer at the Organized Crime and Corruption Reporting Project (OCCRP), said, "I do strongly believe that had we not been using GnuPG all of this time, many of our sources and many of our journalists, would be in danger or in jail"\nbsp{}cite:walfield2017michal-wozniak-interview. Cindy Cohn, the Executive Director of the Electronic Frontier Foundation, goes further, and says that the privacy and security that GnuPG offers makes it "one of the core tools that we need if we're going to have functioning self-government in the United States or around the world"\nbsp{}cite:walfield2017cindy-cohn-interview. But, GnuPG is not only used for encrypting email. GnuPG protects the software updates of nearly all free software-based operating systems including Debian, Ubuntu, Red Hat, and SUSE. Although less common on the desktop, these systems power two-thirds of all web sites\nbsp{}cite:wikipedia-linux-market-share, and are the dominate platforms used in the cloud computing sector. That means that even if you don't directly use GnuPG, if you use the Internet, your personal data is, in part, being protected by GnuPG. And, GnuPG is used for much more. People use it to protect data archives, such as, backups. Software distributors sign their software with it so that users can verify the integrity of a copy. Software developers use it to sign their commits cite:gerwitz2012repository-integrity. Organizations like Debian use it to secure internal processes, such as making sure that a package upload is authorized, that a vote is legitimate, and that a resignation is authentic. GnuPG is used to secure Bitcoin wallets. And, GnuPG is used to sign documents. *** History Werner Koch started GnuPG in 1997\nbsp{}cite:koch1997first-release. But GnuPG's roots lies in PGP, an encryption program originally written by Phil Zimmermann in 1991\nbsp{}cite:wikipedia-pgp. Zimmermann was a long-time political activist, and wrote PGP to allow activists to securely store messages on BBSs. Although the source code for PGP was available, it wasn't free software. Further, due to its use of RSA for public-key cryptography, and IDEA for symmetric encryption, PGP was patent encumbered. Around 1996, Richard Stallman, the founder of the Free Software Foundation, started appealing to people to create a free replacement for PGP. Koch was inspired by this speech, and began working on g10, as he initially called it, which was a reference to the tenth /Grundgesetz/ (the tenth article of German the constitution), which enshrines the right to private communication in Germany. Since the reference was considered to be too obscure even for most Germans, the name /GNU Privacy Guard/ or GnuPG, for short, was adopted soon after the initial release. As of 2017, Koch has continued to work on GnuPG as the lead developer. Since its start, the project has remained relatively small in terms of the number of contributors. But, it was only in 2012 that Koch found himself working alone on GnuPG. Prior to that, the project received enough funding to employ a couple of developers. In 2012, however, GnuPG had a funding crisis, and Koch was forced to lay off his last employee. The funding situation continued to deteriorate, and in 2014 Koch had to take side jobs unrelated to GnuPG to supplement his income. The situation was unsustainable, and Koch nearly give up. But, friends convinced him to give a donation campaign one last shot. The response was amazing. Not only did he receive enough money to fund himself, but he pulled in 250,000\nbsp{}euros in small donations, and Stripe, Facebook, and the Linux Foundation each committed to donating about 50,000\nbsp{}euros per year. Along with some partially unexpected contracts from the German BSI (the Federal Office for Information Security), Koch was able to hire five additional developers. Since then, development of GnuPG has accelerated, and new features are being added on a regular basis. For instance, Koch developed a new key discovery protocol called the Web Key Directory (WKD)\nbsp{}cite:koch2017wkd, there is a new trust model based on TOFU\nbsp{}cite:walfield2016tofu, there is official support for a set of Python bindings, and the GnuPG developers are actively contributing to Enigmail. *** OpenPGP Criticism OpenPGP has been widely criticized. There are three main criticisms: GnuPG isn't easy to use, GnuPG doesn't support deniability, like Off The Record (OTR), and GnuPG doesn't support forward secrecy. Respond to: https://medium.com/@mshelton/how-to-lose-friends-and-anger-journalists-with-pgp-b5b6d078a315 Respond a bit more in depth to the Matthew Green blog post: https://blog.cryptographyengineering.com/2014/08/13/whats-matter-with-pgp/ Summarize Filippo articles: https://arstechnica.com/security/2016/12/op-ed-im-giving-up-on-pgp/ https://arstechnica.com/information-technology/2016/12/signal-does-not-replace-pgp/ **** Usability GnuPG is infamous for being hard to use. There is a fair amount of truth to this. Nevertheless, the argument can be made that some of the difficulties are required to achieve the support that it wants to achieve. For instance, it is unavoidable that people who are worried about active attackers need to think about authentication. **** Deniability Deniability (or deniable authentication) is the property that participants in a conversation are able to authenticate each other's messages, but they cannot later prove this to a third party. In OTR, this works by having the participants use a shared key for authenticating messages. Thus, if Alice knows that she didn't send a given authenticated message, then it must have come from Bob. This style of authentication is fundamentally different from digital signatures, which provide strong evidence that a particular person created or at least endorsed a signed message. Why deniability is perhaps not so useful as one might imagine: https://debian-administration.org/users/dkg/weblog/104 **** Forward Sececy How important is forward secrecy? *** Modern Chat Protocols Over the past few years, the amount of activity in the encryption space has increased dramatically. One of the catalysts was almost certainly the Snowden leaks in June 2013, which not only motivated activists to do some work, but also sensitized the public to the work's importance. The area that has probably received the most attention has been in the end-to-end instant messaging space\nbsp{}cite:ermoshina2016e2e-overview. In particular, Signal, whose protocol has been adopted by WhatsApp and Google Allo, has received very strong endorsements from many prominent members of the InfoSec community. In fact, the creators of the Signal protocol, Moxie Marlinspike and Trevor Perrin, received the 2017 Levchin Prize at the Real World Crypto Symposium for their work on the protocol. The first major difference between OpenPGP and Signal is with respect to their scope: signal focuses exclusively on real-time communication. This narrow focus has a number of advantages in terms of security. In particular, because communication is near real time, clients can negotiate parameters, and it is possible to implement forward secrecy. The other major difference is that OpenPGP focuses on a decentralized model whereas these solutions tend to be wall gardens. Signal uses the telephone number as a stable identifier, which is a strong identifier. Unlike GnuPG, these tools focus on real-time communication. *** Privacy Address nothing to hide argument (that misses the point---everyone needs privacy). *** Scope As its title suggests, this book is intended to be an advanced introduction to GnuPG. It is explicitly /not/ a reference manual. That is, the focus is not on providing a highly technical, exhaustive guide covering exactly what GnuPG does, but on gradually building up reader's understanding. This isn't a value judgment; I believe that the two are complementary. And, my hope is that after reading this book, you'll have a solid understanding of GnuPG's internals, and can quickly use GnuPG's reference manual to fill in any required details. +** A GnuPG Primer + +Examples of how to use gpg from the command line. Cover all of the +important stuff and little to none of the esoteric options. E.g., +generating an online key, encryption, decryption, signing (inline or +detached, clearsign), verifying sigs, using the ~--edit-key~ +interface. Adding a new user id. Retiring a user id. Revoking a +key. Signing someone's key. Setting owner trust. To armor or not to +armor. Talk about importing and exporting keys (including import and +export filters). Some useful options. + +Listing keys. Talk about the different search methods, e.g., +prefixing ~@~ to only search on the email. + +Note that the right way to interact with GPG is by screen scraping, +but by using ~gpg~ 's ~--status-fd~ family of options or using the +GPGME library (or one of the many bindings), which remove the need to +parse ~--status-fd~ 's output. + +GPG is not a library. Talk about how this arose historically. The +tension between providing a user interface and a programming API +(former wants convenience and implication, the latter not.) If you +want to program GnuPG then it is recommended that you use GPGME (or a +binding built on top of GPGME). A lower level interface is +~--status-fd~. Has been around since GnuPG 1.2. Example of why it is +important to use this interface. + +Groups/aliases + +** Cryptography + +Most readers of this book probably already understand how public-key +cryptography works. Perhaps not at the mathematical level, but at +least at the conceptual. But, most readers of this book also need to +be able to explain public-key cryptography for lay people. + +1. What is cryptography? Basically scrambling a text. + +1. Example: most people have probably secured a zip file with a + password. + +1. How does that work? A simple approach is to imagine that each + letter is a number---A is 0, B is 1, C is 2, etc.---and then add + (without carrying---that is B (2) + Z (25) is 27, 27 is larger than + 25, so do: 27 - 26 = 1 and take 1, i.e., do modular 26 arithmetic) + the plain text to the password. For instance consider the text + "Meet me in Mantua" and the password "tank boil throw letter". + + MEETMEINMANTUA + + TANKBOILTHROWLETTER + -------------------- + .... + + If you know the password, you can easily reverse the process. But + if you don't know the password, it is effectively impossible to + recover the plaintext given the ciphertext. + + Note: if the password is at least as long the text and password is + never reused, this is referred to as a one-time pad and is the + strong known cryptography. + +1. This approach doesn't scale. If you want to communicate with + multiple people, you need a remember a password for each person. + +1. Problem solved using public-key cryptography. Instead of sharing a + password, each person has a so-called public key and a so-called + private key. Using a public key cryptography, for Romeo to encrypt + a message to Juliet, he just needs to know her public key. Juliet + can decrypt the message using her private key. The nice thing + about the public key is that it can be shared with anyone. + +1. How does it work? Based on so-called one-way puzzles. Consider + factoring ~221~. To do this, you could try every number from 2 to + the square root of 221 and see if it evenly divides 221. For 221, + this doesn't take that long to do, but for a 1000\nbsp{}digit + number, it could take forever---even for a computer and although + there are some improvements over to the simply method, none are + significantly faster. But, if I told you that the factors are 13 + and 17, they you can /verify/ that very quickly. This is basically + how public-key cryptography works. There are also different + one-way puzzles. + +1. How to imagine public key encryption? We can think of the public + key as the blue prints for a safe that anyone can build around a + message, but once that message is in the safe, it can only be + opened using the recipients corresponding private key. + +Explain signing. + +Give other examples of how to explain public key cryptography. + +Talk about threat modeling. What are you trying to protect? From +whom? What resources does the adversary have? + ** OpenPGP GnuPG is an implementation of OpenPGP, an encryption standard published by the Internet Engineering Task Force (IETF). The IETF's main activity is the development and promotion of standards related to the Internet. Since its formation in 1986, the IETF has standardized many ubiquitous Internet protocols including the HyperText Transfer Protocol (HTTP), and the Transport Layer Security (TLS) protocol. Each standard is managed by a working group, and anyone can participate by joining the appropriate mailing list. The working group responsible for OpenPGP is fittingly called /The OpenPGP Working Group/. OpenPGP consists of three main parts. First, OpenPGP specifies a collection of cryptographic algorithms for encrypting and decrypting data, generating and verifying digital signatures, and deriving keys from passwords (so-called /key derivication functions/ or KDFs). These are built on top of more basic cryptographic building blocks like SHA-1 (a hash algorithm), AES (a symmetric cipher), and RSA (an asymmetric cipher, which is also known as a public-key algorithm). For the most part, the specification does not define these algorithms; it simply says which algorithms should be used where and how to use them. Second, OpenPGP defines a packet-based message format. This format is used not only for exchanging encrypted messages, but also for transferring keys and key meta-data. Finally, OpenPGP includes functionality to help manage keys. This functionality includes the ability to revoke a key, and to sign keys. The first version of the OpenPGP protocol was published in 1996 as RFC\nbsp{}1991. (Although, at that point it was still known as the PGP prototcol.) Since then, the protocol has undergone two major revisions. The most recent version was published in 2007 as RFC\nbsp{}4880. In 2015, the OpenPGP community again reformed the OpenPGP working group to update the specification\nbsp{}[[cite:openpgp-working-group-charter]]. The major goals for the next version are: the deprecation of some old cryptographic algorithms like SHA-1, the introduction of some new cryptographic algorithms based on elliptic curves, the addition of modern message integrity protection in the form of something like Authenticated Encryption with Associated Data (AEAD), and an updated fingerprint format. From an application programmer or user's perspective, the working group is not considering any major changes to the existing functionality; they are primarily tightening the standard's security and cleaning up a few issues. This is true even of OpenPGP's use of SHA-1, which, although SHA-1 has many flaws, is still considered safe in the way that OpenPGP uses it. That is, the changes are mostly to proactively---not reactively---address weaknesses. In the words of the cryptographer Peter Gutmann, "OpenPGP is still too good enough, there's lots of things there that you can nitpick but nothing really fatal, or even close to fatal"\nbsp{}cite:gutmann-too-good-enough. *** Data at Rest OpenPGP is used to protect both data at rest as well as data in motion. Whereas data at rest refers to data that is stored, e.g., on a hard drive, data in motion refers to data that is transferred, e.g., via HTTP. Thus, an encryption scheme that only protects data in motion, such as TLS, removes the encryption on receipt; the data is only protected on the wire. Another way to think about the difference between data at rest and data in motion is that encryption that protects data at rest protects it in time and space whereas encryption that protects data in motion only protects it in space. Yet another way to think about the difference is that data at rest is to the ~tar~ or ~zip~ tools as data in motion is to HTTP or XMPP. The decision to protect not only data in motion, but also data at rest using the same scheme significantly constrains the solution space. In particular, because data at rest may be accessed asynchronously with respect to the encryption, there is no possibility to negociate parameters on the fly. Consider an encrypted backup. When you encrypt the data, you can only use the strongest encryption that is available at the time of the encryption. When you access the data 10 years later, your implementation needs to support that now old encryption algorithm; there is no way to go back in time and say to your former self, "could you use this implementation instead?" An additional consequence is that upgrading the cryptography becomes very difficult. It is not possible to completely deprecate old algorithms, because old messages (like our backup) still need to be decrypted. Similarly, since people continue to use old software, we often cannot use the latest and greatest encryption scheme, because they might not be able to decrypt the data! Another result of this decision to protect data at rest is that enabling forward secrecy is not possible. Forward secrecy is an oft-lauded encryption property, which prevents old encrypted messages from being decrypted if the private key material is somehow compromised. Forward secrecy works by mutating the key material in time. This scheme is fine if you never need to decrypt old messages (as is typically the case for data transferred via HTTPS, say), but doesn't work at all for data at rest: if you want to decrypt some data a week later, nevermind 10 years later, then you won't be able to if you've destroyed the private key material needed to decrypt it! Perfect secrecy becomes even more complicated when a user has multiple devices, and all devices should be able to decrypt all messages. OpenPGP doesn't require that those devices somehow synchronize their state after the private key is copied. But, some type of synchronization is necessary for forward secrecy. This raises the question: why have a single algorithm for both data in motion and data at rest? The reason is that OpenPGP messages are often not stored on a trusted host or even processed on a trusted host before being stored. Consider email. Email is normally stored on a mail server. Even after the mail is read, it remains on the mail server so that it can be read later---potentially years later---on a different device. Thus, even assuming that we could harden the security of the transport layer, it is not clear that when the data is on a mail server, it is any less vulnerable than when it is on the wire. In fact, data breaches at huge companies entrusted with highly personal information from millions or even billions of users, such as Yahoo!\nbsp{} and Adult Friend Finder, are evidence that this is not the case. *** Unbuffered Message Processing OpenPGP is designed to allow unbuffered message processing. This is partially achieved by mandating that message packets be sorted topologically. That is, if a packet has a dependency, that dependency precedes it in the message. This property is important for several reasons. First, it allows an OpenPGP implementation to run on memory constrainted systems while being confident that the implementation can in practice process arbitrarily large messages. Second, it ensures that streaming tools can be used, e.g., something like ~... | gpg -e -r key | ssh ...~. Finally, this property helps avoid some denial of service attacks, which might otherwise be possible by crafting a malicious message. In practice, there are some limitations to the degree to which buffering can be avoided. Consider a pipeline in which a message is verified, and the output of the message is somehow processed. Because the OpenPGP implementation requires the whole message to verify it, to process this message in a streaming fashion, the OpenPGP implementation has to output the data before it has been verified. Now, if the consumer can't process the output in a way that can be reverted in the case of a validation failure, the consumer must first buffer the data. But, even if it is possible for the consumer to recover from a validation failure, it's probably error prone if only because code on an error path is rarely tested. Thus, although the OpenPGP implementation could avoid buffering data in this situation, it has merely shifted the burden. Now, there are some more advanced cryptographic constructs, such as hash chaining, that make it possible to verify the data bit-by-bit. These techniques would help ensure that the consumer only processes verified data, which is an improvement over the status quo. But, they don't completely solve the problem, because they can't protect against message truncation. *** OpenPGP Messages An OpenPGP message is basically a sequences of packets. OpenPGP defines 17 different packet types that are used to not only encrypt and sign messages, but also to transfer keys and key signatures or certifications, which are used in the web of trust. The format is extensible, and this has already been used to add new features. An example of a packet type is the symmetrically encrypted data (SED) packet. A SED packet contains data that has been encrypted using a symmetric algorithm, such as AES. The contents of the packet are zero or more OpenPGP packets. That is, OpenPGP messages are nested; a SED packet is a container. Typically, a SED contains either a signature packet or a compressed data packet, which in turns holds a literal data packet, but the specification doesn't impose any limitations. This flexibility in message composition is referred to as /agility/. It has both advantages and disadvantages. A useful advantage that this flexibility offers is that the format can be used in unforeseen situations. For instance, the web key directory (WKD) uses the non-standard sign+encrypt+sign pattern to facilitate spam detection prior to decryption. Two important disadvantages of this flexibility are that parsing OpenPGP messages is more complicated, and assigning meaning to unusual structures can be difficult. As an example of the latter, consider a message with two literal data packets, the first of which is signed. Assuming the signature is valid, should an implementation report that the message is valid? Probably not. The second part could have been forged. Alternatively a mail program could show both parts and indicate that only the first part is authentic. But, this requires educating the user to understand these nuances. Unfortunately educating users is known to be extremely difficult. *** Encryption Most lay people and even many technical people assume that encryption includes both an integrity check and authentication. In reality, encryption by itself provides neither. This assumption perhaps arises due to conditioning from web browsers that not only conflate the two concepts, but treat a connection secured with a self-signed certificate (which provides encryption, but not authentication), worse than those that use neither encryption nor authentication. Additionally, in recent years, the term end-to-end encryption has entered the mainstream. Although authentication is as important as encryption in such systems, only encryption is mentioned. Be that as it may, in OpenPGP, encryption and signing are separate, independent operations. **** Hybrid Encryption OpenPGP is a hybrid cryptosystem. A hybrid cryptosystem first encrypts data using a symmetric encryption algorithm like AES with a random so-called /session key/, and then encrypts the session key using the recipient's public key. The result is stored in a so-call /public-key encrypted session key/ (PK-ESK) packet. There are two important reasons for doing this as well as several additional advantages. First, public key encryption is thousands of times slower than symmetric encryption. Since a session key is just a single block of data (which is N\nbsp{}bits for an N\nbsp{}bit RSA key), but the data to encrypt could be megabytes or even gigabytes large, this saves a lot of processing power. Second, it is not unusual to encrypt a message to multiple recipients. The most obvious example of this is in the context of email where an encrypted email is sent to multiple people. But even in other contexts, having multiple recipients is not unusual. Specifically, when encrypting data to another party, most programs will also encrypt the data to the person doing the encryption so that the data remains readable and auditable. An advantage of this approach is that it is possible to do message-based key escrow. Thus, a company wouldn't need to have access to each employee's private key, but whenever the employee decrypted an email, the session key could automatically be reencrypted with a special escrow key. Similarly, if law enforcement forces you to reveal the encryption key for some messages, it is sufficient to provide the session keys for decrypting the subpoenaed messages. If you had instead provided your private key, law enforcement could read any message that had been encrypted to you. (In GnuPG, you can extract the session key using the ~--show-session-key~ option.) Finally, using hybrid encryption, it is possible to encrypt to both public keys and passwords. To encrypt a message using a password, OpenPGP specifies a key derivation function (S2K), which is used to generate a symmetric key. (This is saved in a so-called /symmetric-key encrypted session key/ (SK-ESK) packet.) OpenPGP allows the symmetric key to be used directly as the session key, but it can just as well be used to encrypt a session key. In practice, this is primarily interesting to ensure that the sender is able to later decrypt the contents of the message by also encrypting the session key to her public key. **** Algorithm Encryption in OpenPGP is a more or less standard hybrid encryption scheme: 1. A random /session key/ is generated. 1. For each recipient, the OpenPGP implementation encrypts the session key using the recipient's public key, and emits a /public-key encrypted session key/ (PK-ESK) packet. 1. If the data should be encrypted using a password, the same thing is done, but instead of emitted a PK-ESK packet, a /session-key encrypted session key/ (SK-ESK) packet is emitted. 1. Encrypt the actual data using the session key. OpenPGP supports multiple symmetric encryption algorithms. To determine which one to use, the OpenPGP implementation selects one from the intersection of the recipients' preferred algorithms. This information isn't negotiated in real time with the recipients (even when this might in theory be possible), but is stored alongside the recipient's public key (specifically, in a user ID's self-signature). Typically, this is just a list of the algorithms that the OpenPGP implementation that generated the key supports at the time the key was created, but it can be updated to reflect changes in the implementation, and may be customized by expert users. Since all implementations are required to at least support TripleDES, and it appears implicitly at the end of the list, the intersection is never empty. **** An Encrypted Message To better understand how messages are laid out, the following example shows the innards of an encrypted message. This output was created using GnuPG's ~--list-packets~ option. ~hot dump~, which is part of hOpenPGP, and ~pgpdump~ can do something similar. # GNUPGHOME=`pwd`/gnupg/romeo #+BEGIN_EXAMPLE $ echo 'Let us sojourn in Mantua!' | \ > gpg --encrypt -r juliet.capulet@gnupg.net | \ > gpg --list-packets gpg: encrypted with 2048-bit RSA key, ID C1A010A1D38C4BB8, created 2017-07-07 "Juliet Capulet " gpg: encrypted with 2048-bit RSA key, ID 5B905AF0423ABB52, created 2017-07-07 "Romeo Montague " # off=0 ctb=85 tag=1 hlen=3 plen=268 :pubkey enc packet: version 3, algo 1, keyid C1A010A1D38C4BB8 data: [2046 bits] # off=271 ctb=85 tag=1 hlen=3 plen=268 :pubkey enc packet: version 3, algo 1, keyid 5B905AF0423ABB52 data: [2046 bits] # off=542 ctb=d2 tag=18 hlen=2 plen=85 new-ctb :encrypted data packet: length: 85 mdc_method: 2 # off=563 ctb=a3 tag=8 hlen=1 plen=0 indeterminate :compressed packet: algo=2 # off=565 ctb=cb tag=11 hlen=2 plen=32 new-ctb :literal data packet: mode b (62), created 1499445579, name="", raw data: 26 bytes #+END_EXAMPLE The example shows a message that Romeo encrypted to Juliet. (Due to limitations of the OpenPGP format---OpenPGP only supports timestamps between 1970 and 2106---Romeo forward dated the creation time of his key.) The first thing that we notice is that even though Romeo only specified a single recipient (using the ~-r~ option), the message is encrypted to two keys: his and Juliet's. This is because Romeo has the ~encrypt-to~ option set in his ~gpg.conf~ file so that he can always read messages that he encrypts to someone else. ***** Packet Metadata After listing the recipients, ~gpg~ outputs each packet. Each packet starts with a line preceded by a ~#~. This line shows some meta-data and the packet's header. Specifically, ~off~ indicates the offset of the packet within the stream (this may not be accurate if there are compressed packets); ~ctb~ (Content Tag Byte) includes the type of the packet, and some information about the length of the packet (if this is a new format packet, then ~new-ctb~ will appear towards the end of the line); ~tag~ is the type of the packet as extracted from the ~ctb~; and, ~hlen~ and ~plen~ are the header and body lengths, respectively. Sometimes the length of a packet is not known apriori. In this case, ~plen~ will be 0 and ~indeterminate~ or ~partial~ will appear towards the end of the line. This can occur when the data is streamed. ~indeterminate~ means that all data until the end of the message belongs to this packet; ~partial~ means the packet uses a chunked encoding method to encode the data. The mechanism is similar to HTTP's chunked transfer encoding method. These encoding schemes are essential for supporting unbuffered operations. See Section\nbsp{}4.2.2.4 of RFC\nbsp{}4880 for more details. ***** The PK-ESK Packets The first two packets in the message are PK-ESK packets. Each of these holds the session key encrypted to a recipient. A PK-ESK packet also includes the 64-bit key ID of key that the session key was encrypt to. If the key ID wasn't included, then a recipient wouldn't know whether a given PK-ESK packet is encrypted with her or someone else's key and she would just have to try to decrypt them one by one. The obvious consequence is that CPU cycles could be wasted. But, the more important reason for avoiding a decryption attempt is that the user might have to unlock multiple private keys. This can seriously impact an application's usability. Avoiding this UX annoyance by including the key ID in the PK-ESK has a cost: it leaks meta-data. In practice, however, this information is exposed in other places, e.g., at the SMTP level. Nevertheless, OpenPGP provides a mechanism to hide this meta-data by setting the key ID to ~0~, which means the key ID is speculative. Such key IDs are also referred to as wild card key IDs. A speculative key ID can be set in GnuPG by either specifying ~--throw-keyids~ to clear the key ID field for all recipients, or ~--hidden-recipient~ in place of ~--recipient~ to clear the key ID field for a particular recipient. ***** The Encrypted Data Packet Immediately following the PK-ESK packets is an encrypted data packet. This ordering is mandatory: it ensures that buffering is not required, because the key needed to decrypt the packet is stored prior to the data that it decrypts. As already mentioned, an encrypted data packet is a container, which contains 0 or more OpenPGP packets. This is not obvious from the output of the ~--list-packets~ command, because it doesn't show the message's tree structure. In this case, as is usually the case, the encrypted data packet contains a single packet. In OpenPGP, there are actually two types of encrypted data packets: Symmetrically Encrypted Data (SED) packets and Symmetrically Encrypted Integrity Protected data (SEIP) packets. Although the former are technically allowed by the standard, they are deprecated in practice due to security concerns. For instance, it is possible to conduct an oracle attack\nbsp{}[[cite:mister2005cfb-attack]], and message extension and deletion attacks are also possible. Consequently, when GnuPG encounters such a packet, it emits a warning. GnuPG itself will not emit an encrypted packet without integrity protection. We can see that the encrypted data packet includes integrity protection based on the packet's tag (18 instead of 9), and the presence of the ~mdc_method~ field in the above output. ****** Modification Detection Codes MDC stands for Modification Detection Code. Like a message authentication code (MAC), an MDC can verify a message's integrity. But, unlike a MAC, an MDC doesn't say anything about its authenticity. A common criticism leveled at the MDC system is that using an HMAC would have been better since it is better understood. Ignoring that the MDC system has proven to be sufficient for its intended purpose, using an HMAC wasn't really an option when the problem was discussed: HMACs and MDCs were developed concurrently. (For more historical notes, see\nbsp{}[[cite:callas2009mdc]].) Prior to the introduction of the MDC system in RFC\nbsp{}4880, it was only possible to reliably detect integrity violations using signatures. Signatures, however, have the disadvantage that they expose the signer's identity, which is sometimes undesirable. MDC works by computing the SHA-1 over the clear text and the head of the MDC packet. (The rest of the MDC packet is the computed hash.) That is, the hash effectively violates the packet framing. But, this is exactly the behavior that is required to fully ensure the data's integrity: by also including the head of the MDC packet in the hash, extension and removal attacks are mitigated. The following example illustrates how it works: #+BEGIN_EXAMPLE +------+-----------------------------------+-------------+ | SEIP | Data (e.g. a literal data packet) | MDC hash | +------+-----------------------------------+-------------+ \ / ^ `--------------------------------------' | SHA-1 -----------------------------' #+END_EXAMPLE The ~mdc_method~ parameter above seems to suggest that there are multiple MDC methods. This is not the case, and was explicitly avoided to prevent downgrade and cross-grade attacks; the value of 2 is simply SHA-1's OpenPGP algorithm identifier. But even though SHA-1 has since been broken, the relevant security properties for the MDC system remain intact. Nevertheless, the working group is considering replacing the MDC system with one based on Authenticated Encryption with Associated Data (AEAD), which has other useful properties. As a final note, the MDC packet is not shown in the output of ~--list-packets~. This is a technical limitation of GnuPG, which has to do with the way the MDC packet is processed. But, given that ~--list-packets~ is only a debugging interface and not intended for programmatic use, this limitation is unlikely to be fixed. ***** Compressed Packet The compressed packet is nested within the encrypted packet. RFC\nbsp{}4880 specifies three different compression algorithms---ZIP, ZLIB, and BZip2---but notes that they are optional. But even though compression is not required, the RFC recommends it as an operationally useful (even if not rigorous) form of integrity protection. Unfortunately, it has been shown that compressing data prior to encryption can enable a chosen plaintext attack as demonstrated by the CRIME on TLS, and BREACH on HTTP attacks. # https://security.stackexchange.com/questions/39925/breach-a-new-attack-against-http-what-can-be-done/39953#39953 ***** Literal Data Nested within the compression packet is a literal data packet. A literal data packet contains not only the cleartext, but also a bit of metadata. In particular, a literal packet includes a formatting field, which indicates whether the contents are binary data or text, and, in the latter case, whether the text is believed to be UTF-8 formatted. The packet also contains a filename, which is helpful when transferring a file, but is mostly ignored by GnuPG in practice. And, it contains a timestamp. GnuPG sets the timestamp to the current time when the packet is created (not the file's ~mtime~). It is worth pointing out that when GnuPG is told to decrypt data (~gpg --decrypt~), it doesn't look for an encrypted message to decrypt, but processes the message and tries to decrypt any encypted data that it encounters. This subtle difference in behavior can be important, because if GnuPG is told to decrypt a message with just a literal packet, it will simply output the contents of the literal packet without warning the user that the data was not actually encrypted. If a program uses the ability to decrypt a message as an authentication check (e.g., in AutoCrypt's Setup Message), this behavior could lead to subtle attacks\nbsp{}[[cite:autocrypt-bad-import]]. *** Signing A signature provides cryptographic proof of both the signed data's integrity and its authenticity---assuming the key used to sign the data is trusted. That is, like a checksum, a signature can be used to make sure that the data was not modified in transit. But unlike a checksum, a signature can also provide proof of the data's origin (or at least, who signed off on the message). Note: the exact semantics of a signature are not defined by the standard. This is done on purpose, and is viewed by the RFC editors as a feature, because, in the end, a signature's meaning is determined by the actual human users of the system---some will be more casual, and some will be more rigorous no matter what some standard says. **** Multiple Signers In OpenPGP, it is possible for a single message to include multiple signatures created by different keys. This mechanism is useful when disparate parties want to sign a document. For instance, multiple developers might sign released software. Rather than providing each signature separately, it is more useful to combine them into a single file. In GnuPG, this can be done by specifying each of the keys on the command line. For instance: #+BEGIN_EXAMPLE $ echo 'Good-bye cruel world!' | gpg -s -u romeo -u juliet #+END_EXAMPLE A crippling disadvantage of this approach is that all keys must be available at the time that the signature is generated, which is rarely practical. Although OpenPGP's packetized message format makes combining signatures relatively easy, GnuPG does not provide support for this. Nevertheless, in practice, writing an ad-hoc script is straightforward (some hints are here:\nbsp{}[[cite:koch2013clearsign-text-document-with-multiple-keys]]). And, in the special case that the signatures in question are /detached/ signatures, combining them is actually trivial: they just need to be concatenated together as shown below: #+BEGIN_EXAMPLE $ echo 'Romeo and Juliet forever!' > note.txt $ gpg --detach-sign -u romeo --output - note.txt > note.txt.romeo.sig $ gpg --detach-sign -u juliet --output - note.txt > note.txt.juliet.sig $ cat note.txt.romeo.sig note.txt.juliet.sig > note.txt.sig $ gpg --verify note.txt.sig note.txt gpg: Signature made Tue 11 Jul 2017 11:52:48 AM CEST gpg: using RSA key D6636A9EB82A91E94DDEE5066B284A5BE2297415 gpg: issuer "romeo.montague@gnupg.net" gpg: Good signature from "Romeo Montague " [full] gpg: Signature made Tue 11 Jul 2017 11:52:59 AM CEST gpg: using RSA key E5156E507DCB8D63AC89E5334954FDC67A46B4C5 gpg: issuer "juliet.capulet@gnupg.net" gpg: Good signature from "Juliet Capulet " [full] #+END_EXAMPLE In the above examples, the signatures are not nested. That is, they are both only over the data, and one could remove either signature from the OpenPGP message without impacting the validity of the other signature. Sometimes, it can be useful to nest signatures. For instance, a notary might want to not only notarize some document, but also the client's signature over that document. OpenPGP also provides native support for this type of signature. In fact, both types can be present in the same message. GnuPG does not currently support nested signatures. **** Algorithm As in the encryption case, signing is a two-step process. First, the data to be signed is hashed, and then the resulting hash is signed using public-key cryptography. This two-step process is primarily motivated by performance considerations. The exact algorithm that is used is slightly different depending on whether the signature should be inline or detached. We start by describing how an inline signature is created. 1. Emit a so-called /One-Pass Signature/ (OPS) packet. An OPS packet contains meta-data (what hash algorithm to use, etc.) as well as framing information (specifically, whether the signature is nested or not). 1. Hash and emit the data to sign. 1. Emit a signature packet, which includes the computed hash and the signature. As its name and the implementation suggest, the OPS packet makes it possible to both create a signature, and verify it without buffering any data. Since detached signatures are separate from the main OpenPGP message, and OPS packets are effectively redundant, to generate a detached signature, we just skip the first step. A limitation of detached signatures is that they are over the entire OpenPGP message. Thus, nesting them is not possible. **** Example Using our above example with inline signatures, the resulting message has the following packets: #+BEGIN_EXAMPLE $ echo 'Good-bye cruel world!' \ > | gpg -s -u romeo -u juliet | gpg --list-packets # off=0 ctb=a3 tag=8 hlen=1 plen=0 indeterminate :compressed packet: algo=1 # off=2 ctb=90 tag=4 hlen=2 plen=13 :onepass_sig packet: keyid 4954FDC67A46B4C5 version 3, sigclass 0x00, digest 8, pubkey 1, last=0 # off=17 ctb=90 tag=4 hlen=2 plen=13 :onepass_sig packet: keyid 6B284A5BE2297415 version 3, sigclass 0x00, digest 8, pubkey 1, last=1 # off=32 ctb=cb tag=11 hlen=2 plen=28 new-ctb :literal data packet: mode b (62), created 1499772743, name="", raw data: 22 bytes # off=62 ctb=89 tag=2 hlen=3 plen=333 :signature packet: algo 1, keyid 6B284A5BE2297415 version 4, created 1499772743, md5len 0, sigclass 0x00 digest algo 8, begin of digest 88 56 hashed subpkt 33 len 21 (issuer fpr v4 D6636A9EB82A91E94DDEE5066B284A5BE2297415) hashed subpkt 2 len 4 (sig created 2017-07-11) hashed subpkt 28 len 24 (signer's user ID) subpkt 16 len 8 (issuer key ID 6B284A5BE2297415) data: [2048 bits] # off=398 ctb=89 tag=2 hlen=3 plen=333 :signature packet: algo 1, keyid 4954FDC67A46B4C5 version 4, created 1499772743, md5len 0, sigclass 0x00 digest algo 8, begin of digest c5 e3 hashed subpkt 33 len 21 (issuer fpr v4 E5156E507DCB8D63AC89E5334954FDC67A46B4C5) hashed subpkt 2 len 4 (sig created 2017-07-11) hashed subpkt 28 len 24 (signer's user ID) subpkt 16 len 8 (issuer key ID 4954FDC67A46B4C5) data: [2047 bits] #+END_EXAMPLE ***** Compressed Packet Again, we see that the message starts with a compression container. Since the length of the data is not known apriori, the length is marked as ~indeterminate~, which means that the packet includes all of the data until the end of the message. ***** One-Pass Signature Packets The next two packets are OPS packets. These packets include the hash algorithm that was used to generate the signature. This information needs to be available beforehand so that the signature can be verified in a streaming fashion. The hash algorithm, which is also known as the message digest algorithm, is indicated by the ~digest~ field in the output. Another piece of information that is necessary to verify the data in a streaming manner is how to interpret the data to sign. This is determined by the signature's class (~sigclass~). Normally, OPS packets are only used with documents (as opposed to keys or user IDs, which are so small that buffering isn't an issue). OpenPGP defines two types of documents: binary data and text data whose respective classes are ~0~ and ~1~. For binary documents, the data is hashed as is; for text documents, the OpenPGP implementation first converts line endings to ~~ before hashing. The OPS packets also include the signer's key ID and the public key algorithm used to generate the signature. This information is strictly speaking redundant as it is also stored in the matching signature packet, but it can help the implementation identify several common cases in which it can't verify the signature prior to actually computing the hash. Specifically, the implementation can't verify a signature if the signer's public key is unavailable, or the public key algorithm used to compute the signature is not supported (even if the hash algorithm is supported). In such cases, the implementation can fail early, or just skip the hashing, which saves some CPU cycles. Finally, OPS packets include framing information. In GnuPG, this is referred to as the /last signature/ flag. In the above output, it is referred to ~last~. If ~last~ is 1, then the signature is over all of the following data up to the OPS's corresponding signature packet; if ~last~ is 0, then the signature is not nested and is only over the data following the next OPS packet with ~last~ equal to ~1~. Given this definition of ~last~, we see that the first signature in the above example is not nested (~last~ is ~0~), but the second is. Thus, both signatures are over the data; the outer signature is /not/ over the inner signature, just the data. To better understand how signatures nest, consider the following example, which shows an OpenPGP message with three signatures. The first three packets are OPS packets, the middle packet is a literal data packet, and the last three packets are the OPS' corresponding signature packets. #+BEGIN_EXAMPLE ________________________________________________ ,-----> / \ +-----------+-----------+-----------+------+---------+---------+---------+ | A, last=1 | B, last=0 | C, last=1 | Data | C's sig | B's sig | A's sig | +-----------+-----------+-----------+------+---------+---------+---------+ | `----> \____/ `------------------^ #+END_EXAMPLE Working our way in, we see that ~last~ is set for A's signature. Thus, A's signature is over everything immediately following the OPS packet up to the matching signature packet. That is, it is over not only the data, but also over B and C's signatures. In contrast, in B's OPS packet, ~last~ is clear. Thus, B's signature is over everything following the next OPS packet with ~last~ set to ~1~, i.e., everything follow C's OPS packet, up to, but not including, the signature packet matching C's OPS packet. That is, like C's signature, B's signature is only over the literal data packet, not the data packet /and/ C's signature. ***** Literal Data The literal data packet contains the document to be signed. Of course, if the signatures are nested, then the signature may include other data as well. ***** Signature Packet The last two packets are the signature packets that match the OPS packets at the start of the message. Like braces in a programming language, the first OPS packet matches the last signature packet, and the second OPS packet matches the second to last signature packet. Except for the nesting information, the signature packet includes everything present in the OPS packet as well as some additional meta-data, and the actual signature. The additional meta-data usually includes a timestamp (the OpenPGP Signature Creation Time subpacket), and the user ID that was used to make the signature (the OpenPGP Issuer subpacket). There are several other pieces of metadata that can be added, but they are not usually set in this context. The issuer is usually used by a mail user agent to make sure the alleged sender matches the signer. For instance, Romeo might have verified his father's key, but his father might try to trick him by sending him an email that appears to be from Juliet. Because he knows that Romeo always checks a signature's validity, he could just sign the message with his own key. If the mail user agent only shows whether a signature is valid, then Romeo might be tricked. Making sure the from header matches the issuer catches this attack. *** Keys As mentioned above, OpenPGP messages are not only used to transport documents, but are also used to transport keys and key signatures. In OpenPGP, a so-call /key/ is a lot more than just a public and private key pair. Modern OpenPGP keys normally include at least two key pairs as well as a fair amount of meta-data. **** Multiple Public and Private Key Pairs OpenPGP supports multiple key pairs for several reasons. First, although it is possible to use the same key pair for encryption and signing, if you do, then the act of decrypting a message is equivalent to signing it (and vice versa), which could be abused by an adversary. In practice, this particular attack is prevented by the use of distinguishing padding schemes. But, using separate keys avoids this problem and prevents any issues that may be discovered in the future. Second, having multiple keys makes it possible to largely separate identity from key lifetime. In particular, OpenPGP has the concept of primary keys and subkeys. The primary key is used to identify the OpenPGP key. That is, a key's fingerprint is derived from this key, and is independent of any subkeys. This makes it possible for a user to revoke individual subkeys without changing her identity. For instance, each year you could generate a new encryption and a new signing subkey, and revoke the old ones, and there would be no need to create new business cards or even inform your contacts that you have new keys, because, assuming their software is configured to regularly refresh your key, their OpenPGP implementation will automatically find the new subkeys since your primary key did not change. In fact, this type of key rotation approximates forward secrecy\nbsp{}[[cite:brown2001forward-secrecy-for-openpgp]]. To support an arbitrary number of keys, primary keys and subkeys are marked with so-called /capabilities/. There are (perhaps surprisingly) four capabilities: 1. Encryption 1. Signing 1. Certification 1. Authorization An encryption capable key can be used for encryption, and a signing capable key can be used for signing documents. But, if a key does not have the encryption capability, then it should not be used for encryption. The certification capability indicates that a key can be used for signing /keys/ (as opposed to documents). Thus, since a subkey requires a signature to be valid, only a certification-capable key can be used to create a new subkey. Finally, the authorization capability is used for access control. This is primarily useful for using an OpenPGP key with ~ssh~. It is entirely possible for a key to have multiple capabilities. As mentioned above, it is not advisable to use a key for both signing and encryption, but since mathematically certification is just signing, it is reasonable to mark a key as both signing and certification capable. Whether this is reasonable depends on how the user wants to manage keys. For instance, if a signing-capable key is compromised, it is possible to recover without generating an entirely new OpenPGP key. But, if a certification-capable key is compromised, then the attacker effectively owns the identity, and the only way to recover is to completely revoke the OpenPGP key and create a new one. This only works if users physically separate the certification key from the signing key, e.g., by only storing the certification key on an offline computer. Since most users don't do this, GnuPG defaults to making the primary key both certification capable and signing capable. An OpenPGP key can have multiple valid (i.e., not expired and not revoked) subkeys with the same capability. In this case, the RFC does not specify which subkey should be used; it is up to the implementation. If there are multiple encryption-capable keys, GnuPG uses the newest valid subkey. But this is not the /de facto/ standard. For instance, OpenKeychain encrypts a message to all valid encryption-capable keys. The OpenKeychain behavior has the advantage that one can store different keys on different devices. Then if a particular device is compromised, only the subkeys on that device need to be rotated. But, operationally, the advantages for encryption-capable subkeys are not that large, since an encryption-capable key protects /past/ traffic. That is, if an encryption key is compromised, all messages encrypted to it are compromised. Thus, a message is compromised if any encryption key is compromised. So, in this case, one might as well just use a single encryption key. This line of logic does not apply to signing-capable keys. If a signing-capable subkey is compromised, the attacker can forge messages. But, if the user has one signing-capable key per device and revokes just the single signing-capable subkey that was compromised, then the attacker will be thwarted and only signatures created using that key will fail to verify after it has been revoked. **** Self Signatures As mentioned previously, an OpenPGP fingerprint is derived only from the primary key, not the subkeys. This makes sense, since new subkeys can be added at any time. Thus, some mechanism is needed to associate subkeys with the corresponding primary key. Further, a mechanism is needed to associate meta-data with an OpenPGP key. Both of these problems are solved using the same mechanism: self-signatures. A self-signature is like a normal signature, but instead of being over a document, the signature is over structured text, and it is stored alongside the OpenPGP key. A self-signature can only be created (or rather, is only honored if it was created) by a certification-capable key. Since the signature can't be forged, it effectively creates an unforgable binding between the OpenPGP key and the data. Thus, to determine if a subkey really belongs to a given OpenPGP key, it is sufficient to check whether there is a valid self-signature. Because OpenPGP packets can be combined in whatever way a user wants, an attacker who controls a user's network connection may not be able to modify individual packets without detection, but can drop packets. Thus, if an attacker has compromised a user's key, the user notices, and revokes her key, she is still not safe if the attacker also controls the network path, and filters out the revocation certificate thereby preventing other users from learning that the key was compromised. **** Example The following example shows Romeo's key. This key was created by GnuPG using the default parameters. Thus, it has a primary key, which is signing- and certification-capable, and a single subkey, which is encryption capable. #+BEGIN_EXAMPLE $ gpg --export romeo | gpg --list-packets # off=0 ctb=99 tag=6 hlen=3 plen=269 :public key packet: version 4, algo 1, created 1499443140, expires 0 pkey[0]: [2048 bits] pkey[1]: [17 bits] keyid: 6B284A5BE2297415 # off=272 ctb=b4 tag=13 hlen=2 plen=41 :user ID packet: "Romeo Montague " # off=315 ctb=89 tag=2 hlen=3 plen=340 :signature packet: algo 1, keyid 6B284A5BE2297415 version 4, created 1499443140, md5len 0, sigclass 0x13 digest algo 8, begin of digest 71 f6 hashed subpkt 33 len 21 (issuer fpr v4 D6636A9EB82A91E94DDEE5066B284A5BE2297415) hashed subpkt 2 len 4 (sig created 2017-07-07) hashed subpkt 27 len 1 (key flags: 03) hashed subpkt 9 len 4 (key expires after 2y0d0h0m) hashed subpkt 11 len 4 (pref-sym-algos: 9 8 7 2) hashed subpkt 21 len 5 (pref-hash-algos: 8 9 10 11 2) hashed subpkt 22 len 3 (pref-zip-algos: 2 3 1) hashed subpkt 30 len 1 (features: 01) hashed subpkt 23 len 1 (keyserver preferences: 80) subpkt 16 len 8 (issuer key ID 6B284A5BE2297415) data: [2048 bits] # off=658 ctb=b9 tag=14 hlen=3 plen=269 :public sub key packet: version 4, algo 1, created 1499443140, expires 0 pkey[0]: [2048 bits] pkey[1]: [17 bits] keyid: 5B905AF0423ABB52 # off=930 ctb=89 tag=2 hlen=3 plen=310 :signature packet: algo 1, keyid 6B284A5BE2297415 version 4, created 1499443140, md5len 0, sigclass 0x18 digest algo 8, begin of digest 19 f8 hashed subpkt 33 len 21 (issuer fpr v4 D6636A9EB82A91E94DDEE5066B284A5BE2297415) hashed subpkt 2 len 4 (sig created 2017-07-07) hashed subpkt 27 len 1 (key flags: 0C) subpkt 16 len 8 (issuer key ID 6B284A5BE2297415) data: [2043 bits] #+END_EXAMPLE ***** Public Key Packet The public key packet normally comes first. It just contains a minimum amount of information: the public key algorithm (~algo~), the public key parameters (~pkey~), the creation time (~created~), and the expiry time (~expires~). Although the ~--list-packets~ output shows the key ID, this is not included in the packet; it is shown as a matter of convenience. Including it in the packet would be redundant, because it is derived from the creation time and the public key parameters. In the above listing, there is no self-signature for the public-key packet. The parameters are, however, protected by the self-signature over each user ID packet, which is over not only the user ID packet, but also the primary key. It is possible to make signatures just over the primary key. But, this is typically only used in the case of key revocation. Not using a self-signature for the key means that meta-data like user preferences needs to be stored someplace else. By convention, they are stored in a user ID's self-signature. Consequently, if you have multiple user IDs, you could have multiple sets of conflicting preferences. This is actually by design: the relevant preferences are determined by how the key is addressed, which allows different sets of preferences for different environments. So, if you have two user IDs, one for work, and one for home, when someone uses your key to encrypt to your work email address, the preferences are taken from the work user ID. If the caller just specifies the key ID, then the preferences are taken from the so-called /primary user ID/. (The primary user ID is the user ID with the primary user ID flag set in its self-signature. If there are no user IDs that have this flag set or multiple user IDs, then RFC\nbsp{}4880 recommends using the user ID with the newest self-signature.) Thus, because it is reasonable to have different preferences for different user IDs, if the intended user ID is known, it---and not the key ID---should be used to address the key. By convention, self-signatures immediately follow the packet that they certify. As such, any direct key signatures would immediately follow the public key prior to any user ID or subkey packets. In practice, this is not always the case due to implementation bugs or malicious intent. Thus, on import, GnuPG will attempt to fix any out-of-order packets. This can involve some overhead, but this additional overhead is only incurred if the packets are actually out of order. When some meta-data is changed, a new self-signature is created. Since data that is publish can't easily be deleted, OpenPGP treats the key as an append-only log. The result is that a user ID packet, for instance, might have multiple self signatures. In general, if there are multiple self-signed packets for a given packet, only the newest one is used. One important exception is for revocation certificates and any designated revoker settings: it is necessary to respect these even if a later self signature would somehow override them, because this capability could be used by an attacker to invalidate a revocation, which would effectively make revocations of compromised keys impossible. ***** User ID Packet User IDs are stored between the public key and any subkeys. In this example, the key only contains a single user ID. A user ID packet just contains a single value: a free-form string. By convention (per the RFC), this string is an RFC\nbsp{}2822-style mailbox, i.e., a UTF-8 encoded string of the form ~Name (Comment)~. Normally, a user ID doesn't require a comment, and, like Romeo's key, most keys don't have one. Nevertheless, even though comments can (rarely!) be useful for advanced users, it is recommended that most tools not offer users the option to set it, because most people don't understand what they are for. There are two main uses for comments: to distinguish security levels and roles. Thus, if a user wants to have two OpenPGP keys associated with a given email address, one for low-security communication, which is stored directly on the device thereby allowing immediate decryption, and one for high security communication, which is, say, stored on an air-gapped computer and therefore may introduce a long delay if the user is not near the air-gapped computer, comments along the lines of "day-to-day key" and "high security key," respectively, might be appropriate. Similarly, if a developer has a key that is only used for signing commits and releases, a reasonable comment on that key could be "dist sig". Daniel Kahn Gillmor takes an even more conservative stance, and argues that even these comments are probably unnecessary\nbsp{}[[cite:gillmor-user-id-comments]]. It is also possible to use an image as a user ID. In such cases, the image is stored in a so-called user attribute packet. One problem with images is that they can be fairly large. Since images like old signatures can't be deleted once they are published, and they are downloaded whenever a key is retrieved, it is currently recommended that images be limited to just a few kilobytes of data. Images can be useful since many people are able to more quickly associate a person with that person's likeness than with her name. Thus, an image could be shown in a Jabber client or a mail user agent. However, this should probably only be done for validated keys to avoid suggesting authenticity when there is no evidence thereof. Another possible use for images is in a graphical depiction of a path in the web of trust. ***** User ID Self Signature By convention, the user ID self-signature immediately follows the user ID. In addition to binding the user ID to the primary key, it also contains additional metadata. As noted above, there may be multiple self-signatures, and normally only the newest is used. The signature is self-describing. It includes the key that was used to create the signature, the algorithm, etc. The ~sigclass~ subpacket is ~0x13~, which means that this signature is over a user ID. The signature includes a number of hashed subpackets. Hashed subpackets are effectively key-value pairs that are validated by the signature. The OpenPGP specification includes 22\nbsp{}different subpackets including so-called /notation data/, which can be used to store arbitrary data. (Notations are describing towards the end of this chapter.) In this example, there are 10 subpackets. Some of the subpackets provide information about the signature itself. This is the case for the ~issuer fpr~, ~sig created~ and ~issuer key ID~ subpackets. Some of them provide information about the primary key. This is the case for the ~key flags~, and ~key expires after~ subpackets. The ~key flags~ subpacket is primarily used for indicating the primary key's capabilities. The ~key expires after~ subpacket indicates when the key expires. An expiration can be extended by creating a new self-signature with a later expiration time. Note: the expiration time is relative to the key's---not the self-signature's---creation time. And, the remaining subpackets describe user and implementation preferences. ~pref-sym-algos~, ~pref-hash-algos~, and ~pref-zip-algos~ specify what symmetric, hash and compression algorithms, respectively, the user's OpenPGP implementation supports, and the user wants when using this user ID. ~features~ describes what advanced features the OpenPGP implementation supports. Currently, there is only one flag defined, which indicates that the OpenPGP implementation supports the MDC system. And, ~keyserver preferences~ is a set of flags indicating how the key server should handle the key. With the exception of the ~issuer key ID~, all of the subpackets are prefixed with ~hashed~. This indicates that this data is part of the signed data. Subpackets that are not hashed are considered advisory, because an attacker may modify them without detection in transit. ***** Public Subkey Packet The public subkey packets follow the user ID packets. Other than their type, these packets are effectively identical to the public key packet. ***** Public Subkey Self Signature Like user ID packets, a public subkey packet requires a self-signature to validate the key and bind it to the primary key. Typically, a subkey packet contains just a few pieces of meta-data, because preferences are stored in user ID self signatures. There are two minor differences, which are worth pointing out. First, whereas the ~sigclass~ field for user ID is ~0x13~, the ~sigclass~ for public subkeys is ~0x18~. Second, if the subkey is signing capable, then the self-signature must also have a so-called /back signature/ in an embedded signature subpacket created by the signing key over the primary key and the subkey. Obviously, this back signature should not be created for an encryption key based on the aforementioned attacks. *** Key Signing OpenPGP allows users to validate each other's keys using signatures. Thus, if Romeo is convinced that Juliet controls the key ~0x4954FDC67A46B4C5~, then he could certify it (i.e., sign it) using his OpenPGP key. There are two main reasons why Romeo would want to certify someone's key. First, a certification mechanism of this sort enables the OpenPGP implementation to determine whether a key is valid. This information is critical when Romeo wants to verify a signed document. In that case, Romeo is not just interested in whether the signature is mathematically valid, and the data has not be corrupted in transit, but he also wants to know whether the signature was really created by Juliet. Unfortunately, there is no way for computers to figure this out without some help from users. Likewise, when Romeo sends an email to Juliet, he wants to be confident that he is really using Juliet's key. It is completely possible that Romeo could have a key that allegedly belongs to Juliet without realizing it (anyone can create a key with any user ID, and upload it to the key servers). The other reason that a signature is useful is that it provides a mechanism for Romeo's contacts to indirectly verify Juliet's key. That is, when Romeo shares this signature with others (e.g., by publishing it on a key server), then people who trust him (and this is essential!) to validate other people's keys, i.e., to be a so-called /trusted introducer/, could use this signature to find a valid key for Juliet. The network induced on the signatures is referred to as the web of trust although it would be more accurate to refer to is as the web of verifications. Unfortunately, publishing signatures has the unfortunate side-effect of making the user's social graph public. This can have grave implications beyond the privacy concerns. For instance, it could be used to link a source to a journalist. **** Local Signatures If a signature shouldn't be published, it is possible to mark it as being unexportable. To do this, one would create a local signature. This is done in GnuPG by using ~--lsign-key~ instead of ~--sign-key~ to sign the key. At a technical level, this causes an ~Exportable Certification~ subpacket to be included in the signature with the value of ~0~. Unfortunately, using local signatures is not without problems: it is possible to export local signatures and accidentally upload them to a key server, and the key server implementations do not automatically strip local signatures on import. **** Confidence When someone verifies a key, she doesn't always have the same degree of confidence that the verification is correct. For instance, when Romeo signs Juliet's key, he is almost certainly convinced that Juliet really controls the stated key. On the other hand, if Romeo is at the pub and meets Iago, and he asks him to sign his key, Romeo is almost certainly less confident that Iago controls the stated key. This is the case even if Iago shows him his government issued identification papers. And, it is also the case if he sends an encrypted email to the email address in Iago's user ID, and receives a signed reply with a shared secret code. OpenPGP provides a mechanism for expressing different degrees of confidence in the form of three confidence levels ranging from "the person said she controls the key" to "I'm confident she controls the stated key" as well as a generic, "no comment," level. Other than completely ignoring the weakest certification level, this information is not included in web of trust calculations by GnuPG. Thus, for all intents and purposes, it is just gratuitous meta-data. As such, it is better to always use a generic certification level\nbsp{}[[cite:gillmor-cert-level]]. This is what GnuPG does by default. **** Trusted Introducers When signing a key, it is possible to indicate that the key holder should be a trusted introducer. For instance, an organization may have a single key, say ~pgp@company.com~, that they use to sign all of their employees' keys. If employees sign ~pgp@company.com~ using a trust signature, then anyone who trusts, say, ~alice@company.com~, will, as usual, consider ~pgp@company.com~ to be not only verified, but, due to the trust signature, a trusted introducer. Consequently, that person will also consider any keys that ~pgp@company.com~ signed to be verified, which, in this case, is everyone in the company. The following example illustrates this idea: #+BEGIN_EXAMPLE juliet@ alice@ pgp@ bob example -- tsign --> company -- tsign --> company -- sign --> @company .org .com .com .com #+END_EXAMPLE In GnuPG, Juliet doesn't actually have to use a trust signature to sign ~alice@company.com~'s key: she can just use a normal signature and then set the ~ownertrust~ for ~alice@company.com~ appropriately. Trust signatures are very powerful and can also be very dangerous. If Romeo considers Juliet to be a trusted introducer, and Juliet has ~tsign~ ed her father's key, then any key that Juliet's father signs will be considered verified. Juliet's father could abuse this fact to trick Romeo into trusting a key that he forged for Juliet. Trust signatures can be constrained. For instance, in the above example, Alice probably wants to limit the scope of her trust signature of ~pgp@company.com~'s key to just those user IDs associated with ~company.com~. To support this, OpenPGP allows a regular expression to be associated with a trust signature. A trust signature can also make not just immediate connections trusted, but also indirect connections. This is extremely dangerous and probably only makes sense in very limited situations. For instance, in a very large company, each department might have the equivalent of the above ~pgp@company.com~ key, and there is a company-wide key that ~tsign~ s each department's key. In this case, Alice might sign the company-wide key with a depth of ~2~ instead of ~1~. (When Alice uses a trust level of ~1~, she means that anyone that the company verifies is considered verified. A trust level of ~0~ is equivalent to a normal signature; it doesn't create any trusted introducers.) In GnuPG, it is currently not easy to modify a signature. For instance if you want to convert a normal signature into a trust signature, ~gpg~ will complain that the key is already signed. To change a signature type or modify a trust signature, it is first necessary to revoke the existing signature using the ~revsig~ command in the ~--edit-key~ interface. **** Non-Revocable Signatures Occasionally, it can be useful to make a long-term commitment to a signature. This can be done by setting the non-revocable flag. In GnuPG, this is done using the ~nrsign~ command in the ~--edit-key~ interface. **** Example The following example shows Juliet's key including Romeo's signature of her key. #+BEGIN_EXAMPLE $ gpg --export juliet | gpg --list-packets # off=0 ctb=99 tag=6 hlen=3 plen=269 :public key packet: version 4, algo 1, created 1499443081, expires 0 pkey[0]: [2048 bits] pkey[1]: [17 bits] keyid: 4954FDC67A46B4C5 # off=272 ctb=b4 tag=13 hlen=2 plen=41 :user ID packet: "Juliet Capulet " # off=315 ctb=89 tag=2 hlen=3 plen=340 :signature packet: algo 1, keyid 4954FDC67A46B4C5 version 4, created 1499443081, md5len 0, sigclass 0x13 digest algo 8, begin of digest 59 1a hashed subpkt 33 len 21 (issuer fpr v4 E5156E507DCB8D63AC89E5334954FDC67A46B4C5) hashed subpkt 2 len 4 (sig created 2017-07-07) hashed subpkt 27 len 1 (key flags: 03) hashed subpkt 9 len 4 (key expires after 2y0d0h0m) hashed subpkt 11 len 4 (pref-sym-algos: 9 8 7 2) hashed subpkt 21 len 5 (pref-hash-algos: 8 9 10 11 2) hashed subpkt 22 len 3 (pref-zip-algos: 2 3 1) hashed subpkt 30 len 1 (features: 01) hashed subpkt 23 len 1 (keyserver preferences: 80) subpkt 16 len 8 (issuer key ID 4954FDC67A46B4C5) data: [2047 bits] # off=658 ctb=89 tag=2 hlen=3 plen=307 :signature packet: algo 1, keyid 6B284A5BE2297415 version 4, created 1499445515, md5len 0, sigclass 0x10 digest algo 8, begin of digest c6 a3 hashed subpkt 33 len 21 (issuer fpr v4 D6636A9EB82A91E94DDEE5066B284A5BE2297415) hashed subpkt 2 len 4 (sig created 2017-07-07) subpkt 16 len 8 (issuer key ID 6B284A5BE2297415) data: [2046 bits] # off=968 ctb=b9 tag=14 hlen=3 plen=269 :public sub key packet: version 4, algo 1, created 1499443081, expires 0 pkey[0]: [2048 bits] pkey[1]: [17 bits] keyid: C1A010A1D38C4BB8 # off=1240 ctb=89 tag=2 hlen=3 plen=310 :signature packet: algo 1, keyid 4954FDC67A46B4C5 version 4, created 1499443081, md5len 0, sigclass 0x18 digest algo 8, begin of digest ee 3f hashed subpkt 33 len 21 (issuer fpr v4 E5156E507DCB8D63AC89E5334954FDC67A46B4C5) hashed subpkt 2 len 4 (sig created 2017-07-07) hashed subpkt 27 len 1 (key flags: 0C) subpkt 16 len 8 (issuer key ID 4954FDC67A46B4C5) data: [2047 bits] #+END_EXAMPLE The listing follows the usual format described above. The first packet is the public key packet, which is followed by a user ID packet and its self signature. And, at the end comes the subkey key and its self signature. There is one small difference, however. In this listing, Juliet's user ID is followed by not one, but two signatures. And, the second one is not a self-signature, but Romeo's certification signature: we can see from the ~issuer fpr~ subpacket that Romeo, not Juliet, created this signature. There are two important things to observe here. First, Romeo's signature is associated with Juliet's key, not his key. Once it is clear that the signature says something about Juliet's key and not Romeo's, this makes sense. Nevertheless, many beginners don't understand this and think that they somehow own the signature. Unfortunately, this arrangement can lead to denial of service attacks. For instance, vandals could create many signatures on a particular key so that it becomes so large that it can't be imported. Second, certification signatures are associated with user IDs and not with keys. This avoids bait-and-switch type attacks. Consider Paris who convinces Romeo to sign his key. If Romeo signed the key, and not the user ID, then Paris could simply revoke the user ID and replace it with another, say, Juliet's. Since Romeo would still consider the key to be valid, Paris could possibly trick him into believing a message from the key is from Juliet. *** Revocations If a key has been compromised or simply retired, it is essential to revoke it so that other people don't accidentally use it. It is also important to revoke a user ID if the identity is no longer valid, e.g., when leaving an organization, but keeping the same key. Occasionally, it can be useful to revoke a user ID certification. For instance, you should revoke a certification if: you find out that you signed the wrong key; the person who controlled the key somehow lost control of it (e.g., he forgot the password, and doesn't have a revocation certificate); or, you find out that you signed an impostor's key. The following example shows what Juliet's key looks like when she revokes her own key (the output has been truncated): #+BEGIN_EXAMPLE $ gpg --gen-revoke juliet | gpg --import ... $ gpg --export juliet | gpg --list-packets # off=0 ctb=99 tag=6 hlen=3 plen=269 :public key packet: version 4, algo 1, created 1499443081, expires 0 pkey[0]: [2048 bits] pkey[1]: [17 bits] keyid: 4954FDC67A46B4C5 # off=272 ctb=89 tag=2 hlen=3 plen=310 :signature packet: algo 1, keyid 4954FDC67A46B4C5 version 4, created 1500052199, md5len 0, sigclass 0x20 digest algo 8, begin of digest 04 ca hashed subpkt 33 len 21 (issuer fpr v4 E5156E507DCB8D63AC89E5334954FDC67A46B4C5) hashed subpkt 2 len 4 (sig created 2017-07-14) hashed subpkt 29 len 1 (revocation reason 0x02 ()) subpkt 16 len 8 (issuer key ID 4954FDC67A46B4C5) data: [2048 bits] # off=585 ctb=b4 tag=13 hlen=2 plen=41 :user ID packet: "Juliet Capulet " ... #+END_EXAMPLE The revocation is the second packet. It is a self signature on the primary key. We know that the packet is a revocation certificate based on the ~sigclass~ (~0x20~) as well as the ~revocation reason~ subpacket. The ~revocation reason~ allows the user to say why the key is revoked. Here, the value is ~0x2~, which means that the key was compromised. This subpacket can also include a human-readable string. In this case, Juliet did not provide any additional information. But, in the case that the key is being rotated, it might be helpful to include the new key's fingerprint. Of course, this is of limited use, since it is not machine readable. *** Notations RFC\nbsp{}4880 allows signatures to contain arbitrary data. This mechanism can be extremely useful for extending the OpenPGP system. But, despite its availability, they aren't generally used. One example of how they could be used was considered by the Debian project, which thought about using notations to store additional information about how a developer's identity was checked\nbsp{}[[cite:debian-notations]]. Notations are key value pairs. The key must be of the form ~key@example.com~. The domain is included to avoid naming conflicts. Although the value can be any arbitrary data, GnuPG currently only supports free-form strings. One limitations of notations is that as they are stored in signature subpackets, they must fit into the 64 kilobytes of space available to signature subpackets. (Strictly speaking, the hashed area is limited to 64 kilobytes of subpackets and the unhashed area has the same limitation, but using the unhashed area is not advisable.) *** Summary This chapter has presented the important details of the OpenPGP standard. This introduction wasn't intended for someone who is planning to write an OpenPGP parser, but to provide a rough overview of the system. Many details have been omitted, as well as several minor features (yes, for better or worse, OpenPGP is that feature rich). For those looking for more information, the RFC is probably the best place to start: it is highly readable, and this introduction should hopefully make it easy to navigate. +** Passwords + +What are passwords used for (symmetric encryption and protecting +private key). Passwords are not used to protect asymmetric +encryption. The reason for having a password is to protect the key if +the device is compromised (e.g., malware or stolen). Thus, a weak +password does not mean weak transport security; the security of the +transport is the e.g. RSA encryption. If threat model is typical of a +private individual, then using a password manager and a relative weak +password is acceptable. + +How to generate a strong password: need to be able to measure entropy. +Long passphrase doesn't mean anything: if it is a line from a song, it +is probably weak. NSA probably tries all of Wikipedia in various +forms in the first few hours of trying to crack your password. The +only secure way is to use diceware. + +Snowden: "Assume your adversary is capable of one trillion guesses per +second." To withstand one year, need 65 bits of entropy! How to +measure a password's entropy? Need a random password. But that's +impossible to memorize. Unless we encode it smartly! + +*** Diceware + +Encode using a simple word list + + - /dev/random? 1k words (10-bits entropy per word) + - dice? $6^4 = 1296$ words (10.3-bits entropy) + +Secure even if adversary knows the word list! + +Examples: + 1. able + 2. about + 3. above + + Required length: + 80 bits = good = 8 words + 120 bits = strong = 12 words + + Examples: + - percent burst able smash opposite ready blind stab + - pipe after harm person split seize radar about + +Word lists: Diceware (8k). PGP Biometric word list (512). Voice of +America's simple English word list (1.5k) + +** Key Generation + +On the surface, generating a key is easy: you just need to call ~gpg +--gen-key~. For many users, this is appropriate. But, if you have an +elevated threat model, this is probably not the right approach. + +*** Private Key Management + +- Low security - Online key. Default for --key-gen. (When is this + okay. Laptop encrypted. If keys were stolen by an unknown, + unlikely to be cracked. Must trust all local software.) + +- High security - Use a smartcard + + Key stored on a smartcard (GnuK, Nitro, etc.) + + Should use subkeys (Setup slightly more complicated). + + Should store backups on a USB stick, because you can't export private + key from smartcard. Also what happens if it the SC gets lost. + + Much higher security: Crypto can only be done when key is inserted, + but, often not obvious what the operation is. + + Note: easier to explain crypto when using a smartcard + + +Use tails: Hardened. Wipes memory on shutdown. + +Managing the key: + + - Boot from a USB stick? Medium Security: BIOS might be infected, + etc. + + - Dedicated offline computer? High security, but still susceptible + to Bad USB! Old IBM x40 or x60 costs <50 Euros on ebay Remove + wireless network card! + +*** Key Rotation + +Do set an expiry. (GnuPG defaults to 2 years when creating new keys +these days.) This guarantees an eventual revocation. People forget +their passphrase and they don't backup their revocation certificate. +This is a nice emergency brake. + +Alternative: set a designated revoker. Describe what that means and +the risks. Show example of how to do it. + +Note: Can easily extend expiration. Expiration bonus: forces people +to refresh keys. + +Note: When generating a new key, cross sign the keys; revocation +message is just for humans. + +How to approximate forward secrecy. + + +** Validating Keys + +Why is validation important? (What is a MitM attack? Why can't keys +be validated by a machine?) + +How validation works on the web: x509---centralized and completely +broken. + +Traditional way to do this in the OpenPGP world is to use WoT. +Describe how the WoT works. Talk about why it is hard to use: Key +signing parties are for geeks. Even exchanging fingerprints in person +is inconvenient. + +Alternatives? If you don't already have the key on a business card, +just pick up the phone (note: ~--with-icao-spelling~). Talk about why +using the same medium for getting fingerprint is not good. If you +want to send an email, then it might be reasonable to use, say, +twitter direct messages to boot strap a conversation. Both are *much* +more secure than no check. How to sign (~--sign-key~ and +~--lsign-key~). + +Talk about TOFU as an alternative. It's limitation. Nevertheless, in +practice, probably good. New trust model (since v2.1.10, Dec. 2015). +Checks identity / key consistency. Model used by ~ssh~. No user +support required. To enable, add the following to ~gpg.conf~: +~trust-model tofu+pgp~. Can also set ~tofu-default-policy good~. + +Talk about direct trust and trust always and what they are good for. + +Talk about how to verify: a specific short key ID can be faked in just +a few seconds. Even a long key ids are not immune to collisions. +Talk about evil32 / scallion tool. + +*** Key Discovery + +How do you find a key? Traditionally, there are two ways: either via +a business card or web site or by looking on a key server. The former +is good, but inconvenient the latter is very, very bad. Key servers +are not trusted. Anyone can forge a user id, etc. Talk about WKD and +how it works. Given examples of how to deploy WKD. + +Talk about keybase.io, Autocrypt, and pEp. + +** GnuPG's Architecture + +Talk about GnuPG's split architecture. Explain why it is important to +separate gpg from gpg-agent, scdaemon, pinentry and dirmngr: +Components in their own address spaces, which reduces impact of bugs +(think heartbleed). + +This is different from 1.4. + +GPG is for low security---session encryption, encoding, etc. GPG +Agent for security operations: password manager, private keys, etc. +Similar similar to PC and smartcard. (In fact, possible to run +gpg-agent as a different user id or on a different machine.) + +Smartcard Daemon: Interacts with smartcards (directly or via PC/SC). +Note: typically packaged separately as ~scdaemon~. + +Pinentry: for interacting with the user (not only passwords, but also +questions). Started by gpg-agent. Talk about trusted windows and why +it is important. Several different implementations provide tighter +integration with a desktop environment. But, these come at a cost to +security (much more complicated). + +Pinentry fallback if there is no GUI. How to set ~GPG_TTY~ and why it +is necessary. Talk about different pinentry configuration options and +what they are good for. In particular, talk about how password +caching works. Talk about ~gpg-preset-passphraase~ and ~--loopback~ +related stuff in this context. Also talk about ~--keep-tty~ / +~--keep-display~ + +Directory manager is really the network component. Interacts with +keyservers (HKP, ldap, http) (~--search-keys email@example.com~, +~--recv-key keyid~). Certificate and CRL cache for gpgsm. Talk about +different options. In particular, how to best configure ~dirmngr~ +with ~use-tor~, etc. + +Give some details about the different sockets. + +*** gpg-connect-agent + +What it does (communicate with the different components). How to use +it. Fact that it exposes a command line interface. Use help to +figure out what to do. + +Show how to script with gpg-connect-agent, e.g., shutting down a +server. + +*** signals + +Talk about how e.g. SIGUSR1 can be used to cause gpg-agent to dump +debugging information. + +*** Assuan + +Talk about Assuan. IPC protocol Pipe / socket based. Very simple, +text-based interface. No interface definition language (IDL). Show +example of a pinentry session calling ~getpin~. +Can use ~gpg-connect-agent~ to connect to the running GPG Agent. + +Assuan is a separate package from gpg. Anyone can use it. + +*** Debugging + +Due to the distributed nature of the architecture, it can be hard to +figure out what went wrong (error messages become more generic as they +are passed further along the stack). + +~watchgnupg~ helps. Tool for gathering log entries. + +In \texttt{gpg-agent.conf}, add: + + - ~log-file socket:///home/USER/.gnupg/S.log~ + - ~debug-level basic~ (or advanced or expert) + +Run: ~watchgnupg -\:\!-force /home/USER/.gnupg/S.log~ + +How to setup a test environment. (Talk about ~GNUPGHOME~ / +~--homedir~ and where the daemons live.) + +*** configuration + +Talk about gpg.conf, gpg-agent.conf, etc. + +Talk about ~gpgconf~. + +** Good Practices and Tips + +*** Refresh keys. + +When you get a signed message, fetch the key. + +Refresh keys regularly. Why? New preferences. Revocation +certificates. WoT updates. + +Note: Don't use ~gpg --refresh-keys~. Install parcimonie. Uses tor. +Random intervals between each key refresh reduces chance of targetted +attacks and leaking who you are sending messages to. + +*** Key Disclosure + +What to do if you have to disclose the encryption key for a message? +Don't disclose your private key! This allows decryption of all +messages. Just disclose the session key. Show example of +~--show-session-key~. + +*** Backups + +Don't backup the RNG's seed! Exclude ~.gnupg/random_seed~ from backups! + +*** ssh + +Keys Instead of Passwords. Using keys means password is not sent to +server. Ever enter password for a different server? You've just +disclosed your password! + + +OpenSSH stores private keys on hard drive. Keys are protected by a +passphrase. Passphrase is cached by ssh agent. + +GnuPG implements the ssh agent protocol. GnuPG can use keys stored on +a smart card. + +GnuPG's ssh agent: configuration: + +Set ~SSH_AUTH_SOCK~ in ~.bashrc~: + +#+BEGIN_EXAMPLE +export SSH_AUTH_SOCK=$HOME/.gnupg/S.gpg-agent.ssh +#+END_EXAMPLE + +Add ~enable-ssh-support~ to ~~.gnupg/gpg-agent.conf~. Restart gpg +agent. Add public key to ~.ssh/authorized_keys~ file. public key +obtained by doing: + +#+BEGIN_EXAMPLE +$ ssh-add -L +ssh-rsa AAAAB3NzaC1...zyt cardno:000603016636 +#+END_EXAMPLE + +*** Remote gpg-agent + +gpg can use a remote gpg-agent. Running on another computer or as a +different user. + + - Create a new user, ~gpg~ + - On secure pc, add the following to ~.gnupg/gpg-agent.conf~: + +#+BEGIN_EXAMPLE +extra-socket /home/gpg/.gnupg/S.gpg-agent-remote +#+END_EXAMPLE + +On insecure pc, run the following to forward the port: + +#+BEGIN_EXAMPLE +$ ssh -f -o ExitOnForwardFailure=yes -o StreamLocalBindUnlink=yes \ +> -L /home/neal/.gnupg/S.gpg-agent:/home/gpg/.gnupg/S.gpg-agent-remote +> gpg@localhost bash -c '{ while sleep 5; do echo NOP; done } | gpg-connect-agent' +#+END_EXAMPLE + +Requires at least version 6.7 OpenSSH, which supports forwarding Unix +Domain Sockets. + +Note: If forwarding fails, exit. If the socket to be forwarded +already exists. + +Forwards the file ~.../S.gpg-agent~ on the insecure host to the file + ~.../S.gpg-agent-remote~ on the secure host. + +Note: ssh won't expand tildes. + +Loop keeps connection opened and port forwarded. (Could also use +~autossh~.) Exits when gpg-agent exits. ** MUA Integration This chapter provides some guidelines on integrating GnuPG into a mail user agent (MUA). These are recommendations based on our experience. At the time of this writing, KMail and Enigmail probably have the best GnuPG integration. This is not to say that the user interface could not be improved, however, these are reasonable starting points for thinking about how to integrate GnuPG support into an MUA. *** Key Generation Generally speaking, there are two ways to integrate GnuPG support into an MUA: GnuPG can either be supported natively (e.g., KMail, and Claws), or it can be added via a plug-in (e.g., gpgol, Enigmail, and GPGTools). When GnuPG support is added via a plug-in, it is reasonable to prompt the user to create a key when the add on is started for the first time. If GnuPG support is native, but it is not expected that nearly all users will use it, we recommend not automatically starting a key generation wizard. Instead, the wizard should be accessible via a menu. This doesn't mean that if the user has not configured GnuPG, that all relevant functionality should be hidden. In contrast, we strongly recommend that the mail composition window include an "encryption" button. If the user clicks on the encryption button, and encryption support has not yet been configured, then the key generation wizard should be started. Similar, when viewing a message, the message should be displayed as being insecure. If the user clicks on the insecure notice, the MUA should explain why the message is considered to be insecure, and provide an option for the user to configure GnuPG support. This significantly improve discoverability. The key generation wizard should not only prompt the user to generate a new key, but also provide an option to import an existing key. It is recommended that the key generation wizard look for an existing key using WKD and the key servers. If there is, then the user should be asked whether they perhaps have used GnuPG support in the past and would like to import the key. The key generation wizard should default to the user's default user ID, but it should show something like a combo box to allow the user to select a different user id. Although it is possible to associate multiple user ids with a key, the GnuPG project recommends using one key per user id as this simplifies the user interface and it is essential that the initial setup phase be as simple as possible. It also simplifies revocation: it is apparently easier for users to grasp revoking a key vs.\nbsp{}revoking a user id. If users are expected to be more technically sophisticated, then it is reasonable to provide a series of check boxes, one for each configured email address. In either case, there should be a menu option to add an additional user ID to an existing key. The key generation wizard should prompt the user for as little information as possible, and use as many defaults as is reasonable. This means that the user should not be prompted for a comment, which is almost never appropriate\nbsp{}cite:gillmor-user-id-comments. Likewise, key generation parameters including the key length should not be shown. If it is deemed absolutely necessary that the user be allowed to tweak this setting, then they should nevertheless be hidden unless the user explicitly enables some sort of expert mode. Nevertheless, using a 2048-bit RSA key is currently considered sufficiently secure. And, if more security is required, we recommend that users consider focusing on other aspects of their operational security rather than increasing the key size. The easiest and probably most effective improvement is to use a smartcard: Bruce Schneier argues that Snowden's leaks show that the NSA has not broken strong cryptography, but instead attack the infrastructure and the endpoints\nbsp{}cite:schneier2013staying-secure. There are practical reasons for not using an overly large key. Perhaps the most important one is simply based on performance: it does not take twice as long to verify a signature generated with a 4096-bit RSA key than one generated with a 2048-bit RSA key, but about an order of magnitude longer. This performance penalty becomes particularly noticeable for 16384-bit keys. If a key has been created, but the user clicks on encrypt while using an email address that does not have an associated key, the key generation wizard should be shown. In this case, it might be reasonable to prompt the user to add the email address to an existing key. But again, this requires educating the user and it appears to be simpler to just default to creating one key per email address. *** Send Mailing The message draft composition window should have a button to secure the message. There should be a menu attached to the button, which allows the user to choose between "Encrypt and Sign", "Sign-only", "Encrypt-only" and "Do not encrypt or sign." This menu could be activated by a long press. However it is activated, it should be reserved for advance users; normal users should only have to consider two options: secure, in which the message is encrypted and signed, and insecure, in which the message is neither encrypted nor signed. The reason for this, is that users have been trained by web browsers to think of an encrypted connection as not only providing confidentiality, also authentication. The value of attempting to reeducate users here is questionable, given that both are normally wanted anyway. In some cases, it may make sense to always enable encryption by default, and then require that the user explicitly disable it if not desired. This avoids mistakenly sending an message insecured when it should be secured. However, this default can be annoying for users who normally do not use encryption. A MUA can deal with this dilemma more intelligently by considering the context. For instance, if the user is replying to an encrypted message, then encryption should be enabled, and the user should have to explicitly disabled it if it is not desired. In fact, in this case, showing a warning is advisable. Similarly, if the user has recently sent an encrypted mail to a user, or there is a verified key available for the user, then encryption should be turned on. It is possible to go further and to automatically attempt to find keys using ~gpg~ 's auto key locate feature. If the key is retrieved using WKD and the user is using the TOFU model, it is probably reasonable to use the key without requiring additional input from the user. If the key is fetched from a key server based just on the email address, then the key should probably be considered invalid. When the user is adding recipients to an email, it is a good idea to display an icon next to each email address showing via the color and the picture whether any associated key is verified. If the user hovers over the icon or clicks on it, a short (at most, tweet length) message should be displayed explaining why the key is considered verified (or not). If a key is untrusted, there should be an option to verify the key now. This should show a key verification wizard, which is explained in the key management section below. In general, keys that are marginally trusted, should not be considered verified. If this behavior is desirable, it is better to change the ~marginals-needed~ option to ~1~. When sending a mail, if there are any ~bbc~ recipients, the MUA should create a separate mail for each user. Although it is possible to hide a recipient's key ID in a message by using a speculative key ID, this still reveals to the intended recipient that the message may have been copied to other users. Using separate emails avoids this leak. In general, if a draft is saved on the IMAP server, it should be encrypted to the user (and not to any recipients, who should only be able to decrypt a final version). It may be reasonable to relax this requirement to only encrypt drafts that have been marked as mails that should be encrypted. However, this should only be done in cases where it is clear that the user is relying on encryption for privacy purposes and not for security reasons. When sending a mail, it is important to also encrypt the mail to the user. Most users expect to be able to decrypt mails that they send. This can be done using ~gpg~ 's ~encrypt-to~ option, or, when encrypting an email, the sender can be specified explicitly. It can be very useful to have an option to attach the user's public key to an email. Receiving a key can be surprising to users who don't user or know about GnuPG. But, if you are encrypting or signing the email, this is not a concern: you know the recipients' MUA understands OpenPGP. When attaching a key, it is reasonable to just include a minimal version of the key. In particular, you don't need to include any certifications: once the recipient has your key, it is easy to get the rest of the data from a key server. A minimal key can be created by specifying ~--export-options export-minimal~. *** Reading Mail **** Verifying Messages When a user views an email, it is important to communicate whether the contents were transferred in a secure fashion. In web browsers, this type of information is usually shown using a small padlock icon in the address bar. Firefox, for instance, shows a green padlock if it transferred the website in an encrypted manner, /and/ it could authenticate the end-point. It uses a gray padlock with a yellow warning triangle if some---but not all---of the content was encrypted, and eavesdropping was possible, or if the website used a self-signed certificate. It uses a gray padlock with red strikethrough if a man-in-the-middle attack was possible. And it just shows a neutral, "more information" icon if TLS was not used at all\nbsp{}cite:mozilla-padlock. There are two important issues with this scheme. The first issue is that this scheme conflates encryption and authentication. Although it might be reasonable to demand that websites that use authentication also use encryption to be considered secure---it simplifies user training, and doesn't impose a significant deployment cost---this argument doesn't apply in an email setting. Consider, for instance, a company that wants to sign all of its outgoing emails to help mitigate phishing. In this scenario, encryption is more of a hinderance than a help: requiring encryption would mean that the company would have to somehow find the right encryption key for each of its correspondents. When only providing an authentication mechanism, not only are the customers' keys not required, the customers don't even need to have a key: they just need the ability to validate the signature. Second, a TLS connection that can't be authenticated is considered worse than a connection that is completely insecure. For instance, until the recent introduction of /Let's Encrypt/, website operators who didn't want to pay for a certificate sometimes used a self-signed certificate to improve their website's security. Although data protected by such certificates is not secure in the sense that the end point can't be authenticated, such certificates enable encryption, which do protect users from passive surveillance. In other words, self-signed certificates provide more protection than nothing at all, but websites that use self-signed certificates are shown as being less secure than sites that use no protection at all! Happily, at least the Chrome browser does not make this distinction. Likewise, we strongly recommend that whatever mechanism is used to show that an mail can't be authenticated be applied to both unsigned mails, and mails with a signature that can't be verified. Specifically, we recommend considering a traditional email to be the baseline, and that an email is never displayed in such a way that the user would consider it to be less secure than the baseline, unless there is strong evidence of an attack. It is reasonable to show unverified messages and unsigned messages is a neutral manner, and to show verified messages in a positive manner. However, it may also be reasonable to show unverified messages and unsigned messages in a negative manner. This is how MS Outlook behaves when S/MIME is enabled. This has the added advantage that it may prompt the user to learn why a message is shown as being unsafe. The first step to checking whether a signature is authentic, is to check whether the signing key is verified according to some trust model, e.g., the web of trust. When verifying an email, another step is required: it is also necessary to make sure that the key is controlled by the sender. This can be done by checking that the email address in the email's ~From~ header actually appears in one of the key's verified user IDs. This is necessary to prevent an attacker from reusing a message in a different context. For instance, assuming Romeo trusts his father, his father could write an email that appears to come from Juliet, but sign it with his own key. If we didn't check that the From header and the signer field agree, then Romeo would see that the message is verified. Just checking that the sender matches a verified user ID is not actually enough to prevent replay attacks. It is also necessary to make sure that the embedded timestamp is similar to (i.e., within a few hours of) the email's timestamp. If the timestamp in the email is years later than the one embedded in the signature, then the email may be part of an attempted replay attack. Similarly, it is possible to change the recipient. For instance, Juliet might send the following signed message to Paris: "Go away, I do not love you!" But, Paris, realizing that Romeo and Juliet are in love and hope to trick Romeo, might simply send a copy of the message to him with the From header set to Juliet. These types of attacks can be mitigated by also verifying the mail headers, and the Memory Home project was start to do this\nbsp{}cite:memory-hole. Unfortunately, the standard was never fully developed. Nevertheless, there is enough information there to understand the intent, and several mail clients including Enigmail and Mailpile implement it. The web of trust provides three verification levels: a key can either by fully verified, marginally verified or not verified. (Note: normally "trusted" is used here instead of "verified." We reserve the term trusted for when a key is not just verified to be controlled by the stated entity, but may act as an introducer.) And, the TOFU trust model provides even finer grain verification levels. This granularity conveys information, and, as a rule of thumb, marginally verified keys should *not* be shown as having the same level of security as fully verified keys. Instead, data signed by fully verified keys should be shown in, say, green, and data signed by partially verified keys should be shown in, say, yellow. If it is desirable that marginally verified keys have the same security level as fully verified keys, then it is better to explicitly set the ~marginals-needed~ option in the user's ~gpg.conf~ file to ~1~. Each verified component should have its own lock icon, and there should be a menu associated with the icon. This menu should very briefly explain how the trust was calculated. It should also offer the user the option of verify the key. If the user selects this option, then a verification wizard should be shown. This wizard should prompt the user to check the key as it appears on a business card, or to call the user. To make sure the user doesn't simply click on yes without checking the key, it is possible to show pieces of the fingerprint mixed with invalid pieces, and have the user select the right ones. This is clearly less rigorous than a full check, but it is much better than nothing. Other options for verifying fingerprints include scanning something like a QR code. **** Multi-part Emails Thanks to inline signatures, it trivial to make a message that is only partially verifiable. For simplicity's sake---we don't want to confuse the user---it is tempting to treat such messages as insecure like web browsers do. However, some companies, and some mailing lists automatically append a footer to all messages. This modification would change a message that is otherwise completely verifiable to one that contains a part that isn't signed. Thus, messages coming from these sources would never show up as secure. A simply solution is to mark each section individually. This can be done by using a special background, or a frame. Ideally, the background should be unique for each user to prevent a mimicry attacks in which the attacker sends an unsigned message that appears to be signed. MIME, however, makes it easy to transfers rich content that can logically consist of multiple pieces only some of which are signed. Thus, it is entirely possible that a single logical entity might include some content that is verified and some that is not. This suggests that if a message includes at least one verified part, then the MUA should only show those parts that are verified, and warn the user that the message contained unverified content that is hidden. It is reasonable that if the warning include an option to show the unverified parts. At that point the message should be displayed as insecure. **** Importing Keys It is not unusual to receive a signed message, and the signing key is not locally available. In this case, the MUA should fetch the signing key in the background. This isn't a problem in terms of trust; the keyring is not trusted. A key is considered verified based on the trust model its parameters. For instance, in the case of the web of trust, signatures and GnuPG's ownertrust setting are used to determine which keys are considered verified. This behavior can also be enabled by setting GnuPG's ~auto-key-retrieve~ option. This option has the disadvantage that it can potentially block the ~gpg~ process for a relatively long time. There are two potential disadvantages to automatically importing keys. The first disadvantage is that automatically fetching data via the network can be used as a back channel. A sophisticated attacker could create a new key for each message. When a user fetches the key, then if the attacker see the traffic, she knows not only that the user opened the message, but also the user's IP address. This attack can be mitigated by routing traffic via Tor (to do this, Tor must be installed and enabled by adding ~use-tor~ to ~dirmngr.conf~). Using Tor not only hides the user's IP address, but it also requires the attacker to actually control the user's preferred key servers to observe the fetch. The second disadvantage is GnuPG doesn't handle very large key rings (those with thousands of keys) very well. This can be partially mitigated by deferring any trust calculations by setting ~no-auto-check-trustdb~ in ~gpg.conf~ and then running ~gpg --check-trustdb~ from cron. However, the only long-term fix is to improve the way that keys are stored on disk. If a message is not encrypted or signed, the user may still indicate that she has a key. This can be done using a special mail header\nbsp{}cite:smasher2014openpgp-mail-header. In such cases, it is reasonable to proactively fetch such keys. Again, keys available local are not treated specially, but it provides a hint about what the right key may be. Autocrypt uses a similar signaling mechanism, but includes significantly more machinery\nbsp{}cite:autocrypt. Sometimes users include their own or other people's keys as attachments. The latter is particularly common as a way to introduce people in a multiple-party discussion. And, pEp does this as a matter of course. Again, it is reasonable to import these keys with the aforementioned caveats. **** Unencrypted Version The OpenPGP email workflow assumes that messages are stored on an untrusted server and thus continue to needed protection even after the mail has been delivered. This is one of the that OpenPGP doesn't provide forward secrecy. Nevertheless, it may be reasonable to cache the unencrypted version of messages locally if the user believes the device is secure. At a minimum, the user should have the mail stored in an encrypted partition. Stored the plaintext locally has two main advantages. First, the user only has to decrypt a given message once on each device. Second, it is possible to search the contents of the encrypted mail. The advantage of the latter should not be underestimated: the inability to search encrypted mail is a serious usability issue. *** Key Management We have already discussed several aspects of key manage above. Nevertheless, there are still several important points that need to be addressed. **** Ownertrust It is extremely important that the ~ownertrust~ be well hidden. In particular, it should only be possible to set the ~ownertrust~ if the key in question is already verified (e.g., signed). When the ~ownertrust~ option is shown, it should be well explained. Unfortunately only experts understand what "ultimately" trusted means. Further, making someone a trusted introducer requires understand that person's signing policy: does she just sign every key that seems reasonable so that messages show up as green? As it is so hard to get users to validate fingerprints, it is unlikely that we will convince them to discuss their security practices with each other. **** Address Book A key ring is effectively a backwards address book. It is backwards in the sense that keys are primary and identities are secondary. Consequently, it is better to integrate with the user's primary address book as much as possible. This is not only more familiar, but it doesn't require a paradigm shift. If the address book supports identities with multiple email addresses, then it should be possible to associate these with multiple keys. This is a convenient grouping function. *** TODO - kmail combines smime and openpgp in a single operation (e.g., signing). + +** Programming with GnuPG + +~--batch~, ~--status-fd~ and ~--command-fd~. + +Writing tests: use ~--faked-system-time~. (Talk about how it works.) +~gpg-compose~ for creating test data. + +Talk about GPGME. + +** Misc. + +Topics are probably better integrated someplace else: + +- gpgv + +- ~/etc/skel/.gnupg~ + +- keyring vs. keybox. Talk about kbxutil. + +- What's a keygrip. + +- More tools: encrypted mailing lists (schleuder), form encryption + (Kuvert), + bibliographystyle:unsrt bibliography:bib.bib # LocalWords: cryptographic decrypted decrypts reauthenticate hoc # LocalWords: decrypting constrainted TripleDES OpenPGP's subkey # LocalWords: packetized subkeys unforgable subpackets unexportable # LocalWords: introducer Introducers subpacket OpenPGP SED AES WKD # LocalWords: cryptosystem ESK auditable reencrypted decrypt HTTP's # LocalWords: unbuffered apriori plaintext encypted data's OPS's -# LocalWords: revoker verifications unhashed introducers +# LocalWords: revoker verifications unhashed introducers natively +# LocalWords: strikethrough MUA ownertrust Mailpile keyserver cron