Skip to content ↓

zero knowledge by Vincent H. '23

^ not you after reading this post!

one of the things i’m working on outside of class this semester is a zero-knowledge cryptography project with 0xparc, a crypto research organization. i got involved with 0xparc because it’s actually led by an mit undergrad; you can read more about the group on their website. my project is to produce a zk-snark implementation of an elliptic curve pairing; those words probably don’t make much sense to you, so here’s a short (hopefully) beginner-friendly crash course on what’s going on, explained using as little math as possible

i. zero-knowledge proofs


a zero-knowledge proof is essentially when you want to prove something about an object without revealing any other information about the object. for example:

  • i want to prove to you that my friend is 21 years old without revealing their name or drivers license or other personal information
  • i want to prove to you that i know the password to an account without revealing the password
  • i want to prove to you that i took a photo with a famous celebrity without showing you the photo

at first glance you might think these kinds of proofs are impossible, but when conducted in the right setting with the right assumptions they can actually be mass-produced. here are two examples of zero-knowledge proofs: 

  • my friend is red-green color-blind. i have a red ball and a green ball that are otherwise identical, and i want to prove to my friend that the balls have different colors
    • proof: i give both balls to my friend, one in each hand, leave the room, and give them the option to either swap the balls or keep them in place. then i re-enter the room and tell my friend whether the balls were swapped or not. we repeat this process as many times as necessary until my friend is convinced that the balls have different colors
    • the point here is that for me to keep telling my friend whether the balls were swapped or not, i must have a reliable way of distinguishing the two balls, so if i eg. answer correctly 100 times in a row then my friend can be pretty confident the balls have different colors, since the probability of me guessing correctly 100 times is basically zero
    • here are some examples of arguments that don’t work: we could convince my friend the balls have different colors by asking a bunch of people what the color of each ball is to get a consensus, or we could take pictures of the balls and examine the rgb pixel values, but these procedures aren’t zero-knowledge because my friend learns the color of each ball in the process
  • i want to prove to my friend that two entrances to a building are connected
    • proof: i walk in via the first entrance and exit via the second entrance, while my friend watches from outside
    • again, note that letting my friend walk through the building themselves or showing them the building floor plans or etc. wouldn’t be zero-knowledge proofs, since they reveal information about the interior structure of the building

these are two relatively simple examples of zero-knowledge protocols; with more complicated schemes it’s possible to prove a wide variety of claims in zero knowledge (the technical formulation is that any statement in np can be proven). you can imagine how, instead of proofs about balls and colors or entrances and building layouts, we might care about zero-knowledge proofs involving more important objects like passwords or identities or bank accounts. so it shouldn’t be surprising that many of the authentication and authorization schemes keeping the internet secure are actually zero-knowledge proofs of sorts. if you want to continue learning about these topics with more mathematical context i recommend this blog post

ii. zk-snarks


zk-snark stands for zero-knowledge succinct non-interactive argument of knowledge. the zero-knowledge… argument of knowledge part basically just refers to zero-knowledge proofs from the previous section, so the only new qualifiers here are succinct and non-interactive

succinct here means that the proof takes a constant amount of space. for instance, the ball color example from the previous section is not succinct, since depending on how skeptical the verifier is, they might ask to repeat the swapping procedure 10, 100, or 1000 times before they are fully convinced that the balls have different colors, so a record of the proof might end up being arbitrarily long. by contrast, the building entrance example is clearly succinct

non-interactive means that the proof procedure doesn’t require any intermediate input from the verifier. once again, the ball color example isn’t non-interactive, since at each step the verifier decides whether to swap the balls or not and whether to enter another round or not, while the building entrance example is in fact non-interactive

so, putting all of that together, a zk-snark is a succinct non-interactive zero-knowledge proof. the most important property of zk-snarks is the following: for any function f that has a finite representation in terms of the basic arithmetic operations, and any desired target output y, there is a zk-snark proving that f(x)=y. or, to put it differently, i can take any finite function f, a secret input x, a public output y, and produce a zero-knowledge proof that f(x)=y without revealing x, and the proof will take up constant space regardless of how complex f is. this fact took me a day to wrap my head around the first time i saw it; it simply makes no sense that this is possible, using constant space, for arbitrary functions f! but cryptography is filled with results that seem absurd at first glance and obvious in hindsight, so you get used to the feeling after a while

why is this important? well, many cryptography algorithms (like rsa and bls) make use of a private key x and a public key y. for instance, in rsa encryption, the private key is basically a password you keep to yourself that lets you decrypt messages intended for you, while the public key is shared with everyone else and is used by other people to encrypt messages they want to send to you. this setup ensures that if someone else on the internet intercepts an encrypted message intended for you, they won’t be able to decrypt it since they won’t know your private key

this also means that ownership of a private key is essentially the same as ownership of an internet identity. since private keys x and public keys y are usually related by some function f with f(x)=y (f is the function used to generate the public key from the private key), if you can put the function f inside a zk-snark then you can use the zk-snark to produce a zero-knowledge proof that you own the private key corresponding to a public key. in other words, you can create zero-knowledge proofs for ownership of digital identity

if you want to learn more about how zk-snarks actually work, this blog post is pretty good

iii. zk identity


in the previous two sections i introduced zero-knowledge proofs and then talked about how zk-snarks enable zero-knowledge proofs of digital identity. this might not seem very exciting by itself, and indeed it’s true that proof-of-identity on its own doesn’t do much. the interesting applications come from mixing proof-of-identity with existing real-world identities. for instance: 

  • you can use proof-of-identity to manufacture anonymity within a group. many members of social groups are afraid of voicing dissenting opinions, the us senate being perhaps the most frustrating example. proof-of-identity could eg. allow a senator to post an anonymous opinion along with a proof that they are one of the 100 us senators, while revealing no knowledge about which senator they are. this is different from a senator giving an anonymous interview to a news reporter because there is real credibility here – we can mathematically guarantee that the anonymous opinion comes from a senator while ensuring it is cryptographically impossible to figure out which senator voiced the opinion. (by contrast, in the anonymous interview scenario, readers don’t know if the reporter is just be making up a story, and senators may be reluctant to conduct an interview because of the potential for identity leakage)
  • on the flip side, you can use proof-of-identity to give credibility to an anonymous voice. eg. maybe an author wants to try experimenting with a new writing style, but they don’t want it associated with their name because they’re unsure how it will be received, so they share it anonymously but attach a zero-knowledge proof that the author has published before or has sold some number of books in the past, so that readers take the anonymous piece more seriously
  • proof-of-identity can be used to prove an identity is consistent as well. for instance, maybe i want to buy asset 1 from marketplace 1 and asset 2 on marketplace 2, and i want people on marketplace 1 and marketplace 2 to know that i own both asset 1 and asset 2 (eg. i bought peanut butter and jelly from separate vendors to get better deals, but want to prove that i can make a peanut butter jelly sandwich). if marketplace 1 lets you import data from your marketplace 2 account or vice versa then this is easy, but these integrations may not always exist. in this case you can create a shared private key and sign the transactions for both marketplaces with the same private key to prove that the people executing both transactions share the same identity. the more general pattern is that you can show a series of actions performed by different accounts on unrelated platforms was performed by a consistent identity, without revealing your real-world identity

of course there is potential for abuse here. for instance, in the first two examples, someone could use proof-of-identity to make harmful statements anonymously while backing up those statements with status. i’m not trying to argue that proof-of-identity will be a digital panacea, but rather that it opens up more flexible ways of thinking about identity and privacy that aren’t currently possible

iv. what i actually work on


now that i’ve explained zero-knowledge proofs, zk-snarks, and zk identity, we finally have all the pieces necessary to understand what i actually work on

in section ii i mentioned that, for any function f defined in terms of basic arithmetic operations, it is possible to create a zk-snark which proves f(x)=y. basically every function of interest can be written in this form, and there are actually programming languages where you write some code to define the function f, run it on some inputs, and transform that function call into a proof that other people can verify. in my case, the group i work with uses the circom language. technically it’s a circuit design language, but writing circom really feels like writing regular code. for example, if i want to create a zero-knowledge proof that two numbers a and b multiply to 24, i can just write the following circom code: 

template CheckSum() {
    signal input a;
    signal input b;
    a*b === 24;

then i compile the code and let the user feed in inputs a and b. if a and b multiply to 24, a zero-knowledge proof is generated; if not, an error is returned. as you might imagine, for more complicated functions we write significantly longer programs (the largest circuit from my project contains 19 million constraints), but the overall process is the same

to actually create all the proof-of-identity applications from section iii, we need to first implement the relevant cryptography protocols as circuits in circom; zk-snarks are a relatively new topic, so many standard libraries and algorithms have yet to be implemented, and this is one of the major bottlenecks in the development of applications that use zero-knowledge technology. in my case, i’m working on the bls digital signature scheme which allows someone to sign a message and then have other people verify that the signature is valid. the bls signature scheme relies on a math function called an elliptic curve pairing which is too complicated to explain here, though if you’re interested there is an mit class on elliptic curves that touches on pairings. so my work consisted of first reading math to understand elliptic curve pairings (this took a few weeks), then writing out the circom code to produce the actual proofs (this took a few months), and now creating some demos and writeups for public consumption (this is still ongoing)

v. other thoughts


there’s no guarantee that zero-knowledge technology will become widespread in the future, or that any of the proof-of-identity applications i mentioned in section iii will ever be widely used. this work is all being done in a relatively niche corner of the internet, and it might be too technical for the general public to ever understand why this stuff is promising. but i’ve enjoyed my work in spite of these uncertainties – it’s been a really fun way to learn about cryptography and work on math that i otherwise would never have touched

one of the more compelling arguments i’ve heard for working on zero-knowledge tech is the following: we are by default headed towards a world where governments and corporations perform mass surveillance and collect large amounts of data that is then stored in centralized databases; this data is often handled carelessly and always runs the risk of being hacked or leaked, leading to scams, identity theft, and breakdowns in privacy. we have seen in the past few decades that legislation and reform will not stop these trends- governments benefit from such surveillance, and large corporations probably won’t really be held accountable for data malpractices because they provide invaluable services that can’t be replaced easily. however, if we can find ways to replace as much authentication on the internet as possible with zero-knowledge protocols, we may be able to shift the world towards futures where the potential impact of data malpractice is reduced (eg. if you’re using zero-knowledge proofs for authentication then hackers can’t steal your login info from a server, because the server never had access to your login info to begin with)

usually, when people suggest fighting tech with more tech like i did in the previous paragraph, my initial instinct is one of skepticism. attempts to fight fire with fire go poorly more often than they succeed. this case is no different – i’d prefer if there were solutions to data insecurity other than building a new layer of tech to move everything onto; i just haven’t heard of any good ones. but it’s also true that the entire history of cryptography is an ever-escalating fight between new protocols and new exploits, so it makes sense that this field in particular might be an exception to the rule