Remember that FHE does not ensure authenticity of your data by default, it provides only privacy.
If you send your server two encryped numbers and ask to multiply them for you, the server may send you their addition instead, or just encrypt some constant it chooses and send you that. And you can't check that your data was processed the way you want.
If you want authenticated FHE, that will be even slower.
For anyone who wanted a database to query privately, you can look at dishonest majority generic MPC protocols. Here, you will need N servers instead of just one. You will secret-share your inputs between them, and let them do computations on the shares. As long as at least one of the N servers is acting in your interest, you are safe. But if all servers unite their shares, they can recover your values.
> If you send your server two encryped numbers and ask to multiply them for you, the server may send you their addition instead, or just encrypt some constant it chooses and send you that.
That's not what FHE intends to solve. You have the exact same problem with any kind of remote processing, encrypted or not. I don't see why you think it's important to point out the obvious.
Because, sometimes it is said that fully homomorphic encryption means that you don't have to trust the server anymore. And as you pointed out, that is not true. You have to trust a server less.
These are orthogonal definitions of trust, though.
Trusting the server to perform the operation I asked it to is an entirely different domain than trusting the server with the knowing the input and output data of the operation.
The poster is just making sure people understand precisely what this does and does not guarantee. That’s always one of the most important things to know about a cryptographic primitive.
Whenever you see a cryptographic protocol that does not include authentication you should be highly suspicious. In most cases there are some smart attacks.
Gnull just pointed out that FHE by itself does not provide authentication so you still need some additional means to achieve that.
FHE doesn't provide authentication of correctness (i.e. that the correct operation was performed) but you can verify its signature. You can still verify that it's your data but you just can't verify what they've done with it.
With the use cases of FHE though that just is not a major downside. Instead if you rely on some consensus protocol you can get a reasonable guarantee of correctness. If a random N out of M available operators are selected and all produce the same result or hash of a result, you can be reasonably sure that the expected operation was performed.
Additionally, as the "master" who can actually read the FHE data, for most non-trivial tasks I suspect there is often a proof or heuristic you can use to reasonably judge whether the result is correct or at the very least reasonable which should stave off a significant portion of the avenues for attack.
Is it, though? OTOH I cannot think of a single service or provider thereof that makes any guarantees about the trustworthiness of the actual calculation being performed as suggested by the GP.
While trusted computing could be used for such verification, I'm not aware of any commercial real world implementation. I'd be genuinely happy to be made aware of one, though, so please feel free to point me to one.
To my knowledge, those make no guarantee that the work done is correct, only that it is private and secure (which is effectively the same root assumption as with FHE).
The offerings that use SGX or SEV also guarantee correct execution as long as you perform remote attestation of the code before execution and as long as you trust the CPU manufacturer.
> If you send your server two encryped numbers and ask to multiply them for you, the server may send you their addition instead, or just encrypt some constant it chooses and send you that. And you can't check that your data was processed the way you want.
I don't see how authenticated encryption would address that. In fact, I don't see how any kind of encryption can address that. Every way I can see to verify that the server is actually carrying out the correct operation would work without encryption.
Authentication proves that you are talking to the right server and that the result you got back from the server was the result it tried to send you, but whether or not that was what it was supposed to send you is out of scope for encryption.
I don't want a database I can query privately, I want to be able to lease my idle hardware out to strangers on the internet and vice versa without needing to trust them.
Would this work better in a decentralized environment with high number of consensus nodes? The protocol will work to ensure the outputs are more trusted
Usual signatures do not allow homomorphic operations on the signed values. You will have to use special fully homomorphic signatures (FHS) for that.
Suppose you did this and encrypted + signed your values with FHE and FHS as Sign(Encrypt(x)) and Sign(Encrypt(y)). Now you want to add x and y. For that, you will have to run the homomorphic addition of Encrypt(·) values using homomorphic operations of Sign(·), i.e. you do Encrypt(x) + Encrypt(y) inside Sign(·), not just x + y.
This means that the overheads introduced by FHE and FHE do not just add up, they get composed (as in composition of functions). This will be a terrible blow-up in performance. I guess, for this reason people try to build authenticated FHE that will have the properties of FHE + FHS but as a one, more efficient block.
It's really great to see more big companies getting into this game, ease of adoption is really the key here.
When it comes to FHE, there are 3 underlying paradigms you can target with compilers:
1. boolean circuits, where you represent your program as encrypted boolean gates. The advantage is that it's as generic as it gets, the drawback is that it's slow. TFHE is great for that, and it's what is shown here.
2. arithmetic circuits, where you represent your program as a combination of encrypted additions and multiplications. This goes much faster, but you are quickly limited in terms of usecases because you can only do a certain number of arithmetic operations. CKKS/SEAL targets that: https://www.microsoft.com/en-us/research/project/microsoft-s...
3. functional circuits, where you represent your program as a combination of homomorphic functions. Advantage is that you can do very complex things like deep neural network, the drawback being that you have limitations of the bits of precision for the computations. Concrete targets that: https://zama.ai/concrete/
These are all the same paradigm, in priciple. 2 is a special case of 3 and 1 is a special case of 2. All circuits differing in how powerful the processing nodes are.
Are there any benchmarks to estimate its usability?
I first encountered FHE many years ago, but while the concept is cool, the performance hit (IIRC I saw literally a million times slowdown back then) made it absolutely impractical for non-toy problems. And whenever I see reports, they tend to note that it's slow, just as this project (https://github.com/google/fully-homomorphic-encryption/tree/...) notes "the run-times of the FHE-C++ operations are likely to be too long to be practical at this time.", it does not provide any data on just how slow it is.
Are there measurements on how long the demos take to execute, so that I can evaluate if it has the potential for some specific idea without having to set up the build system to try it out myself? E.g. run the same string capitalization demo on 100, 10 000 and 1 000 0000 characters to benchmark its speed.
I've done work on low-latency FHE neural network inferencing [1] and we estimated the FLOPS to be approximately 4 times that of an Intel 8087 floating point coprocessor [2]. This was for a LeNet-5 network on the MNIST dataset with multicore evaluation on a workstation class machine.
My view is that this is already fast enough to support use cases that really need the unique capabilities of FHE. Since this work we've been focused on making FHE more usable with compilers and tooling [3]. Currently most FHE is being programmed like that Intel 8087 was: with the equivalent of assembly by directly calling functions in FHE libraries to perform arithmetic and crypto operations. Imagine having to do register allocation by hand for all of your code. The EVA compiler [4] is meant to be like a "C compiler for FHE", hiding low-level crypto concerns and providing common optimizations.
The github repo has benchmark versions of most of their examples that include built-in timers. I just checked both versions the "reverse a string" example (a single-core and a multi-core version) in an Ubuntu 20.04 VM:
Single core: 2838.660s (47.31 minutes)
Multi-core: 843.828s (14.06 minutes)
The multi-core version was using four CPU cores, so slightly worse than a linear performance increase. Both versions lit up the fans in my MacBook Pro pretty hard.
The string in question, by the way, was "hello" -- five characters long.
Does this terrible performance match your previous findings? It is so bad I'm wondering if there is something wrong with my dev environment.
If "million times slowdown" is accurate, then by eyeballing that should take us roughly from the state of the art consumer processors of today back to the realm of the m68k. Sure, that's a giant gap. But I don't think it's outside the realm of possibility that there could be useful tasks done under those circumstances.
To me m68k seems something like 10 000 times slower than modern hardware, not a million times. From the demo descriptions I get an impression (though there are no benchmarks to be sure) that it's much more limited than what could be done on a m68k.
m68k is the equivalent of those, it's late 70s/early80s, with speeds in the low MHz range (e.g. 7.67 MHz for the version in Sega Genesis) - but I haven't seen claims of FHE having that high of a performance (e.g. sum or filter a million numbers in a second as low MHz allows you to do) but only something an order of magnitude or two slower (e.g. 13ms/core for a bit operation might be treated as comparable to a 0.077 MHz microcomputer; you can parallelize over cores but that computer parallelized over bits), so we're talking about much more than a 40-year regression of performance, as it's slower than the first 1970s microcomputers which themselves were slower than the 1960s minicomputers, so perhaps the 1950s early transistor computers?
Sure, if the algorithm you want to brute force is slower, it will take longer to break it.
This is e.g. why password hashes use the slow bcrypt rather then the fast SHA-256 hashing algorithm. [https://auth0.com/blog/hashing-passwords-one-way-road-to-sec...]
Oh right now there is no shortage of needs, thanks to AI applications mostly, but I could see homomorphic encryption as the next moat that datacenters propose.
This is super cool! For a long time I've wanted a database in the cloud where my cloud provider can't quietly look through all my data. Thats easy enough to do by encrypting specific database entries, but the problem comes from indexing. Most of the obvious ways you could implement an index accidentally end up leaking information as a result of their access patterns.
I'm curious if this approach could be used to build database indexes, and if so, whats the performance cost in using it?
I read about an approach a few years ago which needed hundreds of milliseconds of computation for simple operations. Has the state of the art improved?
> I read about an approach a few years ago which needed hundreds of milliseconds of computation for simple operations. Has the state of the art improved?
Google still relies on the same group you read about, probably. From the docs:
> This transpiler connects Google’s XLS library to the TFHE library.
> Each binary gate takes about 13 milliseconds single-core time to evaluate, which improves [DM15] by a factor 53, and the mux gate takes about 26 CPU-ms.
As a minor pedantic point, this is just someone who works at Google's side project, they have disclaimers in the docs, but it's released by "Google" because of their IP policy, AFAIK.
The state of the art has definitely improved. Zama are doing something called 'programmable bootstrapping' which can enable efficient (e.g., faster) Homomorphic Inference on deep neural networks -- https://whitepaper.zama.ai/whitepaper.pdf?_ga=2.218937936.87...
Good question about database indexes; that could be an interesting use case study.
More generally you want systems that implement oblivious accesses. There’s been a lot of cool work recently in making thi more practical, and I’m excited to see how it pans out in the near future :)
I will also note that for efficiency, Oblix relies on Intel SGX. I’m not sure if a purely cryptographic solution can provide the same security and efficiency properties.
ORAM provides the purely cryptographic solution, but you have to pick your scheme based on the amount of leakage you are willing to have. In the worst case accessing one row in a database would require reading the whole database.
My guess would be usability, the google approach let's anyone knowing C++ (constrained by the supported sets of operations from the language) actually create an FHE program without having to know anything about crypto.
Maybe PALISADE is not user friendly enough? Just a wild guess I never used PALISADE
It's such a convenient parody of a classic programming exercise that it must be intentional. I can't help laughing at how that one file is saying two things at once:
"Uh, it's one-word Hangman. Dead simple. Nothing up our sleeves. I have Real Stuff to do, give me a break."
"Our product is working great, but there's some minor fixes to do. Let's hold a meeting on how to really get this multi-word support cracked. Maybe add 'privacy-aware' and 'digitalization'?"
Surprisingly, it's also making it very easy to understand what is actually relevant in the project.
IMHO the key thing is that any proper hangman implementation would require much more computing on the FHE side than the six if-statements it currenly has. Currently even for such a toy problem as much as possible is moved to the non-FHE client.
FHE isn't obfuscation, nor is it functional encryption. it's not the cryptographic equivalent of Intel SGX; all it lets you do is put encrypted inputs into a program and get encrypted outputs back.
the real holy grail is program obfuscation, which lets you build your programs as "black boxes," containing secrets but never divulging them. there's progress on that front, but it's less feasible than FHE (which is saying something!)
yeah, I was referring to iO. I'm not an expert at all, but the most promising avenue I see is to make iO decryptors of FHE programs, which would verify that the output came from the end of the FHE program and then decrypt it.
I'm not sure whether you could successfully hide the FHE decryption key with iO, though.
> all it lets you do is put encrypted inputs into a program and get encrypted outputs back.
Indeed.
> the real holy grail is program obfuscation
Why so? FHE is a way stronger fundamental improvement that enables a completely different way to compute on individual data without compromising privacy and secrecy, in a trustless way that TEEs/SGX can't.
Obfuscation implies FHE, but we the converse is not (necessarily) true. In fact the only realizable notion of obfuscation, indistinguishablity obfuscation, is so “weak” that it can exist even if things like symmetric encryption don’t.
program obfuscation could trivially implement FHE. the obfuscated program would simply:
1. decrypt the inputs using its embedded private key.
2. do the computation.
3. encrypt the outputs and return them.
I'm not saying SGX is the holy grail, it's as compromised as Intel. I'm saying that an actual cryptographic primitive, like SGX but actually trustless and made of math instead of hardware, is.
I think what they meant was that the secrets would be in a form that the black box would consider decrypted-equivalent (i.e. able to perform computation on), but that wouldn't be plaintext to an outsider.
it's actually stronger than FHE, which is what I was saying. virtual black-box obfuscation would mean that an attacker could only observe the outputs and the inputs - all intermediate states and data, such as the decrypted inputs or the decryption key or logic - would themselves be encrypted. it's like having FHE with an output decryptor circuit at the end.
unfortunately VBB was shown to be impossible, but indistinguishability obfuscation - producing a program that's indistinguishable from any other program that produces the same outputs for the same inputs - is possible, as recently demonstrated.
iO is weaker but still useful. imagine that I have a program that looks like "return input == secret". with iO, you couldn't recover the value of "secret" from the program, since "return Hash(input) == HashedSecret" is equivalent to the original, and iO guarantees that all alternate implementations of a function are indistinguishable.
I really want to understand this. Let's say my focus is on retrieving the input, not the secret.
I do not want the people who are running the program to know my input. I do not want the program to know my input. Does indistinguishability obfuscation help with that?
if you want the users of the program to know the output, but you want to keep the input a secret, you could use functional encryption (another special case of iO.) users can compute a function of the input without being able to decrypt the input or learn anything else about it.
if you want the output to be encrypted too (even potentially with a different key), you can use vanilla FHE.
There is a difference between an array of encrypted numbers (which fhe requires if I understand it correctly), and a completely encrypted bitstream (where the data could be put to hide the fact that it is a number array)
This does not make sense to me, I must be missing something important. If your input is an encrypted state of a Turing machine, then isn't FHE able to simulate the evolution of that Turing machine (hence keeping state and secrets without divulging them)? After all, there is no real difference between data and program.
oh yes. brainfreeze [0] is an example of an FHE Brainfuck machine. the problem is that your users can't decrypt any outputs at all from them. there's also a major possibility of confused deputy attacks if you blindly decrypt outputs returned to you by your users, since the users can "circuit-bend" your FHE program to a degree. for instance, nothing's stopping me from sending the output from brainfreeze back to you early - I can't even tell when it's halted!
edit: actually, that's the fatal flaw with FHEs as Turing machines. you can't ever tell when they halt, because that would require decrypting a bit and exposing it to the end user. you just have to keep running them until you get bored, or abandon Turing machines for circuits.
you might be able to use the user as an oracle, but the real issue is that the user has to be around. querying the user after every clock cycle will get old, fast.
What about programming in the style used to avoid timing attacks. My understanding is that there are plenty of situations even today where for security purposes (side-channel timing attacks) you need your program to execute in a fixed number of steps independent of the input. Is this not enough to solve the problem?
Could you please explain a bit more to the more uninitiated like me:
1) Is my understanding correct - FHE-C++ library allows me to encrypt a photo from the client (in the browser), send it to the server, perform image operations on it, return back the photo to the client in a transformed state without the server ever being able to see what its transforming?
2) Is the holy grail you're noting allows any piece of software to be compiled and encrypted, but still take inputs and produce outputs? do the inputs and outputs have to be encrypted with the same key as the compiled software?
2. for obfuscation, the inputs and outputs are unencrypted, but the behavior of the software can't be analyzed. as a motivating example, say that I've built some ransomware, but I can't trust my C&C server to stay up. instead, I have the worm embed the private key into a freshly-compiled black box. if you feed the black box a valid Bitcoin block containing a ransom transaction to my address, it spits out the private key and you can decrypt your files.
there are many less evil examples out there - wanting to protect your ML models but still make them redistributable, etc.
How is this possible (I have no clue about FHE)? If the server can do operations on the image, then the image can't look like random noise, or not? And if there are indeed patterns the server can operate on, then the image can be decrypted?
If you can write a program with few to no if statements that depend on the data, then you can run that program relatively efficiently under FHE.
A blur or sharpen filter would be an easy example. The server sees you take the random noise of each pixel, mix it with the nearby pixels, and output new random noise.
Sorry if I'm being too thick here, but a blur filter needs information about the neighbour pixels. It can't operate on a single pixel alone (well it changes things pixel by pixel, but needs the neighbour pixels as input). If I can extract these neighbour pixels from the encrypted image, why can't I extract the image itself?
Each pixel is encrypted separately. FHE lets the server take two encrypted numbers and add or multiply them to make a new encrypted number. There is no way for the server to ever know what any of those numbers actually are.
Doesn't it let you avoid decrypting the data anywhere while processing that data tho? I think that might be part of the magic that makes it valuable. It seems like if implemented well, it's another additional solid layer of security that makes an attackers job harder.
the real problem is that processing the data with FHE on someone else's computer is tens of thousands of times less efficient than doing it yourself, unencrypted, on your own computer, so there's no real benefit from offloading the computation.
except if you're mixing data sources.
for instance, say that I have a sweet Snapchat filter that I don't want to let off my servers. and you want to use my filter, but you don't want me to see your face in plaintext. in that case you can send me your encrypted photo, I run it through my filter, and send the results to you for you to decrypt.
its only real use case is that type of split data custody.
Slowness is necessary to some degree. Early stopping leaks information, so FHE algorithms wind up being worst-case for all inputs (e.g., O(N) to find the value for a key).
The key part of slowness is not the algorithms, but inherent speed of processing information this way. E.g. the TFHE library used here (https://tfhe.github.io/tfhe/) claims that "Each binary gate takes about 13 milliseconds single-core time to evaluate", and it operates on individual bits (so things like integer multiplication need many such 13ms operations), so doing something relatively trivial on non-tiny data sizes - like iterating over a megabyte of data to do something that's O(n) anyway - takes an impractical amount of time.
I think `O(log n)` for key-lookup, assuming fully-indexed data, should be sufficient, as that's the worst-case lookup time in a balanced tree.
Methinks this has interesting implications for storage-optimizations: in many situations we will significantly-optimize for certain use-cases at the expense of others (e.g. O(1) or sub-O(log n) retrieval at the cost of super-O(log n) insert or modifications) - but because now you know your program cannot appear to operate faster than O(log n) for any operation you could de-optimize some parts of your program which in turn allow you to improve sub-O(log n) parts to be O(log n), e.g. insert/update performance.
Let's say you have a sorted array, and try to do a binary search. For each 'if', the only way to calculate it is to take both branches and combine them later. So for x branches in a row, you need to do 2^x calculations. O(2^logn) = O(n)
If you have a tree that uses pointers, you're even worse off. Chasing a pointer under FHE has a cost O(y), where y is the size of your memory. Assuming y is proportional to n, that's O(n log n) just to look up a single key.
Citation needed. The argument isn't as simple as you made it look.
It's true that if the information needed for a computation is only stored in one location or the other of a random access memory according to a branch, accessing it will reveal which branch is taken, which is revealing too much.
But that isn't the only way to store information. For example a secret linear map or polynomial encoding of the information in a random access memory distributes the contents in such a way that you can sample multiple points to access encoded values, without obviously revealing the addressing scheme.
[0] shows that o(N) lookups are impossible in general with only O(N) space in any private information retrieval scheme. That admittedly doesn't show that O(log n) lookups are impossible, but it's a similar result.
The improvements I'm aware of include (1) storing some extra information, (2) splitting the query into multiple phases to move the bulk of the Ω(N) cost out of the hot path, and (3) reframing the problem (an actual database lookup can't be improved, but similar constructions might still satisfy the end goal, or if you're using a hashmap as a sub-component then the composite algorithm might still be made efficient by never computing those expensive intermediates).
Yes, it's still FHE, because the physical locations (e.g. RAM addresses, disk block indexes) don't reveal which logical index (in the FHE encrypted array) you are looking up. The encrypted data isn't just shuffled in memory with a secret permutation, it's encoded in a way that's analogous to a hologram but with cryptographic operations.
I follow the idea that revealing the physical location is safe under a method like this. But how does the program tell the server which physical locations to grab? The knowledge of which locations are needed is inside the encryption.
Ah. By using more complex FHE states and operations, such that part of the state can be decoded/decrypted by the server to direct the server's actions, while other parts remain opaque.
The server may or may not find those decoded parts meaningful: Meaningful would be something like console output while the computation progresses, and a "finished" signal. Non-meaningful (but still useful to the FHE program) would be something that looks like a sequence of memory requests whose locations may as well be pseudo-random.
the term homomorphism implies that the transformations on the encrypted data (such as addition and multiplication) also apply on the plain text; now you can't check if a value is zero, as it's a different value in the plaintext and encrypted text; so you can't branch or do any decission, based on the value of the encrypted data (i think that would severely limit the practical utility of this trick)
This has some similarities to how you do 'vectorization' of data processing to avoid branching; you can "make decisions" with conditional expressions (y = condition ? a : b ) implemented as y = condition * a + (!condition) * b. You do have to calculate both branches, but you get the conditional result depending on that decision.
I saw some answers but one potential roadblock is runtime, it looks like google transpiler generates a boolean circuit, which is kind of like emulating a small piece of hardware in software
>> FHE can change the way computations are performed by preserving privacy end-to-end, thereby giving users even greater confidence that their information will remain private and secure.
The idea today is that you rely on sharing a key with the server (through a secure protocol) but the server still has to decrypt your data to do anything with it and then encrypt it back and send the result to you.
With FHE, if the server runs FHE software then you can encrypt your data with your secret key without ever disclosing it to the server (as it does not need it to compute stuff on your data).
The benefits are many: the server never has to know anything about your data (imagine a MedTech company doing diagnosis, your medical data will be safe from their prying eyes).
If the server is compromised, the attacker cannot look at your data, potentially no more private information leak!
On the regulatory side you potentially don't have to worry about GDPR anymore, you can't access the data of your users.
Also, you don't have to care where the data is being stored, because if it's fully end-to-end encrypted, location is far less of an issue (to your GDPR point). Alleviates a lot of the talk around the security risks posed by the unrestricted sale and transfer of sensitive data to foreign adversaries (China, Russia) that dominates the political landscape in the U.S.
I'm trying to understand what kind of computations would be done on encrypted data. For cases where computation depends on the raw data (conditionally), this would not be feasible. For other cases, such as e.g. training some ML model to map inputs to output, this could also be done on any non-homomophically ecrypted data. What are actually use cases for computations on FHE data?
Edit: Thinking of it more, I could imagine some sort of image transformations (e.g. a blur filter) as applications - but those are fairly corner use cases, are there more broad ones?
Potentially do stats on website usage data without ever seeing the actual data (the endgoal is not super humanitarian I know). Medical diagnosis could be one, bonus is the service never sees your data.
As per conditional programs, there might be a way to do it but it would always be the worst case (no early loop break for example) so the runtimes would be aweful and the memory state could get prohibitively large I guess (to keep track of all the branches).
HE allows third-parties to perform processing without seeing the plaintext. This allows sensitive data to be held and processed by a cloud computing platform. It could also be useful for untrusted peer-to-peer computing (something like Ethereum). It seems too inefficient to be practical at the moment, but on the other hand that hasn't stopped cryptocurrencies before...
Hospital saves data about a patient in a theoretical common file that tracks all his health info, but solely belongs to the patient.
Insurance wants to add data about some reimbursements for glasses they've done, but should not be allowed to see his medical conditions.
FHE lets you work on the data and update it without ever revealing the contents.
Accept that, in the case of your append-only log, you need to trust that someone is applying that lack of read permission to themselves. And that they do it well.
Couldn't regular asymmetric encryption take care of that? Just encrypt each entry with the patient's public key.
I think you only need homomorphic encryption if the transformation is something that the owner of data can't or shouldn't do themselves. It would be overkill for simple financial information.
Maybe it helps to think of HE you're already familiar with. For example, RSA is homomorphic under multiplication - so you can multiply two ciphertexts and get back an encrypted result. If this can be done for the right subset of operations, you can construct a fully homomorphic scheme. To get anything deeper than that intuitive understanding you'd have to dive into individual protocols :)
If you send your server two encryped numbers and ask to multiply them for you, the server may send you their addition instead, or just encrypt some constant it chooses and send you that. And you can't check that your data was processed the way you want.
If you want authenticated FHE, that will be even slower.
For anyone who wanted a database to query privately, you can look at dishonest majority generic MPC protocols. Here, you will need N servers instead of just one. You will secret-share your inputs between them, and let them do computations on the shares. As long as at least one of the N servers is acting in your interest, you are safe. But if all servers unite their shares, they can recover your values.