Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Fully Homomorphic Encryption (FHE) (github.com/google)
294 points by gone35 on June 15, 2021 | hide | past | favorite | 150 comments


Remember that FHE does not ensure authenticity of your data by default, it provides only privacy.

If you send your server two encryped numbers and ask to multiply them for you, the server may send you their addition instead, or just encrypt some constant it chooses and send you that. And you can't check that your data was processed the way you want.

If you want authenticated FHE, that will be even slower.

For anyone who wanted a database to query privately, you can look at dishonest majority generic MPC protocols. Here, you will need N servers instead of just one. You will secret-share your inputs between them, and let them do computations on the shares. As long as at least one of the N servers is acting in your interest, you are safe. But if all servers unite their shares, they can recover your values.


> If you send your server two encryped numbers and ask to multiply them for you, the server may send you their addition instead, or just encrypt some constant it chooses and send you that.

That's not what FHE intends to solve. You have the exact same problem with any kind of remote processing, encrypted or not. I don't see why you think it's important to point out the obvious.


Because, sometimes it is said that fully homomorphic encryption means that you don't have to trust the server anymore. And as you pointed out, that is not true. You have to trust a server less.


These are orthogonal definitions of trust, though.

Trusting the server to perform the operation I asked it to is an entirely different domain than trusting the server with the knowing the input and output data of the operation.


This is the key point that the original commenter was trying to elucidate.

Well put.


I've been following FHE casually for a while now and I definitely thought it meant I didn't have to trust the server anymore.


I didn't know it. They taught me something. That's why it's important to point out the obvious, it's not obvious to everyone.


The poster is just making sure people understand precisely what this does and does not guarantee. That’s always one of the most important things to know about a cryptographic primitive.


Whenever you see a cryptographic protocol that does not include authentication you should be highly suspicious. In most cases there are some smart attacks.

Gnull just pointed out that FHE by itself does not provide authentication so you still need some additional means to achieve that.


FHE doesn't provide authentication of correctness (i.e. that the correct operation was performed) but you can verify its signature. You can still verify that it's your data but you just can't verify what they've done with it.

With the use cases of FHE though that just is not a major downside. Instead if you rely on some consensus protocol you can get a reasonable guarantee of correctness. If a random N out of M available operators are selected and all produce the same result or hash of a result, you can be reasonably sure that the expected operation was performed.

Additionally, as the "master" who can actually read the FHE data, for most non-trivial tasks I suspect there is often a proof or heuristic you can use to reasonably judge whether the result is correct or at the very least reasonable which should stave off a significant portion of the avenues for attack.


Because it is crucial in practice, right?


Is it, though? OTOH I cannot think of a single service or provider thereof that makes any guarantees about the trustworthiness of the actual calculation being performed as suggested by the GP.

While trusted computing could be used for such verification, I'm not aware of any commercial real world implementation. I'd be genuinely happy to be made aware of one, though, so please feel free to point me to one.


Most of the major cloud providers have trusted/confidential compute offerings.


To my knowledge, those make no guarantee that the work done is correct, only that it is private and secure (which is effectively the same root assumption as with FHE).


The offerings that use SGX or SEV also guarantee correct execution as long as you perform remote attestation of the code before execution and as long as you trust the CPU manufacturer.


> If you send your server two encryped numbers and ask to multiply them for you, the server may send you their addition instead, or just encrypt some constant it chooses and send you that. And you can't check that your data was processed the way you want.

I don't see how authenticated encryption would address that. In fact, I don't see how any kind of encryption can address that. Every way I can see to verify that the server is actually carrying out the correct operation would work without encryption.

Authentication proves that you are talking to the right server and that the result you got back from the server was the result it tried to send you, but whether or not that was what it was supposed to send you is out of scope for encryption.


I think they mean verifiable. As in, an encryption where you could verify that A*B==C given only the encrypted values of A, B, and C.


I don't want a database I can query privately, I want to be able to lease my idle hardware out to strangers on the internet and vice versa without needing to trust them.


Would this work better in a decentralized environment with high number of consensus nodes? The protocol will work to ensure the outputs are more trusted


Is FHE CPA secure? Is that even a sensible question to ask wrt FHE?


Couldn’t you just sign the payload to verify authenticity?


Usual signatures do not allow homomorphic operations on the signed values. You will have to use special fully homomorphic signatures (FHS) for that.

Suppose you did this and encrypted + signed your values with FHE and FHS as Sign(Encrypt(x)) and Sign(Encrypt(y)). Now you want to add x and y. For that, you will have to run the homomorphic addition of Encrypt(·) values using homomorphic operations of Sign(·), i.e. you do Encrypt(x) + Encrypt(y) inside Sign(·), not just x + y.

This means that the overheads introduced by FHE and FHE do not just add up, they get composed (as in composition of functions). This will be a terrible blow-up in performance. I guess, for this reason people try to build authenticated FHE that will have the properties of FHE + FHS but as a one, more efficient block.


Then your input is signed. And now what? How does this authenticate the output?


I suppose you can just rerun the function? Most cases I know don’t have a large nested set of operations.


Then you didnt need to do the remote operation... Some operations are verifiable though in less time than doing the whole computation.


Maybe in the way the operation affects the signature?


It's really great to see more big companies getting into this game, ease of adoption is really the key here.

When it comes to FHE, there are 3 underlying paradigms you can target with compilers:

1. boolean circuits, where you represent your program as encrypted boolean gates. The advantage is that it's as generic as it gets, the drawback is that it's slow. TFHE is great for that, and it's what is shown here.

2. arithmetic circuits, where you represent your program as a combination of encrypted additions and multiplications. This goes much faster, but you are quickly limited in terms of usecases because you can only do a certain number of arithmetic operations. CKKS/SEAL targets that: https://www.microsoft.com/en-us/research/project/microsoft-s...

3. functional circuits, where you represent your program as a combination of homomorphic functions. Advantage is that you can do very complex things like deep neural network, the drawback being that you have limitations of the bits of precision for the computations. Concrete targets that: https://zama.ai/concrete/


These are all the same paradigm, in priciple. 2 is a special case of 3 and 1 is a special case of 2. All circuits differing in how powerful the processing nodes are.


Yes, in practice however it makes a difference. Consider for example computing the ReLu function for a neural network:

- With boolean circuits you need to run dozens of boolean gates, which means a lot of underlying crypto ops. Works but expensive.

- with arithmetic circuits, you would approximate it using polynomials. Works but not with high precision.

- with functional circuits, you encore the function as a single “bootstrapping” operation. Works in a single crypto op.

Performance / precision tradeoffs will be very different in these 3 cases


The probably like it because it will let them run whatever they want on end user devices without having to deal with interference from the user.


Are there any benchmarks to estimate its usability?

I first encountered FHE many years ago, but while the concept is cool, the performance hit (IIRC I saw literally a million times slowdown back then) made it absolutely impractical for non-toy problems. And whenever I see reports, they tend to note that it's slow, just as this project (https://github.com/google/fully-homomorphic-encryption/tree/...) notes "the run-times of the FHE-C++ operations are likely to be too long to be practical at this time.", it does not provide any data on just how slow it is.

Are there measurements on how long the demos take to execute, so that I can evaluate if it has the potential for some specific idea without having to set up the build system to try it out myself? E.g. run the same string capitalization demo on 100, 10 000 and 1 000 0000 characters to benchmark its speed.


I've done work on low-latency FHE neural network inferencing [1] and we estimated the FLOPS to be approximately 4 times that of an Intel 8087 floating point coprocessor [2]. This was for a LeNet-5 network on the MNIST dataset with multicore evaluation on a workstation class machine.

My view is that this is already fast enough to support use cases that really need the unique capabilities of FHE. Since this work we've been focused on making FHE more usable with compilers and tooling [3]. Currently most FHE is being programmed like that Intel 8087 was: with the equivalent of assembly by directly calling functions in FHE libraries to perform arithmetic and crypto operations. Imagine having to do register allocation by hand for all of your code. The EVA compiler [4] is meant to be like a "C compiler for FHE", hiding low-level crypto concerns and providing common optimizations.

[1] "CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs": https://arxiv.org/pdf/1810.00845.pdf

[2] https://en.wikipedia.org/wiki/Intel_8087

[3] "EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation": https://arxiv.org/pdf/1912.11951.pdf

[4] https://github.com/microsoft/EVA


The github repo has benchmark versions of most of their examples that include built-in timers. I just checked both versions the "reverse a string" example (a single-core and a multi-core version) in an Ubuntu 20.04 VM:

Single core: 2838.660s (47.31 minutes) Multi-core: 843.828s (14.06 minutes)

The multi-core version was using four CPU cores, so slightly worse than a linear performance increase. Both versions lit up the fans in my MacBook Pro pretty hard.

The string in question, by the way, was "hello" -- five characters long.

Does this terrible performance match your previous findings? It is so bad I'm wondering if there is something wrong with my dev environment.


If "million times slowdown" is accurate, then by eyeballing that should take us roughly from the state of the art consumer processors of today back to the realm of the m68k. Sure, that's a giant gap. But I don't think it's outside the realm of possibility that there could be useful tasks done under those circumstances.


To me m68k seems something like 10 000 times slower than modern hardware, not a million times. From the demo descriptions I get an impression (though there are no benchmarks to be sure) that it's much more limited than what could be done on a m68k.


More accurate: an early 6502 or Z80 (circa late 70s/early 80s) with speeds in the low MHz range.

So a ~40-year regression in performance.


m68k is the equivalent of those, it's late 70s/early80s, with speeds in the low MHz range (e.g. 7.67 MHz for the version in Sega Genesis) - but I haven't seen claims of FHE having that high of a performance (e.g. sum or filter a million numbers in a second as low MHz allows you to do) but only something an order of magnitude or two slower (e.g. 13ms/core for a bit operation might be treated as comparable to a 0.077 MHz microcomputer; you can parallelize over cores but that computer parallelized over bits), so we're talking about much more than a 40-year regression of performance, as it's slower than the first 1970s microcomputers which themselves were slower than the 1960s minicomputers, so perhaps the 1950s early transistor computers?


You’re right. The original 68k chip was contemporaneous with the Z80 and 6502.

I was thinking of the later chips in the 68k series, which got much faster over the years.


And a from-0-to-100 gain in privacy.

Calling this a "regression" is nonsense.


for non-English speakers: colloquially, performance == speed. this is a regression in speed, I was making GP’s assessment more accurate.


Does it also make it proportionally slower to brute-force attack it?


Sure, if the algorithm you want to brute force is slower, it will take longer to break it. This is e.g. why password hashes use the slow bcrypt rather then the fast SHA-256 hashing algorithm. [https://auth0.com/blog/hashing-passwords-one-way-road-to-sec...]


Cynical remark: all processes requiring a million times more cycles is the dream of many chip-making companies.


Is it though? If chips had fewer practical applications, fewer people will buy them.


Oh right now there is no shortage of needs, thanks to AI applications mostly, but I could see homomorphic encryption as the next moat that datacenters propose.


This is super cool! For a long time I've wanted a database in the cloud where my cloud provider can't quietly look through all my data. Thats easy enough to do by encrypting specific database entries, but the problem comes from indexing. Most of the obvious ways you could implement an index accidentally end up leaking information as a result of their access patterns.

I'm curious if this approach could be used to build database indexes, and if so, whats the performance cost in using it?

I read about an approach a few years ago which needed hundreds of milliseconds of computation for simple operations. Has the state of the art improved?


> I read about an approach a few years ago which needed hundreds of milliseconds of computation for simple operations. Has the state of the art improved?

Google still relies on the same group you read about, probably. From the docs:

> This transpiler connects Google’s XLS library to the TFHE library.

On the TFHE website (https://tfhe.github.io/tfhe/):

> Each binary gate takes about 13 milliseconds single-core time to evaluate, which improves [DM15] by a factor 53, and the mux gate takes about 26 CPU-ms.


As a minor pedantic point, this is just someone who works at Google's side project, they have disclaimers in the docs, but it's released by "Google" because of their IP policy, AFAIK.


No, probably not. They have an entire team on it, and has an official Google blog post: https://developers.googleblog.com/2021/06/our-latest-updates...


Yeah, I saw that post later, and was pleasantly surprised that they seem to be putting work into it.


Well, FHE may not be there yet but there are other approaches using functional encryption (?) [ref] concepts, like this one I stumbled upon: https://www.cossacklabs.com/blog/secure-search-over-encrypte...

[ref] https://cseweb.ucsd.edu/~daniele/LatticeLinks/FE.html


The state of the art has definitely improved. Zama are doing something called 'programmable bootstrapping' which can enable efficient (e.g., faster) Homomorphic Inference on deep neural networks -- https://whitepaper.zama.ai/whitepaper.pdf?_ga=2.218937936.87...

Good question about database indexes; that could be an interesting use case study.


For (database) index structures, you want something like Oblix: https://people.eecs.berkeley.edu/~raluca/oblix.pdf [PDF]


More generally you want systems that implement oblivious accesses. There’s been a lot of cool work recently in making thi more practical, and I’m excited to see how it pans out in the near future :)

I will also note that for efficiency, Oblix relies on Intel SGX. I’m not sure if a purely cryptographic solution can provide the same security and efficiency properties.


ORAM provides the purely cryptographic solution, but you have to pick your scheme based on the amount of leakage you are willing to have. In the worst case accessing one row in a database would require reading the whole database.


If it helps, we did a mini site to explain FHE: https://6min.zama.ai



I found https://homomorphicencryption.org to be a helpful resource as well.



I wonder why nobody mentioned palisade, a mature and open source FHE.

> PALISADE now supports the BGV, BFV, CKKS, and FHEW schemes and a more secure variant of the TFHE scheme, including bootstrapping.

From https://palisade-crypto.org/


My guess would be usability, the google approach let's anyone knowing C++ (constrained by the supported sets of operations from the language) actually create an FHE program without having to know anything about crypto.

Maybe PALISADE is not user friendly enough? Just a wild guess I never used PALISADE


And also Concrete by Zama, coded in Rust https://github.com/zama-ai/concrete


Whilst demonstrating incredible bleeding edge technology, the Hangman example is hardcoded.

https://github.com/google/fully-homomorphic-encryption/blob/...

It's such a convenient parody of a classic programming exercise that it must be intentional. I can't help laughing at how that one file is saying two things at once:

"Uh, it's one-word Hangman. Dead simple. Nothing up our sleeves. I have Real Stuff to do, give me a break."

"Our product is working great, but there's some minor fixes to do. Let's hold a meeting on how to really get this multi-word support cracked. Maybe add 'privacy-aware' and 'digitalization'?"

Surprisingly, it's also making it very easy to understand what is actually relevant in the project.


IMHO the key thing is that any proper hangman implementation would require much more computing on the FHE side than the six if-statements it currenly has. Currently even for such a toy problem as much as possible is moved to the non-FHE client.


Same thing with the calculator example. It's not doing anything with FHE. Do any of the examples actually do anything??


Other threads on HE compilers:

Microsoft/EVA - https://news.ycombinator.com/item?id=25193863

Alchemy - https://news.ycombinator.com/item?id=18265948

Cingulata - https://news.ycombinator.com/item?id=23437737

And a good overview of HE compilers/libraries https://arxiv.org/pdf/2101.07078.pdf


One real world use-case of ~HE is blind auctions, where no one can see the bids.

Electronic voting is tangential.

Encrypted storage for random cloud services should be at least decades away.


> Electronic voting is tangential.

Of possible interest is a design for an online deliberation system:

https://bytebucket.org/djarvis/world-politics/raw/master/doc...

HE would be useful to help hide the votes.


This seems pretty revolutionary, if indeed arbitrary C++ programs can be written as FHE. Are there any key limitations?


See https://github.com/google/fully-homomorphic-encryption/tree/...

- "Variable-length arrays are not supported"

- "While-loops and for-loops with a variable end-condition are not supported."

- "Floating-point data types are not supported."


FHE isn't obfuscation, nor is it functional encryption. it's not the cryptographic equivalent of Intel SGX; all it lets you do is put encrypted inputs into a program and get encrypted outputs back.

the real holy grail is program obfuscation, which lets you build your programs as "black boxes," containing secrets but never divulging them. there's progress on that front, but it's less feasible than FHE (which is saying something!)


Black box obfuscation was ruled out more than a decade ago by a simple argument in the same paper where it was formally introduced.


Indistinguishability obfuscation is now considered possible despite all the false starts.

https://eprint.iacr.org/2020/1003.pdf


yeah, I was referring to iO. I'm not an expert at all, but the most promising avenue I see is to make iO decryptors of FHE programs, which would verify that the output came from the end of the FHE program and then decrypt it.

I'm not sure whether you could successfully hide the FHE decryption key with iO, though.


What was the paper?


It might be this one: https://www.boazbarak.org/Papers/obfuscate.pdf The original paper with the same name is from 2001.

A great article on IO: https://www.quantamagazine.org/computer-scientists-achieve-c...


> all it lets you do is put encrypted inputs into a program and get encrypted outputs back.

Indeed.

> the real holy grail is program obfuscation

Why so? FHE is a way stronger fundamental improvement that enables a completely different way to compute on individual data without compromising privacy and secrecy, in a trustless way that TEEs/SGX can't.


Obfuscation implies FHE, but we the converse is not (necessarily) true. In fact the only realizable notion of obfuscation, indistinguishablity obfuscation, is so “weak” that it can exist even if things like symmetric encryption don’t.


program obfuscation could trivially implement FHE. the obfuscated program would simply:

1. decrypt the inputs using its embedded private key.

2. do the computation.

3. encrypt the outputs and return them.

I'm not saying SGX is the holy grail, it's as compromised as Intel. I'm saying that an actual cryptographic primitive, like SGX but actually trustless and made of math instead of hardware, is.


> I'm saying that an actual cryptographic primitive, like SGX but actually trustless and made of math instead of hardware, is.

Does this exist, even in theory?



Wouldn’t the decrypted inputs be readable in memory?


I think what they meant was that the secrets would be in a form that the black box would consider decrypted-equivalent (i.e. able to perform computation on), but that wouldn't be plaintext to an outsider.


yeah, but it isn't equivalent to FHE, and doesn't seem as strong a guarantee.


it's actually stronger than FHE, which is what I was saying. virtual black-box obfuscation would mean that an attacker could only observe the outputs and the inputs - all intermediate states and data, such as the decrypted inputs or the decryption key or logic - would themselves be encrypted. it's like having FHE with an output decryptor circuit at the end.

unfortunately VBB was shown to be impossible, but indistinguishability obfuscation - producing a program that's indistinguishable from any other program that produces the same outputs for the same inputs - is possible, as recently demonstrated.

iO is weaker but still useful. imagine that I have a program that looks like "return input == secret". with iO, you couldn't recover the value of "secret" from the program, since "return Hash(input) == HashedSecret" is equivalent to the original, and iO guarantees that all alternate implementations of a function are indistinguishable.


I really want to understand this. Let's say my focus is on retrieving the input, not the secret.

I do not want the people who are running the program to know my input. I do not want the program to know my input. Does indistinguishability obfuscation help with that?


if you want the users of the program to know the output, but you want to keep the input a secret, you could use functional encryption (another special case of iO.) users can compute a function of the input without being able to decrypt the input or learn anything else about it.

if you want the output to be encrypted too (even potentially with a different key), you can use vanilla FHE.


Thanks, the latter is what I would desire.


There is a difference between an array of encrypted numbers (which fhe requires if I understand it correctly), and a completely encrypted bitstream (where the data could be put to hide the fact that it is a number array)


This does not make sense to me, I must be missing something important. If your input is an encrypted state of a Turing machine, then isn't FHE able to simulate the evolution of that Turing machine (hence keeping state and secrets without divulging them)? After all, there is no real difference between data and program.


oh yes. brainfreeze [0] is an example of an FHE Brainfuck machine. the problem is that your users can't decrypt any outputs at all from them. there's also a major possibility of confused deputy attacks if you blindly decrypt outputs returned to you by your users, since the users can "circuit-bend" your FHE program to a degree. for instance, nothing's stopping me from sending the output from brainfreeze back to you early - I can't even tell when it's halted!

0. https://github.com/joeltg/brainfreeze

edit: actually, that's the fatal flaw with FHEs as Turing machines. you can't ever tell when they halt, because that would require decrypting a bit and exposing it to the end user. you just have to keep running them until you get bored, or abandon Turing machines for circuits.


Does the user telling you when to halt leak any information?


you might be able to use the user as an oracle, but the real issue is that the user has to be around. querying the user after every clock cycle will get old, fast.


What about programming in the style used to avoid timing attacks. My understanding is that there are plenty of situations even today where for security purposes (side-channel timing attacks) you need your program to execute in a fixed number of steps independent of the input. Is this not enough to solve the problem?


Could you please explain a bit more to the more uninitiated like me:

1) Is my understanding correct - FHE-C++ library allows me to encrypt a photo from the client (in the browser), send it to the server, perform image operations on it, return back the photo to the client in a transformed state without the server ever being able to see what its transforming?

2) Is the holy grail you're noting allows any piece of software to be compiled and encrypted, but still take inputs and produce outputs? do the inputs and outputs have to be encrypted with the same key as the compiled software?


1. yes, exactly.

2. for obfuscation, the inputs and outputs are unencrypted, but the behavior of the software can't be analyzed. as a motivating example, say that I've built some ransomware, but I can't trust my C&C server to stay up. instead, I have the worm embed the private key into a freshly-compiled black box. if you feed the black box a valid Bitcoin block containing a ransom transaction to my address, it spits out the private key and you can decrypt your files.

there are many less evil examples out there - wanting to protect your ML models but still make them redistributable, etc.


> 1. yes, exactly.

How is this possible (I have no clue about FHE)? If the server can do operations on the image, then the image can't look like random noise, or not? And if there are indeed patterns the server can operate on, then the image can be decrypted?


> If the server can do operations on the image, then the image can't look like random noise, or not?

FHE is precisely the magic that allows doing operations on encrypted data and get encrypted results that you can then decrypt to valid end result.


If you can write a program with few to no if statements that depend on the data, then you can run that program relatively efficiently under FHE.

A blur or sharpen filter would be an easy example. The server sees you take the random noise of each pixel, mix it with the nearby pixels, and output new random noise.


> The server sees you take the random noise of each pixel, mix it with the nearby pixels, and output new random noise.

Which means the pixel positions are known to the server, which means the image isn't really encrypted?


A blur filter does the exact same thing to every pixel.

The server would know that you're doing a blur, but it would have no idea what the contents of the image are.

Your data is encrypted. Your program is not encrypted.

If you want to hide your program, then things will get much slower yet again.


Sorry if I'm being too thick here, but a blur filter needs information about the neighbour pixels. It can't operate on a single pixel alone (well it changes things pixel by pixel, but needs the neighbour pixels as input). If I can extract these neighbour pixels from the encrypted image, why can't I extract the image itself?


Each pixel is encrypted separately. FHE lets the server take two encrypted numbers and add or multiply them to make a new encrypted number. There is no way for the server to ever know what any of those numbers actually are.


Doesn't it let you avoid decrypting the data anywhere while processing that data tho? I think that might be part of the magic that makes it valuable. It seems like if implemented well, it's another additional solid layer of security that makes an attackers job harder.


the real problem is that processing the data with FHE on someone else's computer is tens of thousands of times less efficient than doing it yourself, unencrypted, on your own computer, so there's no real benefit from offloading the computation.

except if you're mixing data sources.

for instance, say that I have a sweet Snapchat filter that I don't want to let off my servers. and you want to use my filter, but you don't want me to see your face in plaintext. in that case you can send me your encrypted photo, I run it through my filter, and send the results to you for you to decrypt.

its only real use case is that type of split data custody.


> Are there any key limitations?

It's slow, you would not want any part of your program that isn't security critical using it.


This may not always be true.


Slowness is necessary to some degree. Early stopping leaks information, so FHE algorithms wind up being worst-case for all inputs (e.g., O(N) to find the value for a key).


The key part of slowness is not the algorithms, but inherent speed of processing information this way. E.g. the TFHE library used here (https://tfhe.github.io/tfhe/) claims that "Each binary gate takes about 13 milliseconds single-core time to evaluate", and it operates on individual bits (so things like integer multiplication need many such 13ms operations), so doing something relatively trivial on non-tiny data sizes - like iterating over a megabyte of data to do something that's O(n) anyway - takes an impractical amount of time.


> The key part of slowness is not the algorithms, but inherent speed of processing information this way.

They're both enormous roadblocks. They're both key.

Even if you could do a 64 bit calculation every 0.25 nanoseconds, basic algorithms becoming O(n^2) or O(n^3) will cripple what you can compute.

And even if you could use the most efficient algorithms in the world, when your CPU is a billion times slower than normal you can't do much.


I think `O(log n)` for key-lookup, assuming fully-indexed data, should be sufficient, as that's the worst-case lookup time in a balanced tree.

Methinks this has interesting implications for storage-optimizations: in many situations we will significantly-optimize for certain use-cases at the expense of others (e.g. O(1) or sub-O(log n) retrieval at the cost of super-O(log n) insert or modifications) - but because now you know your program cannot appear to operate faster than O(log n) for any operation you could de-optimize some parts of your program which in turn allow you to improve sub-O(log n) parts to be O(log n), e.g. insert/update performance.


O(log n) lookups are impossible under FHE.

Let's say you have a sorted array, and try to do a binary search. For each 'if', the only way to calculate it is to take both branches and combine them later. So for x branches in a row, you need to do 2^x calculations. O(2^logn) = O(n)

If you have a tree that uses pointers, you're even worse off. Chasing a pointer under FHE has a cost O(y), where y is the size of your memory. Assuming y is proportional to n, that's O(n log n) just to look up a single key.


> O(log n) lookups are impossible under FHE.

Citation needed. The argument isn't as simple as you made it look.

It's true that if the information needed for a computation is only stored in one location or the other of a random access memory according to a branch, accessing it will reveal which branch is taken, which is revealing too much.

But that isn't the only way to store information. For example a secret linear map or polynomial encoding of the information in a random access memory distributes the contents in such a way that you can sample multiple points to access encoded values, without obviously revealing the addressing scheme.


> Citation needed.

[0] shows that o(N) lookups are impossible in general with only O(N) space in any private information retrieval scheme. That admittedly doesn't show that O(log n) lookups are impossible, but it's a similar result.

The improvements I'm aware of include (1) storing some extra information, (2) splitting the query into multiple phases to move the bulk of the Ω(N) cost out of the hot path, and (3) reframing the problem (an actual database lookup can't be improved, but similar constructions might still satisfy the end goal, or if you're using a hashmap as a sub-component then the composite algorithm might still be made efficient by never computing those expensive intermediates).

[0] https://link.springer.com/content/pdf/10.1007/s00145-004-013...


You would have to decrypt part of your data to tell the host what memory locations to fetch. Is that still FHE? And how would you do it safely?


Yes, it's still FHE, because the physical locations (e.g. RAM addresses, disk block indexes) don't reveal which logical index (in the FHE encrypted array) you are looking up. The encrypted data isn't just shuffled in memory with a secret permutation, it's encoded in a way that's analogous to a hologram but with cryptographic operations.


I follow the idea that revealing the physical location is safe under a method like this. But how does the program tell the server which physical locations to grab? The knowledge of which locations are needed is inside the encryption.


Ah. By using more complex FHE states and operations, such that part of the state can be decoded/decrypted by the server to direct the server's actions, while other parts remain opaque.

The server may or may not find those decoded parts meaningful: Meaningful would be something like console output while the computation progresses, and a "finished" signal. Non-meaningful (but still useful to the FHE program) would be something that looks like a sequence of memory requests whose locations may as well be pseudo-random.


It has a dependency on XLS[cc], so they share their limitations: https://github.com/google/fully-homomorphic-encryption/tree/...


the term homomorphism implies that the transformations on the encrypted data (such as addition and multiplication) also apply on the plain text; now you can't check if a value is zero, as it's a different value in the plaintext and encrypted text; so you can't branch or do any decission, based on the value of the encrypted data (i think that would severely limit the practical utility of this trick)


This has some similarities to how you do 'vectorization' of data processing to avoid branching; you can "make decisions" with conditional expressions (y = condition ? a : b ) implemented as y = condition * a + (!condition) * b. You do have to calculate both branches, but you get the conditional result depending on that decision.


I saw some answers but one potential roadblock is runtime, it looks like google transpiler generates a boolean circuit, which is kind of like emulating a small piece of hardware in software


>> FHE can change the way computations are performed by preserving privacy end-to-end, thereby giving users even greater confidence that their information will remain private and secure.

"Even greater"! Wow!

Even greater than what?


Even greater than “pinky promise”, which is the current status quo and equal to 0.


The idea today is that you rely on sharing a key with the server (through a secure protocol) but the server still has to decrypt your data to do anything with it and then encrypt it back and send the result to you.

With FHE, if the server runs FHE software then you can encrypt your data with your secret key without ever disclosing it to the server (as it does not need it to compute stuff on your data).

The benefits are many: the server never has to know anything about your data (imagine a MedTech company doing diagnosis, your medical data will be safe from their prying eyes).

If the server is compromised, the attacker cannot look at your data, potentially no more private information leak!

On the regulatory side you potentially don't have to worry about GDPR anymore, you can't access the data of your users.


Also, you don't have to care where the data is being stored, because if it's fully end-to-end encrypted, location is far less of an issue (to your GDPR point). Alleviates a lot of the talk around the security risks posed by the unrestricted sale and transfer of sensitive data to foreign adversaries (China, Russia) that dominates the political landscape in the U.S.


I'm trying to understand what kind of computations would be done on encrypted data. For cases where computation depends on the raw data (conditionally), this would not be feasible. For other cases, such as e.g. training some ML model to map inputs to output, this could also be done on any non-homomophically ecrypted data. What are actually use cases for computations on FHE data?

Edit: Thinking of it more, I could imagine some sort of image transformations (e.g. a blur filter) as applications - but those are fairly corner use cases, are there more broad ones?


Potentially do stats on website usage data without ever seeing the actual data (the endgoal is not super humanitarian I know). Medical diagnosis could be one, bonus is the service never sees your data.

Image filtering made me think of this FHE demo/quickstart: https://6min.zama.ai/

As per conditional programs, there might be a way to do it but it would always be the worst case (no early loop break for example) so the runtimes would be aweful and the memory state could get prohibitively large I guess (to keep track of all the branches).


Medical diagnosis is a great example!


HE allows third-parties to perform processing without seeing the plaintext. This allows sensitive data to be held and processed by a cloud computing platform. It could also be useful for untrusted peer-to-peer computing (something like Ethereum). It seems too inefficient to be practical at the moment, but on the other hand that hasn't stopped cryptocurrencies before...


Hospital saves data about a patient in a theoretical common file that tracks all his health info, but solely belongs to the patient. Insurance wants to add data about some reimbursements for glasses they've done, but should not be allowed to see his medical conditions.

FHE lets you work on the data and update it without ever revealing the contents.


So, you could do the same with an append-only log with no read permissions?


Accept that, in the case of your append-only log, you need to trust that someone is applying that lack of read permission to themselves. And that they do it well.


Couldn't regular asymmetric encryption take care of that? Just encrypt each entry with the patient's public key.

I think you only need homomorphic encryption if the transformation is something that the owner of data can't or shouldn't do themselves. It would be overkill for simple financial information.


Absolutely. That’s why gp’s parent mentioned working on the patients’ data, not just updating it.

For example, running machine learning models on homomorphically encrypted health data.


*Except that


> are there more broad ones?

Indeed there are. FHE could provide a way to stay compliant with privacy regulations while outsourcing processing to external providers.

Think medical data or trade secrets.


Is that project used by google (or anyone else) for anything? I would be interested in production usage of such library before trying it out.


No, it's not used or usable in production due to performance reasons.


Maybe the "Encryption at rest" used in GCE products?


Is this the first published implementation of that Booleanifier technique?

Google/Bing/DuckDuckGo either return nothing or return nothing related for Booleanifier.


anyone mind explaining how this works?



FHE is useful for nation-state to groom for certain “patterns”.


Homomorphic always felt like a magic to me. it blow me away when I first heard it.


honestly FHE is voodoo to me


Maybe it helps to think of HE you're already familiar with. For example, RSA is homomorphic under multiplication - so you can multiply two ciphertexts and get back an encrypted result. If this can be done for the right subset of operations, you can construct a fully homomorphic scheme. To get anything deeper than that intuitive understanding you'd have to dive into individual protocols :)

https://crypto.stackexchange.com/questions/3555/homomorphic-...


> For example, RSA is homomorphic under multiplication - so you can multiply two ciphertexts and get back an encrypted result.

That's also related to why blind signatures work, but also to why RSA padding is necessary (to intentionally break this property!).

https://en.wikipedia.org/wiki/Blind_signature#Blind_RSA_sign...

https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Attacks_aga...

I guess blind signatures may also helpful for this intuition.


Check this out: https://6min.zama.ai

Helps to demystify the voodoo :)


[flagged]


This is a common confusion. Google homophobic encryption and you'll find quite a number of results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: