Launch HN: Blyss (YC W23) – Homomorphic encryption as a service

dougk16 · on March 14, 2023

Let's say I sent up a key "foo" to get the value "bar", and I did this again and again. Will either "foo" or "bar" be encrypted to the same ciphertext again and again? Or is there some kind of nonce or salt or other mechanism that will make the ciphertext always different? Congrats on launching and thank you for any answer.

blintz · on March 14, 2023

Great question! The ciphertexts will be different every time, just like in standard encryption; the scheme uses something very similar to a nonce.

We are trying to avoid the "ECB Penguin", of course: https://crypto.stackexchange.com/questions/14487/can-someone...

themoonisachees · on March 14, 2023

I'm guessing this solves a very specific pet peeve of mine:

When your bitwarden vault is not opened, if you log in to website, the extension will ask if you want to store the password, even if your vault already has an entry for that website. Of course, this is by design so that bitwarden doesn't store websites you have credentials for in plaintext (unlike lastpass and it blew up in their face).

Would this allow your browser to query a database of "domains i have a password for" without a leak on bitwarden's server exposing this exact database? There are other implementation details but you get the idea.

forty · on March 15, 2023

The most logical way would be to a copy of your vault stored locally when you first login (then kept in sync using shared secret with the server) - as far as I known most password managers work this way (no idea for bitwarden). This should enable schemes when it can check locally if your vault contain a given website, rather than having to reach a server for each website you visit.

julvo · on March 14, 2023

wouldn't salting and hashing be enough for this use case if you keep the salt on the client?

paulryanrogers · on March 16, 2023

Or even a bloom filter?

malf · on March 15, 2023

Same problem as “hash the PIN code”, an attacker can run the algorithm for Alexa-1M.csv.

dariosalvi78 · on March 14, 2023

this is great, thanks for launching it, I may actually be a future user of yours ;)

one important feature I see missing is that one cannot run queries with comparisons, such as "give me any message sent between 2022-10-10 and 2023-02-01". This would be very important when one doesn't have all the keys, or when the keys are too many, like in the messages example above.

Any idea for this kind of scenario?

neilmovva · on March 14, 2023

Thanks! Yup, it's not always practical to make a huge number of queries when you expect many of them to come back empty. Instead, we first perform private lookups against a Bloom filter, to find out which keys actually hold data (e.g. messages). Then, we privately retrieve only the useful keys.

The Bloom filter is also served over Blyss, so the server still learns nothing about which keys you're interested in. We implemented this system for our private password checker, which tests passwords against almost a billion breached credentials: https://playground.blyss.dev/passwords

dariosalvi78 · on March 14, 2023

Thanks for the answer, but I was meaning from your customer perspective. My understanding is that you offer a key-value store, so the only operation available on the encrypted data is a comparison (==).

If my application wants to retrieve data within a certain range (< and > operators) is there anything I can do to implement it on top of you SDK?

Think of the encrypted messages app: how can I retrieve this month's messages using your SDK?

I hope this clearer now...

blintz · on March 14, 2023

Yeah, we don’t natively support range queries.

The simple way to efficiently do this kind of check would be to store an index of keys (perhaps chunk them into buckets, like 0-10, 10-20, etc), and then privately retrieve the individual items. Retrievals are fast, especially when batched, so if the ultimate number of items you’re trying to retrieve is not too large, this can work.

If you want to chat more about range queries, feel free to email us (founders @ blyss.dev)

layer8 · on March 15, 2023

You can store the nodes of a search tree (e.g. with three levels year/month/day) on Blyss and use that. Of course you’d need to maintain that search tree yourself.

dariosalvi78 · on March 15, 2023

yes, I was thinking about the same, basically the search functionality would be on me. If one has an estimation of how frequently data come in (if the search is done on time) then one could find the best approach for keeping those trees within reasonable size, maybe with a multi level approach, as you suggest.

A generic functionality that would adapt to all situations would be nicer. Maybe Blyss could offer some libraries for that?

ngneer · on March 14, 2023

This capability is not exactly Fully Homomorphic Encryption (FHE). In the cryptographic literature this is typically referred to as PIR, or Private Information Retrieval [https://en.wikipedia.org/wiki/Private_information_retrieval]. Counterintuitive indeed. The idea is not totally new, though...

blintz · on March 14, 2023

True! We are using FHE to perform PIR. The underlying scheme we use is a real homomorphic encryption scheme (Regev + GSW), but yeah, we explicitly do not support performing arbitrary computation on encrypted data. As it turns out, that's still quite slow - the Google FHE C++ transpiler still takes seconds to do 32-bit arithmetic operations. Our PIR system is able to achieve much more practical speed + communication overheads.

ngneer · on March 14, 2023

I vaguely recall basic PIR schemes turn a database from O(n) to O(n^2). While this would be transparent to users of the service, could you comment on space overhead?

blintz · on March 14, 2023

The space overhead is roughly constant (or at most logarithmic in n), and varies by the scheme. In practice, it’s something like 1.5-8x overhead. This is no big deal for storage, but does make it a pain on the memory side for processing (since the full database, including the overhead factor, needs to be resident in memory).

ngneer · on March 15, 2023

Awesome, thanks for taking the time to elucidate!

Oreko · on March 14, 2023

I haven't read their protocol, but you can easily implement PIR using FHE through polynomial evaluation.

wizzard0 · on March 14, 2023

A thing long overdue, I’d say!

Have you thought about making some ELI5 explainer on how the algo essentially works?

The post you link to is already a great start, I feel like it’s just a question of a little editing work and maybe more examples

— for the nerds to get interested and actually read the paper

— for the users to understand privacy properties better (eg why this is better than TLS in case of a server infected with malware, etc

— and also things which it doesn’t do, which would calm anxiety in those who /need/ to understand the limitations to feel safe

— and to keep devs from thinking it’s a magic pixie dust and over-promising users, only to get hacked

blintz · on March 14, 2023

A big part of this company has turned out to be figuring out how to explain FHE :)

I'm working on a higher-level "why/how to use this" blog post that should help. Thanks for the suggestions!

mike_d · on March 15, 2023

An advisor once told me "never build a business where your customers don't know they have the problem you solve."

Aside from the Apple and Googles of the world, how do you convince engineers and their managers that your solution solves a problem? Sure privacy is nice, but most consumers either don't think about it or simply assume it isn't private no matter what you tell them.

kobalsky · on March 15, 2023

I'm trying to understand too so please correct me if this is wrong.

As I was able to gather it's like, you send them a camera which can only hold one picture.

Then they use this camera to scan every document they have. The trick is that this camera will only save the picture, without revealing them which one, of the record you are interested in.

When they finish scanning all the documents, they send you back the camera which now has a single picture of the document you wanted, and than only you can access.

sshine · on March 14, 2023

Awesome work, I look forward to finding applications for this.

Question: Have you considered using zk-STARKs for succinct proofs of computation? Or would that be too far off target wrt. being good at one thing?

E.g. https://github.com/TritonVM

blintz · on March 14, 2023

Things tend to get pretty slow when you try to compute SNARKs over FHE computations. Some progress is getting made, but it's still pretty academic.

There is a cool company trying to instead use FHE to accelerate SNARKs: https://github.com/Sunscreen-tech/Sunscreen. They seem to be making some headway!

asm64me · on March 14, 2023

How do you guarantee the server that sent the Javascript to the browser, which stores the client secret key in the browser, didn't get hacked to also send the client secret key somewhere else after it was generated in the client?

blintz · on March 14, 2023

Yeah, this is definitely a risk of any in-browser demo of this tech. The story for apps is much better, since there's a routine installation process, signatures are checked, etc. We'd like private retrievals to eventually be part of the browser itself, so that it can make a kind of "private GET" request natively.

We'd also love to bind our client JS code to a hash of our build output from GitHub, but as of now there's no simple way to do this that the browser will pin automatically - integrity checks are good, but don't prevent the server from just changing the hash. We've toyed with writing an extension for this, but haven't gotten around to it.

wizzard0 · on March 14, 2023

I wonder how the Subresource Integrity can expand to the root document hash (other than using IPFS gateways).

UPD yeah, extension hashing resources sounds nice too

holmesworcester · on March 14, 2023

I've wanted this too! You could include subresource integrity hash in the URL that the browser will check against the page. This would make things like Cryptpad and Skiff, or group invite links in Signal, way more secure.

blintz · on March 14, 2023

This seems like it would be a cool browser standard. The browser could check that the specified SRI hash matches one published by some other entity, and then include extra information in the ‘lock’ icon or dialog, that goes further than TLS.

Usually, when I have an idea for a standard, it turns out one exists, so maybe I’ll do some digging…

wizzard0 · on March 14, 2023

Assuming the malicious server operator, you need to obtain the client out-of-band (package manager, app store etc), or if we require it’s the web app - thru somethink like an IPFS gateway where you can be sure the bits received match a particular hash.

Or do a git clone (pinned to commit hash) and host the client locally, I guess))

osigurdson · on March 15, 2023

I was completely lost attempting to understand this at first but I think I kind of get it now based on the documentation on the website. However, some things seem strange. If I have N items in my database, the key is length N bits. I million items = 1 million bit key. It is very likely that I am not understanding this correctly however. Does something translate a Log(N) sized "user" key to a one hot vector?

item = key * database

That sounds like loading the entire database every time. If true, I do understand how the system cannot possibly would not know which item was retrieved but not clear on how the entire database isn't loaded every time.

"Cryptographic proof that the server is not spying..."

I don't understand how this is possible. If the service is implemented to remember the key, perform the request and return the result how can cryptographic proof be provided.

In any case, this definitely seems like a very cool and useful project. I struggle to understand / trust it a little but perhaps I'll eventually become comfortable with it.

neilmovva · on March 15, 2023

Thanks for checking it out! Responses inline:

> That sounds like loading the entire database every time

Yup, we do perform computation over the entire database for every read - there is zero correlation between the server's work and the client's query. We currently serve queries to a 1 GB database in under 1 second. For much larger databases (100+ GB), this becomes more a question of cost: we can stay fast (1 sec) with more expense, or go slower (e.g. 5 sec) and stay cheap.

> Cryptographic proof that the server is not spying

If you trust your client software [0], then you can be sure that your request isn't decrypted anywhere outside your device. Even a malicious Blyss server cannot determine your query, because it never got a chance to see it.

[0] This level of security depends entirely on having a trusted client. Our client software is open source, and we plan to have it formally audited. We'll also publish signed desktop apps so you can be sure that you're running the same client every time.

osigurdson · on March 16, 2023

So, imagine your api endpoint looks like this:

get_data(key):

  hahaGotTheKey = key

  result = do_complicated_homomorphic_stuff(key)

  hahaGotTheResult = result

  save(hahaGotTheKey, hahaGotTheResult)

  return result

I guess you are saying the since the key is encrypted there is no way to know exactly what the user asked for. The result is encrypted so there is no way to know what it is. The only thing we know is the user asked for something and something was retrieved.

Of course, if the key is encrypted and the data is encrypted, how is it differentiated from a regular kvs? i.e.

cypherKey = encr(key) cypherData = encr(value)

kvs.put(cypherKey, cypherData)

Of course, I obviously do not understand it - this is merely a window into my flawed mental model.

miketmahlkow · on March 14, 2023

Can you you elaborate on the differences between this and end-to-end encryption?

blintz · on March 14, 2023

Sure! End-to-end encryption (E2EE) in a messaging context is about the service provider (Meta for WhatsApp, Apple for iMessage) not learning the contents of messages sent on the platform. E2EE also gets used when referring to backups, where it again refers to the service provider of the backups not learning the contents of backups.

Private retrieval is a more general concept, which refers to retrieving data from a server without letting it learn your access pattern. In a specific application, it's easier to see the contrast: for example, in our password checker (https://playground.blyss.dev/passwords), the data that Blyss helps keep encrypted, and prevents the server from learning, is which password you are checking. With standard E2EE techniques, it would not really possible to keep your query private.

In messaging, Blyss can be used to build messaging services that not only do not learn what you say (the standard E2EE guarantee), but also do not learn who you talk to. We're working on this, but it's a tricky thing to ship.

EamonnMR · on March 15, 2023

At a previous job we used to rent a fair number of servers from companies we couldn't trust (and even if we did, some had security practices so bad that we couldn't be sure they didn't, for example, leave easily exploited backdoors on our boxes. We sunk a fair bit of thought into how we could get some use out of that spare compute we were paying for. Some day, maybe that problem could be solved with FHE, if an efficient scheme could be achieved. Though I'd prefer, in the future, to never be renting shady or insecure servers I suppose.

neilmovva · on March 16, 2023

Yes, we all place a lot of trust in cloud vendors today. FHE is a way to move the trust boundary back to the client - let the server be as malicious or insecure as it wants. Raw compute could even become much cheaper, since any machine anywhere can be a supplier in the market for untrusted CPU time.

robszumski · on March 14, 2023

Are there any hardware acceleration strategies for FHE or is it all making the calculations more efficient on the software side right now? My guess is that the software needs to mature before baking silicon?

neilmovva · on March 14, 2023

Our FHE scheme uses lots of Number Theoretic Transforms (NTTs), which are pretty computationally expensive. NTT is a good candidate for acceleration, and there is quite a bit of interest from the zk community in doing so (https://www.zprize.io/prizes/accelerating-ntt-operations-on-...).

From a hardware perspective, NTT can be done in parallel, but has a fairly large working set of data (~512 MB) with lots of unstructured accesses. This is too big to fit in even the largest CPU L3 caches, so DRAM bandwidth is still relevant. It may be eventually be feasible to build an ASIC with this much on-chip memory, but in the meantime, GPUs do a pretty decent job with their massive HBM bandwidth.

byteware · on March 14, 2023

interesting prize, I wonder why they fix that it has to be radix-2 NTT, using higher radix speeds things up an order of magnitude on GPU (granted I am using a 256 bit field, so it might be more memory bound)

arcanemachiner · on March 14, 2023

Might just be my browser, but on the homepage, both the "scan for breached credentials" and "block malicious URLs" links both lead to the password checker when clicked.

jonathan-kosgei · on March 14, 2023

What is the read latency?

blintz · on March 14, 2023

It's 1-2 seconds for a 1 GB database with millions of items.

(A couple years ago this was more like minutes, and about 10 years ago it would have taken hours!)

robertlagrant · on March 15, 2023

That's impressive.

pmoriarty · on March 15, 2023

As someone who cares about privacy I would never trust something like this, no matter how many guarantees it advertised, though (much like duckduckgo, which I don't trust either), I might still use it anyway since the alternatives are services that more or less spit on privacy or actively work to undermine it.

neilmovva · on March 16, 2023

Thanks for the feedback, I understand your hesitation. We don't just want to advertise guarantees - we want you to never trust third-party servers again. Fully homomorphic encryption makes this possible by never letting sensitive data even leave your device. Our job is to make this new cryptography a web standard as ubiquitous as TLS.

zgao · on March 15, 2023

Hey, tangentially- I am CEO of Fabric, a company building orders of magnitude faster hardware accelerators for next-gen cryptography on the latest fab technologies.

Would love to share notes if you're up for it!

blintz · on March 15, 2023

Sure, we'd love to talk! Hardware acceleration is really cool. Send us an email: founders @ blyss.dev.

namank · on March 14, 2023

Which companies would use this and why? Data worth making private is also worth some $$ to the business hosting it.

blintz · on March 14, 2023

Some data, like passwords or other credentials, isn't stuff anyone really wants to monetize - so secrets managers (things like HashiCorp Vault) and password managers are both interested in using this to allow them to collect even less data.

In other cases, for the same compliance and data security reasons behind the desire for on-prem, larger enterprises prefer that their SaaS vendors collect as little data about them as possible. Blyss can get you the best of both worlds: the data security of on-prem, with the convenience and ease-of-deployment of SaaS.

1differential · on March 14, 2023

Congrats on the launch! I actually considered launching something tangential - though I never figured out who the customers would really be nor how I would pitch this to companies. Excited to see where this takes you!

neilmovva · on March 14, 2023

Thanks! Yup, private retrieval is interesting as a product because it's a fundamentally new capability; there aren't really competitors we can show incremental improvements against. If you're still interested in the space, we'd be happy to compare notes! Feel free to email us: founders AT blyss.dev

O__________O · on March 15, 2023

> The SDK has not yet been security reviewed, and the public Blyss service is still in beta.

Currently, what are your plans related to security audit both in terms of structuring it and context to you it would make sense?

blintz · on March 15, 2023

The main thing we'd like the security review to focus on is our Rust client code. We'd also really like to select for a reviewing team that has a deep level of familiarity with cryptography. We would provide the team with a summary of the sensitive operations involved in lattice-based key generation, so that lattice experience would not need to be a hard prerequisite to understanding the code.

O__________O · on March 15, 2023

While likely expensive, when I looked around awhile back, Trail of Bits [1] to me seemed to produce the best audits for cryptographic systems, though possible there are better/cheaper options.

[1] https://www.trailofbits.com/

brap · on March 14, 2023

Looks good, congrats!

In your landing page example, where does the secret client key fit in?

blintz · on March 14, 2023

Thanks! The secret client key stays in the browser or app. It's used to encrypt queries, and decrypt the server responses.

brap · on March 14, 2023

Right, but is it generated under the hood for each query?

And how is the data that was initially written encrypted/decrypted? who holds the key for that?

blintz · on March 14, 2023

Yes, it's generated in the browser for each query.

And this depends on the application - for example, for the private password checker, all the dumped passwords data is from a public dataset, so its not encrypted. In messaging, the data would be encrypted under the intended recipient's public key.

bribriinlondon · on March 20, 2023

Very cool. Do you offer consulting if someone wanted to bake this into their solution?

oars · on March 14, 2023

Exciting times.

OpenAI's GPT-4 announcement, Google announcing AI for Workspace, Meta additional 10k layoffs, and now we're seeing homomorphic encryption come out to the masses.

All in one day!

O__________O · on March 15, 2023

Seems like combining this with Tor onion service would be a natural fit putting aside potential legal or ethical issues. Any thoughts on the topic?

O__________O · on March 15, 2023

To answer my own question, yes, not only would PIR on Tor Onion Service enable anonymous access of data via an anonymous connection, but there other possible uses of PIR for Tor. Notable example being "PIR Tor" — which is a Tor architecture that would change Tor from being P2P to client-server:

https://www.usenix.org/conference/usenix-security-11/pir-tor...

bauruine · on March 15, 2023

Tor uses a "client-server" model at the moment. All relays have to publish their server descriptor to the directory authorities (9 Tor relays run by super trusted community members). The article you linked is about how to scale the network after those become a bottleneck, where one possibility would be P2P or as they propose PIR.

What potential legal or ethical issues do you see with access via Tor onion service?

detrites · on March 15, 2023

Could this somehow be used against a dynamic resource such as a chatbot to allow fully private interaction?

eternalban · on March 14, 2023

This is great -- sorely needed and long overdue. Thanks for sharing the code and good luck with the company!

haliax · on March 14, 2023

Is this FHE or oblivious transfer?

blintz · on March 14, 2023

It's FHE applied to solve a variant of oblivious transfer, called "private information retrieval" (https://en.wikipedia.org/wiki/Private_information_retrieval). PIR is very similar to oblivious transfer, except that in oblivious transfer, the privacy is mutual - the client learns exactly one element from the database; in PIR, it's ok if the client learns some number of 'extra' items other than the one it queried.

Foomf · on March 14, 2023

You stole my name!

I'm kidding... for a while I wanted to make a game named "blyss". I own the blyss.io domain name. I'll sell it to you if you want!

nurhdmsx · on March 14, 2023

I read homophobic encryption as a service and was seriously confused

colesantiago · on March 14, 2023

> This is essentially the ultimate privacy guarantee - a server that does work for its users (like fetching emails, tweets, or search results), without ever knowing what its users are doing - who they talk to, who they follow, or even what they search for.

Isn't this perfect for mostly criminals and all the bad actors?

Is there anything you're going to do about these people using your service?

blintz · on March 14, 2023

We don't think this guarantee is only useful to bad actors, in the same way that end-to-end encryption has turned out to be useful even if you're not doing something illegal.

The businesses using Blyss want to perform tasks (like scanning for breached credentials) without seeing sensitive customer data. Even the US government's civilian cybersecurity agency, CISA, recommends that you use end-to-end encrypted solutions for credential vaults (https://www.cisa.gov/news-events/cybersecurity-advisories/aa... Blyss is an added layer for these services, protecting even access metadata.

abtinf · on March 14, 2023

Individuals have a right to privacy. This right is not contingent on there being no bad actors on the planet. If anything, the existence of bad actors reinforces the right to privacy of good actors.

inariakagane · on March 14, 2023

Crime is illegal, best to leave that up to various law enforcement agencies.

While talking about a crime is illegal, it is the person(s) talking about the crime that are criminals, not the letter it is written on.