Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Blyss (YC W23) – Homomorphic encryption as a service
206 points by blintz on March 14, 2023 | hide | past | favorite | 75 comments
Hi everyone! I’m Samir, and my co-founder Neil and I are building Blyss (https://blyss.dev). Blyss is an open source homomorphic encryption SDK, available as a fully managed service.

Fully homomorphic encryption (FHE) enables computation on encrypted data. This is essentially the ultimate privacy guarantee - a server that does work for its users (like fetching emails, tweets, or search results), without ever knowing what its users are doing - who they talk to, who they follow, or even what they search for. Servers using FHE give you cryptographic proof that they aren’t spying on you.

Unfortunately, performing general computation using FHE is notoriously slow. We have focused on solving a simple, specific problem: retrieve an item from a key-value store, without revealing to the server which item was retrieved.

By focusing on retrievals, we achieve huge speedups that make Blyss practical for real-world applications: a password scanner like “Have I Been Pwned?” that checks your credentials against breaches, but never learns anything about your password (https://playground.blyss.dev/passwords), domain name servers that don’t get to see what domains you’re fetching (https://sprl.it/), and social apps that let you find out which of your contacts are already on the platform, without letting the service see your contacts (https://stackblitz.com/edit/blyss-private-contact-intersecti...).

Big companies (Apple, Google, Microsoft) are already using private retrieval: Chrome and Edge use this technology today to check URLs against blocklists of known phishing sites, and check user passwords against hacked credential dumps, without seeing any of the underlying URLs or passwords.

Blyss makes it easy for developers to use homomorphic encryption from a familiar, Firebase-like interface. You can create key-value data buckets, fill them with data, and then make cryptographically private retrievals. No entity, not even the Blyss service itself, can learn which items are retrieved from a Blyss bucket. We handle all the server infrastructure, and maintain robust open source JS clients, with the cryptography written in Rust and compiled to WebAssembly. We also have an open source server you can host yourself.

(Side note: a lot of what drew us to this problem is just how paradoxical the private retrieval guarantee sounds—it seems intuitively like it should be impossible to get data from a server without it learning what you retrieve! The basic idea of how this is actually possible is: the client encrypts a one-hot vector (all 0’s except a single 1) using homomorphic encryption, and the server is able to ‘multiply’ these by the database without learning anything about the underlying encrypted values. The dot product of the encrypted query and the database yields an encrypted result. The client decrypts this, and gets the database item it wanted. To the server, all the inputs and outputs stay completely opaque. We have a blog post explaining more, with pictures, that was on HN previously: https://news.ycombinator.com/item?id=32987155.)

Neil and I met eight years ago on the first day of freshman year of college; we’ve been best friends (and roommates!) since. We are privacy nerds—before Blyss, I worked at Yubico, and Neil worked at Apple. I’ve had an academic interest in homomorphic encryption for years, but it became a practical interest when a private Wikipedia demo I posted on HN (https://news.ycombinator.com/item?id=31668814) became popular, and people started asking for a simple way to build products using this technology.

Our client and server are MIT open source (https://github.com/blyssprivacy/sdk), and we plan to make money as a hosted server. Since the server is tricky to operate at scale, and is not part of the trust model, we think this makes sense for both us and our customers. People have used Blyss to build block explorers, DNS resolvers, and malware scanners; you can see some highlights in our playground: https://playground.blyss.dev.

We have a generous free tier, and you get an API key as soon as you log in. For production use, our pricing is usage-based: $1 gets you 10k private reads on a 1 GB database (larger databases scale costs linearly). You can also run the server yourself.

Private retrieval is a totally new building block for privacy - we can’t wait to see what you’ll build with it! Let us know what you think, or if you have any questions about Blyss or homomorphic encryption in general.



Let's say I sent up a key "foo" to get the value "bar", and I did this again and again. Will either "foo" or "bar" be encrypted to the same ciphertext again and again? Or is there some kind of nonce or salt or other mechanism that will make the ciphertext always different? Congrats on launching and thank you for any answer.


Great question! The ciphertexts will be different every time, just like in standard encryption; the scheme uses something very similar to a nonce.

We are trying to avoid the "ECB Penguin", of course: https://crypto.stackexchange.com/questions/14487/can-someone...


I'm guessing this solves a very specific pet peeve of mine:

When your bitwarden vault is not opened, if you log in to website, the extension will ask if you want to store the password, even if your vault already has an entry for that website. Of course, this is by design so that bitwarden doesn't store websites you have credentials for in plaintext (unlike lastpass and it blew up in their face).

Would this allow your browser to query a database of "domains i have a password for" without a leak on bitwarden's server exposing this exact database? There are other implementation details but you get the idea.


The most logical way would be to a copy of your vault stored locally when you first login (then kept in sync using shared secret with the server) - as far as I known most password managers work this way (no idea for bitwarden). This should enable schemes when it can check locally if your vault contain a given website, rather than having to reach a server for each website you visit.


wouldn't salting and hashing be enough for this use case if you keep the salt on the client?


Or even a bloom filter?


Same problem as “hash the PIN code”, an attacker can run the algorithm for Alexa-1M.csv.


this is great, thanks for launching it, I may actually be a future user of yours ;)

one important feature I see missing is that one cannot run queries with comparisons, such as "give me any message sent between 2022-10-10 and 2023-02-01". This would be very important when one doesn't have all the keys, or when the keys are too many, like in the messages example above.

Any idea for this kind of scenario?


Thanks! Yup, it's not always practical to make a huge number of queries when you expect many of them to come back empty. Instead, we first perform private lookups against a Bloom filter, to find out which keys actually hold data (e.g. messages). Then, we privately retrieve only the useful keys.

The Bloom filter is also served over Blyss, so the server still learns nothing about which keys you're interested in. We implemented this system for our private password checker, which tests passwords against almost a billion breached credentials: https://playground.blyss.dev/passwords


Thanks for the answer, but I was meaning from your customer perspective. My understanding is that you offer a key-value store, so the only operation available on the encrypted data is a comparison (==).

If my application wants to retrieve data within a certain range (< and > operators) is there anything I can do to implement it on top of you SDK?

Think of the encrypted messages app: how can I retrieve this month's messages using your SDK?

I hope this clearer now...


Yeah, we don’t natively support range queries.

The simple way to efficiently do this kind of check would be to store an index of keys (perhaps chunk them into buckets, like 0-10, 10-20, etc), and then privately retrieve the individual items. Retrievals are fast, especially when batched, so if the ultimate number of items you’re trying to retrieve is not too large, this can work.

If you want to chat more about range queries, feel free to email us (founders @ blyss.dev)


You can store the nodes of a search tree (e.g. with three levels year/month/day) on Blyss and use that. Of course you’d need to maintain that search tree yourself.


yes, I was thinking about the same, basically the search functionality would be on me. If one has an estimation of how frequently data come in (if the search is done on time) then one could find the best approach for keeping those trees within reasonable size, maybe with a multi level approach, as you suggest.

A generic functionality that would adapt to all situations would be nicer. Maybe Blyss could offer some libraries for that?


This capability is not exactly Fully Homomorphic Encryption (FHE). In the cryptographic literature this is typically referred to as PIR, or Private Information Retrieval [https://en.wikipedia.org/wiki/Private_information_retrieval]. Counterintuitive indeed. The idea is not totally new, though...


True! We are using FHE to perform PIR. The underlying scheme we use is a real homomorphic encryption scheme (Regev + GSW), but yeah, we explicitly do not support performing arbitrary computation on encrypted data. As it turns out, that's still quite slow - the Google FHE C++ transpiler still takes seconds to do 32-bit arithmetic operations. Our PIR system is able to achieve much more practical speed + communication overheads.


I vaguely recall basic PIR schemes turn a database from O(n) to O(n^2). While this would be transparent to users of the service, could you comment on space overhead?


The space overhead is roughly constant (or at most logarithmic in n), and varies by the scheme. In practice, it’s something like 1.5-8x overhead. This is no big deal for storage, but does make it a pain on the memory side for processing (since the full database, including the overhead factor, needs to be resident in memory).


Awesome, thanks for taking the time to elucidate!


I haven't read their protocol, but you can easily implement PIR using FHE through polynomial evaluation.


A thing long overdue, I’d say!

Have you thought about making some ELI5 explainer on how the algo essentially works?

The post you link to is already a great start, I feel like it’s just a question of a little editing work and maybe more examples

— for the nerds to get interested and actually read the paper

— for the users to understand privacy properties better (eg why this is better than TLS in case of a server infected with malware, etc

— and also things which it doesn’t do, which would calm anxiety in those who /need/ to understand the limitations to feel safe

— and to keep devs from thinking it’s a magic pixie dust and over-promising users, only to get hacked


A big part of this company has turned out to be figuring out how to explain FHE :)

I'm working on a higher-level "why/how to use this" blog post that should help. Thanks for the suggestions!


An advisor once told me "never build a business where your customers don't know they have the problem you solve."

Aside from the Apple and Googles of the world, how do you convince engineers and their managers that your solution solves a problem? Sure privacy is nice, but most consumers either don't think about it or simply assume it isn't private no matter what you tell them.


I'm trying to understand too so please correct me if this is wrong.

As I was able to gather it's like, you send them a camera which can only hold one picture.

Then they use this camera to scan every document they have. The trick is that this camera will only save the picture, without revealing them which one, of the record you are interested in.

When they finish scanning all the documents, they send you back the camera which now has a single picture of the document you wanted, and than only you can access.


Awesome work, I look forward to finding applications for this.

Question: Have you considered using zk-STARKs for succinct proofs of computation? Or would that be too far off target wrt. being good at one thing?

E.g. https://github.com/TritonVM


Things tend to get pretty slow when you try to compute SNARKs over FHE computations. Some progress is getting made, but it's still pretty academic.

There is a cool company trying to instead use FHE to accelerate SNARKs: https://github.com/Sunscreen-tech/Sunscreen. They seem to be making some headway!


How do you guarantee the server that sent the Javascript to the browser, which stores the client secret key in the browser, didn't get hacked to also send the client secret key somewhere else after it was generated in the client?


Yeah, this is definitely a risk of any in-browser demo of this tech. The story for apps is much better, since there's a routine installation process, signatures are checked, etc. We'd like private retrievals to eventually be part of the browser itself, so that it can make a kind of "private GET" request natively.

We'd also love to bind our client JS code to a hash of our build output from GitHub, but as of now there's no simple way to do this that the browser will pin automatically - integrity checks are good, but don't prevent the server from just changing the hash. We've toyed with writing an extension for this, but haven't gotten around to it.


I wonder how the Subresource Integrity can expand to the root document hash (other than using IPFS gateways).

UPD yeah, extension hashing resources sounds nice too


I've wanted this too! You could include subresource integrity hash in the URL that the browser will check against the page. This would make things like Cryptpad and Skiff, or group invite links in Signal, way more secure.


This seems like it would be a cool browser standard. The browser could check that the specified SRI hash matches one published by some other entity, and then include extra information in the ‘lock’ icon or dialog, that goes further than TLS.

Usually, when I have an idea for a standard, it turns out one exists, so maybe I’ll do some digging…


Assuming the malicious server operator, you need to obtain the client out-of-band (package manager, app store etc), or if we require it’s the web app - thru somethink like an IPFS gateway where you can be sure the bits received match a particular hash.

Or do a git clone (pinned to commit hash) and host the client locally, I guess))


I was completely lost attempting to understand this at first but I think I kind of get it now based on the documentation on the website. However, some things seem strange. If I have N items in my database, the key is length N bits. I million items = 1 million bit key. It is very likely that I am not understanding this correctly however. Does something translate a Log(N) sized "user" key to a one hot vector?

item = key * database

That sounds like loading the entire database every time. If true, I do understand how the system cannot possibly would not know which item was retrieved but not clear on how the entire database isn't loaded every time.

"Cryptographic proof that the server is not spying..."

I don't understand how this is possible. If the service is implemented to remember the key, perform the request and return the result how can cryptographic proof be provided.

In any case, this definitely seems like a very cool and useful project. I struggle to understand / trust it a little but perhaps I'll eventually become comfortable with it.


Thanks for checking it out! Responses inline:

> That sounds like loading the entire database every time

Yup, we do perform computation over the entire database for every read - there is zero correlation between the server's work and the client's query. We currently serve queries to a 1 GB database in under 1 second. For much larger databases (100+ GB), this becomes more a question of cost: we can stay fast (1 sec) with more expense, or go slower (e.g. 5 sec) and stay cheap.

> Cryptographic proof that the server is not spying

If you trust your client software [0], then you can be sure that your request isn't decrypted anywhere outside your device. Even a malicious Blyss server cannot determine your query, because it never got a chance to see it.

[0] This level of security depends entirely on having a trusted client. Our client software is open source, and we plan to have it formally audited. We'll also publish signed desktop apps so you can be sure that you're running the same client every time.


So, imagine your api endpoint looks like this:

get_data(key):

  hahaGotTheKey = key

  result = do_complicated_homomorphic_stuff(key)

  hahaGotTheResult = result

  save(hahaGotTheKey, hahaGotTheResult)

  return result
I guess you are saying the since the key is encrypted there is no way to know exactly what the user asked for. The result is encrypted so there is no way to know what it is. The only thing we know is the user asked for something and something was retrieved.

Of course, if the key is encrypted and the data is encrypted, how is it differentiated from a regular kvs? i.e.

cypherKey = encr(key) cypherData = encr(value)

kvs.put(cypherKey, cypherData)

Of course, I obviously do not understand it - this is merely a window into my flawed mental model.


Can you you elaborate on the differences between this and end-to-end encryption?


Sure! End-to-end encryption (E2EE) in a messaging context is about the service provider (Meta for WhatsApp, Apple for iMessage) not learning the contents of messages sent on the platform. E2EE also gets used when referring to backups, where it again refers to the service provider of the backups not learning the contents of backups.

Private retrieval is a more general concept, which refers to retrieving data from a server without letting it learn your access pattern. In a specific application, it's easier to see the contrast: for example, in our password checker (https://playground.blyss.dev/passwords), the data that Blyss helps keep encrypted, and prevents the server from learning, is which password you are checking. With standard E2EE techniques, it would not really possible to keep your query private.

In messaging, Blyss can be used to build messaging services that not only do not learn what you say (the standard E2EE guarantee), but also do not learn who you talk to. We're working on this, but it's a tricky thing to ship.


At a previous job we used to rent a fair number of servers from companies we couldn't trust (and even if we did, some had security practices so bad that we couldn't be sure they didn't, for example, leave easily exploited backdoors on our boxes. We sunk a fair bit of thought into how we could get some use out of that spare compute we were paying for. Some day, maybe that problem could be solved with FHE, if an efficient scheme could be achieved. Though I'd prefer, in the future, to never be renting shady or insecure servers I suppose.


Yes, we all place a lot of trust in cloud vendors today. FHE is a way to move the trust boundary back to the client - let the server be as malicious or insecure as it wants. Raw compute could even become much cheaper, since any machine anywhere can be a supplier in the market for untrusted CPU time.


Are there any hardware acceleration strategies for FHE or is it all making the calculations more efficient on the software side right now? My guess is that the software needs to mature before baking silicon?


Our FHE scheme uses lots of Number Theoretic Transforms (NTTs), which are pretty computationally expensive. NTT is a good candidate for acceleration, and there is quite a bit of interest from the zk community in doing so (https://www.zprize.io/prizes/accelerating-ntt-operations-on-...).

From a hardware perspective, NTT can be done in parallel, but has a fairly large working set of data (~512 MB) with lots of unstructured accesses. This is too big to fit in even the largest CPU L3 caches, so DRAM bandwidth is still relevant. It may be eventually be feasible to build an ASIC with this much on-chip memory, but in the meantime, GPUs do a pretty decent job with their massive HBM bandwidth.


interesting prize, I wonder why they fix that it has to be radix-2 NTT, using higher radix speeds things up an order of magnitude on GPU (granted I am using a 256 bit field, so it might be more memory bound)


Might just be my browser, but on the homepage, both the "scan for breached credentials" and "block malicious URLs" links both lead to the password checker when clicked.


What is the read latency?


It's 1-2 seconds for a 1 GB database with millions of items.

(A couple years ago this was more like minutes, and about 10 years ago it would have taken hours!)


That's impressive.


As someone who cares about privacy I would never trust something like this, no matter how many guarantees it advertised, though (much like duckduckgo, which I don't trust either), I might still use it anyway since the alternatives are services that more or less spit on privacy or actively work to undermine it.


Thanks for the feedback, I understand your hesitation. We don't just want to advertise guarantees - we want you to never trust third-party servers again. Fully homomorphic encryption makes this possible by never letting sensitive data even leave your device. Our job is to make this new cryptography a web standard as ubiquitous as TLS.


Hey, tangentially- I am CEO of Fabric, a company building orders of magnitude faster hardware accelerators for next-gen cryptography on the latest fab technologies.

Would love to share notes if you're up for it!


Sure, we'd love to talk! Hardware acceleration is really cool. Send us an email: founders @ blyss.dev.


Which companies would use this and why? Data worth making private is also worth some $$ to the business hosting it.


Some data, like passwords or other credentials, isn't stuff anyone really wants to monetize - so secrets managers (things like HashiCorp Vault) and password managers are both interested in using this to allow them to collect even less data.

In other cases, for the same compliance and data security reasons behind the desire for on-prem, larger enterprises prefer that their SaaS vendors collect as little data about them as possible. Blyss can get you the best of both worlds: the data security of on-prem, with the convenience and ease-of-deployment of SaaS.


Congrats on the launch! I actually considered launching something tangential - though I never figured out who the customers would really be nor how I would pitch this to companies. Excited to see where this takes you!


Thanks! Yup, private retrieval is interesting as a product because it's a fundamentally new capability; there aren't really competitors we can show incremental improvements against. If you're still interested in the space, we'd be happy to compare notes! Feel free to email us: founders AT blyss.dev


> The SDK has not yet been security reviewed, and the public Blyss service is still in beta.

Currently, what are your plans related to security audit both in terms of structuring it and context to you it would make sense?


The main thing we'd like the security review to focus on is our Rust client code. We'd also really like to select for a reviewing team that has a deep level of familiarity with cryptography. We would provide the team with a summary of the sensitive operations involved in lattice-based key generation, so that lattice experience would not need to be a hard prerequisite to understanding the code.


While likely expensive, when I looked around awhile back, Trail of Bits [1] to me seemed to produce the best audits for cryptographic systems, though possible there are better/cheaper options.

[1] https://www.trailofbits.com/


Looks good, congrats!

In your landing page example, where does the secret client key fit in?


Thanks! The secret client key stays in the browser or app. It's used to encrypt queries, and decrypt the server responses.


Right, but is it generated under the hood for each query?

And how is the data that was initially written encrypted/decrypted? who holds the key for that?


Yes, it's generated in the browser for each query.

And this depends on the application - for example, for the private password checker, all the dumped passwords data is from a public dataset, so its not encrypted. In messaging, the data would be encrypted under the intended recipient's public key.


Very cool. Do you offer consulting if someone wanted to bake this into their solution?


Exciting times.

OpenAI's GPT-4 announcement, Google announcing AI for Workspace, Meta additional 10k layoffs, and now we're seeing homomorphic encryption come out to the masses.

All in one day!


Seems like combining this with Tor onion service would be a natural fit putting aside potential legal or ethical issues. Any thoughts on the topic?


To answer my own question, yes, not only would PIR on Tor Onion Service enable anonymous access of data via an anonymous connection, but there other possible uses of PIR for Tor. Notable example being "PIR Tor" — which is a Tor architecture that would change Tor from being P2P to client-server:

https://www.usenix.org/conference/usenix-security-11/pir-tor...


Tor uses a "client-server" model at the moment. All relays have to publish their server descriptor to the directory authorities (9 Tor relays run by super trusted community members). The article you linked is about how to scale the network after those become a bottleneck, where one possibility would be P2P or as they propose PIR.

What potential legal or ethical issues do you see with access via Tor onion service?


Could this somehow be used against a dynamic resource such as a chatbot to allow fully private interaction?


This is great -- sorely needed and long overdue. Thanks for sharing the code and good luck with the company!


Is this FHE or oblivious transfer?


It's FHE applied to solve a variant of oblivious transfer, called "private information retrieval" (https://en.wikipedia.org/wiki/Private_information_retrieval). PIR is very similar to oblivious transfer, except that in oblivious transfer, the privacy is mutual - the client learns exactly one element from the database; in PIR, it's ok if the client learns some number of 'extra' items other than the one it queried.


You stole my name!

I'm kidding... for a while I wanted to make a game named "blyss". I own the blyss.io domain name. I'll sell it to you if you want!


I read homophobic encryption as a service and was seriously confused


> This is essentially the ultimate privacy guarantee - a server that does work for its users (like fetching emails, tweets, or search results), without ever knowing what its users are doing - who they talk to, who they follow, or even what they search for.

Isn't this perfect for mostly criminals and all the bad actors?

Is there anything you're going to do about these people using your service?


We don't think this guarantee is only useful to bad actors, in the same way that end-to-end encryption has turned out to be useful even if you're not doing something illegal.

The businesses using Blyss want to perform tasks (like scanning for breached credentials) without seeing sensitive customer data. Even the US government's civilian cybersecurity agency, CISA, recommends that you use end-to-end encrypted solutions for credential vaults (https://www.cisa.gov/news-events/cybersecurity-advisories/aa... Blyss is an added layer for these services, protecting even access metadata.


Individuals have a right to privacy. This right is not contingent on there being no bad actors on the planet. If anything, the existence of bad actors reinforces the right to privacy of good actors.


Crime is illegal, best to leave that up to various law enforcement agencies.

While talking about a crime is illegal, it is the person(s) talking about the crime that are criminals, not the letter it is written on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: