Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>The Valid method takes a context (which is optional but has been useful for me in the past) and returns a map. If there is a problem with a field, its name is used as the key, and a human-readable explanation of the issue is set as the value.

I used to do this, but ever since reading Lexi Lambda's "Parse, Don't Validate," [0] I've found validators to be much more error-prone than leveraging Go's built-in type checker.

For example, imagine you wanted to defend against the user picking an illegal username. Like you want to make sure the user can't ever specify a username with angle brackets in it.

With the Validator approach, you have to remember to call the validator on 100% of code paths where the username value comes from an untrusted source.

Instead of using a validator, you can do this:

    type Username struct {
      value string
    }

    func NewUsername(username string) (Username, error) {
      // Validate the username adheres to our schema.
      ...

      return Username{username}
    }
That guarantees that you can never forget to validate the username through any codepath. If you have a Username object, you know that it was validated because there was no other way to create the object.

[0] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...



Crazy that actually using your type system leads to better code. Stop passing everything around as `string`. Parse them, and type them.


There's a name for this anti-pattern: "Stringly typed"


I've also seen it called primitive obsession, which is also applicable to other primitive types like using an integer in situations where an enum would be better.


Definitely use to fall for primitive obsession. It seemed so silly to wrap objects in an intermediary type.

After playing with Rust, I changed my tune. The type system just forces you into the correct path, that a lot of code became boring because you no longer had to second guess what-if scenarios.


> Definitely use to fall for primitive obsession. It seemed so silly to wrap objects in an intermediary type.

A lot of languages certainly don't make it easy. You shouldn't have to make a Username struct/class with a string field to have a typed username. You should be able to declare a type Username which is just a string under the hood, but with different associated functions.


We use this pattern extensively in a large Java app. As long as you establish these patterns early on in the project, the team adapts to the conventions. It's worked well for us and the lack of language support doesn't get in the way much.


Yeah, modern type systems are game changers. I've soured on Rust, but if Go had the full Ocaml type system with match statements I think it would be the perfect language.


Go would need such a revamp to be anywhere close to a decent language, that it would be just a straight up other language.


Sadly enums are too advanced of a concept to be included in Go.



This term is typically used to refer to things like data structures and numerical values all being passed as strings. I don't think a reasonable person would consider storing a username in a string to be "stringly typed".


It definitely is stringly typed. It's just that it's a very normalized example of it, that people don't think of as being an antipattern.

If you want to implement what Yaron Minsky described as "make illegal states unrepresentable", then you use a username type, not a string. That rules out multiple entire classes of illegal states.

If you do that, then when you compile your program, the typechecker can provide a much stronger correctness proof, for more properties. It allows you to do "static debugging" effectively, where you debug your code before it ever even runs.


I don’t get what you’re about. The root comment clearly presents a structure of a separate type. The fact that it happens to contain a single string field is completely irrelevant (what type an actual username should be, a float?). “Stringly typed” is about stringifying non-string values to save typing work and is not applicable here in the slightest.


I wasn't replying to the root comment, I was replying in the context of the subsequent three comments, specifically:

> > > Crazy that actually using your type system leads to better code.

> > There's a name for this anti-pattern: "Stringly typed"

> I don't think a reasonable person would consider storing a username in a string to be "stringly typed".

#1 was saying that the root comment shows better code using the type system.

#2 was clearly referring to the case where you don't do this as being an anti-pattern.

#3 is saying that storing a username in a string, without wrapping defining a distinct type for it, was not stringly typed. But as I pointed out, it certainly is.

If you doubt my interpretation of #3, the same commenter said this in another comment: "Is it really more 'programmer friendly' to create wrapper types for individual strings all over your codebase?"


I see, my apologies!


I wasn’t sure who was right. I’ll tie break with https://wiki.c2.com/?StringlyTyped= which pretty much says what you just said


The commenter you're replying to misunderstood the discussion. See my sibling reply.


The One True Wiki[0] says "Used to describe an implementation that needlessly relies on strings when programmer & refactor friendly options are available."

Which is exactly what's going on here. A username has a string as a payload, but that payload has restrictions (not every string will do) and methods which expect a username should get a username, not any old string.

[0]: https://wiki.c2.com/?StringlyTyped


I don't agree that this example is more "programmer friendly". Anything you want to do with the username other than null check and passing an argument is going to be based directly on the string representation. Insert into a database? String. Display in a UI? String. Compare? String comparison. Sort? String sort. Is it really more "programmer friendly" to create wrapper types for individual strings all over your codebase that need to have passthrough methods for all the common string methods? One could argue that it's worth the tradeoff but this C2 definition is far from helpful in setting a clear boundary.

Meanwhile the real world usages of this term I've seen in the past have all been things like enums as strings, lists as strings, numbers as strings, etc... Not arbitrary textual inputs from the user.


You inherit some code. Is that string a username or a phone number? Who knows. Someone accidentally swapped two parameter values. Now the phone number is a username and you’ve got a headache of trying to figure out what’s wrong.

By having stronger types this won’t come up as a problem. You don’t have to rely on having the best programmers in the world that never make mistakes (tm) to be on your team and instead rely on the computer making guard rails for you so you can’t screw up minor things like that.


I agree on the one hand but empirically I don’t think I have seen a bug where the problem was the string for X ended up being used as Y. Probably because the variable/field names do enough heavy lifting. But if your language makes it easy to wrap I say why not. It might aid readability and maybe avoid a bug.

I would probably type to the level of Url, Email, Name but not PersonProfileTwitterLink.


I’ve refactored a large js code base into ts. Found one such bug for every ~2kloc. The obvious ones are found quickly in untyped code, the problem is in rare cases where you e.g. check truthiness on something that ends up always true.


Of those bugs I wondered how much a type would help. For example is it a misunderstanding of business requirements (nosurcharge bool = iscash bool) or a “typo” / copy paste error. If the former types don’t help. The latter they might.


It definitely helps in larger applications where things are named similarly, especially if you're dealing with a massive DB schema. If you have some method to update some data where all the PRs are longs and it has the signature "update(long,long)", passing the wrong long value would be disastrous. Even if this type of error is 1:10k LOC, using wrapper classes pretty much eliminates this bug.

In our codebase, we use wrapper classes and the only time we had a defect with this is when one developer got lazy and used 3 primitive Strings in a class instead of wrapper classes. Another developer needed to update the code and populated the wrong wrapper class as they were not as familiar with that part of the codebase. Had the original developer simply used wrapper classes, the person maintaining the code wouldn't have had that confusion.


> Is it really more "programmer friendly" to create wrapper types for individual strings all over your codebase that need to have passthrough methods for all the common string methods?

That can be handled transparently in languages that have good support for strong type systems, like Rust or Haskell, using traits or type classes.

What you're saying is essentially that addressing stringly typing can only be taken so far in weakly typed languages, without becoming inconvenient.

> Meanwhile the real world usages of this term I've seen in the past have all been things like enums as strings, lists as strings, numbers as strings, etc... Not arbitrary textual inputs from the user.

The definitional question is not that interesting. The point is that the concept applies just as much to a username represented as a string as it does to any other kind of value being represented as a string.

The reason is simple, which is just that "string" is a general type that can represent anything, whereas "username" is a subset of all possible strings. If you're trying to use your type system to ensure correct code, you want to be able to type check a function signature like `f(user, company, motto)`, just to take a simple example.


Bash :(


JSON


TCL


As a PHP developer I am frankly disappointed you think that we only do that with strings. I've got an array[1] full of other tools.

1. Or maybe a map? Those keys might have significance I didn't tell you about.


I originally typed out `int` and wanted to do more, but I try to keep my comments as targeted as possible to avoid the common reply pattern of derailing a topic by commenting on the smallest and least important part of it. If I type `string`, `int`, `arrays`, `maps`, `enums`... someone will write 3 paragraphs about enums are actually an adequate usage of the type system, and everyone will focus on that instead of the overarching message.


things have different costs.

Types limit you from making some mistakes, but it also impacts your extensibility. Imagine an enum with 4 values and you want to add 1 because 10 level deep one of the services need new value. How does it usually go with strongly typed languages? You go and update all services until new value is properly propagated to lowest level who actually needs that value.

Now imagine doing same with strings, you can validate at the lowest level, upper levels just pass value as it is. If upper layers have conditionals based on value, they still can limit their logic to those values


Why would you need to update code that isn't matching on the value? It just knows it has an X and passes it to a function that needs an X.


if you don't update the code in intermediate layers, some automated validation based on enum values will fail, which also drops the request


You only need to update the parser and the places that are using it. Depending on language, the parser might update itself (Scala generally works this way). Everyone else has an already parsed value that they're just passing around. That's the point: only run your validation at the outer layer of your application.


This is a good design pattern, but be wary of doing validation too early. The design pattern allows you to do it as early or late as you like, but doesn't tell you when to do it. Often it's best to do it as part of parsing/validating some larger object.

See Steven Witten's "I is for Intent" [1] for some ideas about the use of unvalidated data in a UI context.

[1] https://acko.net/blog/i-is-for-intent/


I read through that piece and strongly disagree with the premise that their insight is somehow at odds with leaning into the type system for correctness.

The legitimate insight that they have is that anchoring the state as close as possible to the user input is valuable—I think that that is a great insight with a lot of good applications.

However, there's nothing that says you can't take that user-centric state and put it in a strongly typed data structure as soon as possible, with a set of clearly defined and well-typed transitions mapping the user-centric state to the derived states.

Edit: looks like there was discussion on this the other day, with a number of people making similar observations—https://news.ycombinator.com/item?id=39269886


A text file and an abstract syntax tree can both be rigorously represented using types, but one is before parsing and other is after parsing. The question is which one is more suitable for editing?

Text has more possible states than the equivalent AST, many of which are useful when you haven't typed in all the code yet. Incomplete code usually doesn't parse.

This suggests that drafts should be represented as text, not an AST.

And maybe similarly for drafts of other things? Drafts will have some representation that follows some rules, but maybe they shouldn't have to follow all the rules. You may still want to save drafts and collaborate on them even though they break some rules.

In a system that's not an editor, though, maybe it makes sense to validate early. For a command-line utility, the editor is external, provided by the environment (a shell or the editor for a shell script) so you don't need to be concerned with that.


Conceptually equivalent to the ancient arts of private constructors and factory methods.


Which (in Java) were then abstracted away in... interesting annotations.


Related:

Parse, don't validate (2019) - https://news.ycombinator.com/item?id=35053118 - March 2023 (219 comments)

Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)

Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)

Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)


I’ve found it hard to apply this pattern in Go since, if Username is embedded in a struct, and you forget to set it, you’ll get Username’s zero value, which may violate your constraints.


But if you then create a constructor / factory method for that struct, not setting it would trigger an error. But this is one of the problem with Go and other languages that have nil or no "you have to set this" built into their type system: it relies on people's self-discipline, checked by the author, reviewer, and unit test, and ensuring there's not a problem like you describe takes up a lot of diligence.


It only relies on unit tests. The people can fail all day long and the unit tests will catch it every single time. Not special unit tests that attempt to seek out such issues, the same unit tests you are writing in languages that have a stricter type system.

If you forget to initialize a field and the tests don't notice, you didn't need the field in the first place, so it won't matter if it is left in an invalid state.

You just don't get the squiggly lines in your text editor. That's the tradeoff.


The pattern sounds nice in theory, but very cumbersome since now you have to obsessively ensure you have NewX calls everywhere or some form of "validated bool". In the end, you're just validating in a roundabout way and calling it "parsing".

I personally find being robust to errors and having clear error messages is the best option.

Don't focus so hard on getting things right, but rather dealing with things when they go wrong.


Why? You can easily call NewUsername inside NewAccount for example, just return the error. Or did I misunderstood?


Because go doesn’t have exhaustiveness checking when initialising structs. Instead it encourages “make the zero value meaningful” which is not always possible nor desirable. I usually use a linter to catch this kind of problem https://github.com/GaijinEntertainment/go-exhaustruct


I like this but in the examples would volume be calculated by width/length rather than being set?


The issue is DRY often comes to wreck this sort of thing. Some devs will see "Hmm, Username is exactly the same as just a string so let's just use a string as Username is just added complexity".

I've tried it with constructs like `Data` and `ValidatedData` and it definitely works, but you do end up with duplicate fields between the two objects or worse an ever growing inheritance tree and fields unrelated to either object shared by both.

For example, consider data looking like

    Data {
      value string
    }
and ValidatedData looking like

    ValidatedData {
      value int
    }
There's a mighty temptation for some devs to want to apply DRY and zip these two things together. Unfortunately, that can really be messy on these sorts of type changes and the where of where validation needs to happen gets muddled.


Except Username is not exactly the same as string, and that's important. Username is a subset of string. If they were equivalent, we wouldn't need to parse/validate.

The often misinterpreted part of DRY is conflating "these are the same words, so they are the same", with "these are the same concept, so they are the same". A Username and a String are conceptually different.


DRY is just "Do not repeat yourself". And a LOT of devs take that literally. It's not "Do not repeat concepts" (which is what it SHOULD be but DRC isn't a fun acronym).

Unfortunately "This is the same character string" is all a DRY purist needs to start messing up the code base.

I honestly believe that "DRY" is an anti-pattern because of how often I see this exact behavior trotted out or espoused. It's a cargo cult thing to some devs.


That's why I like to tell people to always remember to stay MOIST - the Most Optimal is Implicitly the Simplest Thing.

When you add complexity to DRY out your code, you're adding a readability regression. DRY matters in very few context beyond readability, and simplicity and low cognitive load need to be in charge. Everything else you do code-style wise should be in service of those two things.


DRY has nothing to do with readability. The fact that it might help with it is purely coincidental.

DRY is about maintainability - if you repeat rules (behavior) around the system and someone comes along and changes it, how can you be sure it affected all the system coherently?

I've seen this in practice: we get a demand from the PO, a more recent hire goes to make the change, the use case of interest to the PO gets accepted. A week later we have a bug on production because a different code path is still relying on the old rule.


Maintainability and readability are two sides to the same coin. It's not exactly rocket science to cook up an example situation where making a change in one place is less maintainable than making it in two, because of overly DRY, overly abstracted nonsense leading to a _single_ place to change that's so far removed from where you'd expect it to be that it takes much longer and is much more wrought with risk than just having to do it twice.

Doing something twice is not an anathema, that's my point, not when doing it twice is a cognitively easier and practically faster task.

In almost every case, bugs are the result of human error, and keeping cognitive load as low as possible reduces the likelihood of human error in all cases. As DRY as possible is very rarely the lowest cognitive load possible.


In my experience (~20 years) with software development I developed the belief that people will go through the path of applying patterns, techniques, architectures, good practices, first as dogma, then to rejection, ending in acceptance of the knowledge that almost all of software development patterns/best practices are mostly good heuristics, which require experience to apply correctly and know when to break or bend the rules.

DRY applied as a dogma will eventually fail, because it's not a verified mathematical proof of infallible code, it's just a practice that gives good results inside its constraints, people just don't learn the constraints until it explodes in their faces a few times.

Like any wisdom, it's hard it will be received and understood without the rite of passage of experience.


This seems less about DRY and more a story about a hypothetical junior dev making a dumb mistake masquerading as commentary about “DRY purism”.


Man I wish it was just jr devs. I cut jrs a ton of slack, they don't know any better. However, it's the seniors with the quick quips that are the biggest issue I run into. Or perhaps senior devs with jr mentalities


most srs are just jrs with inflated egos and titles


Like everything, it depends is the right answer.


DRY vs premature optimisation is the landscape most long term devs find themselves in. You can say that FP, OO and a bunch of other paradigms affect this, but eventually you need to repeat yourself. The key is to determine when this happens without spending too much time determining when this happens.


One of the major issues with a lot of the outdated concepts in programming is that we still teach them to young people. I work a side gig as an external examiner for CS students. Especially in the early years they are taught the same OOP content that I was taught some decades ago, stuff that I haven’t used (also) for some decades. Because while a lot of the concepts may work well in theory, they never work out in a world where programmers have to write code on a Thursday afternoon after a terrible week.

It’s almost always better to repeat code. It’s obviously not something that is completely black and white, even if I prefer to never really do any form of inheritance or mutability, it’s not like I wouldn’t want you to create a “base” class with “created by” “updated by” and so on for your data classes and if you have some functions that do universal stuff for you and never change, then by all means use them in different places. But for the most part, repeating code will keep your code much cleaner. Maybe not today or the next month, but five years down the line nobody is going to want to touch that shared code which is now so complicated you may as well close your business before you let anyone touch it. Again, not because the theoretical concepts that lead to this are necessarily flawed, but because they require too much “correctness” to be useful.

Academia hasn’t really caught on though. I still grade first semester students who have the whole “Animal” -> “duck”, “dog”, “cat” or whatever they use into their heads as the “correct way” to do things. Similar to how they are often taught other processes than agile, but are taught that agile is the “only” way, even though we’ve seen just how wrong that is.

I’m not sure what we can really do about it. I’ve always championed strongly opinionated dev setups where I work. Some of the things we’ve done, and are going to do, aren’t going to be great, but what we try to do is to build an environment where it’s as easy as possible for every developer to build code the most maintainable way. We want to help them get there, even when it’s 15:45 on a Thursday that has been full of shit meetings in a week that’s been full of screaming children and an angry spouse and a car that exploded. And things like DRY just aren’t useful.


It’s a balancing act, but deletable code is often preferable to purely-DRY-for-the-sake-of-DRY, overly abstracted code.


Yeah, no. Not at all. I imagine that you are taking DRY quite literally, as if and critiquing the most stupid use cases of it, like DRYing calls to Split with spaces to SplitBySpace.

DRY's goal is to avoid defining behaviors in duplicity, resulting in having multiple points in code to change when you need to modify said behavior. Code needs to be coherent to be "good", for a number of of the different quality indicators.

I'm doing a "side project" right now where I'm using a newcomer payment gateway. They certainly don't DRY stuff. Same field gets serialized with camel case and snake case in different API, and whole structures that represent the same concept are duplicate with slightly different fields. This probably means that Thursday 15.25 the dev checked-in her code happy because the reviewer never cared about DRY, and now I'm paying the price of maintaining four types of addresses in my code base.


> It’s almost always better to repeat code.

God no. Stop the copy pasta disease! It's horrible, mindless programming.

When reviewing code, I'm astonished anything was accomplished by copy pasting so much old code (complete with bugs and comment typos).

Incidentally, OOP encourages you to copy a lot. It's just an engine for generating code bloat. Want to serialize some objects? Here's your Object serializer and your overloaded Car serialize and your overloaded Boat serializer, with only a few different fields to justify the difference!

OOP is bad. Copy pasta is bad. DRY is good. All hail DRY, forever, at any cost.


Countless man-centuries have been lost looking for the perfect abstraction to cover two (or an imagined future with two) cases which look deceptively similar, then teasing them apart again.


OOP and Dry are compatible! I’ve actually done the thing that the above commenter suggests - create a base object with created on/by so that I never have to think about it. Whether or not you actually care about that, if you implement a descended of that object you’re going to get some stuff for free, and you’re gonna like it!


Nobody, ever, is claiming no abstractions are useful or worthwhile. The issue is DRY implies that you should always look for an abstraction to avoid repeating yourself. Trust me, that way lies madness. It should be “sometimes repeat yourself, based on enough context, consideration and experience”. But that’s not as snappy.


For what it's worth, I've always had an easier time combining WET code than untangling the knot than is too DRY code. Too little abstraction and you might have to read some extra code to understand it. Too much abstraction and no one other than the writer, and even then, may ever understand it.


There's a mistake many junior devs (and sometimes mid and senior devs) make where they confuse hiding complexity with simplicity - using a string instead of a well defined domain type is a good example, there is a certain complexity of the domain expressed by the type that they don't want to think about too deeply so they replace with a string which superficially looks simpler but in fact hides all of the inherent complexity and nuance.

It causes what I call the lumpy carpet syndrome - sweeping the complexity under the carpet causes bumps to randomly appear that when squashed tend to cause other bumps to pop up rather than actually solving the problem.


Go now has generics, so I'm confident some smart fellow will apply DRY and make it a generic ValidatedData[type, validator] type struct, with a ValidatedDataFactory that applies the correct validator callback, and a ValidatorFactory that instantiates the validators based on a new valdiation rule DSL written in JSON or XML.

...Easy!


This is a variation on one of my favorite software design principles: Make illegal states unrepresentable. I first learned about it through Scott Wlaschin[1].

[1]: https://fsharpforfunandprofit.com/posts/designing-with-types...


> you have to remember to call the validator on 100% of code paths

But copy-pasting the same lines of code in literally every function is the Golang Way.

It makes code "simpler".


It's not guaranteed at all, that's where go's zero-values come in. E.g. nested structs, un/marshaljson magic methods etc. How do you deal with that?


Every struct requiring its zero value to be meaningful is probably one of the worst design flaws in the language.


This is where we arrive at my conclusion that go is not well-suited to implementing business logic!


On the contrary, I've found striving to make zero values meaningful makes designs far more succinct and clearer.


There is no such requirement. Common wisdom suggests that you should ensure zero values are useful, but that isn't about every random struct field – only the values you actually give others. Initialize your struct fields and you won't have to consider their zero state. They will never be zero.

It's funny seeing this beside the DRY thread. Seems programmers taking things a bit too literally is a common theme.


> Initialize your struct fields and you won't have to consider their zero state.

“Just do the right thing everywhere and you don’t have to worry!”

You can’t stop consumers of your libraries from creating zero-valued instances.


Then the zero value is their problem, not yours. You have no reason to be worried about that any more than you are worried about them not getting enough sleep, or eating unhealthy food. What are you doing to stop them from doing that? Nothing, of course. Not your problem.

Coq exists if you really feel you need a complete type system. But there is probably good reason why almost nobody uses it.


> Then the zero value is their problem, not yours.

Except for all those times you're the consumer of someone else's library and there's no way for them to indicate that creating a zero-valued struct is a bug.

Again, it's the philosophy of "Just do the right thing everywhere and you don’t have to worry!" Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs.


> Except for all those times you're the consumer of someone else's library and there's no way for them to indicate that creating a zero-valued struct is a bug.

Nonsense. Go has a built-in facility for documentation to communicate these things to other developers. Idiomatic Go strongly encourages you to use it. Consumers of the libraries expect it.

> Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs.

Well, sure. But, like I said, almost nobody uses Coq. The vast, vast, vast majority of projects – and I expect 100% of web projects – use languages with incomplete type systems, making what you seek impossible.

And there's probably a good reason for that. While complete type systems sound nice in theory, practice isn't so kind. There are tradeoffs abound. There is no free lunch in life. Sorry.


> The vast, vast, vast majority of projects – and I expect 100% of web projects – use languages with incomplete type systems, making what you seek impossible.

…where, "what GP seeks" is…

> way for [library authors] to indicate that creating a zero-valued struct is a bug

I'd say that's a really low and practical bar, you really don't need Coq for that. Good old Python is enough, even without linters and type hints.

Of course it's very easy to create an equivalent of zero struct (object without __init__ called), but do you think it's possible to do it while not noticing that you are doing something unusual?


> Good old Python is enough

No, Python is not enough to "...work with a type system where designers of libraries can actually prevent you from writing bugs." Not even typed Python is going to enable that. Only a complete type system can see the types prevent you from writing those bugs. And I expect exactly nobody is writing HTTP services with a language that has a complete type system – for good reason.

> Of course it's very easy to create an equivalent of zero struct

Yes, you are quite right that you, the library consumer, can Foo.__new__(Foo) and get an object that hasn't had its members initialized just like you can in Go. But unless the library author has specifically called attention to you to initialize the value this way, that little tingling sensation should be telling you that you're doing something wrong. It is not conventional for libraries to have those semantics. Not in Python, not in Go.

Just because you can doesn't mean you should.


You don't have to go as far as Coq. Rust manages "parse, don't validate" extremely well with serde.

Go's zero-values are the problem, not any other lack of its type system.


> You don't have to go as far as Coq.

No, you do. Anywhere the type system is incomplete means that the consumer can do something the library didn't intend. Rust does not have a complete type system. There was no relevance to mentioning it. But I know it is time for Rust's regularly scheduled ad break. And while you are at it, enjoy a cool, refreshing Coca-Cola.

> Go's zero-values are the problem

"Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs." has nothing to do with zero-values. It doesn't even really have anything to do with Go specifically. My, the quality of advertising has really declined around here. Used to be the Rust ads at least tried to look like they fit in.


This insane perspective of “nothing is totally perfect so any improvements over what go currently does are pointless” whenever you confront a gopher with some annoying quirk of the language is one of the worst design flaws in the golang community hivemind.


Tell us, why you hold that perspective? It's an odd one. Nobody else in this thread holds that perspective. You even admit it is insane, yet here you are telling us about this unique perspective you hold for some reason. Are you hoping that we will declare you insane and admit you in for care? I don't quite grasp the context you are trying to work within.


Any language without zero-values (or some equally destructive quality) can do "parse, don't validate". Go cannot. Rust is just an example.


Top of the hour again? Time for another Rust advertisement?

The topic at hand is about preventing library users from doing things the library author didn't intended using the type system, not "what happens if a language has zero-values". Perhaps you are not able to comprehend this because you are hungry? You're not you when you are hungry. Grab a Snickers.


what happens if a language has zero-values, is that you can't "parse, don't validate".

Maybe it's time for you to finally try rust? Or any other language without zero-values, since rust seems to irritate you in particular.


Don't worry, I have tried languages without zero-values. But they have nothing to do with the discussion that was taking place before the ad break. Now back to the show, you cannot prevent library consumers from doing things you don't intend without a compete type system. Rust does not have a complete type system. It leaves holes open for library consumers to do unexpected things and as such it has no relevance here. Sorry that your client's product isn't the be all and end all.


> Now back to the show,

The original claim was that with go, doing certain pattern "[...] guarantees that you can never forget to validate the username through any codepath". Which is not true. It is not true, because go has its own billion-dollar-mistake called zero values.


If you go way back there was talk about that, but the discussion had long shifted to "Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs."

I get it: You were in such a rush to fill your marketing quotas that you didn't bother to read the entire thread. Maybe the lesson here is don't use HN as an advertising platform next time? You should have known better from the get go.


You manage to present a strawman and produce a No True Scotsman fallacy all at once in this comment thread.

Nobody is suggesting that Coq should be used, so stop bringing it up (strawman). And yes, Coq might have an even stricter and more expressive type system than Rust. But nobody is asking for a perfect type system (no true Scotsman). People are asking to be able to prevent users of your library to provide illegal values. Rust (and Haskell and Scala and Typescript and ….) lets you do this just fine whereas Golang doesn’t.

And personally I would much rather have the compiler or IDE tell me I’m doing something wrong than having to read the docs in detail to understand all the footguns.

My personal opinion is that - even though I’m very productive with Golang and I enjoy using it - Golang has a piss poor type system, even with the addition of Generics.


> People are asking to be able to prevent users of your library to provide illegal values. [...] and Typescript

Typescript, you say?

   const bar: Foo = {} as Foo
Hmm. Oh, right, just don't hold it wrong. But "sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs."

Your example doesn’t even satisfy the base case, let alone the general case. Get back to us when you have actually read the thread and can provide something on topic.


But that is not an accident, is it? It’s someone very deliberately casting an object. It’s not the same and you probably know it.


It might be an accident. Someone uninitiated may think that is how you are expected to initialize the value. A tool like Copilot may introduce it and go unnoticed.

But let's assume the programmer knows what they are doing and there is no code coming from any other source. When would said programmer write code that isn't deliberate? What is it about Go that you think makes them, an otherwise competent programmer, flail around haphazardly without any careful deliberation?


C++ constructors actually make the guarantee, but it comes with other pains


Lots of languages handle it just fine and don’t need the mess of C++ ctors.

GP is pointing out that go specifically makes it an issue.


What language do you have in mind?


Any language which supports private state: smalltalk, haskell, ada, rust, …


I always understood "parse don't validate" a bit differently. If you are doing the validation inside of a constructor, you are still doing validation instead of parsing. It is safer to do the validation in one place you know the execution will go through, of course, but not the idea I understand "parse don't validate" to mean. I understand it to mean: "write an actual parser, whatever passes the parser can be used in the rest of the program", where a parser is a set of grammar rules for example, or PEG.


I'm not a Haskell developer, so it's possible that I misunderstood the original "Parse, Don't Validate" post.

>If you are doing the validation inside of a constructor, you are still doing validation instead of parsing.

Why that would be considered validation rather than parsing?

From the original post:

>Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.

That's the key idea to me.

A parser enforces checks on an input and produces an output. And if you define an output type that's distinct from the input type, you allow the type system "preserve" the fact that the data passed a parser at some point in its life.

But again, I don't know Haskell, so I'm interested to know if I'm misunderstanding Lexi Lambda's post.


Parse don't validate means that if you want a function that converts an IP address string to a struct IpAddress{ address: string } you don't validate that the input string is a valid IP address then return a struct with that string inside. Instead you parse that IP into raw integers, then join those back into an IP string.

The idea is that your parsed representation and serializer are likely produce a much smaller and more predictable set of values than may pass the validator.

As an example there was a network control plane outage in GCP because the Java frontend validated an IP address then stored it (as a string) in the database. The C++ network control plane then crashed because the IP address actually contained non-ASCII "digits" that Java with its Unicode support accepted.

If instead the address was parsed into 4 or 8 integers and was reserialized before being written to the DB this outage wouldn't have happened. The parsing was still probably more lax than it should have been, but at least the value written to the DB was valid.

In this case it was funny Unicode, but it could be as simple as 1.2.3.04 vs 1.2.3.4. By parsing then re-serializing you are going to produce the more canonical and expected form.


Perhaps "normalize" or "canonicalize" is more appropriate. A parser can liberally interpret but I don't take it to imply some destructured form necessarily. There are countless scenarios where you want to be able to reproduce the exact input, and often preserving the input is the simplest solution.

But yes usually you do want to split something into it's elemental components, should it have any.


Thanks for that explanation! I hadn't appreciated that aspect of "parse, don't validate," before.

But even with that understanding and from re-reading the post, that seems to be an extra safety measure rather than the essence of the idea.

Going back to my original example of parsing a Username and verifying that it doesn't contain any illegal characters, how does a parser convert a string into a more direct representation of a username without using a string internally? Or if you're parsing an uint8 into a type that logically must be between 1 and 100, what's the internal type that you parse it into that isn't a uint8?


Personally I don't think I would have used the phrase "parse don't validate" for something like a username. It isn't clear to me what it would mean exactly. I generally only thing of this principle for data that has some structure, not as much a username or number from 1-100.

IP address would be about the minimum amount of structure. Something else would be like processing API requests. You can take the incoming JSON and fully parse it as much as possible, rather than just validate it is as expected (for example drop unknown fields)


> Or if you're parsing an uint8 into a type that logically must be between 1 and 100, what's the internal type that you parse it into that isn't a uint8?

Just for the sake of example, your internal representation might start from 0, and you just add 1 whenever you output it.

Your internal type might also not be a uint8. Eg in Python you would probably just use their default type for integers, which supports arbitrarily big numbers. (Not because you need arbitrarily big numbers, but just because that's the default.)


Just do

    type Username string
And replace

      return Username{username}
with

      return Username(username)


The problem there is that you lose the guarantee that the parser validated the string value.

A caller can just say:

    // This is returning an error for some reason, so let's do it directly.
    // username, err := parsers.NewUsername(raw)
    username := parsers.Username(raw)
You also get implicit conversions in ways you probably don't want:

    var u Username
    u = "<hello>" // Implicitly converts from string to Username


That's true I did not think of that.


If you do that, people outside the package can also do Username(x) conversions instead of calling NewUsername. Making value package private means that you can only set it from outside the package using provided functionality.


the fact that this is some special “technique” really shows how far behind Go’s type system & community around typing is


So far I like the commonly used approach in the Typescript community best:

1. Create your Schema using https://zod.dev or https://github.com/sinclairzx81/typebox or one of the other many libs.

2. Generate your types from the schema. It's very simple to create partial or composite types, e.g. UpdateModel, InsertModels, Arrays of them, etc.

3. Most modern Frameworks have first class support for validation, like Fastify (with typebox). Just reuse your schema definition.

That is very easy, obvious and effective.


This doesn’t solve the problem.

If I have a user type, inferred from a Zod schema:

> { username: string; email: string }

And a function which takes that type:

> storeUser(user: User)

There is absolutely nothing that guarantees that the user object has been parsed by Zod. You can simply:

> storeUser({ username: “”, email: “no” })

And Typescript will not shout at you.

The only way to comparably solve it with Typescript is to inject a symbol into the object during parsing which confirms it has been passed through the correct parser function.

Personally, I just do basic type parsing on input data (usually request data) and more strict parsing where constraints like “is this a valid username, is this a valid email” during output (usually sending to the database). What happens in between I/O doesn’t matter much in many projects (CRUD), and in the places it does you can enforce more rigidity.


Depends on what the problem definition is. If it's having an as bullet proof solution as possible, you're of course right.

But there are simpler and cheaper solutions that might be good enough.

I guess my intention was also to point out that there're mature frameworks for many languages, but somehow most people in the Go community keep reinventing the wheel and unfortunately more often worse than better. Some years ago I wrote a Go web service. Of course I found the first two versions of OPs series. They're great to read and even greater to watch on YT, but I preferred the approach ardanlabs (Bill Kennedy). It was for sure interesting going through all of this, but incredible time consuming.


Well, I use ajv and they have ways of applying format validation, so not just saying: "this is a string", but rather, "this is a string and must be a valid domain name".

Now, if your complaint is rather that you can call whatever method and pass in your bogus data, I don't see the point in arguing that. It's your code, the only person who can stop you is you.


> Now, if your complaint is rather that you can call whatever method and pass in your bogus data

This entire comment thread is a discussion about how to prevent that from being a possibility. The person I responded to threw their hat in with a Typescript solution that doesn’t achieve the goal being discussed. I was simply pointing this out.


>> I've found validators to be much more error-prone than leveraging Go's built-in type checker.

>This entire comment thread is a discussion about how to prevent that from being a possibility.

No, this thread is also about how much you need to invest to be safe enough, when time and resources are limited.


> and a human-readable explanation of the issue is set as the value.

This is annoying to translate later. At least also include some error code string that is documented somewhere and isn't prone to change randomly.


I mean, you may end up just wanting something like,

    type UsernameError struct {
      name   string
      reason string
    }
    func (e *UsernameError) Error() string { 
      return fmt.Errorf("invalid username %q: %s", e.name, e.reason)
    }
And reason can be "username cannot be empty" or "username may not contain '<'" or something like that.

This is fine for lots of different cases, because it’s likely that your code wants to know how to handle “username is invalid”, but only humans care about why.

I have personally never seen a Go codebase where you parse error strings. I know that people keep complaining about it so it must be happening out there—but every codebase I’ve worked with either has error constants (an exported var set to some errors.New() value) or some kind of custom error type you can check. Or if it doesn’t have those things, I had no interest in parsing the errors.


I write mostly frontends. Sometimes the APIs I talk to give back beautiful English error messages - that I can't just show to the user, because they are using a different language most of the time. And I don't want to write logic that depends on that sentence, far too brittle.


Right—I think the “error code” here is going to be the error type, i.e., UsernameError, or some qualified version of that.

It’s not perfect, but software evolves through many imperfect stages as it gets better, and this is one such imperfect stage that your software may evolve through.

Including a human-readable version of the error is useful because the developers / operators will want to read through the logs for it. Sometimes that is where you stop, because not all errors from all backends will need to be localized.


You can use new types with validation too. In fact the approaches seem to be duals.

Parse, don't validate:

                    string          ParsedString
  untrusted source -------> parse --------------> rest of system
Validate, don't parse:

                    UnvalidatedString            string
  untrusted source ------------------> validate -------> rest of system


The problem is that pattern "fails open." If anyone on the team forgets to define an untrusted string as UnvalidatedString, the data skips validation.

If you default to treating primitive types as untrusted, it's hard for someone to accidentally convert an untrusted type to a trusted type without using the correct parse method.


The dual problem would be any function which forgets to accept a ParsedString instead of a string can skip parsing.

Both cases appear to depend on there being a "checkpoint" all data must go through to cross over to the rest of the system, either at parsing or at UnvalidatedString construction.


>The dual problem would be any function which forgets to accept a ParsedString instead of a string can skip parsing.

>Both cases appear to depend on there being a "checkpoint" all data must go through to cross over to the rest of the system, either at parsing or at UnvalidatedString construction.

The difference is that if string is the trusted type, then it's easy to miss a spot and use the trusted string type for an untrusted value. The mistake will be subtle because the rest of your app uses a string type as well.

The converse is not true. If string is an untrusted type and ParsedString is a trusted type, if you miss a spot and forget to convert an untrusted string into a ParsedString, that function can't interact with any other part of your codebase that expects a ParsedString. The error would be much more visible and the damage more contained.

I think UnvalidatedString -> string also kind of misses the point of the type system in general. To parse a string into some other type, you're asserting something about the value it stores. It's not just a string with a blessing that says it's okay. It's a subset of the string type that can contain a more limited set of values than the built-in string type.

For example, parsing a string into a Username, I'm asserting things about the string (e.g., it's <10 characters long, it contains only a-z0-9). If I just use the string type, that's not an accurate representation of what's legal for a Username because the string type implies any legal string is a valid value.


The example also assumes that everything is like a 'ParsedString' that contains a copy of the original untrusted value inside.


My Go is rusty, do you mean not exporting the type "Username" (ie username) to avoid default constructor usage?


In Go, capitalized identifiers are exported, whereas lowercase identifiers are not.

In the example I gave above, clients outside of the package can instantiate Username, but they can't access its "value" member, so the only way they could get a populated Username instance is by calling NewUsername.


Love it, I called it entity factories https://bower.sh/entity-factories


AKA 'Value Object' from DDD or a similar 'Quantity' accounting pattern. Another angle is that this fixes 'Primitive Obsession' code smell.


Now what? the username is in an unexported field and unusable? I can kind of see what its going for but it seems like a way just to add another layer of wrapping and indirection.


It would need a getter here. Probably good to keep it immutable, if you want guarantees that it will never be changed to something that violates the username rules.


> need a getter

Yeah, thats what I figured. Im not sure if I want the tradeoff of calling .GetValue in multiple places just to save calling validate in maybe 2 or 3 places.

Not to mention I cant easily marshal/unmarshal into it and next week valid username is a username that doesnt already exist in the database.

Maybe this approach appeals to people and Im hesitant to say “that’s not how Go is supposed to be written” but for me this feels like “clever over clear”.


> Yeah, thats what I figured. Im not sure if I want the tradeoff of calling .GetValue in multiple places just to save calling validate in maybe 2 or 3 places.

The tradeoff is not that you save calling validate, it’s that you avoid forgetting to call validate in the first place, because when you forget to validate, you get a type error.

IMO it’s a little more clear this way:

    type Ticket struct {
      requestor Username
      assignee  Username
    }
It lets you write code that is little more obvious.


I’m not sure I understand. In your example you’ve grouped related data in a struct and validating that it matches your system’s invariants, that feels good to me.

The original example was more “wrap a simple type in an object so it’s always validated when set” which looks beautiful when you don’t have the needed getters in the example nor show all the Get call sites opposed to the 1 or 2 New call sites. All in the name of “we don’t want to set the username without validation” but without private constructors Username{“invalid”} can be invoked, the validation circumvented and I’m not convinced the overhead we paid was worth it.


The countless bugs I've had to deal with and all the time I've lost fixing these bugs caused by people who forgot to validate data in a certain place or didn't realize they had to do so proves to me that the overhead of calling a get on a wrapper type is totally worth it.

I value the hours wasted on diagnosing a bug far more than the extra keystrokes and couple of seconds required to avoid it in the first place.


No, you’ve achieved an illusion of that as now your spending hours wasted on discovering where a developer forgot to call NewUsername and instead called Username{“broken”}. I cant see the value in this abstraction in Go.


They can’t because value is not exported. They must use the NewUsername function, which forces the validation.

In my opinion, this pattern breaks when the validation must return an error and everything becomes very verbose.


Oh, thats true about it being unexported. I hadn’t considered that.


But surely this is just another way of doing validation and not fundamentally "parsing"? If at the end you've just stored the input exactly as you got it, the only parsing you're potentially doing is in the validation step and then it gets thrown away.


Implementation-wise, yes, but the interface you're exposing is indistinguishable from that of a parser. For all your consumers know, you could be storing the username as a sequence of a 254-valued enum (one for each byte, except the angle brackets) and reconstructing the string on each "get" call. For more complex data you would certainly be storing it piecewise; the only reasons this example gets a pass are 1) because it is so low in surface area that a human can reasonably validate the implementation as bug-free without further aid from the type checker, and 2) because Go's type system is so inexpressive that you can't encode complex requirements with it anyway.


The validation is not completely thrown away, since the type indicates that the data has been validated. I understand "parsing" as applying more structure to a piece of data. Going from a String to an IP or a Username fits the definition.

I push my team to use this pattern in our (mostly Scala) codebase. We have too many instances of useless validations, because the fact that a piece of data has been "parsed"/validated is not reflected in its type using simple validation.

For example using String, a function might validate the String as a Username. Lower in the call stack, a function ends up taking this String as an arg. It has no way of knowing if it has been validated or not and has to re-validate it. If the first validation gets a Username as a result, other functions down the call stack can take a Username as an argument and know for sure it's been validated / "parsed".


Encapsulation saves lives.


em ai have a problem from cars




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: