Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Fast 2D Rendering on GPU (raphlinus.github.io)
286 points by raphlinus on June 13, 2020 | hide | past | favorite | 123 comments


Fast 2D rendering on GPU has represented the last few months of concentrated work for me, and I'm happy to present the results now. It's required going pretty deep into various aspects of GPU compute, so feel free to ask me about that, 2D vector graphics, or anything related.


Hi! This looks interesting, although I confess I'm very new to this area. I'm writing a programming language specifically for UI designers, and currently I'm building a naive implementation in html5 canvas. I know very little about low-level rendering - just really know how to use drawing API's like canvas and Quartz 2D.

With that being said, I'm looking to delve deeper into this subject. Do you have any recommendations on where to start? Whenever I look beyond simple drawing API's, the focus seems to be entirely on 3D rendering (which is interesting but not my main focus right now).


It's a good question, as honestly the knowledge for 2D graphics is pretty arcane, as opposed to 3D being so widely taught. I actually started a github repo for a book on 2d graphics but have no idea whether I'll actually finish it.

In the meantime, antigrain.com is one good (if old) source. The original PostScript "red book" was extremely influential in its time (it's where I learned a lot of this stuff) but is quite dated now. Best of luck, and I'm also happy to field requests for more specific areas. For example, for color theory (an important aspect of 2D graphics!), handprint.com is quite a remarkable resource.


> knowledge for 2D graphics is pretty arcane

such an unfortunate state of affairs!

i am currently learning how to render graphics using the GPU on my mac using apple metal. what i am getting is that the GPU has been optimized for 3D rendering?! GPUs make no provision or easy way for rendering 2D graphics?

it makes no sense to me... that's where you start...


I remember asking a similar question on HN a while back. The response was that 2D graphics, UIs in particular, are mostly computed on the CPU. I have no idea why this is the case, though.


See this blog, it explains it pretty well imho: https://blog.mecheye.net/2019/05/why-is-2d-graphics-is-harde...

And: historically they've been computed mostly on CPU, but I think it's time for that to change.


During the late 1990s and early 2000s it was a lot more common for GPUs to provide 2D acceleration, and GUIs were drawn using those primitives. I remember the switch to CPU rendering happening, and the subsequent removal of 2D acceleration from GPUs, but I don't remember why.

At any rate, the 2D graphics we expect now are a lot more complex than the unantialiased lines, blits, and fills of old.


> And: historically they've been computed mostly on CPU, but I think it's time for that to change.

It would be great to wait a bit for OS & GPU power management to evolve before biting the bullet on that. My laptop goes from 6 to 2.something hours of battery as soon as I have a GL context opening somewhere, likely because it seems to power on its discrete GPU automatically in that case.


This is changing. I've been doing power measurements as well (just didn't make the cut of this blog post), and the 1060 is surprisingly power-efficient in its low frequency modes. It's also generally the case that the GPU is always active in its role running the compositor.


> and the 1060 is surprisingly power-efficient in its low frequency modes.

maybe ? the computer on which this happens is a 1070. But please be aware that series 10 are a very small percentage of people. The average laptop of non-tech people around me is easily 8 years old, often on their 2nd or 3rd battery... and these people won't be able to complain easily to anyone when their new battery's life suddenly is halved because of $SOFTWARE.


With most dual-GPU machines you do get a choice whether to power on the discrete GPU or not. It's even supported on GNOME/Wayland as of late.


Both macOS and Windows have ways for applications to specify whether they want to prefer a discrete gpu or integrated.


Don't fool yourself, they won't do anything until most of the browser/apps do it. Then they will fix it to sell "longer" battery life.


I've done a fair bit of 2d graphics work (written a rasterizer, etc). Honestly it's because it's

  1) tricky to shoehorn 2D graphics onto the APIs that GPUs provide and 
  2) really not needed.  I can easily render eg: a world map with hundreds of thousands of lines at hundreds of frames/second with one core.


Please don’t use preformatted text to write lists. It’s a pain. Just leave a blank line between the items so that each is a paragraph.


If you need portable results pixel to pixel on various platforms then CPU based rendering is more straightforward than using the GPU.

The various libraries such as freetype for font rastrization only works on CPU.

Plenty of work should be done to research and implementation is left to be done in order to use the GPU more widely.


This is an interesting and subtle point about doing "software on GPU compute." You are in complete control over what gets computed, and are not at the mercy of the hardware's fixed function pipeline for stuff like rasterization rules and sampling patterns for antialiasing. So I think portable results pixel to pixel are in fact viable.

Of course CPU rendering is always more straightforward than GPU, the higher performance comes at a significant cost in complexity.


now this may be a dimb question, but why would you start there?

as far as i understand, 2d and 3d have literally zero to do with each other in how they are rendered. one is a bunch of triangles. the other is lines, curves, thickness, gradients, and fonts (which are essentially little programs)


> the other is lines, curves, thickness, gradients, and fonts

You can reduce all these to drawing triangles.


> You can reduce all these to drawing triangles.

You can, people have tried this, and it sucks. The main problem is that the conversion of Bézier paths to triangles is a hard problem with lots of conditional branching. Even when you do it, there is the other problem of rendering triangles with really good antialiasing, MSAA forces a compromise between performance and quality. By contrast, piet-gpu does an exact-area calculation for antialiasing.

So it's not a question of whether you can do it, but whether it works well, and approaches like piet-gpu absolutely stomp triangles.


> there is the other problem of rendering triangles with really good antialiasing

Easier than you think. Here's couple lines of pixel shader that does that, with really good antialiasing and without MSAA:

https://github.com/Const-me/Vrmac/blob/master/Vrmac/Draw/Sha...


Doing that reduction is surprisingly difficult, and usually a serial algorithm that runs on the CPU, and a naive approach is a resulting triangle set is not efficient to a GPU, but that's what toolkits like cairo, Direct2D, nanovg do.

Raph is describing an architecture where path evaluation happens on the GPU, without being baked to triangles.


yes, you can draw 2d in 3d space. this thread however is about 3d being built on top of 2d. not 2d being built on top of 3d.


that's not a dumb question! it's my own fault. i only have knowledge of 2D graphics.


Haiku OS AppServer (the screen rendering component, similar to Unix X11) is a full GUI system implemented with AntiGrain Geometry as the renderer.


I think that blend2d (https://blend2d.com/) is a worthy successor to AGG, and it's under active development.


Thank you for the link. It was a great reading. But I just want to point out that Blender2D is a software renderer.


So is/was AGG


Thanks!


I recommend the blog of the OP, so many gems


What's been your overall experience with using Vulkan shaders for compute? Are there basic primitives that are missing from shading languages and/or have you found any impedence mismatches between writing shaders vs. how you might describe the same algorithms in other languages?


That's a big topic. I've been able to work around the missing primitives (for example, I autogenerate code for Rust-style structs and enums), but have had much bigger struggles around two issues: tools, which are still quite primitive, and understanding performance, which is extremely difficult. These two problems intersect because I can imagine a lot better tools for digging into performance issues. One that I would have paid good money for is an instruction-level simulator that would highlight the source code to tell me where the stalls, bank conflicts, divergence problems, etc. are in the source code. Such a thing is possible (there are academic papers like [1]), but not as far as I know usable in daily development.

The "impedance mismatch" is that you (generally) have to write in a style to extract lots of parallelism. This tends to be very different than the way you'd write scalar CPU code, but not completely alien to me as it has a lot of similarity with the way you'd write SIMD. I've pretty much gotten the hang of it now. I'm thinking of a blog post of redoing path_coarse.comp from its current basically scalar style to a more parallel version, as that would I think illuminate the issues.

[1] http://comparch.gatech.edu/hparch/papers/gera_ispass18.pdf


I pondered on the same subject recently as I was implementing the same algorithm (i.e. Mandelbrot set) on the CPU (scalar vs SIMD) and GPU compute using fixed-point and floating-point for comparisons (if interested: https://tayfunkayhan.wordpress.com/2020/06/03/mandelbrot-in-...).

It bothers me how little progress has been done on "shading" languages front compared to overall many-core computation models and capabilities over the years. And, that is despite the fact that shaders are very often where the most time is spent in modern workloads.

Compute with Vulkan is another story. It offers some nice abstractions, but it shows that it's mostly intended for async-compute/work-offloading for rendering, IMO. Too much fruction.


- Shouldn't 2D rendering be a solved problem given that it's basically a subset of 3D rendering?

- Don't libraries like Skia, Qt, Cairo use GPU rendering? I've always assumed so. I mean, this is 2020, GPUs have been around for decades.


> - Shouldn't 2D rendering be a solved problem given that it's basically a subset of 3D rendering?

The problem is that primitives artists use are different. 3D rendering tends to all consist of polygon meshes, which are relatively easy to render. 2D rendering (basically) consists of Bezier paths, which are harder. The equivalent in 3D, which is adaptive subdivision, is not really a solved problem in real-time either.

Additionally, 2D rendering quality tends to be more important than 3D rendering quality. Whereas you can get away with 4xMSAA or hacks like FXAA in 3D, true 16xAA (without hacks) is the absolute minimum for 2D rendering quality nowadays, and even it isn't considered great for some tasks like font rendering (Pathfinder and piet-gpu both use analytic AA which is effectively 256xAA).

> - Don't libraries like Skia, Qt, Cairo use GPU rendering? I've always assumed so. I mean, this is 2020, GPUs have been around for decades.

There's a difference between renderers with GPU support and renderers that are oriented around using the GPU efficiently. In many cases this results in an order-of-magnitude speedup. On the GPU, state changes are expensive, and many such renderers that have GPU support don't really go out of their way to avoid them. There are also occlusion culling optimizations that most renderers don't do, but piet-gpu and Pathfinder do.


Others have spoken to this, but as a general introduction I highly recommend Jasper's post "Why are 2D vector graphics so much harder than 3D?" In short, no, it's not a solved problem.

Also, I see a lot of variations of this question, but I should state this more clearly. There's been accelerated graphics in one form or another for a long time, but what I'm doing is a completely different type of thing. In my world, on the CPU you just encode the scene into a binary representation that's optimized for GPU but in many ways is like flatbuffers, and then the GPU runs a highly parallel program to render the whole thing. In previous approaches, the CPU is deeply involved in taking the scene apart and putting it back together in a form that's well suited to relatively dumb pixel pipes. Now that GPUs are really fast, that approach runs into limitations.

It also depends what you're trying to do. I'm focusing here on dynamic paths (and thus font rendering), while most of the libraries optimized for UI put text into texture atlases and then use the GPU to composite quads to the final surface, something they can do well.

https://blog.mecheye.net/2019/05/why-is-2d-graphics-is-harde...


Can you expound the principle of tiling mentioned in your algorithm a bit more ? The conventional mechanism is to use de-Casteljau to divide a bezier curve into triangles and then rasterize these triangles using GPU. If the curve is required to be scaled, the triangularization/tesselation is done again How is the algorithm presented in the link different ? Somehow the concept of tiling seems to imply that rasterization of the curve is done in the CPU itself. What am I missing ?


I recommend reading the blog post series, I'm not sure I can usefully summarize the concepts in a comment reply. But very briefly, there's a flattening step (evaluated on GPU, based on de Casteljau) that converts the Bezier into a polyline (not triangles), then a tiling step that records for each tile a "command list" that contains the complete description of how to render the pixels in a tile, finally "fine rasterization" so that each workgroup reads that command list and renders 256 pixels in parallel from it. From your question, it sounds like your mental model is pretty different from how this pipeline works.


Yes, I am trying to align my mental model with yours. I read your blog post "A sort-middle architecture for 2D graphics" & "2D Graphics on Modern GPU", but still unable to grasp the fundamental guiding principles. It's not clear what are the commands that constitute each tile and whatever they are, what is the fundamental reasion which leads to the performance being better than joining the polylines of a curve to get a triangle list and having those triangle rasterized by the GPU. Any blog/article on the fundamental principles, that you would recomend


No, it is not a subset, unless you are talking about drawing textures.

Those libraries use GPU rendering using textures, like distance fields(Qt) or simply calculate fonts in 2D and draw it on textures using the GPU, like Apple or cairo usually do.

Distance fields are blurry when you have small fonts on display.

Things like calculating the exact area order the 2D curve is something you could easily do in CPU, but it is extremely difficult to do on the GPU. You need decoupled data in order to parallelize it.


> Don't libraries like Skia, Qt, Cairo use GPU rendering? I've always assumed so. I mean, this is 2020, GPUs have been around for decades.

As a Qt user / developer - you can use the GPU for your app, e.g. with Qt Quick or QGraphicsView, but there are sometimes good reasons to stick to CPU & software rendering, e.g. it is somewhat common to want to have intertwined "native" OS widgets (which are all CPU-rendered raster things) and custom GPU-drawn scene - this is a case where things fall a bit apart.

Another thing is that pretty, freetype-like font rendering is super expensive when you have a lot of text to show, and can't really be done (at least I have definitely not seen infinality-level beauty from the state of the art) on the GPU yet... next to that filling some rects (read: 95% of UI) with SSE/AVX/AVX2 as Qt does is stupidly fast.


> pretty, freetype-like font rendering is super expensive when you have a lot of text to show, and can't really be done (at least I have definitely not seen infinality-level beauty from the state of the art) on the GPU yet

This is exactly what Pathfinder does. pcwalton is on this thread and is the main author of that.

https://github.com/servo/pathfinder#features says:

> Advanced font rendering. Pathfinder can render fonts with slight hinting and can perform subpixel antialiasing on LCD screens. It can do stem darkening/font dilation like macOS and FreeType in order to make text easier to read at small sizes. The library also has support for gamma correction.


> This is exactly what Pathfinder does.

from the screenshots I saw so far, pretty much not.

> It can do stem darkening/font dilation like macOS and FreeType in order to make text easier to read at small sizes.

there are tons of different ways to do that. Even freetype has a few different algorithms to do it, some not even merged if I'm not mistaken, which give wildly different results


To give you an idea that pcwalton knows what he’s been doing and has indeed been seeking to match platform rendering exactly, here are a couple of tweets about the macOS font dilation: https://twitter.com/pcwalton/status/918593367914803201, https://twitter.com/pcwalton/status/918991457532354560.

I rather like the demonstration of rendering including subpixel rendering at https://twitter.com/pcwalton/status/971475785616797698, as well.


> To give you an idea that pcwalton knows what he’s been doing and has indeed been seeking to match platform rendering exactly,

I don't intend at all to cast doubt on pcwalton's abilities - the work is brilliant without any hesitation.

But I wonder how that is possible given that "platform rendering" pretty much has changed every other macOS version and every Windows version ("ClearType" from WinXP is definitely not "ClearType" from Win10) ; and let's not talk about the customization abilities of freetype which makes rendering on any two linux boxes also entirely distinct.


Things like multi-resolution font-rendering are actually more complicated than one might imagine.

The easy way is to just tesselate the font into polygons -- but this tesselation often depends on the zoom level. The same thing can be said about implicit curves etc.

Most libraries do use GPU's for basic draw operations (e.g rendering a gradient), but to build something like Photoshop -- you need much more complexity.


How well does this approach work for something like 2D data visualization where most of the visual elements are the same -- i.e. can be instanced in OpenGL/etc?

Thanks for publishing this, it's awesome work! I'm looking forward to progression to wgpu hinted in the Github README.


I absolutely have data visualization in mind for this, as I think it can benefit greatly from the scale. But the pipeline I've built is very agile, it will easily handle a diverse mix of items. It's not like OpenGL etc where there's a certain amount of overhead for a draw call so there are significant gains to be had from instancing and batching.

It is likely that CPU-side encoding can be made more efficient, though, by just filling in quantities to a template, rather than encoding from scratch.


any relation to the work currently being done here by pcwalton?

https://github.com/servo/pathfinder/pull/350

hopefully you guys aren't doing identical work in parallel (no pun intended) :)


Yes, there's been a lot of influence in both directions. The approaches have a lot in common, but also have significant differences. I plan to write up a description of the Pathfinder compute work soonish.


I would love that!


As Patrick indicated, there's a lot of cross-fertilization of ideas, and one of the best outcomes of this work would be for high performance compute rendering to ship in Pathfinder.


It looks like the main fruit of this is piet-gpu[1]. How close would you say it is to stable? And how complete a realisation of the concepts put forth in the OP? I'm currently working on a ui library, and evaluating alternatives for hw acceleration. The whole library wants to be in c, so I would want to rewrite the library by release-time to use it, but don't want to waste time doing that now if it's still really volatile.

1. https://github.com/linebender/piet-gpu


Not very, and not very. This is research, it depends on compute capabilities that have really only gone mainstream in GPU lately, and the current codebase is about trying out the ideas. Best of luck in your project!


Wow, those numbers look very impressive on an absolute basis! Do you have an idea on how piet-gpu compares to Skia? I'm not a graphics guy, my impression was that it is currently the dominant vector graphics renderer.


I would like to get an idea of the capabilities of these things.

Suppose I was to make an ebook reader designed to be modelled more closely after paper, on a device like the Surface Book’s 13″ 3000×2000 display, showing two pages side-by-side. Each page might contain something like 2,000–2,500 letters. I want to be able to flip through pages like I might with a paper book, so that I might be roughly completely rendering several pages at once, and perhaps parts of several more pages; ideally it might render the page like a real 3D page, but if that’s too troublesome I’d settle for an affine transformation while flipping. Assume that the layout of the pages, with all the shaping, is all done ahead of time and is in memory.

I’ve never seen anyone attempt anything like this before. In the old way of doing things, I think anyone attempting this would render each page to a bitmap and use that as a GPU texture, and I think that could provide acceptable performance (unless you were flipping rapidly through hundreds of pages, because that’d take a lot of GPU memory to keep), but I imagine that the quality of the rendering mid-turn would be fairly atrocious—it could be the sort of thing where the appearance of the text subtly changes half a second after you finish turning the page, as it switches from one renderer to a slightly different one.

Would the performance of this new approach be sufficient to render what I describe at 60fps, while rendering each frame perfectly?

I was talking about the Surface Book; its Intel Core i7-6600U has Intel HD Graphics 520 as its GPU; probably not too far off your 630’s results. (And I’m interested in what integrated graphics can do, more than a dedicated GPU.)

My guess based upon your paper-1 results is that this is probably just barely possible with integrated graphics, so long as you employ a few tricks to reduce the amount of work required.


Short answer, yes, I believe it could work, though not with much margin to spare. It might require some dedicated optimization work to fit the frame budget, especially when you know things about the scene (it's mostly small glyphs and with basically no overdraw) as opposed to being completely general purpose.

Doing a warp transformation in the element processing kernel would probably work just fine, and give you realistic movement and razor-sharp rendering.

I'd love to see such a thing and would very much like to encourage people to build it based on my results :)


Another need for that kind of effect is for real-time events.

I work on software that is used to create visual effects in real-time. It's used in live events such as concerts, clubs, corporate presentations. There you want to be able to render high-quality text in high resolution, even higher than 4K, and still allow a performer to add effects on text such as zooming, scrolling, doing 3D transition effects...

Today the best way to do this is via a font atlas but it does not look nice when zooming. Also if you need non-European characters it may heavy preparing the font atlas.

So I am looking for a library that would allow this kind of rendering manipulation for text rendering. Do you have any suggestions?


In the free software world, Pathfinder is probably your best bet, and I'm hopeful that it will take on speedups from compute (directly inspired by my research). The library that's probably the closest fit for what you describe is Slug.


You say free software and of course, it's nice to have source code and flexibility.

I have nothing against licensing and it looks like things are moving there. Do you have a recommendation for Metal / DirectX support for that kind of rendering facilities? What matters more for me is performance.


Slug has DirectX support (it runs on all major APIs). Pathfinder does not yet, but likely will in time. As always, evaluate the offerings and choose what meets your needs best. Performance is probably comparable between the two, but that depends hugely on workloads and how it's integrated into the rest of your system.


Thank you for sharing your experience I will investigate.


For the OP's effect, remember you're probably going to want motion blur in each frame, which means you can render at a much lower resolution in the direction of motion (as long as you can do multipass rendering)


I don't think you do want motion blur, as you don't know the direction in which the user's eyes are tracking. When games and films do motion blur, they know where they expect the viewer's attention to be, so blurring things they won't be tracking improves the look (especially at the low frame-rate of films). But if you happen to track the moving page, and you are likely to, and it has blur added, then it will look worse.

Look at phone interfaces; even if you scroll fast, they don't add motion blur, and on high framerate and/or low-persistence displays, you can read things while it scrolls.


I have done some testing around this (motion blur while scrolling) and can confirm that motion blur is not practical for scrolling. It “looks nice” and is a nice effect, but makes the text completely unreadable while scrolling.

If you have access to a Mac, the easiest way to check for your self is to add a CI motion blue filter to a tableview’s layer’s filter property (you can change the strength of the blur based on the speed of scrolling if you would like).


Eek, hadn’t thought about how motion blur might be desirable. That’s going to get messy.


I think you could easily get this done in a traditional 3D pipeline if you use a texture based font, aniso/mip filtering and gpu instancing. You wouldn't be rendering 2500 draw calls. You can batch much of it together.

The reason you don't see this is because no one runs a 3d engine in their text readers. Fonts are not usually shared as sprite sheets. You also lose subpixel font rendering.


I would like to point out that caching one bitmap per page will be the preferred solution for a different but important reason: You don't want to have your laptop GPU running at 100% while viewing static text. Fidelity is nice, but I don't think you'll convince many people with an ebook reader that drains your mobile device's battery like that.


I don’t believe this is a problem. It will only need to render it when things change—so maybe the GPU will be sitting at 100% while you’re flipping through pages, but other than that typically very brief time, it should be at 0%.


Noob question. Isn't some GPU acceleration for 2D used since forever? I keep being perplexed as to why browsers still have problems with it (at least on my older Macbook with a shitty Intel's embedded GPU). I thought some stuff, like scrolling, was offloaded at least in early/mid-2000s, making for a distinctly meh experience when proper drivers weren't installed.


GPUs derive most of their parallelism from emitting rows of related “quads” (the smallest unit of the screen that supports a difference operator). 2D graphics are “chatty” in window sizes that are more like 5–10 units in diameter. It’s hell on HW perf. To make things worse, 2D applications usually want submillisecond latency. A GPU driver/hw stack will struggle to get latency below 1–2ms. When there’s lots of multipass rendering (which is a thing 2D also wants a lot of), latency can to 10+ ms.


Well said, thanks. It's also a goal of this work to do compositing inside the "fine rasterizer" compute kernel, to avoid multipass as much as possible. Of course, in the general case for stuff like blur that's hard, but that's one reason why I've been working on special cases like blurred rounded rectangles, which I can render pretty much at lightspeed. (It's not integrated into the current code but would be quite easy)


What does latency have to do with 2D? Are you suggesting UI toolkits are writing GPU commands synchronously? You should fill display list, fire it and forget about GPU until the next update.


Pen drawing for UIs. The ideal case is to update a small portion of the screen at about 240hz, to provide a good simulation of pen/pencil feedback. Really, your latency envelope should be on the order of the propagation of sound through the barrel of the marking device, but screens don’t update that fast.


You are probably thinking https://www.youtube.com/watch?v=vOvQCPLkPt4

Bottleneck is in the input layer. While using a desktop computer look at the mouse cursor, now move the mouse - hardware accelerated 2D graphics, imperceptible latency.

Low latency thru orthogonal multiplexing https://www.youtube.com/watch?v=t1VcC9_yhc0


Surely >95% of screens out there right now are running at 60Hz, so 240Hz "pen drawing" is pretty niche and not a priority?


And >95% of screens don’t support pen drawing.

I expect a screen that supports or is designed for pen drawing to be somewhat more likely to be above 60Hz. All of these things are niche things that not many care about, but in any case, it’d be nice to be able to do better. And like with Formula 1 race cars, benefits from high-end techniques tend to trickle down to other more mainstream targets in time.


It's even more offloaded in browsers as I expected.

I recently had to do a proof of concept displaying a huge gantt chart (I really mean huge!).

Normally, 2D drawing has some disadvantages with textures and scaling/rotating.

This proof of concept included testing out various methods on the <canvas>, including both using WebGL and standard 2D calls.

To my surprise, I was able to get the standard 2D calls way faster, even with texturing and rotations (which I did not expect). All browsers (Chrome, Firefox, Edge, IE) do GPU optimizations with standard 2D canvas drawings.

Only when I was putting a shitload of different things on there (gantt chart looked more like a barcode at this point), the WebGL implementation started to outperform the 2D calls.

Remark that I didn't go into programming shaders. In the end, it wasn't necesarry since the normal canvas calls already supported plenty of performance.

Also remark that I'm not just some random dude with no experience. I'm creating a game dev tool https://rpgplayground.com, which runs fully in the browser using Haxe and the Kha graphics library, and made plenty of games during my career.


Well, some rendering acceleration was with us from the very beginning in form of DMA (https://en.wikipedia.org/wiki/Direct_memory_access) - transfer of blocks from RAM to VRAM.

But that pretty much what we have now even on modern systems.

For desktop UI purposes (but not for games) GPU acceleration started to become critical only relatively recently - on high-DPI screens. Number of pixels jumped 9 times between 96 ppi and 320 ppi (Retina) screens. CPUs haven't changed that much in the time frame so you may experience slow rendering on otherwise perfect screen picture.


At the very beginning x86 PC computers didnt have DMA capable of transferring ram to ram, not to mention DMA was slower than CPU http://www.os2museum.com/wp/the-danger-of-datasheets/ First PC 2D acceleration was very much on graphics chip, in form of either IBM 8514 or TIGA compatibility ($1K cards when introduced). Amiga also doesnt fit the profile, seeing most models shipped with unified shared memory (chip ram).


> Amiga also doesnt fit the profile, seeing most models shipped with unified shared memory (chip ram).

What profile? They did RAM to RAM DMA transfer exactly for video acceleration with the Agnus chip, which would access memory through DMA channels to perform 2D blitting.


c-smile specifically said 'RAM to VRAM'. Agnus/Alice didnt have DMA access to _non video_ ram so Amiga is automatically out.

What I think he was thinking about was Blitter in general, and not particular implementations using DMA controller. First Blitter accelerated 2D graphics I read about were done on Xerox Alto using microcode.


I'm arguing from the point of view that 'VRAM to VRAM' (or maybe "shared I/O RAM" in Amiga's case) is a subset of 'RAM to VRAM'. You seem to be arguing from the point of view that just "RAM" in this context means non-video RAM. That's a fair assumption, and I understood it from your last post, but I don't think the difference is significant to the general point that we've had various forms of DMA-based 2D video acceleration for a long time.


As others have said, the 2D pipeline right now is mostly focused on drawing textures (dumb blocks of pixel data). The work in the OP is about drawing SVG graphics (complex sets of shapes, lines, curves).


Yes, both X Server and Windows have 2D Hardware acceleration for years now. X Server through a multitude of interfaces[1] and Windows through GDI and now Direct2D[2][3]

[1] https://en.wikipedia.org/wiki/X.Org_Server#2D_graphics_drive...

[2] https://en.wikipedia.org/wiki/Direct2D

[3] https://docs.microsoft.com/en-us/windows/win32/direct2d/comp...


Yes, it's common to have this kind of acceleration in a video controller. There are VESA extensions for blitting to screen memory for example. Even something as basic as a CGA text screen—where rendering of bitmap fonts is offloaded to the video card and the CPU only needs to operate on indices into a table of predefined characters—could be considered a basic form of 2D acceleration.

The article and most discussion here however concerns rasterization, specifically path rendering, which is useful when rendering things like fonts and image formats like SVG. You use primitives such as lines and curves to form outlines of objects and then fill them. It's a problem that does not very easily translate into rendering textured triangles (which is what the GPU is best at) or copying/moving screen regions, so there's some work in doing it in a performant way still, and a lot of it typically happens on the CPU.


There was some form of 2D GPU acceleration used in Windows XP that is no longer used in modern GPUs. I remember a PCI Trident video card that provided such acceleration for my Windows 98-era computer.

There's also a 3D fixed pipeline that was all that the first 3D GPUs could do and it is also removed from modern GPUs.

A game I still play to this day is called rFactor, and it runs much faster in DX9 mode than in DX8 or DX7 mode in my hardware. At the very least, it behaves that way in this week's test. This means my hardware doesn't have that old fixed pipeline.

Some people with old hardware can only run it with decent frame rate in DX7.

So the answer to your question is: yes, 2D has been accelerated since forever, but now we have general purpose 3D programmable video cards, and they are so good that now we only have general purpose 3D programmable video cards and the other forms of acceleration are obsolete.

Except for things like video codecs and DRM.


Try windows3.1 instead of windows xp: 2d acceleration cards started early 90 , see https://en.m.wikipedia.org/wiki/S3_Graphics or trident pages by example.


I take your word for it.

It's just that my Win3.1 computer was too slow for me to remember the acceleration.


Even DirectX had predecessor, back in Windows 3.1 with the Win32s extensions, it was called WinG and offered hardware accelerated blitting.

https://en.wikipedia.org/wiki/WinG


I remember playing something that ran with the Win32s extensions and was similar to Wolfenstein 3D, but it was 3D, and I don't think it had anything to do with 2D acceleration.

Also remember the glorious Fury3 game.

https://www.youtube.com/watch?v=6_8tSpXLuK8


How does this (and Pathfinder for that matter) compare to NanoVG? I've recently been experimenting with swapping out Cairo for NanoVG and it seems much faster. The lack of dashed lines may kill my experiment though unless I can think of a decent workaround.


I have to say I'm consistently impressed by NanoVG. Not because it does anything fancy, but because it doesn't. It's not the fastest or the most featureful vector renderer out there, but it resides in a very nice sweet spot that balances performance, features, and simplicity.

In general I've found NanoVG to have comparable performance to the master branch of Pathfinder. NanoVG currently performs better at text (PF's implementation is designed to get international text right first and optimization hasn't really been done yet), while PF generally does better at complex vector workloads like the tiger.


I've just released a fork of nanovg that does GPU rendering a bit like Pathfinder, so it can support arbitrary paths - nanovg's antialiasing has some issues with thin filled paths. It also adds support for dashed lines.

https://github.com/styluslabs/nanovgXC

If you give it a try, let me know how it works for you.


That is very exciting to hear. I'll give it a shot. Thanks.


I expect it to be faster, but doing performance evaluation is really hard work. Performance depends on so many variables, including the workloads you're trying to render and a ton of factors about the GPU and system. I know people are really curious about competitive performance against other renderers, and it's something I should probably do, but honestly I'd love it if somebody else took up that mantle.


I kind of meant more in terms of how they work and what they can do but I suppose I could do the work in answering that question myself by figuring out how nanovg works then comparing it to what you've written.

I've been finding Patrick's pathfinder work fascinating so it's nice to have yet another project to follow.


Got it. So far my research prototype really just does filled paths and limited strokes, but the architecture should extend to a richer set of graphics primitives. Pathfinder also started out that way but recent work is building out a good chunk of the SVG imaging model. I definitely recommend checking it out, there's a good chance it'll do what you need.


Something I am not sure to completely understand, after reading the article, is: in which way is it better/ more desirable than the classical approach of GPU rendering? (with vertex, triangles, and shaders)


The basic problem is that, without compute, you either have to encode the vector scene into GPU primitives (triangles) as quickly as possible, and all the ways I know of to do that either involve (a) an expensive CPU process or (b) a cheap CPU process but way too much overdraw on GPU side. Compute gives you the best of both worlds, allowing you to upload outlines directly to GPU and do the processing necessary to lower them to primitives on the GPU itself.

Pathfinder in there is actually using regular old GPU rasterization with triangles and so forth, and I'm fairly confident it's about as fast as you can go at D3D10 level (i.e. no compute shaders) without sacrificing quality. Note that the numbers can vary wildly depending on hardware. On my MacBook Pro, with a powerful CPU and limiting myself to the Intel integrated GPU only, Pathfinder is actually about equal to the GPU compute approach on a lot of scenes like the tiger, though it uses a lot of CPU.


> all the ways I know of to do that either involve (a) an expensive CPU process or (b) a cheap CPU process but way too much overdraw on GPU side

That's fixable, take a look: https://github.com/Const-me/Vrmac#vector-graphics-engine

My CPU process is moderately expensive, and I use quite a few tricks to reduce overdraw, e.g. both draw calls (the complete vector image, however complex, takes 2 draw calls to render) use hardware depth buffer and early Z rejection.


Is it true for all 2D rendering?

From my intuition, this seems pretty specialized for vector-like rendering, with a lot of small bezier shapes.


Yeah, when I say 2D I mean vector art. There are a lot of things under the heading of 2D rendering, such as blitting raster sprites, that are much closer to being solved problems. (Though you might be surprised--power concerns, coupled with greatly increased pixel density, have brought renewed attention to performance of blitting lately...)


Question to the author: is this only implementation of GPU rasterizer? What about anything like WARP (https://en.wikipedia.org/wiki/Windows_Advanced_Rasterization...) - fallback rendering when GPU is not available ?


It's something I'd like to explore at some point, also Swiftshader, which might be easier to explore, as it's already Vulkan. I expect performance to be pretty good, but there are already really advanced CPU renderers such as Blend2D. Doing serious performance evaluation is hard work, so I actually hope it's something others take up, as I have pretty limited time for it myself.


I understand.

Problem is that any practical 2D rendering solution shall support as GPU as CPU rendering unfortunately.

It would be interesting to see any GPU equivalent of something like AGG by Max Shemanarev, RIP.


Practical 2D renderers implement an abstraction layer that lets them easily redirect their output to a number of low level libraries, which can be either CPU or GPU-based (or a mix). I worked on a few such 2D stacks, including OpenGL, DirectX, WebGL, and AGG based, and it took no more than a few days to add new 2D backends to an existing pipeline. Most 2D rendering is based on concepts from PostScript so it's usually easy to do such ports -- except for AGG, that one was a bit like a library from outer space. Maxim himself worked on a hardware accelerated 2D renderer for Scaleform, and it looked nothing like AGG, mostly because AGG is practically impossible to move to a GPU implementation.


I know, AGG is just a set of primitives but not an abstraction like class Graphics {...}. But that one can be assembled from them. Did it once for early versions of Sciter.

Ideally GPU and CPU rendering backends should have pixel perfect match that makes "adding new 2D backend" tricky at best.


iTerm's usage of metal on macOS is good example of benefits in implementing 2D rendering on GPU[1].

[1]https://gitlab.com/gnachman/iterm2/-/wikis/Metal-Renderer


That links to iTerm2 which is what I’m guessing you meant to say ? Kitty is also a GPU accelerated terminal emulator and one that I enjoy using https://sw.kovidgoyal.net/kitty/ Not sure if it uses Metal on Mac though.


Naive question - but does this change to make use of DX11/shaders in newer GPUs make it more likely to have a good cross-platform UI for app development?

I’ve looked before and although OpenGL is good for windowing, widgets etc, it’s not great for sub pixel rendered/anti aliased text.


Yes, the motivation and long term goal for this work is to provide a performant foundation for cross-platform UI. There's a lot to be done though!


I wonder how it compares performance-wise against my solution of the same problem: https://github.com/Const-me/Vrmac#vector-graphics-engine

I don’t use compute shaders, tessellating input splines into polylines, and building triangular meshes from these.


This is very impressive. Have you benchmarked it against core graphics on a Mac? I believe they've done a similar thing - performing the render on Metal


CoreGraphics is pure CPU rasterizer, similar to GDI+.

Sciter (https://sciter.com) on MacOS uses Skia/OpenGL by default with fallback to CoreGraphics.

It is possible to configure Sciter to use CoreGraphics on MacOS to compare these two on the same UI by using

    SciterSetOption(NULL, SCITER_SET_GFX_LAYER, GFX_LAYER_CG);
I think it is safe to say that Skia/OpenGL is 5-10 times more performant than CG on typical UI tasks.


Interesting, thanks - I remember hearing at one of the WWDC that Coregraphics got a 10x speed improvement using metal. I just read the fine print - it seems draw calls only have a 10x speed improvement, I assume because they render through a layer of some sort. The CA* libraries may use metal - animation and layers, which I guess is where the 10x draw call improvement comes in maybe.

Some discussion here https://arstechnica.com/civis/viewtopic.php?t=1285571 , though a lot of guessing.


Metal is definitely used extensively in CoreAnimation, and Apple UI tends to rely on that - relatively slow (and memory hungry) rendering of layer content, which is then composited very smoothly and nicely in CA.

They might use it for other stuff like glyph compositing (I think this is one reason they got rid of RGB subpixeling, to make it more amenable to GPU), but last I profiled it, it was still doing a lot of the pixels on CPU, as others have stated.


I believe that the most HW optimizations are made in MacOS on DWM level - window composition and animation. Window surface is just a bitmap in RAM that needs to be populated by CoreGraphics/CPU.

In any case Acrylic/Vibrancy effects or "blur-behind background" (like on screenshots here: https://sciter.com/sciter-4-2-support-of-acrylic-theming/) are achievable only on GPU and on DWM level.


And before that, I'm pretty sure Core Graphics used OpenGL in some situations. I remember seeing CG::OGL in stack traces for WebKit.


Core Graphics is generally slower than Cairo/Pixman.


Is GPU here synonymous with Nvidia?


No, it's designed to be portable to all GPU hardware that can support compute, which these days is a pretty good chunk of the fleet. I tested it on Linux on Intel HD 4000, and the master branch seems to run just fine, though previous versions.

A lot of the academic literature (Massively Parallel Vector Graphics, the Li et al scanline work) is dependent on CUDA, but that's just because tools for doing compute on general purpose graphics APIs are so primitive. I have a talk and a bunch of blog posts on exactly this topic, as I had to explore deeply to figure it out. See https://news.ycombinator.com/item?id=22880502 and https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html for more breadcrumbs on that.


It uses metal and apple uses amd do no.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: