Because we love it when corporations come up with their own proprietary implementations of existing open protocols with similar if not superior functionality.
Apples and oranges. multipart/form-data is used for sending a set of form information, possibly including files, all together. This announcement is that S3 will now allow you to upload a file in pieces so that you don't lose everything when an upload is interrupted.
Based on the docs, it appears that this also allows you to upload segments of files without knowing the final number of segments or the final file size.
This will be pretty damn useful for piping the output of some process generating a large file (i.e. video transcoding) and beginning the upload before the file has been fully created.
This will be pretty damn useful for piping the output of some process generating a large file (i.e. video transcoding) and beginning the upload before the file has been fully created.
Even better: You can split a video into pieces, transcode each part on a different EC2 node, and upload the parts directly from those respective nodes.
That's an interesting idea. The only problem I can see is that every segment needs to be min. 5MB, so you'd probably need to buffer an extra segment compared to the "naive" implementationto ensure the last segment is big enough.
Limitations of the TCP/IP protocol make it very difficult for a single application to saturate a network connection.
I'm just curious, but exactly which "limitations" are those? I can believe that parallel connections help in practice (especially when fetching small objects), but for large objects, I find it surprising you can't get reasonably close to saturating a single network connection with a modern TCP stack (e.g., using TCP window scaling).
It's pretty much impossible to saturate even a LAN connection with a single TCP connection. The are a number of issues at play here — RTT (Round Trip Time, i.e. ping/latency), window sizes, packet loss and initcwnd (TCP's initial window).
The combination of the limitations imposed by the speed of light and TCP's windowing system means that you are buggered transferring large files over high-latency TCP connections. I haven't checked their figures, but here's a TCP rate calculator I just found which lets you tune the different parameters: http://osn.fx.net.nz/LFN/
The greater the delay, the bigger the impact. For example if we take a standard Windows XP machine and plug in the values for a standard Gigabit LAN (typically .2ms latency between hosts) we get a maximum speed of 700Mbit/sec, but if we try if between two hosts, one of them in the USA (typically around 120ms) the maximum transfer rate falls to 1.17 Mbit/sec.
The are a number of issues at play here — RTT (Round Trip Time, i.e. ping/latency), window sizes, packet loss and initcwnd (TCP's initial window).
Initial window size: not relevant AFAICS, I'm not talking about connection startup behavior.
RTT, Window size: if the bandwidth-delay product is large, obviously you need a large window size (>>65K). Thankfully, recent TCP stacks support TCP window scaling.
Packet loss: you need relatively large buffers (by the standards of traditional TCP) and a sane scheme for recovering from packet loss (e.g., SACK), but I don't see why this is a show stopper on modern TCP stacks.
I'm not super familiar with the SPDY work, but from what I recall, it primarily addresses connection startup behavior, rather than steady-state behavior.
This is very good - uploading large files is a PITA. Now all we need is a Flash client we can add to a website, and we'll have a reliable way for website users to upload huge files.
As fashionable as it is to deride Flash, the vast majority find it quite stable. I believe a good Flash developer could produce a solid upload client for this service. I also believe that every developer on HN would use such a client without batting an eye.
Granted. Mobile devices are handicapped in terms of embeddable functionality, but that isn't limited to Flash. Java applets, the only obvious alternative for this use case, are in the same boat. I think that if mobile users want to upload >100Mb files over the air it's fair to make them use an app.
Perhaps, but I started the blog in November of 2004, long before EC2, S3, or any of the other services had been released. It was a clean and simple way to get a blog up and running and I've never had a compelling reason to go through the trouble to move it to another host.
I would assume that the AWS guys are all about comparative advantage. Anybody not familiar with the term will probably enjoy Wikipedia's surprisingly good explanation:
tl;dr: The AWS people get higher marginal return on their investment of time and money by hosting with TypePad and putting the time they save into making AWS better. Probably.
I did not find this in the description for the service, but I'm wondering what happens if you have a crash or power failure while doing a multi-part upload and dont have the 'upload-id' stored anywhere.
First of all the storage for the data already uploaded is reserved and there is no way to release since you cannot abort the upload without the 'id'.
Second of all there doesnt seem to be a way to list active multi-part uploads, you can only list the status of an upload for which you have the 'id'.
I run a service that process S3 and Cloudfront logs for people. Each S3 Bucket generates between 200 and 1000 logfiles every day that need to be combined together to for a full day's weblog.
Part of what my service does is re-upload that combined logfile back to the bucket in question, and since for large sites it can be upwards of 200mb zipped, it'd be nice to be able to upload it in little 5mb chunks that can be resent if anything goes wrong.
exactly the same use case here... :)
Maybe Amazon should go a step further and enable people to get logs aggregated together by the chosen time unit (hour, day, week), so we won't need to do round trip just to join files.