Project

General

Profile

Problems with file upload for very big files (several GB)

Added by Emeric Poupon 9 months ago

Hello,

I am using wt 4.10.0 on Debian 12.
I noticed that when uploading a file (using a WFileDropWidget), a temporary file is created, but when the file is complete, the file is copied in a brand new temporary file.
Once this is done, the first temporary file is deleted and the uploaded callback is called for the new copied one.

This is very annoying as this can take a very long time for big files, and require even more disk space.
And even more annoying, for somewhat unknown reasons, it sometimes fail: both files are deleted and the uploaded signal is not fired (or at least not received?), with no reason in the logs.

Is this known? I don't remember this behavior with previous Wt versions, and uploading/stealing the uploaded file was very fast and convenient.

By the way, it could be handy to have an option not to automatically start uploading the files once they are added (if the connection is slow, it prevents the user from interacting with the rest of the application, and it would easier for the application to expose an "upload" button and just hide everything to show a progress bar to reduce the possible interactions)


Replies (3)

RE: Problems with file upload for very big files (several GB) - Added by Matthias Van Ceulebroeck 9 months ago

Hello Emeric,

Indeed Wt does upload the file to a temporary file, and has been doing this for a long time.

Perhaps for your use-case you can take a look at WFileUpload, this offers more functionality, like stealSpooledFile

If you wish to keep using the WFileDropWidget, perhaps you can take a look at setJavaScriptFilter. This function allows you to send the bigger files in chunks. I believe all temporary files will then only be the size of your chunks, and will then be appended to the final file.

A filterFn may look something like:

var createFilter = function(upload, chunkSize) {
  // The initial position to read the item from.
  var position = 0;

  // Set `true` if you want compression
  upload.filtered = false;

  var readNextChunk = function() {
    var fileReader = new FileReaderSync();
    var result = fileReader.readAsArrayBuffer(upload.file.slice(position, position + chunkSize));

    position += result.byteLength;
    last = (position >= upload.file.size);

    // This blob is the actual data, or the compressed data if `filter` is enabled.
    var blob = new Blob([new Uint8Array(result)]);

    chunk = new Object();
    chunk.upload = upload;
    chunk.data = blob;
    chunk.last = last;
    return chunk;
  };

  var chunkBuffer = readNextChunk();

  return function filter(sendChunkFn) {
    sendChunkFn(chunkBuffer);
    if (!chunkBuffer.last)
        chunkBuffer = readNextChunk();
  };
}

The chunksize variable is the size of the chunks (expressed in bytes).

I do notice this section requires more documentation.

I hope this helps, if not, please let me know.

RE: Problems with file upload for very big files (several GB) - Added by Emeric Poupon 9 months ago

Thanks for you reply!
Actually I am using stealSpooledFile from https://www.webtoolkit.eu/wt/doc/reference/html/classWt_1_1WFileDropWidget_1_1File.html / https://www.webtoolkit.eu/wt/doc/reference/html/classWt_1_1Http_1_1UploadedFile.html
I guess it is supposed to work?

But I am not sure how the filter function would solve my problem?
For example for a 14GB file, I do see -server side- one single wt temporary file used for the whole upload, except at the very end where the whole file is copied in another new one. Looks like this copy is only server side?
Do you mean it is a side effect of the "default" filter javascript function?

For information, I set in nginx conf:

    proxy_request_buffering off;
    proxy_buffering off;
    client_max_body_size 0;

And in wt_config.xml <max-request-size>51200000000</max-request-size>

RE: Problems with file upload for very big files (several GB) - Added by Wim Dumon 9 months ago

Hello Emeric,

Wt spools the file because it spools the full post request before it interprets its data. From the top of my head, with standard file upload, the data must pass through a MIME parser and some MIME cruft will be removed before you end up with the uploaded file. That's what happens between the spooled request and the spooled file, and this has been this way for a very long time.

We had an application recently where we needed to upload multigigabyte files. Because we wanted some extra features, like resumed uploads, we chose to implement tus (see tus.io). On the client side we use the tus javascript library and tie it in WFileDropWidget, and on the server we implemented the tus server part as a WResource. This library does not use the standard support in a browser to upload a file, but makes its own HTTP calls to the tus backend. In our implementation there's still some copying of files done in the case of resumed uploads, but depending on your server implementation you could probably eliminate that. This tus integration is not part of standard Wt.

Wim.

    (1-3/3)