Ticket Hash: | 94c6a431fee47acdb590ee3963704ef1d756a5cf | |||
Title: | Buffering until timeout | |||
Status: | Closed | Type: | Code Defect | |
Severity: | Important | Priority: | High | |
Subsystem: | Resolution: | Fixed | ||
Last Modified: | 2019-04-09 18:02:25 | |||
Version Found In: | 1.7.16 | |||
User Comments: | ||||
anonymous added on 2018-04-01 14:06:53:
This bug might not be in Tcltls: it might be in Tcl stacked channels, the Tcl [read] command, fileevent generation or buffering. The bug occurs when using Tcltls 1.7.16 with the stock Tcl 8.6.8 and LibreSSL 2.6.4, but has existed for a long time: it also occurs in older versions at least as far back as Tcltls 1.6.7, Tcl 8.5.13 and OpenSSL 1.02d. When a channel is opened with tls::socket, a non-blocking [read] does not always return the available data, sometimes (if -blocksize is 4096) even if it exceeds the size requested; instead it waits until the remote server closes the connection. When using http::geturl with tls::socket, a HTTPS request for an entity that is larger than the -blocksize hangs if the server response uses a persistent connection and does not use chunked encoding (but does supply an accurate Content-Length header). The client has some successful [reads], but then does nothing until the server times out, although the full response has been received and is buffered somewhere. The relevant part of http is in the command http::Event, for unchunked data with known size $state(totalsize) obtained from a Content-Length header. The relevant part of the code can be abstracted as: fconfigure $sock -blocking off -translation binary -buffersize $state(-blocksize) fileevent $sock readable [list http::Event $sock $token] proc http::Event {sock token} { variable $token upvar 0 $token state # 1. How many bytes to ask for. set reqSize $state(-blocksize) # 2. Read. set block [read $sock $reqSize] set n [string length $block] if {$n >= 0} { append state(body) $block incr state(currentsize) $n } # 3. Check for end of data. if {$state(currentsize) >= $state(totalsize)} { Eof $token } return } The bug can be worked around by precisely specifying the size of the untransferred response body. # 1. How many bytes to ask for. set reqSize [expr {$state(totalsize) - $state(currentsize)}] This workaround fails if reqSize is capped at $state(-blocksize): # 1. How many bytes to ask for. set reqSize [expr {min($reqSize, $state(-blocksize))}] When the hanging occurs, more than $state(-blocksize) bytes can be available, (if -blocksize is 4096) but they are not returned by [read] until timeout. The properties read by [fconfigure $sock] (when added to http::Event) are: -blocking 0 -buffering full -buffersize 8192 -encoding binary -eofchar {{} {}} -translation {lf crlf} The last three properties correspond (for reading) to the results of the command fconfigure $sock -translation binary read(n) describes non-blocking mode with two arguments: read channelId numChars If channelId is in nonblocking mode, the command may not read as many characters as requested: once all available input has been read, the command will return the data that is available rather than blocking for more input. If the channel is configured to use a multi-byte encoding, then there may actually be some bytes remaining in the internal buffers that do not form a complete character. These bytes will not be returned until a complete character is available or end-of-file is reached. The -nonewline switch is ignored if the command returns before reaching the end of the file. The hanging occurs even with a binary resource, fetched with -encoding binary, so the multi-byte issue does not arise. There is no problem with http, only with https where [tls::socket] is used in place of [socket]. read(n) makes clear that in nonblocking mode, the command will not wait to receive the number of characters requested. Example code that demonstrates the delay: # The http package sets both the -buffersize for the socket, and the number of # bytes requested with [read], to the http::geturl option -blocksize, which has # default value 8192. package require http set start [clock milliseconds] proc http::Log {args} { set time [expr {[clock milliseconds] - $::start}] puts stderr [list $time {*}$args] flush stderr return } package require tls http::register https 443 [list ::tls::socket -ssl2 0 -ssl3 0 -tls1 1 -tls1.1 1 -tls1.2 1] proc GetFromWeb {blocksize url} { set ::prevSize 0 set token [::http::geturl $url \ -blocksize $blocksize \ -keepalive 1 \ -progress httpProgress \ -headers {accept-encoding unsupported} \ ] array get $token # N.B. Implicit Return } proc httpProgress {token total current} { upvar 0 $token state set msg {, and therefore will not demonstrate the bug.} if { [info exists state(transfer)] && ($state(transfer) eq "chunked") } { puts "The response uses chunked transfer encoding, $msg" } elseif {$total == 0} { puts "The server has not reported the size of the entity, $msg" } if {$state(connection) eq {close}} { puts "The server will close the connection, $msg" } set time [expr {[clock milliseconds] - $::start}] set extra [expr {$current - $::prevSize}] set ::prevSize $current puts "$time Fetched $extra bytes - accumulated $current bytes of total $total" flush stdout } set blocksize 8192 # This URL serves (at least in the UK) a resource of about 300kB, unchunked, and # times out after approx 13s. set url https://www.bbc.co.uk/ # These URLs demonstrate that http sites (using [socket]) do not # trigger the bug. # set url http://news.mit.edu/sites/mit.edu.newsoffice/files/styles/news_article_image_top_slideshow/public/images/2018/MIT-Debris-Disks_0.jpg?itok=kGim01Xz # set url http://static01.nyt.com/images/2018/03/30/theater/30threetallwomen2/merlin_135911256_e2911f33-8b35-445d-90ee-d15457c19fae-superJumbo.jpg # The corresponding https URLs do trigger the bug, but are inconvenient to use # because the server does not time out for about 10 minutes. # set url https://news.mit.edu/sites/mit.edu.newsoffice/files/styles/news_article_image_top_slideshow/public/images/2018/MIT-Debris-Disks_0.jpg?itok=kGim01Xz # set url https://static01.nyt.com/images/2018/03/30/theater/30threetallwomen2/merlin_135911256_e2911f33-8b35-445d-90ee-d15457c19fae-superJumbo.jpg set start [clock milliseconds] set img [GetFromWeb $blocksize $url] set NUL {} anonymous added on 2018-04-01 14:17:03: The workaround has been applied to Tcl's http library in branch bug-46b6edad51-concurrent-http anonymous added on 2018-04-01 19:53:58: This bug shows many similarities with the bug reported here: https://sourceforge.net/p/tls/bugs/38/ and (cross-filed) here: http://core.tcl.tk/tcl/tktview/1945538fffffffffffff Regards, Erik Leunissen -- anonymous added on 2018-04-02 02:21:24: Thanks, Erik. Yes, it does seem to be the same bug. anonymous (claiming to be jinhu@stanford.edu) added on 2018-10-22 21:01:25: Hi, We have the same problem and I was able to pinpoint it after learning tcl channel frame. 1. In "tlsWatchProc" of tlsIO.c, the condition to create timer is not covering all cases. It should also check the BIO buffer. The line should read: if ((mask & TCL_READABLE) && (Tcl_InputBuffered(statePtr->self) > 0 || BIO_ctrl_pending(statePtr->bio) > 0)) { ............... } 2. In "BioCtrl" of tlsBIO.c, the result of BIO_CTRL_PENDING should be: ret = ((chan) ? Tcl_InputBuffered(chan) : 0); NOT ret = ((chan) ? 1: 0); Second error caused BIO_ctrl_pending always return 1 even no data available. Please let me know after you push our the patch. Best, Jinhu jinhu@stanford.edu rkeene added on 2019-04-09 18:02:25: Fixed in TclTLS v1.7.17+ |