Ignore low-confidence CharlockHolmes guesses when parsing link cards (#9510)

* Add failing test for windows-1251 link cards

* Ignore low-confidence CharlockHolmes guesses

Fixes #9466

* Fix no method error when charlock holmes cannot detect charset
This commit is contained in:
ThibG 2018-12-17 19:19:45 +01:00 committed by Eugen Rochko
parent 4ede51743e
commit e709b8da0d
3 changed files with 30 additions and 1 deletions

17
spec/fixtures/requests/windows-1251.txt vendored Normal file
View file

@ -0,0 +1,17 @@
HTTP/1.1 200 OK
server: nginx
date: Wed, 12 Dec 2018 13:14:03 GMT
content-type: text/html
content-length: 190
accept-ranges: bytes
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251" />
<title>ñýìïë òåêñò</title>
</head>
<body>
<p>ñýìïë òåêñò</p>
</body>
</html>