projects
/
youtube-dl
/ commitdiff
commit
grep
author
committer
pickaxe
?
search:
re
summary
|
shortlog
|
log
|
commit
| commitdiff |
tree
raw
|
patch
| inline |
side by side
(parent:
bc8a2ea
)
[utils] Fix inconsistent output of clean_html
author
Tithen-Firion
<tithen.firion.0@gmail.com>
Fri, 28 Apr 2017 15:34:27 +0000
(17:34 +0200)
committer
Tithen-Firion
<tithen.firion.0@gmail.com>
Fri, 28 Apr 2017 15:34:27 +0000
(17:34 +0200)
`\s` in Python 2.x doesn't match unicode whitespace characters by
default
youtube_dl/utils.py
patch
|
blob
|
history
diff --git
a/youtube_dl/utils.py
b/youtube_dl/utils.py
index 91e235ff2f6166106c93e4b8de5bbeb18a6d8b7a..41bc205446a7594ece9f14dc43b0617808962053 100644
(file)
--- a/
youtube_dl/utils.py
+++ b/
youtube_dl/utils.py
@@
-421,8
+421,8
@@
def clean_html(html):
# Newline vs <br />
html = html.replace('\n', ' ')
- html = re.sub(r'\s*<\s*br\s*/?\s*>\s*', '\n', html)
- html = re.sub(r'<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
+ html = re.sub(r'
(?u)
\s*<\s*br\s*/?\s*>\s*', '\n', html)
+ html = re.sub(r'
(?u)
<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
# Strip html tags
html = re.sub('<.*?>', '', html)
# Replace html entities