The regex used non-greedy match, but alas it failed on input like this:
<object class="...> ... class="BrightcoveExperience"
It captured two objects and the intervening HTML. This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.
Video in question: http://www.harpercollinschildrens.com/feature/petethecat/
self.report_extraction(video_id)
# Look for BrightCove:
- m_brightcove = re.search(r'<object.+?class=([\'"]).*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
+ m_brightcove = re.search(r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
if m_brightcove is not None:
self.to_screen(u'Brightcove video detected.')
bc_url = BrightcoveIE._build_brighcove_url(m_brightcove.group())