Feature #4578

EIT: scrape sub-title from the summary for UK

Added by Em Smith 2 months ago. Updated about 1 month ago.

Status:FixedStart date:2017-09-09
Priority:NormalDue date:
Assignee:Adam Sutton% Done:

0%

Category:EPG - Grabbers
Target version:-

Description

The UK broadcaster often puts the sub-title in the summary as "subtitle: summary".

We can optionally scrape this to be the sub-title, whereas currently we use the summary as the sub-title.

Associated revisions

Revision b0ba8e37
Added by Em Smith 2 months ago

eit: Scrape sub-title from summary in OTA EIT. (#4578).

The Freeview/Freesat frequently have a subtitle as part of the
summary. So we have "Treehouse of Horror IX: Three scary stories."
from which we can deduce the subtitle as "Treehouse of Horror IX".

Other variants are "...title_continuation. Subtitle" (so the real
title of the program is split in to the summary), and
"x/y. Subtitle" where x/y is the episode number.

So allow scraping of this and use it as the subtitle. If we cannot
scrape a subtitle then we continue the existing practice of using
the summary buffer for the subtitle.

The subtitle is currently NOT removed from the summary.

Issue: #4578

Revision 5afd22d4
Added by Em Smith 2 months ago

eit: Add additional documentation on EIT scraper. (#4578)

Add some more documentation on the EIT scraper to clarify
that it does not access the Internet and that we only ship
with a few configurations at the moment.

Issue: #4578

Revision 7088f210
Added by Em Smith 2 months ago

eit: Minor fixes to regex to make them parsable as JSON for test harness. (#4578)

The strings were not parsable by the JSON parser.

Issue: #4578

Revision f95cf7d1
Added by Em Smith 2 months ago

eit: Fix scrape subtitle regex to be compatible with Python test harness. (#4578)

Python complained about the subtitle regex whereas they
worked fine in perl and Tvh. So fix them to work in
all three.

Issue: #4578.

History

#1 Updated by Mark Clarkstone 2 months ago

Em Smith wrote:

The UK broadcaster often puts the sub-title in the summary as "subtitle: summary".

We can optionally scrape this to be the sub-title, whereas currently we use the summary as the sub-title.

There was a fix for this a long time ago that was removed, I can't remember the reason, but I guess it caused issues for others, would be nice to have it back though.

/me mumbles something about broadcasters not following standards..

#2 Updated by Em Smith 2 months ago

"Those who forget history are doomed to repeat it".

I think there's a fine line we need to tread since it's easy to start trying to fix-up the data, but then we end up having a hundred-and-one edge cases. I tried fixing up the descriptions field but it just very quickly became messy.

I'm hoping the subtitle gets in, and then I have one last one in this area to add a new modifier to recording filename to only use subtitle if it differs to description and then it will be nicer.

#3 Updated by Jaroslav Kysela about 1 month ago

  • Status changed from New to Fixed

Also available in: Atom PDF