In the winter semester 2013/14 Prof. Dr. Edmund Weitz[1]My lecturer in: Mathematics 1, Mathematics 2, Selected Topics of Media Computer Science and Theoretical Computer Science. started to record all of his lectures and distributed them online.

Because I find the ViMP media center too unintuitive and unclear, it didn't offer download links at that time, and it doesn't have the features I need, I developed a simple web crawler in one weekend. It automatically searches for new videos, caches them in a local database and generates a clear and simple index for the media center as an static HTML file. Videos get structurally grouped by course and by lecture date and can be marked by the website user via checkboxes [2]The state of the checkboxes, and which structure element is open or closed, are of course locally saved in the browser, and not transmitted over cookies or similiar, remaining unknown to the web server..

The Media Indexer was extended between the winter semester 2013/14 and the summer semester 2014 by the following features for a minor honorarium:

  • Additional structure group: semesters.
  • Summation of video duration and number of videos.
  • Configuration file, to set the media center URL, where the data inside the HTML-DOM resides and how it is parsed.
  • Configuration file, to arbitrarily change the structure of the index, as well as changing the regular expressions to determine structure membership.
  • Macros for the configuration files, to determine the date and semester by the program, to use them as structuring elements.
  • More command line parameters[3]Manually delete or update single videos, generating the index without a search (e.g. when the config files have changes), reset the local database and start from scratch, etc..
  • The internal database have changed from CSV to XML.
  • The HTML generation isn't programmed into the binary executable anymore, but is outsourced to a XSLT file.
  • Additional output: RSS-Feeds (likewise via XSLT)
  • Generating subpages for semesters and couses, to not always have to load the huge front page.
  • Multithreading: asynchronous HTTP requests, analysis and output generation are all parallelized[4]The numbers aren't comparable, because of all the new features, but the running time for a complete run improved from 6.3 minutes with 461 videos to 3.3 minutes with 518 videos. Usually there isn't a complete run over all pages and all videos, but a small run over only the first few pages and videos that were added since the last run.
  • Improved exception handling, logging and reporting.

Between the summer semester 2014 and the winter semester 2014/15 it was again slightly extended:

  • Bugfix: operating system dependent problems with umlauts in course names[5]The course names are used as directory names, forming the URLs of the subpages. The program is configured to use UTF-8 whenever possible, but nonetheless on one of three systems the file system prevented subpages with umlauts to update..
  • Manually implemented and optimized insertion sort algorithm, which is, contrary to the quicksort algorithm of Arrays.sort, better fitted for this application [6]The old elements are already sorted and new elements will usually be added to the end and rearranged only with other new elements..
  • Additional outputs: XML[7]Based on the XML format, a fellow student made a Python script, that automatically downloads all video files of one course. and JSON (both via XSLT).

In the winter semester 2014/15 the Media Indexer was extended with a comment counter, to highlight potentially important remarks and corrections.

The Media Indexer is used by Prof. Dr. Weitz, in a private area of his website, that only students with the correct password can access, and by me on my website. Both instances use the same software, but differ in the configuration files[8]In contrary, my configuration accepts video files that other lecturers might have added to the media center, and lists videos for that no rule has matched as 'Unsortiert' (engl.: unsorted)..

Languages Scala, XML, XSLT, HTML, CSS, JavaScript, JSON
Technologies Async Http Client, HtmlCleaner, Xerces, Xalan, EXSLT, Futures, Scala Regex, StringEscapeUtils, ThreadLocal, Java Properties, XPath, XSD, DTD, HTML5, jQuery, RSS
IDE Eclipse with Scala IDE
Participants 1

Robin C. Ladiges / Media Indexer

German flag