Updates from September, 2012 Toggle Comment Threads | Keyboard Shortcuts

  • Elmer Masters 3:37 am on September 27, 2012 Permalink
    Tags: ,   

    Dred Scott issue solved. The file weighs in at 750k. The simple_html_dom.php library I’m using had MAX_FILE_SIZE set to 600k. I pushed that to 10Mb just to be on the safe side. That did the trick.

  • Elmer Masters 10:47 pm on September 26, 2012 Permalink  

    So, processing of SCOTUS opinions stumbles on Dred Scott decision. No idea why. It is bigger than most, but only 750K. Same error on 2 machines. Seems like DOM is unable to open HTML file as an object. I’m looking into stuff like strange entities but nothing is popping out and I can open file in Firefox.

  • Elmer Masters 6:40 pm on September 24, 2012 Permalink  

    Watching docs process on the FLR server is like watching paint dry. Every opinions has to be opened, have metadata added or modified, and then be inserted into Solr. Takes some time.

  • Elmer Masters 6:14 pm on September 24, 2012 Permalink  

    Realized that the index.html files in all of those directories do not need to be modified or indexed. Updated the code to ignore index.html files.

  • Elmer Masters 5:41 pm on September 24, 2012 Permalink

    Finished up testing of SCOTUS cases from the 2008 PRO collections. Ready to load them into FLR. Should be interesting.

  • Elmer Masters 6:27 pm on September 20, 2012 Permalink  

    Before we get to the EPUBs though we need to make sure that all of that great meta data is being put into Solr. Finding these opinions is more important than recreating the book volumes.

  • Elmer Masters 6:21 pm on September 20, 2012 Permalink
    Tags: United States Reports   

    So, since I’m already running through all these opinions, I’m going to create EPUB versions of US Reports. Watch for availability.

  • Elmer Masters 3:54 pm on September 20, 2012 Permalink  

    Back into Free Law Reporter today. I’m adding additional meta data to SCOTUS opinions. Also decided the PRO case dump from 2008 would be represented as Free Law Reporter 3rd. BTW, I enjoy making up my own reporter series.

  • Elmer Masters 8:43 pm on September 18, 2012 Permalink
    Tags: bad spiders,   

    More bans on the Drupal system today. An IP address that belongs to Verizon FIOS and the panscient.com spider. Both repeatedly opened thousands of connections to seemingly random pages on the site over the course of 90 minutes this afternoon. Bad Spiders.

  • Elmer Masters 6:44 pm on September 18, 2012 Permalink
    Tags: ,   

    Turns out unpacking the Fastcase Advance Sheets .MOBI version is pretty straight forward. There’s a hunk of python for that at http://wiki.mobileread.com/wiki/Mobi_unpack. Once unpacked it’s all about processing the HTML into individual opinion files.
    It would be nice to use the .EPUB files, but at the moment those are locked away in iBooks.

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc