CHANGELOG

Path: CHANGELOG
Last Update: Tue Aug 10 19:48:23 -0600 2010

v0.7.7 (11th September 2009)

  • Trigger callbacks contained in Form XObjects when we encounter them in a content stream
  • Fix inheritance of page resources to comply with section 3.6.2

v0.7.6 (28th August 2009)

  • Various bug fixes that increase the files we can successfully parse
    • Treat float and integer tokens differently (thanks Neil)
    • Correctly handle PDFs where the Kids element of a Pages dict is an indirect reference (thanks Rob Holland)
    • Fix conversion of PDF strings to Ruby strings on 1.8.6 (thanks Andrès Koetsier)
    • Fix decoding with ASCII85 and ASCIIHex filters (thanks Andrès Koetsier)
    • Fix extracting inline images from content streams (thanks Andrès Koetsier)
    • Fix extracting [ ] from content streams (thanks Christian Rishøj)
    • Fix conversion of text to UTF8 when the cmap uses bfrange (thanks Federico Gonzalez Lutteroth)

v0.7.5 (27th August 2008)

  • Fix a 1.8.7ism

v0.7.4 (7th August 2008)

  • Raise a MalformedPDFError if a content stream contains an unterminated string
  • Fix an bug that was causing an endless loop on some OSX systems
    • valid strings were incorrectly thought to be unterminated
    • thanks to Jeff Webb for playing email ping pong with me as I tracked this issue down

v0.7.3 (11th June 2008)

  • Add a high level way to get direct access to a PDF object, including a new executable: pdf_object
  • Fix a hard loop bug caused by a content stream that is missing a final operator
  • Significantly simplified the internal code for encoding conversions
    • Fixes YACC parsing bug that occurs on Fedora 8‘s ruby VM
  • New callbacks
    • page_count
    • pdf_version
  • Fix a bug that prevented a font‘s BaseFont from being recorded correctly

v0.7.2 (20th May 2008)

  • Throw an UnsupportedFeatureError if we try to open an encrypted/secure PDF
  • Correctly handle page content instruction sets with trailing whitespace
  • Represent PDF Streams with a new object, PDF::Reader::Stream
    • their really wasn‘t any point in separating the stream content from it‘s associated dict. You need both parts to correctly interpret the content

v0.7.1 (6th May 2008)

  • Non-page strings (ie. metadata, etc) are now converted to UTF-8 more accurately
  • Fixed a regression between 0.6.2 and 0.7 that prevented difference tables from being applied correctly when translating text into UTF-8

v0.7 (6th May 2008)

  • API INCOMPATIBLE CHANGE: any hashes that are passed to callbacks use symbols as keys instead of PDF::Reader::Name instances.
  • Improved support for converting text in some PDF files to unicode
  • Behave as expected if the Contents key in a Page Dict is a reference
  • Include some basic metadata callbacks
  • Don‘t interpret a comment token (%) inside a string as a comment
  • Small fixes to improve 1.9 compatibility
  • Improved our Zlib deflating to make it slightly more robust - still some more issues to work out though
  • Throw an UnsupportedFeatureError if a pdf that uses XRef streams is opened
  • Added an option to PDF::Reader#file and PDF::Reader#string to enable parsing of only parts of a PDF file(ie. only metadata, etc)

v0.6.2 (22nd March 2008)

  • Catch low level errors when applying filters to a content stream and raise a MalformedPDFError instead.
  • Added support for processing inline images
  • Support for parsing XRef tables that have multiple subsections
  • Added a few callbacks to improve the way we supply information on page resources
  • Ignore whitespace in hex strings, as required by the spec (section 3.2.3)
  • Use our "unknown character box" when a single character in an Identity-H string fails to decode
  • Support ToUnicode CMaps that use the bfrange operator
  • Tweaked tokenising code to ensure whitespace doesn‘t get in the way

v0.6.1 (12th March 2008)

  • Tweaked behaviour when we encounter Identity-H encoded text that doesn‘t have a ToUnicode mapping. We just replace each character with a little box.
  • Use the same little box when invalid characters are found in other encodings instead of throwing an ugly NoMethodError.
  • Added a method to RegisterReceiver that returns all occurrences of a callback

v0.6.0 (27th February 2008)

  • all text is now transparently converted to UTF-8 before being passed to the callbacks. before this version, text was just passed as a byte level copy of what was in the PDF file, which was mildly annoying with some encodings, and resulted in garbled text for Unicode encoded text.
  • Fonts that use a difference table are now handled correctly
  • fixed some 1.9 incompatible syntax
  • expanded RegisterReceiver class to record extra info
  • expanded rspec coverage
  • tweaked a README example

v0.5.1 (1st January 2008)

  • Several documentation tweaks
  • Improve support for parsing PDFs under windows (thanks to Jari Williamsson)

v0.5 (14th December 2007)

  • Initial Release

[Validate]