New features
- Added
docx
as an input format (Jesse Rosenthal). The docx reader includes conversion of native Word equations to pandoc LaTeXMath
elements. Metadata is taken from paragraphs at the beginning of the document with stylesAuthor
,Title
,Subtitle
,Date
, andAbstract
. - Added
epub
as an input format (Matthew Pickering). The epub reader includes conversion of MathML to pandoc LaTeXMath
elements. - Added
t2t
(Txt2Tags) as an input format (Matthew Pickering). Txt2tags is a lightweight markup format described at http://txt2tags.org/. - Added
dokuwiki
as an output format (Clare Macrae). - Added
haddock
as an output format. - Added
--extract-media
option to extract media contained in a zip container (docx or epub) while adjusting image paths to point to the extracted images. - Added a new markdown extension,
compact_definition_lists
, that restores the syntax for definition lists of pandoc 1.12.x, allowing tight definition lists with no blank space between items, and disallowing lazy wrapping. (See below under behavior changes.) - Added an extension
epub_html_exts
for parsing HTML in EPUBs. - Added extensions
native_spans
andnative_divs
to activate parsing of material in HTML span or div tags as Pandoc Span inlines or Div blocks. --trace
now works with the Markdown, HTML, Haddock, EPUB, Textile, and MediaWiki readers. This is an option intended for debugging parsing problems; ordinary users should not need to use it.
Behavior changes
- Changed behavior of the
markdown_attribute
extension, to bring it in line with PHP markdown extra and multimarkdown. Settingmarkdown="1"
on an outer tag affects all contained tags, recursively, until it is reversed withmarkdown="0"
(#1378). - Revised markdown definition list syntax (#1429). Both the reader and writer are affected. This change brings pandoc's definition list syntax into alignment with that used in PHP markdown extra and multimarkdown (with the exception that pandoc is more flexible about the definition markers, allowing tildes as well as colons). Lazily wrapped definitions are now allowed. Blank space is required between list items. The space before a definition is used to determine whether it is a paragraph or a "plain" element. WARNING: This change may break existing documents! Either check your documents for definition lists without blank space between items, or use
markdown+compact_definition_lists
for the old behavior. .numberLines
now works in fenced code blocks even if no language is given (#1287, jgm/highlighting-kate#40).- Improvements to
--filter
: - Don't search PATH for a filter with an explicit path. This fixed a bug wherein
--filter ./caps.py
would runcaps.py
from the system path, even if there was acaps.py
in the working directory. - Respect shebang if filter is executable (#1389).
- Don't print misleading error message. Previously pandoc would say that a filter was not found, even in a case where the filter had a syntax error.
- HTML reader:
- Parse
div
andspan
elements even without--parse-raw
, providednative_divs
andnative_spans
extensions are set. Motivation: these now generate native pandoc Div and Span elements, not raw HTML. - Parse EPUB-specific elements if the
epub_html_exts
extension is enabled. These includeswitch
,footnote
,rearnote
,noteref
. - Org reader:
- Support for inline LaTeX. Inline LaTeX is now accepted and parsed by the org-mode reader. Both math symbols (like
\tau
) and LaTeX commands (like\cite{Coffee}
), can be used without any further escaping (Albert Krewinkel). - Textile reader and writer:
- The
raw_tex
extension is no longer set by default. You can enable it withtextile+raw_tex
. - DocBook reader:
- Support
equation
,informalequation
,inlineequation
elements withmml:math
content. This is converted into LaTeX and put into a Pandoc Math inline. - Revised
plain
output, largely following the style of Project Gutenberg: - Emphasis is rendered with
_underscores_
, strong emphasis with ALL CAPS. - Headings are rendered differently, with space to set them off, not with setext style underlines. Level 1 headers are ALL CAPS.
- Math is rendered using unicode when possible, but without the distracting emphasis markers around variables.
- Footnotes use a regular
[n]
style. - Markdown writer:
- Horizontal rules are now a line across the whole page.
- Prettier pipe tables. Columns are now aligned (#1323).
- Respect the
raw_html
extension.pandoc -t markdown-raw_html
no longer emits any raw HTML, including span and div tags generated by Span and Div elements. - Use span with style for
SmallCaps
(#1360). - HTML writer:
- Autolinks now have class
uri
, and email autolinks have classemail
, so they can be styled. - Docx writer:
- Document formatting is carried over from
reference.docx
. This includes margins, page size, page orientation, header, and footer, including images in headers and footers. - Include abstract (if present) with
Abstract
style (#1451). - Include subtitle (if present) with
Subtitle
style, rather than tacking it on to the title (#1451). - Org writer:
- Write empty span elements with an id attribute as org anchors. For example
Span ("uid",[],[]) []
becomes<<uid>>
. - LaTeX writer:
- Put table captions above tables, to match the conventional standard. (Previously they appeared below tables.)
- Use
\(..\)
instead of$..$
for inline math (#1464). - Use
\nolinkurl
in email autolinks. This allows them to be styled using\urlstyle{tt}
. Thanks to Ulrike Fischer for the solution. - Use
\textquotesingle
for'
in inline code. Otherwise we get curly quotes in the PDF output (#1364). - Use
\footnote<.>{..}
for notes in beamer, so that footnotes do not appear before the overlays in which their markers appear (#1525). - Don't produce a
\label{..}
for a Div or Span element. Do produce a\hyperdef{..}
(#1519). - EPUB writer:
- If the metadata includes
page-progression-direction
(which can beltr
orrtl
, thepage-progression-direction
attribute will be set in the EPUB spine (#1455). - Custom lua writers:
- Custom writers now work with
--template
. - Removed HTML header scaffolding from
sample.lua
. - Made citation information available in lua writers.
--normalize
andText.Pandoc.Shared.normalize
now consolidate adjacentRawBlock
s when possible.
API changes
- Added
Text.Pandoc.Readers.Docx
, exportingreadDocx
(Jesse Rosenthal). - Added
Text.Pandoc.Readers.EPUB
, exportingreadEPUB
(Matthew Pickering). - Added
Text.Pandoc.Readers.Txt2Tags
, exportingreadTxt2Tags
(Matthew Pickering). - Added
Text.Pandoc.Writers.DokuWiki
, exportingwriteDokuWiki
(Clare Macrae). - Added
Text.Pandoc.Writers.Haddock
, exportingwriteHaddock
. - Added
Text.Pandoc.MediaBag
, exportingMediaBag
,lookupMedia
,insertMedia
,mediaDirectory
,extractMediaBag
. The docx and epub readers return a pair of aPandoc
document and aMediaBag
with the media resources they contain. This can be extracted using--extract-media
. Writers that incorporate media (PDF, Docx, ODT, EPUB, RTF, or HTML formats with--self-contained
) will look for resources in theMediaBag
generated by the reader, in addition to the file system or web. Text.Pandoc.Readers.TexMath
: Removed deprecatedreadTeXMath
. RenamedreadTeXMath'
totexMathToInlines
.Text.Pandoc
: AddedReader
data type (Matthew Pickering).readers
now associates names of readers withReader
structures. This allows inclusion of readers, like the docx reader, that take binary rather than textual input.Text.Pandoc.Shared
:- Added
capitalize
(Artyom Kazak), and replaced uses ofmap toUpper
(which give bad results for many languages). - Added
collapseFilePath
, which removes intermediate.
and..
from a path (Matthew Pickering). - Added
fetchItem'
, which works likefetchItem
but searches aMediaBag
before looking on the net or file system. - Added
withTempDir
. - Added
removeFormatting
. - Added
extractSpaces
(from HTML reader) and generalized its type so that it can be used by the docx reader (Matthew Pickering). - Added
ordNub
. - Added
normalizeInlines
,normalizeBlocks
. normalize
is nowPandoc -> Pandoc
instead ofData a :: a -> a
. Some users may need to change their uses ofnormalize
to the newly exportednormalizeInlines
ornormalizeBlocks
.Text.Pandoc.Options
:- Added
writerMediaBag
toWriterOptions
. - Removed deprecated and no longer used
readerStrict
inReaderOptions
. This is handled byreaderExtensions
now. - Added
Ext_compact_definition_lists
. - Added
Ext_epub_html_exts
. - Added
Ext_native_divs
andExt_native_spans
. This allows users to turn off the default pandoc behavior of parsing contents of div and span tags in markdown and HTML as native pandoc Div blocks and Span inlines. Text.Pandoc.Parsing
:- Generalized
readWith
toreadWithM
(Matthew Pickering). - Export
runParserT
andStream
(Matthew Pickering). - Added
HasQuoteContext
type class (Matthew Pickering). - Generalized types of
mathInline
,smartPunctuation
,quoted
,singleQuoted
,doubleQuoted
,failIfInQuoteContext
,applyMacros
(Matthew Pickering). - Added custom
token
(Matthew Pickering). - Added
stateInHtmlBlock
toParserState
. This is used to keep track of the ending tag we're waiting for when we're parsing inside HTML block tags. - Added
stateMarkdownAttribute
toParserState
. This is used to keep track of whether the markdown attribute has been set in an enclosing tag. - Generalized type of
registerHeader
, using new type classesHasReaderOptions
,HasIdentifierList
,HasHeaderMap
(Matthew Pickering). These allow certain common functions to be reused even in parsers that use custom state (instead ofParserState
), such as the MediaWiki reader. - Moved
inlineMath
,displayMath
from Markdown reader to Parsing, and generalized their types (Matthew Pickering). Text.Pandoc.Pretty
:- Added
nestle
. - Added
blanklines
, which guarantees a certain number of blank lines (and no more).
Bug fixes
- Markdown reader:
- Fixed parsing of indented code in list items. Indented code at the beginning of a list item must be indented eight spaces from the margin (or edge of the container), or four spaces from the list marker, whichever is greater.
- Fixed small bug in HTML parsing with
markdown_attribute
, which caused incorrect tag nesting for input like<aside markdown="1">*hi*</aside>
. - Fixed regression with intraword underscores (#1121).
- Improved parsing of inline links containing quote characters (#1534).
- Slight rewrite of
enclosure
/emphOrStrong
code. - Revamped raw HTML block parsing in markdown (#1330). We no longer include trailing spaces and newlines in the raw blocks. We look for closing tags for elements (but without backtracking). Each block-level tag is its own
RawBlock
; we no longer try to consolidate them (though--normalize
will do so). - Combine consecutive latex environments. This helps when you have two minipages which can't have blank lines between them (#690, #1196).
- Support smallcaps through span.
<span style="font-variant:small-caps;">foo</span>
will be parsed as aSmallCaps
inline, and will work in all output formats that support small caps (#1360). - Prevent spurious line breaks after list items (#1137). When the
hard_line_breaks
option was specified, pandoc would formerly produce a spurious line break after a tight list item. - Fixed table parsing bug (#1333).
- Handle
c++
andobjective-c
as language identifiers in github-style fenced blocks (#1318). - Inline math must have nonspace before final
$
(#1313). - LaTeX reader:
- Handle comments at the end of tables. This resolves the issue illustrated in http://stackoverflow.com/questions/24009489.
- Correctly handle table rows with too few cells. LaTeX seems to treat them as if they have empty cells at the end (#241).
- Handle leading/trailing spaces in
\emph
better.\emph{ hi }
gets parsed as[Space, Emph [Str "hi"], Space]
so that we don't get things like* hi *
in markdown output. Also applies to\textbf
and some other constructions (#1146). - Don't assume preamble doesn't contain environments (#1338).
- Allow (and discard) optional argument for
\caption
(James Aspnes). - HTML reader:
- Fixed major parsing problem with HTML tables. Table cells were being combined into one cell (#1341).
- Fixed performance issue with malformed HTML tables. We let a
</table>
tag close an open<tr>
or<td>
(#1167). - Allow space between
<col>
and</col>
. - Added
audio
andsource
ineitherBlockOrInline
. - Moved
video
,svg
,progress
,script
,noscript
,svg
fromblockTags
toeitherBlockOrInline
. map
andobject
were mistakenly in both lists; they have been removed fromblockTags
.- Ignore
DOCTYPE
andxml
declarations. - MediaWiki reader:
- Don't parse backslash escapes inside
<source>
(#1445). - Tightened up template parsing. The opening
{{
must be followed by an alphanumeric or:
. This prevents the exponential slowdown in #1033. - Support "Bild" for images.
- DocBook reader:
- Better handle elements inside code environments. Pandoc's document model does not allow structure inside code blocks, but at least this way we preserve the text (#1449).
- Support
<?asciidoc-br?>
(#1236). - Textile reader:
- Fixed list parsing. Lists can now start without an intervening blank line (#1513).
- HTML block-level tags that do not start a line are parsed as inline HTML and do not interrupt paragraphs (as in RedCloth).
- Org reader:
- Make tildes create inline code (#1345). Also relabeled
code
andverbatim
parsers to accord with the org-mode manual. - Respect
:exports
header argument in code blocks (Craig Bosma). - Fixed tight lists with sublists (#1437).
- EPUB writer:
- Avoid excess whitespace in
nav.xhtml
. This should improve TOC view in iBooks (#1392). - Fixed regression on cover image. In 1.12.4 and 1.12.4.2, the cover image would not appear properly, because the metadata id was not correct. Now we derive the id from the actual cover image filename, which we preserve rather than using "cover-image."
- Keep newlines between block elements. This allows easier diff-ability (#1424).
- Use
stringify
instead of customplainify
. - Use
renderTags'
for all tag rendering. This properly handles tags that should be self-closing. Previously<hr/>
would appear in EPUB output as<hr></hr>
(#1420). - Better handle HTML media tags.
- Handle multiple dates with OPF
event
attributes. Note: in EPUB3 we can have only one dc:date, so only the first one is used. - LaTeX writer:
- Correctly handle figures in notes. Notes can't contain figures in LaTeX, so we fake it to avoid an error (#1053).
- Fixed strikeout + highlighted code (#1294). Previously strikeout highlighted code caused an error.
- ConTeXt writer:
- Improved detection of autolinks with URLs containing escapes.
- RTF writer:
- Improved image embedding:
fetchItem'
is now used to get the images, and calculated image sizes are indicated in the RTF. - Avoid extra paragraph tags in metadata (#1421).
- HTML writer:
- Deactivate "incremental" inside slide speaker notes (#1394).
- Don't include empty items in the table of contents for slide shows. (These would result from creating a slide using a horizontal rule.)
- MediaWiki writer:
- Minor renaming of
st
prefixed names. - AsciiDoc writer:
- Double up emphasis and strong emphasis markers in intraword contexts, as required by asciidoc (#1441).
- Markdown writer:
- Avoid wrapping that might start a list, blockquote, or header (#1013).
- Use Span instead of (hackish)
SmallCaps
inplainify
. - Don't use braced attributes for fenced code (#1416). If
Ext_fenced_code_attributes
is not set, the first class attribute will be printed after the opening fence as a bare word. - Separate adjacent lists of the same kind with an HTML comment (#1458).
- PDF writer:
- Fixed treatment of data uris for images (#1062).
- Docx writer:
- Use Compact style for empty table cells (#1353). Otherwise we get overly tall lines when there are empty table cells and the other cells are compact.
- Create overrides per-image for
media/
in reference docx. This should be somewhat more robust and cover more types of images. - Improved
entryFromArchive
to avoid an unneeded parse. - Section numbering carries over from reference.docx (#1305).
- Simplified
abstractNumId
numbering. Instead of sequential numbering, we assign numbers based on the list marker styles. Text.Pandoc.Options
:- Removed
Ext_fenced_code_attributes
frommarkdown_github
extensions. Text.Pandoc.ImageSize
:- Use default instead of failing if image size not found in exif header (#1358).
- ignore unknown exif header tag rather than crashing. Some images seem to have tag type of 256, which was causing a runtime error.
Text.Pandoc.Shared
:fetchItem
: unescape URI encoding before reading local file (#1427).fetchItem
: strip a fragment like?#iefix
from the extension before doing mime lookup, to improve mime type guessing.- Improved logic of
fetchItem
: absolute URIs are fetched from the net; other things are treated as relative URIs ifsourceURL
isJust _
, otherwise as file paths on the local file system. fetchItem
now properly handles links without a protocol (#1477).fetchItem
now escapes characters not allowed in URIs before trying to parse the URIs.- Fixed runtime error with
compactify'DL
on certain lists (#1452). pandoc.hs
: Don't strip path off ofwriterSourceURL
: the path is needed to resolve relative URLs when we fetch resources (#750).Text.Pandoc.Parsing
- Simplified
dash
andellipsis
(#1419). - Removed
(>>~)
in favor of the equivalent(<*)
(Matthew Pickering). - Generalized functions to use
ParsecT
(Matthew Pickering). - Added
isbn
andpmid
to list of recognized schemes (Matthew Pickering).
Template changes
- Added haddock template.
- EPUB3: Added
type
attribute tolink
tags. They are supposed to be "advisory" in HTML5, but kindlegen seems to require them. - EPUB3: Put title page in section with
epub:type="titlepage"
. - LaTeX: Made
\subtitle
work properly (#1327). - LaTeX/Beamer: remove conditional around date (#1321).
- LaTeX: Added
lot
andlof
variables, which can be set to get\listoftables
and\listoffigures
(#1407). Note that these variables can be set at the command line with-Vlot -Vlof
or in YAML metadata.
Under the hood improvements
- Rewrote normalize for efficiency (#1385).
- Rewrote Haddock reader to use
haddock-library
(#1346). - This brings pandoc's rendering of haddock markup in line with the new haddock.
- Fixed line breaks in
@
code blocks. - alex and happy are no longer build-depends.
- Added
Text.Pandoc.Compat.Directory
to allow building against different versions of thedirectory
library. - Added
Text.Pandoc.Compat.Except
to allow building against different verions ofmtl
. - Code cleanup in some writers, using Reader monad to avoid passing options parameter around (Matej Kollar).
- Improved readability in
pandoc.hs
. - Miscellaneous code cleanups (Artyom Kazak).
- Avoid
import Prelude hiding (catch)
(#1309, thanks to Michael Thompson). - Changed
http-conduit
flag tohttps
. Depend onhttp-client
andhttp-client-tls
instead ofhttp-conduit
. (Note: pandoc still depends onconduit
viayaml
.) - Require
highlighting-kate >= 0.5.8.5
(#1271, #1317, Debian #753299). This change to highlighting-kate means that PHP fragments no longer need to start with<?php
. It also fixes a serious bug causing failures with ocaml and fsharp. - Require latest
texmath
. This fixes\tilde{E}
and allows\left
to be used with]
,)
etc. (#1319), among many other improvements. - Require latest
zip-archive
. This has fixes for unicode path names. - Added tests for plain writer.
Text.Pandoc.Templates
:- Fail informatively on template syntax errors. With the move from parsec to attoparsec, we lost good error reporting. In fact, since we weren't testing for end of input, malformed templates would fail silently. Here we revert back to Parsec for better error messages.
- Use
ordNub
(#1022). - Benchmarks:
- Made benchmarks compile again (Artyom Kazak).
- Fixed so that the failure of one benchmark does not prevent others from running (Artyom Kazak).
- Use
nfIO
instead of thegetLength
trick to force full evaluation. - Changed benchmark to use only the test suite, so that benchmarks run more quickly.
- Windows build script:
- Add
-windows
to file name. - Use one install command for pandoc, pandoc-citeproc.
- Force install of pandoc-citeproc.
make_osx_package
: Call zip filepandoc-VERSION-osx.zip
. The zip should not be namedSOMETHING.pkg.zip
, or OSX finder will extract it into a folder namedSOMETHING.pkg
, which it will interpret as a defective package (#1308).README
:- Made headers for all extensions so they have IDs and can be linked to (Beni Cherniavsky-Paskin).
- Fixed typos (Phillip Alday).
- Fixed documentation of attributes (#1315).
- Clarified documentation on small caps (#1360).
- Better documentation for
fenced_code_attributes
extension (Caleb McDaniel). - Documented fact that you can put YAML metadata in a separate file (#1412).