pdf to text

I need to create text file from pdf file content. I tried to do that using pdfsharp but it don't work. If some one knows how to do that using pdfsharp please help me.

There is a flavor of PDFSharp called MigraDoc, has a RTF to PDF and PDF to RTF library. You can convert a PDF into RTF then can extract txt by loading it into a richtextbox control (WinForm) easilly.

Basically, word to pdf is ok but pdf to word means the converter guessing at attributes.

It is generally used for converting book formats like mobi to epub but it also can do Pdf to docx. It doesn't provide a great conversion in terms of formatting. I have used to extract info from the pdf. It has a command line interface you can call from C#.

Another solution I have used is to convert the pdf pages to images and embed them into a docx. I used ghostscript to convert them to jpg or png files and Imagemajick to trim the margins and shrink them a little. Both ghostscript and Imagemajick are command lines programs you can call from C#. This worked very well for small 10 pages reports. You can set the resolution for the image in ghostscript.

In your code above, I think you are using two third party products and you’d better post your issue to the corresponding forums for professional supports. Besides, there are also some other tools suggested above and you could have a try. Or, you could try converting the PDF to XML and then convert the XML to Word document