- Perl Basics
- Perl Advanced
Apr 14, 2011 Depending exactly what you want to do with the PDFs and their contents (that is, do you want a perl module which essentially replaces Acrobat Reader, or do you just want to extract and print the text of the documents) CPAN might provide what you want as it contains quite a few modules related to PDFs: CPAN Search of PDF modules. Perl is widely known as 'the duct-tape of the Internet'. Perl can handle encrypted Web data, including e-commerce transactions. Perl can be embedded into web servers to speed up processing by as much as 2000%. Perl's modperl allows the Apache web server to embed a Perl interpreter. System Read and Write Functions Reading Characters Using getc Reading a Binary File Using binmode Directory-Manipulation Functions The mkdir Function The chdir Function The opendir Function The closedir Function The readdir Function The telldir and seekdir Functions The rewinddir Function The rmdir Function.
- Perl Useful Resources
Perl.com and the authors make no representations with respect to the accuracy or completeness of the contents of all work on this website and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. The information published on this website may not be suitable for every situation. I am trying to extract text from PDF files using Perl. I have been using pdftotext.exe from command line (i.e using Perl system function) for extracting text from PDF files, this method works fine. The problem is that we have symbols like α, β and other special characters in the PDF files which are not being displayed in the generated txt file.
- Selected Reading
Description
This function reads, or attempts to read, LENGTH number of bytes from the file associated with FILEHANDLE into BUFFER. If an offset is specified, the bytes that are read are placed into the buffer starting at the specified offset.
Syntax
Following is the simple syntax for this function −
Return Value
This function the number of bytes read or the undefined value.
Example
Following is the example code showing its basic usage −
When above code is executed, it produces the following result −
perl_function_references.htm
Active8 years, 11 months ago
I want to parse the text from a pdf file in perl without converting the pdf into any other format . Is it possible ?
HickPerl Read Pdf
Hick13.8k4242 gold badges129129 silver badges226226 bronze badges
1 Answer
Yes you can.
Take a look at the CAM::PDF package.
You can use this module to pull the text out.
Byron WhitlockByron Whitlock40.9k2323 gold badges109109 silver badges158158 bronze badges