Does this tool upload my PDF to any server?

No. The entire extraction process runs inside your browser using JavaScript. Your file is read locally via the FileReader API and processed in-memory. Nothing is transmitted to any server at any point. You can disconnect from the internet before using it and it will work identically.

Why does it say no images were found, even though I can clearly see photos in my PDF?

This happens when the PDF uses an image encoding format this tool does not reconstruct, such as CCITTFaxDecode (common in scanned black-and-white documents) or JBIG2 (used in some OCR pipelines). It can also happen if the visual content is rendered as vector graphics rather than embedded raster images, or if the images are stored inside compressed object streams that wrap the entire PDF object tree.

Will the extracted JPEG images have the same quality as the originals?

Yes, for images extracted as JPEG. The tool copies the raw JPEG bytes directly out of the PDF binary without re-encoding or re-compressing them. The output file is byte-for-byte identical to what was embedded in the PDF, so the quality and resolution are exactly what the document author included.

What does the RAW badge mean on some extracted images?

RAW means the image was stored in the PDF as a FlateDecode compressed pixel stream (raw RGB or grayscale data compressed with DEFLATE). The tool decompresses the pixel data and exports it as a PNG file. The pixel content is preserved exactly, but the file format has been converted from the internal compressed format to PNG for compatibility.

Can I extract images from password-protected PDFs?

Only if the PDF has already been unlocked before you load it in your browser. Password-protected PDFs with content encryption use the PDF encryption specification to scramble all stream data, which means the raw bytes in the file are ciphertext and cannot be parsed as images. If you have the password and your PDF viewer can open the file, some viewers allow you to save an unencrypted copy, which can then be processed by this tool.

What is the ZIP download and how does it work?

The Download All as ZIP button packages every extracted image into a single ZIP archive without any server involvement. The ZIP file format is implemented entirely in JavaScript following the PKZIP specification, using the STORE compression method (images are already compressed, so re-compressing adds no benefit). Each file entry gets a correct CRC-32 checksum computed in-browser. The resulting ZIP can be opened by any standard archive utility.

PDF Image Extractor — Justconvertpdf

📄

Drop a PDF here or click to choose

PDF files only · Processed entirely in your browser

Skip tiny images (under 50x50 px) Skip duplicate images Include compressed/raw streams

Scanning PDF... 0%

Why Getting Images Out of a PDF Is Harder Than It Looks

You open a PDF report, see a crisp product photograph or a detailed technical diagram, and you want that image. You right-click. Nothing useful appears. You try screenshot tools, but a screenshot is a smudged copy — you lose resolution, you pick up compression artifacts from your screen, and the pixel count shrinks to whatever your monitor happens to render. The original image sitting inside that PDF might be a 4000-pixel-wide TIFF-quality JPEG. What you capture from a screenshot is a 1200-pixel shadow of it.

This gap between "there is clearly an image here" and "I can extract that image cleanly" frustrates designers, researchers, publishers, and anyone who works with documents professionally. The PDF format was not built with easy media extraction in mind. It was built to make documents look identical on every device and printer. Images are embedded as binary streams inside an opaque container, referenced through an internal structure called an XObject. To get them out correctly, you have to understand that structure, not just copy pixels off a screen.

What Is Actually Happening Inside a PDF

A PDF file is a structured binary format. It contains objects — dictionaries, streams, arrays — all cross-referenced through a table at the end of the file. Images are stored as stream objects tagged with a subtype of "Image." Each image object carries metadata in its dictionary: width, height, bits per component, color space, and crucially, the filter used to compress the raw pixel data.

The two most common image encodings you encounter are DCTDecode and FlateDecode. DCTDecode means the image data is a raw JPEG bitstream — the exact same bytes you would get in a standalone .jpg file. FlateDecode means the pixel data has been compressed using the DEFLATE algorithm, the same compression used inside ZIP files and PNG images. Less commonly, PDFs contain CCITTFaxDecode streams (used for scanned black-and-white documents) or JBIG2Decode streams (efficient for text-heavy scans).

When a PDF contains a JPEG image, the raw JPEG bytes are embedded directly in the file. The PDF parser does not transform them. That means extracting a JPEG from a PDF is, at its core, a matter of finding the JPEG start marker (the byte sequence 0xFF 0xD8 0xFF), locating the end marker (0xFF 0xD9), and copying everything in between. The result is a byte-perfect replica of the original JPEG at its original quality level — no re-encoding, no quality loss.

Why Online Tools and Desktop Apps Often Fail

Many PDF image extraction tools convert the entire PDF page to an image first, then hand you that. This defeats the purpose entirely. If a PDF page is 72 DPI for display but contains a 300 DPI photograph, rasterizing the page gives you 72 DPI output. The embedded photograph at 300 DPI is destroyed before you ever see it.

Other tools upload your file to a server, extract the images there, and send them back. Beyond the obvious privacy concern — PDFs often contain confidential contracts, medical records, financial statements, internal presentations — this approach also introduces a bottleneck. Large PDFs time out, server queues back up, and the extracted files sit on someone else's infrastructure for an indeterminate period.

A third class of tools simply crashes or returns an empty result for PDFs that use certain encoding combinations, because they hard-coded support for only one filter type.

How This Extractor Works

This tool reads your PDF file entirely inside your browser using the FileReader API. Your file never leaves your device. The JavaScript engine runs three sequential scans over the raw binary data.

The first scan hunts for JPEG start-of-image markers throughout the entire file. Every valid JPEG found — identified by its SOI header and matched to its corresponding EOI terminator — is extracted as a standalone byte sequence. The tool then reads the JPEG's SOF (Start of Frame) segment to determine the actual pixel dimensions, letting it display width and height without loading the image into a canvas first. Duplicates are detected using a fast hash computed over sampled bytes, so if the same image appears on twenty pages of a document you still get just one copy.

The second scan searches for the PNG signature bytes (the fixed 8-byte sequence 0x89 0x50 0x4E 0x47 followed by the mandatory IHDR chunk). PNGs embedded in PDFs are less common than JPEGs but do appear, particularly with logos, screenshots, and illustrations created by vector-to-raster workflows. The PNG IHDR chunk contains the width and height at fixed byte offsets, making extraction deterministic.

The third scan, which can be disabled via the options, targets FlateDecode image streams. These are identified by scanning the PDF's text layer for dictionary entries specifying image subtype, width, height, and a FlateDecode filter, followed by the length of the compressed data. The tool uses the browser's native DecompressionStream API to decompress the DEFLATE data in-memory, then writes the resulting raw pixels to an offscreen canvas. From the canvas it exports a PNG file, which is handed to you as a download.

Interpreting the Results

Each extracted image is shown with its pixel dimensions and file size. The JPEG badge indicates a losslessly extracted JPEG — the bytes are identical to what the author embedded. The PNG badge marks a directly embedded PNG. The "RAW" badge means the image was reconstructed from a compressed pixel stream and exported as PNG; quality is preserved but the file format has changed from the original stored format.

If a PDF shows no extractable images, this typically means the document uses vector graphics (described in PDF's own drawing language, not pixel data), consists entirely of text, or uses an encoding type like CCITTFax that this tool does not currently reconstruct. A scanned document that looks like a photo in a PDF viewer might actually store its content as CCITTFaxDecode — essentially a fax compression format — rather than JPEG or PNG.

Getting the Best Results

For PDFs made from design tools like InDesign, Illustrator, or professional photography workflows, expect primarily JPEG extractions at the same resolution the designer embedded. For PDFs exported from Microsoft Office or Google Docs, expect a mix of JPEG and PNG depending on the original asset types used in the document. For scanned PDFs run through OCR software, results vary by the scanner software — many modern scanners embed images as JPEG, which this tool handles cleanly.

Disabling the "Skip tiny images" filter can reveal icons, bullets, watermarks, and decorative elements that are technically images but were filtered out by the default minimum size threshold. Disabling "Skip duplicates" lets you retrieve every instance of a repeated image, useful if you need to verify that embedded copies are consistent across a document.

Large PDFs with hundreds of embedded photographs may take several seconds to scan. The progress bar tracks the multi-pass scan so you can see where the tool is in the process. The "Download All as ZIP" button assembles all extracted images into a ZIP archive using a pure JavaScript implementation of the ZIP file format, with correct CRC-32 checksums and no third-party library required.

🏞️ PDF Image Extractor