Our drivers will come and pick up your print archive whether it's a warehouse full or a single volume.
Digital issues can be sent by Dropbox, FTP or post.
We use high-speed industrial scanners to digitise your issues producing full colour 300DPI images of each page. If your content was bound in volumes we rebuild and name each issue.
Our OCR cloud then takes your pages and extracts the text.
We support over 100 languages including English, Spanish, German, French, Chinese, Arabic, and more!
Our systems then extract basic layout elements from each page such as columns, paragraphs, headings, and images.
Each page is hand-tagged to split out individual articles and join articles across page boundaries.
By extracting each article we can create pages that are highly targetted for strong keyword matching, SEO, and searchability.
Using all of the data generated by the previous steps we create the text for each article, following the flow of the tagging. Corrections to deal with hyphenation and drop caps are made.
Your content is now free from the chains of its print formatting! It can be repurposed for the web, tables, phones, syndication, anything you can think of!
Named entities like people, places, buildings, and companies are identified.
We categorise your articles into topics such as sport, politics, human interest, music, etc.
An article graph is created that links topics, entities, and other attributes to create a complete map of your content.
This allows deeper control of searches and the ability to suggest highly related content.