LOAR Repository

Machine learning

Machine learning


Datasets for machine learning algorithms


Two word2vec dictionaries added, build from the following corpus 1) 65.000 Gutenberg E-books 2) 32 Million danish newspaper pages

Recent Submissions

  • Egense, Thomas
    About 30 million danish newspapers pages from 1880 to 2005 that has been digitized in the mediestream.dk project. Over 98% of the pages are in danish, but a few other languages are present in the corpus as well. This ...
  • Egense, Thomas
    Description: 55,000 e-books from Project Gutenberg (http://www.gutenberg.org/). About 35.000 books are english, but over 50 different languages are represented. The word2vec algorithm does a good job at seperating the ...

Search LOAR


My Account


RSS Feeds