IT Development

 

Sub-communities within this community

Collections in this community

Recent Submissions

  • Egense, Thomas
    About 30 million danish newspapers pages from 1880 to 2005 that has been digitized in the mediestream.dk project. Over 98% of the pages are in danish, but a few other languages are present in the corpus as well. This ...
  • Egense, Thomas
    Description: 55,000 e-books from Project Gutenberg (http://www.gutenberg.org/). About 35.000 books are english, but over 50 different languages are represented. The word2vec algorithm does a good job at seperating the ...