The Unicode conversion issue.. [UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 10-15: ordinal not in range(128)]

>>>site= pywikibot.Site(‘en’,’wikipedia’)
>>> cat = {
…     ‘ar’: u’تصنيف:وسوم حقوق نسخ الصور غير الحرة’,
…     ‘en’: u’Category:Wikipedia non-free file copyright tags’,
…     ‘zh’: u’Category:合理使用图像模板’,
… }
>>> category = pywikibot.translate(site, cat)
>>> templatecat = pywikibot.Category(site, category)
>>> templatelist = list(templatecat.articles())
>>> for template in templatelist:
…     temp = pagegenerators.ReferringPageGenerator(template)
…     print list(temp)
>>>
ending in :
 Page(Wikipedia:WikiProject Chile/Assessment), Page(Wikipedia:WikiProject Rhode Island/Assessment), Page(Wikipedia:WikiProject Green Day/Assessment), Page(Wikipedia:WikiProject Polynesia/Assessment), Page(Wikipedia:WikiProject Wisconsin/Assessment), Traceback (most recent call last):
  File “<stdin>”, line 3, in <module>
UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 27-28: ordinal not in range(128)
and for another case:
>>> c = site.newpages(step=100)
>>> list(c)
ending in :
(Page(No Sections test), u’2015-04-16T15:41:01Z’, 36, u”, u’72.28.152.40′, u’Created page with “This is an article without sections.”‘), (Page(Wikipedia:Requests for permissions/Pending changes reviewer), u’2015-04-16T05:01:14Z’, 1915, u”, u’MusikAnimal’, u’Created page with “<noinclude>{{pp-semi-indef}}{{pp-move-indef}}{{Requests for permissions}}{{noadminbacklog}}<!–If the backlog is cleared, than change this to {{noadminbacklog}} and vice-versa…”‘), (Page(User:Joylintp), u’2015-04-16T01:52:51Z’, 520, u”, u’Joylintp’, u’Created page with ” ==\u554f\u5019== ===\u4e2d\u6587=== \u60a8\u597d!\u6b61\u8fce! ===English=== Hello!Welcome ! ===\u65e5\u672c\u8a9e=== \u3053\u3093\u306b\u3061\u306f\uff01\u3088\u3046\u3053\u305d\uff01 ===esperanta=== Saluton!Bonvenon ! ===fran\xe7ais=== Bonj…”‘), (Traceback (most recent call last):
  File “<stdin>”, line 1, in <module>
UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 10-15: ordinal not in range(128)

ONgoing Wrap Up Report : The SUMUP of all the great experiences.. :D :)

This post shall add as the final report summarizing the intern duration into a nutshell. The entire period from the very start has been a real boost for me for enhancing my skills  and expanding my knowledge. So, I would like to add more in a layman format about the experience throughout the internship period. Not to state throughout the duration it was a fixed cycle of questioning, exploring its answers, phases of confusion and finally a resurrection from the same, but it was all worth the experience I have gained. I am really grateful support to my mentors for their apprehensive support and guidance.

Project Objective: Getting into the technicalities of my project, I had basically two work divisions.

As proposed  have updated the documentation part and is focusing on the hard-core code implementation. For the code implementation, the sub-tasks and dependencies are summarized here.  The link to the present status review sheet is here.

Reorganizing the elements help a lot..

It’s really important for me to understand the aim of my project and make a demarcation between the scopes of mere porting and the scope of improvements in the script. After discussing with on of my mentors, I realized that I am meant to just add the scripts with minimum modifications made such that during improvement phase it would be easier to understand how the script converted from compat to core version. Well that’s the main reason my scripts are not being merged which though looks disappointing but is essential. I might continue even after porting the scripts to assist in improvement of the scripts for core version.

Now secondly, I  still need to get a lot more familiar with the scripts still + get into core python programming too. These would be really great helper tasks.

I’d better get few clarifications made within the reviewer board regarding merging and present workflow adopted to deal with the scripts already in progress in the project for merge purpose. I hope this reorganization really gives a proper direction to my ongoing tasks.

The obstacles continue to hinder my way..

It seems that the confusion and obstacles of this week’s task are taking new shape with each minor progress, as such I have got a new issue to tackle. Well this new issue is not due to any script I have been working on but seems to be due to some unwanted changes in my system configuration/settings while installing mwparserfromhell. It can be viewed here. Just hoping to resolve it at the earliest so that further delay in submission may be avoided.

Fingers crossed.

Finally the issue resolved \o/. Well I cleaned an unwanted file which was causing the error in the system, hence well me back to work again .. 😀

Status of the restart after the exams..

It had been long since I added any post on my work status. So, here is one ..
Well the week after the exam seemed quite confusing and loaded since I couldn’t proceed with most of the scripts due to one or the other obstructing errors I got. The major issue is the requirement of the script which in most of the cases is not properly mentioned, as such I have compiled the list of issues that ate this week of mine because of which i didn’t submit any important patch yet.

This is the list of the queries I had ..

Task one : Copyright repackaging.

Status :
  • I had updated the pywikibot/compat/query file so that it may support full functionality of copyright. (basically by adding new utility functions – CombineParams, ConvToList, ListToParam and  ToUtf8.
  • But there is a warning given in this script, i.e.,  – this makes me doubtful if I am proceeding right.
    WARNING: THIS MODULE EXISTS SOLELY TO PROVIDE BACKWARDS-COMPATIBILITY.
    Do not use in new scripts; use the source to find the appropriate function/method instead.

Suggested : It’s preferred not to expand the content which are meant to support backward-compatibilty with compat. As such, this move was not preferred.

  • I have added the scripts by making it compatible with the core version.
  • Then I have added pywikibot/scripts/copyright folder with the following files:

copyright.py — output  — which seem is working well but is without Google API or Yahoo API I was referring to, ending with.

copyright_put.py  In this I am stuck at line 184 (I am not sure which ‘output’ or ‘pending’ file it is referring to) – please suggest me how can i create one such ‘output’ file so that testing might be completed. (I found this point while debugging it and concluding that here the program stops because the condition is not fulfilled.

copyright_clean.py  — I get output which seems to be working.

__init__.py 

  • Additional files generated during testing – link
    BESIDES, as most of them seem to be working I shall push these files namely : core/script/copyright/4 files + core/copyright/exclusion_list.txt.
    Query  1 : Do I need to add something else too?

Suggested : I was suggested which files need to be added.

Task two :  Missing possibility to retrieve images from a page that were not included through templates
Status :
 
          For this what I understood is, getting a new function named say ‘linkedPagesthroughcontentparsing‘ in pywikibot/page.py and should make use of regex search operations to execute it . Isn’t it ? I am not very familiar with regex that’s why I have stopped proceeding. The link is what I have done till now by trying to see the implementation in compat version. Please suggest if I am on the right track. ( A lot of changes need to be done)
Suggestion : Need to follow pep8 guidelines and it’s better to have it as an argument like “content” which the default would be false and when it’s true you parse the content instead of the links.
 
Task three: Port warnfile.py
Status :
 
        Output but If I manually create the file I get :
>>➜  pywikibot-core git:(warnfile)✗ python pwb.py scripts/warnfile.py -lang:’test’ family:’test’ 
 interwiki-bot.log
         >> Parsing warnfile…
         >> Fixing… 0 pages
                 where interwiki-log.bot is the log file generated using -log parameter with python interwiki.py.
            As you had told me to go through the interwiki.py script, I inferred that it would be better in case you may give me example of some existing warnfile files so that I may use it for testing purpose since this warnfile.py is acting as mere a module which is imported in interwiki.py (L2441 in interwiki.py) if it gets parameter -warnfile:filename . So any pre-existing example might be very helpful for me to proceed as then I may test the script and then submit the patch after properly testing it.
Suggested: I have been suggested to use this command to run the script.
python pwb.py interwiki -new -family:wiktionary -lang:en -dry -log -ns:14
 
Task four:  Porting : splitwarning.py (not listed in phabricator)
Status: 
  •  This script is supposed to split the log file (like interwiki-bot.log) but since I am not yet done with regex i guess that’s hindering me to proceed.
  • Anyway what I inferred is that in the different log files i used (locally present in my repo) namely: interwiki.log, interwiki-bot.2.log,  interwiki-bot.log,  makecat-bot.log : it seems not been able to get any matching warning as it expects L29 . Please let me what warning is it and how I may generate this warning again and thus test this script properly before submitting.
 
Task five: Porting : piper.py (not listed in phabricator)
Status
  • Output — Initially it was working perfect but now I guess due to the new package added for copyright task it is spilling these messages . How may I get rid of these messages? Now it doesn’t work anymore :/
  • Query 2: One more problem once I asked you earlier this issue: for piper.py as you said i moved the message to scripts/i18n/piper.py file and it’s working as expected just that i couldn’t understand the fact that for some scripts like blockpageschecker.py the i18n script has the same name but for different key in the message twtranslate seems to work but when i tried o change the key for message (dict) for piper it doesn’t work so finally the name of the msg script is piper.py and the keys for the dict is also piper  Why is it happening so ?

Suggested:  To checkout how category.py in i18n directory works..

Task six: Porting standardize_interwiki.py (not listed in phabricator)
Status
  • Output  —  Seems to be working well.
  • Please check the script once if I shall add something then i’ll submit the patch.
Suggested: It would be good if you use pagegen argument handling (so it supports something -cat or -start or etc.) (use genFactory etc.)
 
Query 3: One more thing I see is that i get message in most of the scripts:
>>>WARNING: /home/innovator/pywikibot-core/pywikibot/page.py:4751: UserWarning: Site test:test instantiated using different code “yi”
         >>>link._site = pywikibot.Site(lang, source.family.name) 
         Is this fine or did I make some mistake or so?
Remark: It’s not a problem of the script.

So, this is the query list, since I have got their solution it’s easier to proceed.
Much for now..

Tada 😀

Estimated Revised timeline

I’ll  follow strict deadlines to compensate the delay caused.

The task to be taken care of in the first two week includes the rest tasks in phabricator. Atleast I’ll check upon each of them and try to fix the issues collectively.Will get going based on the review sheet.

Yet a lot to learn. Hoping that all goes well..

Mid Term Evaluation

I realize that my project is going quite behind the planned schedule. It has been because of many reasons.. One of the primary reason being I had wrongly estimated the time consumption required for each script. Besides, inevitable involvement in family or work too have made the progress slow.

As such the final way left for me is to get going as soon as my exams get over. I hope I might do the needful..

** A major drawback which contributed to my slow pace – less familiar with programming in Python.

Important advices.

Here, I have described a sum up of the different suggestions and guidelines I had been directed for porting..

Basic in general guidelines:

  • This is the ultimate guide to start working.
  • Besides, for an overall study on all aspects of Pywikibot on may refer to the compilation mediawiki and at wikibooks.
  • Do the scripting according to pep8 , pep257 , flake8 and pyflakes guidelines..
  • Make minute changes based on the present version, date, author etc.. details.
  • Retain the functionality of the script based on the pywikibot module.
  • Don’t forget to test the script, in different environments (python version, OS version, etc.), before submitting it.
  • Make sure that before calling super __init__, you update self.availableOptions. For Bot implementation it’s advisable to use self.availableOptions.update({}) in __init__ where the defined parameters’s value may be obtained by using self.getOption(‘<parameter name’>). This may be facilated in two ways as depicted in script parser_function_count.py and scripts/commons_link.py.
  • Make necessary changes in user-config.py to get logging in from commandline/terminal for testing scripts.
  • Using userPut() in place of put() is widely done. For reference see scripts/pagefromfile.py and pywikibot/bot.py.
  • Expected format to write docstrings : See the top of other scripts, e.g. solve_disambiguation.
  • @deprecated(“fileIsShared”)
    def fileIsOnCommons(self): –>  this means to use fileIsShared()
  • @param and @type are only used for functions and methods
  • Don’t use print.
  • It is more compelling to use unicode_literals when back-porting new or existing Python 3 code to Python 2/3 than when porting existing Python 2 code to 2/3. In the latter case, explicitly marking up all unicode string literals with u'' prefixes would help to avoid unintentionally changing the existing Python 2 API.
  • Don’t remove u’…’ just for fun but only if git blame was broken anyway.
  • If a bot uses GeneratorFactory, the module should include the line
    docuReplacements = {‘&params;’: pywikibot.pagegenerators.parameterHelp}
    and include the marker &params; in the module’s docstring
    We manually include it so the parameters show up in the auto-generated module documentation:
  • # creating & retrieving urls – help
  • using __future__ as
    from __future__ import (absolute_import, division, print_function, unicode_literals)
  • Unicode doesn’t exist in Python 3, but this here is easy: str.decode generates a unicode object.
    eg. : unicode_text = unicode(mungedText, ‘utf-8’) rewritten as : mungedText.decode(encoding=‘utf-8’)
  • Using input_yn instead of inputChoice:
    answer = pywikibot.input_yn(u’Do you want to import %s?’, default=False, automatic_quit=False)
  • When issue like Jenkins actually saying to rebase the latest patch been submitted it refers to rebase on the master. It may be done directly by stashing unstaged changes first and then using ‘git rebase master’
    Usually it works always unless you depend on other patches which aren’t merged (this means the PS is not merged) (So if the patch is merged but you depend on not the latest PS (so not on the PS which got merged) then you need to rebase it differently)
  • MORE to COME

Working with patrol.py script..

Here, I have described a sum up of the different suggestions and guidelines I had been directed for porting..

Dependencies : patrol.py is dependent on mwlib.uparser ..

Basic in general guidelines:

  • Do the scripting according to pep8 , pep257 , flake8 and pyflakes guidelines..
  • Make minute changes based on the present version, date, author etc.. details.
  • Retain it’s functionality based on the pywikibot module.

Specific to patrol.py :

  • .verbose issue to be replaced by .config.verbose_output
  • Became familiar with site.py functions meant for patrolling (patrol).
  • I needed to prepare patrol whitelists for testing purpose on test.wikipedia, wikisource and mediawiki (for future reference) using list .
  • On testing, I encountered “pywikibot.data.api.APIError: permissiondenied: Permission denied” error which meant to get additional rights to patrol as per instructions been provided here.
  • From this, I came to know about Patrol permission right given to specific users. So, I needed to generate request here.
  • Once received the rights, proper testing is done in conventional way i.e. using : “python pwb.py scripts/patrol.py -family:’test’ -lang:’test'”
  • Finally it seems to work as I received expected output : here.

Me and FOSS encountered .. :D

Well here, I shall elaborate my first acquaintance with the FOSS culture and a hell lot of new experiences following my selection at FOSS OPW Round 9.

It had been quite an important breakpoint where I developed a more realistic perception about open source contribution norms which otherwise appeared to me as a very formal mode of working. I should state that the initial meetings have been significantly essential to motivate me towards my project due to the much supportive and friendly guidance been provided. I have been quite inspired by their working, organisation and task tackling approach and hope to follow them to avoid maximum mistakes which might happen otherwise. They have been quite friendly at times especially during the announcement and have maintained the most apt balance of working which I really appreciate.

Hoping for the best !